When making a compiler, how do you implement an operator that potentially returns a string not present anywhere in the program?
I've tried to implement the TypeOf operator in AEC, which works similar to the
typeof in JavaScript, except that, when invoked on a structure, it returns the typename of that structure (rather than "object"). For the most part, it works, however, there is a problem: the compiler crashes if one tries to access TypeOf(AddressOf(someStructure)) and the string "SomeStructurePointer" isn't present anywhere in the program. You can read more about it here:
https://github.com/FlatAssembler/AECforWebAssembly/issues/22Here is how the strings are implemented in my programming language. Right after parsing,
this function is invoked:
protected:
std::set<std::string> getStringsInSubnodes() const {
auto setToBeReturned = std::set<std::string>();
if (text == "asm(" or text == "asm_i32(" or text == "asm_i64(" or
text == "asm_f32(" or
text == "asm_f64(") // Inline assembly isn't a string that can be
// manipulated, and storing it in memory wastes
// memory (potentially a lot of it).
return std::set<std::string>(); // That's why we will return an empty set,
// as if we had no strings in our subnodes
// (even though we have at least one).
if (text.size() and text[0] == '"') {
setToBeReturned.insert(text);
return setToBeReturned;
}
for (auto child : children) {
auto stringsInChild = child.getStringsInSubnodes();
setToBeReturned.insert(stringsInChild.begin(), stringsInChild.end());
}
// Now follows the code for the `TypeOf` operator. You can read more about
// it here: https://langdev.stackexchange.com/q/4189/330
for (auto basicDataType : basicDataTypeSizes)
setToBeReturned.insert("\"" + basicDataType.first + "\"");
if (isPointerType(text))
setToBeReturned.insert("\"" + demanglePointerType(text) + "\"");
if (text == "Structure")
setToBeReturned.insert("\"" + children.at(0).text + "\"");
return setToBeReturned;
}
And then all the strings collected from the Abstract Syntax Tree into the set are put at the beginning of the heap memory, like
this:
auto allTheStrings = getStringsInSubnodes();
for (auto string : allTheStrings) {
context.globalVariables[string] = context.globalVariablePointer;
context.variableTypes[string] = "CharacterPointer";
if (string.back() != '"')
string += '"';
globalDeclarations += "\t(data 0 (i32.const " +
std::to_string(context.globalVariablePointer) +
") " + string + ")\n";
context.globalVariablePointer += string.size() - 1;
}
And the very
TypeOf operator is implemented
like this:
if (text == "TypeOf(") {
if (children.size() != 1) {
std::cerr << "Line " << lineNumber << ", Column " << columnNumber
<< ", Compiler error: The TypeOf operator has either no "
"children or has more than 1 child!"
<< std::endl;
std::exit(1);
}
TreeNode newTreeNode =
TreeNode("\"" + children.at(0).getType(context) + "\"", lineNumber,
columnNumber);
return newTreeNode.compile(context);
}