CGs are list of CG lines, which are lists of CG tokens.
- §1. Introduction
- §2. Text to a CG token list
- §7. The GTC
- §9. Logging
- §10. Parsing nonliteral tokens
- §12. Determining
- §13. Scoring
§1. Introduction. Until 2021, CG tokens were held as parse nodes in the syntax tree, with a special type TOKEN_NT and a set of annotations, but as cute as that was it was also obfuscatory, and now each CG token corresponds to a cg_token object as follows:
typedef struct cg_token { struct wording text_of_token; int grammar_token_code; struct parse_node *what_token_describes; 0 or else one of the *_GTC values struct binary_predicate *token_relation; struct noun_filter_token *noun_filter; struct command_grammar *defined_by; int slash_class; used in slashing: see CGLines::slash int slash_dash_dash; ditto struct cg_token *next_token; in the list for a CG line CLASS_DEFINITION } cg_token; cg_token *CGTokens::cgt_of(wording W, int lit) { cg_token *cgt = CREATE(cg_token); cgt->text_of_token = W; cgt->slash_dash_dash = FALSE; cgt->slash_class = 0; cgt->what_token_describes = NULL; cgt->grammar_token_code = lit?LITERAL_GTC:0; cgt->token_relation = NULL; cgt->noun_filter = NULL; cgt->defined_by = NULL; cgt->next_token = NULL; return cgt; }
- The structure cg_token is accessed in 5/cgl and here.
§2. Text to a CG token list. Tokens are created when text such as "drill [something] with [something]" is parsed, from an Understand sentence or elsewhere. What happens is much the same as when text with substitutions is read: the text is retokenised by the lexer to produce the following, in which the square brackets have become commas:
"drill" , something , "with" , something
In fact we use a different punctuation set from the lexer's default, because we want forward slashes to break words, so that we need / to be a punctuation mark: thus "get away/off/out" becomes
"get" "away" / "off" / "out"
define GRAMMAR_PUNCTUATION_MARKS L".,:;?!(){}[]/" note the slash
cg_token *CGTokens::tokenise(wording W) { wchar_t *as_wide_string = Lexer::word_text(Wordings::first_wn(W)); Reject this if it contains punctuation2.1; wording TW = Feeds::feed_C_string_full(as_wide_string, TRUE, GRAMMAR_PUNCTUATION_MARKS); Reject this if it contains two consecutive commas2.2; cg_token *tokens = CGTokens::break_into_tokens(TW); if (tokens == NULL) { StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_UnderstandEmptyText), "'understand' should be followed by text which contains at least " "one word or square-bracketed token", "so for instance 'understand \"take [something]\" as taking' is fine, " "but 'understand \"\" as the fog' is not. The same applies to the contents " "of 'topic' columns in tables, since those are also instructions for " "understanding."); } return tokens; }
§2.1. Reject this if it contains punctuation2.1 =
int skip = FALSE, literal_punct = FALSE; for (int i=0; as_wide_string[i]; i++) { if (as_wide_string[i] == '[') skip = TRUE; if (as_wide_string[i] == ']') skip = FALSE; if (skip) continue; if ((as_wide_string[i] == '.') || (as_wide_string[i] == ',') || (as_wide_string[i] == '!') || (as_wide_string[i] == '?') || (as_wide_string[i] == ':') || (as_wide_string[i] == ';')) literal_punct = TRUE; } if (literal_punct) { StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_LiteralPunctuation), "'understand' text cannot contain literal punctuation", "or more specifically cannot contain any of these: . , ! ? : ; since they " "are already used in various ways by the parser, and would not correctly " "match here."); return NULL; }
- This code is used in §2.
§2.2. Reject this if it contains two consecutive commas2.2 =
LOOP_THROUGH_WORDING(i, TW) if (i < Wordings::last_wn(TW)) if ((compare_word(i, COMMA_V)) && (compare_word(i+1, COMMA_V))) { StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_UnderstandCommaCommand), "'understand' as an action cannot involve a comma", "since a command leading to an action never does. " "(Although Inform understands commands like 'PETE, LOOK' " "only the part after the comma is read as an action command: " "the part before the comma is read as the name of someone, " "according to the usual rules for parsing a name.) " "Because of the way Inform processes text with square " "brackets, this problem message is also sometimes seen " "if empty square brackets are used, as in 'Understand " "\"bless []\" as blessing.'"); return NULL; }
- This code is used in §2.
§3. The following tiny Preform grammar is then used to break up the resulting text at commas:
<grammar-token-breaking> ::= ... , ... | ==> { NOT_APPLICABLE, - } <quoted-text> | ==> { TRUE, - } ... ==> { FALSE, - }
- This is Preform grammar, not regular C code.
§4. The following function takes a wording and turns it into a linked list of CG tokens, divided by commas:
cg_token *CGTokens::break_into_tokens(wording W) { return CGTokens::break_into_tokens_r(NULL, W); } cg_token *CGTokens::break_into_tokens_r(cg_token *list, wording W) { <grammar-token-breaking>(W); switch (<<r>>) { case NOT_APPLICABLE: { wording LW = GET_RW(<grammar-token-breaking>, 1); wording RW = GET_RW(<grammar-token-breaking>, 2); list = CGTokens::break_into_tokens_r(list, LW); list = CGTokens::break_into_tokens_r(list, RW); break; } case TRUE: Word::dequote(Wordings::first_wn(W)); if (*(Lexer::word_text(Wordings::first_wn(W))) == 0) return list; W = Feeds::feed_C_string_full(Lexer::word_text(Wordings::first_wn(W)), FALSE, GRAMMAR_PUNCTUATION_MARKS); LOOP_THROUGH_WORDING(i, W) { cg_token *cgt = CGTokens::cgt_of(Wordings::one_word(i), TRUE); list = CGTokens::add_to_list(cgt, list); } break; case FALSE: { cg_token *cgt = CGTokens::cgt_of(W, FALSE); list = CGTokens::add_to_list(cgt, list); break; } } return list; }
§5. If list represents the head of the list (and is NULL for an empty list), this adds cgt at the end and returns the new head.
cg_token *CGTokens::add_to_list(cg_token *cgt, cg_token *list) { if (list == NULL) return cgt; if (cgt == NULL) return list; cg_token *x = list; while (x->next_token) x = x->next_token; x->next_token = cgt; return list; }
§6. As the above shows, the text of a token is not necessarily a single word, unless it's a literal.
wording CGTokens::text(cg_token *cgt) { return cgt?(cgt->text_of_token):(EMPTY_WORDING); }
§7. The GTC. The GTC, or grammar token code, is a sort of type indicator for tokens. As produced by the tokeniser above, tokens initially have GTC either UNDETERMINED_GTC or LITERAL_GTC. Differentiation of non-literal tokens into other types happens in CGTokens::determine.
Note that there are two sets of GTC values, one set positive, one negative. The negative ones correspond closely to command-parser grammar reserved tokens in the old I6 compiler, and this is indeed what they compile to if we are generating I6 code.
define NAMED_TOKEN_GTC 1 define RELATED_GTC 2 define STUFF_GTC 3 define ANY_STUFF_GTC 4 define ANY_THINGS_GTC 5 define LITERAL_GTC 6 define UNDETERMINED_GTC 0 define NOUN_TOKEN_GTC -1 like I6 noun define MULTI_TOKEN_GTC -2 like I6 multi define MULTIINSIDE_TOKEN_GTC -3 like I6 multiinside define MULTIHELD_TOKEN_GTC -4 like I6 multiheld define HELD_TOKEN_GTC -5 like I6 held define CREATURE_TOKEN_GTC -6 like I6 creature define TOPIC_TOKEN_GTC -7 like I6 topic define MULTIEXCEPT_TOKEN_GTC -8 like I6 multiexcept
int CGTokens::is_literal(cg_token *cgt) { if ((cgt) && (cgt->grammar_token_code == LITERAL_GTC)) return TRUE; return FALSE; } int CGTokens::is_I6_parser_token(cg_token *cgt) { if ((cgt) && (cgt->grammar_token_code < 0)) return TRUE; return FALSE; } int CGTokens::is_topic(cg_token *cgt) { if ((cgt) && (cgt->grammar_token_code == TOPIC_TOKEN_GTC)) return TRUE; return FALSE; }
§8. A multiple token is one which permits multiple matches in the run-time command parser: for instance, the player can type ALL where a MULTI_TOKEN_GTC is expected.
int CGTokens::is_multiple(cg_token *cgt) { switch (cgt->grammar_token_code) { case MULTI_TOKEN_GTC: case MULTIINSIDE_TOKEN_GTC: case MULTIHELD_TOKEN_GTC: case MULTIEXCEPT_TOKEN_GTC: return TRUE; } return FALSE; }
void CGTokens::log(cg_token *cgt) { if (cgt == NULL) LOG("<no-cgt>"); else { LOG("<CGT%d:%W", cgt->allocation_id, cgt->text_of_token); if (cgt->slash_class != 0) LOG("/%d", cgt->slash_class); if (cgt->slash_dash_dash) LOG("/--"); switch (cgt->grammar_token_code) { case NAMED_TOKEN_GTC: LOG(" = named token"); break; case RELATED_GTC: LOG(" = related"); break; case STUFF_GTC: LOG(" = stuff"); break; case ANY_STUFF_GTC: LOG(" = any stuff"); break; case ANY_THINGS_GTC: LOG(" = any things"); break; case NOUN_TOKEN_GTC: LOG(" = noun"); break; case MULTI_TOKEN_GTC: LOG(" = multi"); break; case MULTIINSIDE_TOKEN_GTC: LOG(" = multiinside"); break; case MULTIHELD_TOKEN_GTC: LOG(" = multiheld"); break; case HELD_TOKEN_GTC: LOG(" = held"); break; case CREATURE_TOKEN_GTC: LOG(" = creature"); break; case TOPIC_TOKEN_GTC: LOG(" = topic"); break; case MULTIEXCEPT_TOKEN_GTC: LOG(" = multiexcept"); break; } LOG(">"); } }
§10. Parsing nonliteral tokens. Unless a token is literal and in double-quotes, it will start out as having UNDETERMINED_GTC until we investigate what the words in it mean, which we will do with the following Preform grammar.
Note that <grammar-token> always matches any text, even if it sometimes throws a problem message on the way. Its return integer is a valid GTC, and its return pointer is a (non-null) description of what the token matches.
<grammar-token> ::= <named-grammar-token> | ==> Apply the command grammar10.1 any things | ==> { ANY_THINGS_GTC, Specifications::from_kind(K_thing) } any <s-description> | ==> { ANY_STUFF_GTC, RP[1] } anything | ==> { ANY_STUFF_GTC, Specifications::from_kind(K_thing) } anybody | ==> { ANY_STUFF_GTC, Specifications::from_kind(K_person) } anyone | ==> { ANY_STUFF_GTC, Specifications::from_kind(K_person) } anywhere | ==> { ANY_STUFF_GTC, Specifications::from_kind(K_room) } something related by reversed <relation-name> | ==> Apply the reversed relation10.2 something related by <relation-name> | ==> Apply the relation10.3 something related by ... | ==> Issue PM_GrammarBadRelation problem10.4 <standard-grammar-token> | ==> { pass 1 } <definite-article> <k-kind> | ==> { STUFF_GTC, Specifications::from_kind(RP[2]) } <s-description> | ==> { STUFF_GTC, RP[1] } <s-type-expression> | ==> Issue PM_BizarreToken problem10.9 ... ==> Issue PM_UnknownToken problem10.10 <standard-grammar-token> ::= something | ==> { NOUN_TOKEN_GTC, Specifications::from_kind(K_object) } things | ==> { MULTI_TOKEN_GTC, Specifications::from_kind(K_object) } things inside | ==> { MULTIINSIDE_TOKEN_GTC, Specifications::from_kind(K_object) } things preferably held | ==> { MULTIHELD_TOKEN_GTC, Specifications::from_kind(K_object) } something preferably held | ==> { HELD_TOKEN_GTC, Specifications::from_kind(K_object) } other things | ==> { MULTIEXCEPT_TOKEN_GTC, Specifications::from_kind(K_object) } someone | ==> { CREATURE_TOKEN_GTC, Specifications::from_kind(K_object) } somebody | ==> { CREATURE_TOKEN_GTC, Specifications::from_kind(K_object) } text | ==> { TOPIC_TOKEN_GTC, Specifications::from_kind(K_understanding) } topic | ==> Issue PM_UseTextNotTopic problem10.5 a topic | ==> Issue PM_UseTextNotTopic problem10.5 object | ==> Issue PM_UseThingNotObject problem10.6 an object | ==> Issue PM_UseThingNotObject problem10.6 something held | ==> Issue something held problem message10.7 things held ==> Issue things held problem message10.8 <named-grammar-token> internal { command_grammar *cg = CommandGrammars::named_token_by_name(W); if (cg) { ==> { -, cg }; return TRUE; } ==> { fail nonterminal }; }
- This is Preform grammar, not regular C code.
§10.1. Apply the command grammar10.1 =
==> { NAMED_TOKEN_GTC, ParsingPlugin::rvalue_from_command_grammar(RP[1]) }
- This code is used in §10.
§10.2. Apply the reversed relation10.2 =
==> { RELATED_GTC, Rvalues::from_binary_predicate(BinaryPredicates::get_reversal(RP[1])) }
- This code is used in §10.
§10.3. Apply the relation10.3 =
==> { RELATED_GTC, Rvalues::from_binary_predicate(RP[1]) }
- This code is used in §10.
§10.4. Issue PM_GrammarBadRelation problem10.4 =
Problems::quote_source(1, current_sentence); Problems::quote_wording(2, W); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_GrammarBadRelation)); Problems::issue_problem_segment( "The grammar token '%2' in the sentence %1 " "invites me to understand names of related things, " "but the relation is not one that I know."); Problems::issue_problem_end(); ==> { RELATED_GTC, Rvalues::from_binary_predicate(R_equality) }
- This code is used in §10.
§10.5. Issue PM_UseTextNotTopic problem10.5 =
Problems::quote_source(1, current_sentence); Problems::quote_wording(2, W); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UseTextNotTopic)); Problems::issue_problem_segment( "The grammar token '%2' in the sentence %1 would in some " "ways be the right logical way to suggest 'any words at " "all here', but Inform in actually uses the special syntax " "'[text]' for that. %P" "This is partly for historical reasons, but also because " "'[text]' is a token which can't be used in every sort of " "Understand grammar - for example, it can't be used with 'matches' " "or in descriptions of actions or in table columns; it's really " "intended only for defining new commands."); Problems::issue_problem_end(); ==> { TOPIC_TOKEN_GTC, Specifications::from_kind(K_understanding) };
- This code is used in §10 (twice).
§10.6. Issue PM_UseThingNotObject problem10.6 =
Problems::quote_source(1, current_sentence); Problems::quote_wording(2, W); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UseThingNotObject)); Problems::issue_problem_segment( "The grammar token '%2' in the sentence %1 would in some " "ways be the right logical way to suggest 'any object at " "all here', but Inform uses the special syntax '[thing]' " "for that. (Or '[things]' if multiple objects are allowed.)"); Problems::issue_problem_end(); ==> { MULTI_TOKEN_GTC, Specifications::from_kind(K_object) }
- This code is used in §10 (twice).
§10.7. Issue something held problem message10.7 =
CGTokens::incompatible_change_problem( "something held", "something", "something preferably held"); ==> { HELD_TOKEN_GTC, Specifications::from_kind(K_object) }
- This code is used in §10.
§10.8. Issue things held problem message10.8 =
CGTokens::incompatible_change_problem( "things held", "things", "things preferably held"); ==> { MULTIHELD_TOKEN_GTC, Specifications::from_kind(K_object) }
- This code is used in §10.
§10.9. Issue PM_BizarreToken problem10.9 =
LOG("$T", current_sentence); Problems::quote_source(1, current_sentence); Problems::quote_wording(2, W); Problems::quote_kind_of(3, RP[1]); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_BizarreToken)); Problems::issue_problem_segment( "The grammar token '%2' in the sentence %1 looked to me as " "if it might be %3, but this isn't something allowed in " "parsing grammar."); Problems::issue_problem_end(); ==> { STUFF_GTC, Specifications::from_kind(K_thing) }
- This code is used in §10.
§10.10. Issue PM_UnknownToken problem10.10 =
LOG("$T", current_sentence); Problems::quote_source(1, current_sentence); Problems::quote_wording(2, W); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UnknownToken)); Problems::issue_problem_segment( "I was unable to understand what you meant by the grammar token '%2' " "in the sentence %1."); Problems::issue_problem_end(); ==> { STUFF_GTC, Specifications::from_kind(K_thing) }
- This code is used in §10.
§11. Something of an extended mea culpa: but it had the desired effect, in that nobody complained about what might have been a controversial change.
void CGTokens::incompatible_change_problem(char *token_tried, char *token_instead, char *token_better) { Problems::quote_source(1, current_sentence); Problems::quote_text(2, token_tried); Problems::quote_text(3, token_instead); Problems::quote_text(4, token_better); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_ObsoleteHeldTokens)); Problems::issue_problem_segment( "In the sentence %1, you used the '[%2]' as a token, which was " "allowed in the early Public Beta versions of Inform 7, but became " "out of date in August 2006.%L A change was then made so that if an " "action needed to apply to something which was carried, this would " "now be specified when the action is created - not in the Understand " "line for it. For instance, one might say 'Dismantling is an action " "which applies to one carried thing', instead of '...which applies " "to one thing', and then write grammar such as 'Understand \"dismantle " "[something] as dismantling' instead of '...[something held]...'. " "So you probably need to change your '[%2]' token to '[%3]', and " "change the action's definition (unless it is a built-in action " "such as 'dropping'). An alternative, though, for fine-tuning is to " "change it to '[%4]', which allows anything to be Understood, but " "in cases of ambiguity tends to guess that something held is more " "likely to be what the player means than something not held."); Problems::issue_problem_end(); }
§12. Determining. To calculate a description of what is being described by a token, then, we call the following function, which delegates to <grammar-token> above.
In the two cases NAMED_TOKEN_GTC and RELATED_GTC the pointer result is a temporary one telling us which named token, and which relation, respectively: we then convert those into the result. In all other cases, the parse_node pointer returned by <grammar-token> is the result.
parse_node *CGTokens::determine(cg_token *cgt, int depth) { if (CGTokens::is_literal(cgt)) return NULL; <grammar-token>(CGTokens::text(cgt)); cgt->grammar_token_code = <<r>>; parse_node *result = <<rp>>; switch (cgt->grammar_token_code) { case NAMED_TOKEN_GTC: cgt->defined_by = ParsingPlugin::rvalue_to_command_grammar(result); result = CommandGrammars::determine(cgt->defined_by, depth+1); break; case ANY_STUFF_GTC: Make sure the result is a description with one free variable12.1; cgt->noun_filter = UnderstandFilterTokens::nft_new(result, TRUE, FALSE); break; case ANY_THINGS_GTC: Make sure the result is a description with one free variable12.1; cgt->noun_filter = UnderstandFilterTokens::nft_new(result, TRUE, TRUE); break; case RELATED_GTC: cgt->token_relation = Rvalues::to_binary_predicate(result); kind *K = BinaryPredicates::term_kind(cgt->token_relation, 0); if (K == NULL) K = K_object; result = Specifications::from_kind(K); break; case STUFF_GTC: Make sure the result is a description with one free variable12.1; cgt->noun_filter = UnderstandFilterTokens::nft_new(result, FALSE, FALSE); break; default: Node::set_text(result, CGTokens::text(cgt)); break; } if (result) Vet the grammar token determination for parsability at run-time12.2; cgt->what_token_describes = result; return cgt->what_token_describes; }
§12.1. If the token determines an actual constant value — as it can when it is a named token which always refers to a specific thing, for example — it is possible for result not to be a description. Otherwise, though, it has to be a description which is true or false for any given value, so:
Make sure the result is a description with one free variable12.1 =
pcalc_prop *prop = Specifications::to_proposition(result); if ((prop) && (Binding::number_free(prop) != 1)) { LOG("So $P and $D\n", result, prop); StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_FilterQuantified), "the [any ...] doesn't clearly give a description in the '...' part", "where I was expecting something like '[any vehicle]'."); result = Specifications::from_kind(K_object); }
- This code is used in §12 (three times).
§12.2. Vet the grammar token determination for parsability at run-time12.2 =
if (Specifications::is_description(result)) { kind *K = Specifications::to_kind(result); if ((K_understanding) && (Kinds::Behaviour::is_object(K) == FALSE) && (Kinds::eq(K, K_understanding) == FALSE) && (RTKindConstructors::request_I6_GPR(K) == FALSE)) { Problems::quote_source(1, current_sentence); Problems::quote_wording(2, CGTokens::text(cgt)); StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UnparsableKind)); Problems::issue_problem_segment( "The grammar token '%2' in the sentence %1 invites me to understand " "values typed by the player during play but for a kind of value which " "is beyond my ability. Generally speaking, the allowable kinds of value " "are number, time, text and any new kind of value you may have created - " "but not, for instance, scene or rule."); Problems::issue_problem_end(); result = Specifications::from_kind(K_object); } }
- This code is used in §12.
§13. Scoring. This score is needed when sorting CG lines in order of applicability: see the discussion at CGLines::cgl_determine. The function must return a value which is at least 0 but strictly less than CGL_SCORE_TOKEN_RANGE. The general idea is that higher scores cause tokens to take precedence over lower ones.
int CGTokens::score_bonus(cg_token *cgt) { if (cgt == NULL) internal_error("no cgt"); if (cgt->grammar_token_code == UNDETERMINED_GTC) internal_error("undetermined"); int gtc = cgt->grammar_token_code; switch(gtc) { case STUFF_GTC: return 5; case NOUN_TOKEN_GTC: return 1; case MULTI_TOKEN_GTC: return 1; case MULTIINSIDE_TOKEN_GTC: return 2; case MULTIHELD_TOKEN_GTC: return 3; case HELD_TOKEN_GTC: return 4; case CREATURE_TOKEN_GTC: return 1; case TOPIC_TOKEN_GTC: return 0; case MULTIEXCEPT_TOKEN_GTC: return 3; } return 1; }