CGs are list of CG lines, which are lists of CG tokens.


§1. Introduction. Until 2021, CG tokens were held as parse nodes in the syntax tree, with a special type TOKEN_NT and a set of annotations, but as cute as that was it was also obfuscatory, and now each CG token corresponds to a cg_token object as follows:

typedef struct cg_token {
    struct wording text_of_token;
    int grammar_token_code;
    struct parse_node *what_token_describes;  0 or else one of the *_GTC values
    struct binary_predicate *token_relation;
    struct noun_filter_token *noun_filter;
    struct command_grammar *defined_by;
    int slash_class;  used in slashing: see CGLines::slash
    int slash_dash_dash;  ditto
    struct cg_token *next_token;  in the list for a CG line
    CLASS_DEFINITION
} cg_token;

cg_token *CGTokens::cgt_of(wording W, int lit) {
    cg_token *cgt = CREATE(cg_token);
    cgt->text_of_token = W;
    cgt->slash_dash_dash = FALSE;
    cgt->slash_class = 0;
    cgt->what_token_describes = NULL;
    cgt->grammar_token_code = lit?LITERAL_GTC:0;
    cgt->token_relation = NULL;
    cgt->noun_filter = NULL;
    cgt->defined_by = NULL;
    cgt->next_token = NULL;
    return cgt;
}

§2. Text to a CG token list. Tokens are created when text such as "drill [something] with [something]" is parsed, from an Understand sentence or elsewhere. What happens is much the same as when text with substitutions is read: the text is retokenised by the lexer to produce the following, in which the square brackets have become commas:

"drill" , something , "with" , something

In fact we use a different punctuation set from the lexer's default, because we want forward slashes to break words, so that we need / to be a punctuation mark: thus "get away/off/out" becomes

"get" "away" / "off" / "out"
define GRAMMAR_PUNCTUATION_MARKS L".,:;?!(){}[]/"  note the slash
cg_token *CGTokens::tokenise(wording W) {
    wchar_t *as_wide_string = Lexer::word_text(Wordings::first_wn(W));
    Reject this if it contains punctuation2.1;
    wording TW = Feeds::feed_C_string_full(as_wide_string, TRUE,
        GRAMMAR_PUNCTUATION_MARKS);
    Reject this if it contains two consecutive commas2.2;

    cg_token *tokens = CGTokens::break_into_tokens(TW);
    if (tokens == NULL) {
        StandardProblems::sentence_problem(Task::syntax_tree(),
            _p_(PM_UnderstandEmptyText),
            "'understand' should be followed by text which contains at least "
            "one word or square-bracketed token",
            "so for instance 'understand \"take [something]\" as taking' is fine, "
            "but 'understand \"\" as the fog' is not. The same applies to the contents "
            "of 'topic' columns in tables, since those are also instructions for "
            "understanding.");
    }
    return tokens;
}

§2.1. Reject this if it contains punctuation2.1 =

    int skip = FALSE, literal_punct = FALSE;
    for (int i=0; as_wide_string[i]; i++) {
        if (as_wide_string[i] == '[') skip = TRUE;
        if (as_wide_string[i] == ']') skip = FALSE;
        if (skip) continue;
        if ((as_wide_string[i] == '.') || (as_wide_string[i] == ',') ||
            (as_wide_string[i] == '!') || (as_wide_string[i] == '?') ||
            (as_wide_string[i] == ':') || (as_wide_string[i] == ';'))
            literal_punct = TRUE;
    }
    if (literal_punct) {
        StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_LiteralPunctuation),
            "'understand' text cannot contain literal punctuation",
            "or more specifically cannot contain any of these: . , ! ? : ; since they "
            "are already used in various ways by the parser, and would not correctly "
            "match here.");
        return NULL;
    }

§2.2. Reject this if it contains two consecutive commas2.2 =

    LOOP_THROUGH_WORDING(i, TW)
        if (i < Wordings::last_wn(TW))
            if ((compare_word(i, COMMA_V)) && (compare_word(i+1, COMMA_V))) {
                StandardProblems::sentence_problem(Task::syntax_tree(),
                    _p_(PM_UnderstandCommaCommand),
                    "'understand' as an action cannot involve a comma",
                    "since a command leading to an action never does. "
                    "(Although Inform understands commands like 'PETE, LOOK' "
                    "only the part after the comma is read as an action command: "
                    "the part before the comma is read as the name of someone, "
                    "according to the usual rules for parsing a name.) "
                    "Because of the way Inform processes text with square "
                    "brackets, this problem message is also sometimes seen "
                    "if empty square brackets are used, as in 'Understand "
                    "\"bless []\" as blessing.'");
                return NULL;
            }

§3. The following tiny Preform grammar is then used to break up the resulting text at commas:

<grammar-token-breaking> ::=
    ... , ... |      ==> { NOT_APPLICABLE, - }
    <quoted-text> |  ==> { TRUE, - }
    ...              ==> { FALSE, - }

§4. The following function takes a wording and turns it into a linked list of CG tokens, divided by commas:

cg_token *CGTokens::break_into_tokens(wording W) {
    return CGTokens::break_into_tokens_r(NULL, W);
}
cg_token *CGTokens::break_into_tokens_r(cg_token *list, wording W) {
    <grammar-token-breaking>(W);
    switch (<<r>>) {
        case NOT_APPLICABLE: {
            wording LW = GET_RW(<grammar-token-breaking>, 1);
            wording RW = GET_RW(<grammar-token-breaking>, 2);
            list = CGTokens::break_into_tokens_r(list, LW);
            list = CGTokens::break_into_tokens_r(list, RW);
            break;
        }
        case TRUE:
            Word::dequote(Wordings::first_wn(W));
            if (*(Lexer::word_text(Wordings::first_wn(W))) == 0) return list;
            W = Feeds::feed_C_string_full(Lexer::word_text(Wordings::first_wn(W)),
                FALSE, GRAMMAR_PUNCTUATION_MARKS);
            LOOP_THROUGH_WORDING(i, W) {
                cg_token *cgt = CGTokens::cgt_of(Wordings::one_word(i), TRUE);
                list = CGTokens::add_to_list(cgt, list);
            }
            break;
        case FALSE: {
            cg_token *cgt = CGTokens::cgt_of(W, FALSE);
            list = CGTokens::add_to_list(cgt, list);
            break;
        }
    }
    return list;
}

§5. If list represents the head of the list (and is NULL for an empty list), this adds cgt at the end and returns the new head.

cg_token *CGTokens::add_to_list(cg_token *cgt, cg_token *list) {
    if (list == NULL) return cgt;
    if (cgt == NULL) return list;
    cg_token *x = list;
    while (x->next_token) x = x->next_token;
    x->next_token = cgt;
    return list;
}

§6. As the above shows, the text of a token is not necessarily a single word, unless it's a literal.

wording CGTokens::text(cg_token *cgt) {
    return cgt?(cgt->text_of_token):(EMPTY_WORDING);
}

§7. The GTC. The GTC, or grammar token code, is a sort of type indicator for tokens. As produced by the tokeniser above, tokens initially have GTC either UNDETERMINED_GTC or LITERAL_GTC. Differentiation of non-literal tokens into other types happens in CGTokens::determine.

Note that there are two sets of GTC values, one set positive, one negative. The negative ones correspond closely to command-parser grammar reserved tokens in the old I6 compiler, and this is indeed what they compile to if we are generating I6 code.

define NAMED_TOKEN_GTC 1
define RELATED_GTC 2
define STUFF_GTC 3
define ANY_STUFF_GTC 4
define ANY_THINGS_GTC 5
define LITERAL_GTC 6
define UNDETERMINED_GTC 0
define NOUN_TOKEN_GTC -1         like I6 noun
define MULTI_TOKEN_GTC -2        like I6 multi
define MULTIINSIDE_TOKEN_GTC -3  like I6 multiinside
define MULTIHELD_TOKEN_GTC -4    like I6 multiheld
define HELD_TOKEN_GTC -5         like I6 held
define CREATURE_TOKEN_GTC -6     like I6 creature
define TOPIC_TOKEN_GTC -7        like I6 topic
define MULTIEXCEPT_TOKEN_GTC -8  like I6 multiexcept
int CGTokens::is_literal(cg_token *cgt) {
    if ((cgt) && (cgt->grammar_token_code == LITERAL_GTC)) return TRUE;
    return FALSE;
}

int CGTokens::is_I6_parser_token(cg_token *cgt) {
    if ((cgt) && (cgt->grammar_token_code < 0)) return TRUE;
    return FALSE;
}

int CGTokens::is_topic(cg_token *cgt) {
    if ((cgt) && (cgt->grammar_token_code == TOPIC_TOKEN_GTC)) return TRUE;
    return FALSE;
}

§8. A multiple token is one which permits multiple matches in the run-time command parser: for instance, the player can type ALL where a MULTI_TOKEN_GTC is expected.

int CGTokens::is_multiple(cg_token *cgt) {
    switch (cgt->grammar_token_code) {
        case MULTI_TOKEN_GTC:
        case MULTIINSIDE_TOKEN_GTC:
        case MULTIHELD_TOKEN_GTC:
        case MULTIEXCEPT_TOKEN_GTC:
            return TRUE;
    }
    return FALSE;
}

§9. Logging.

void CGTokens::log(cg_token *cgt) {
    if (cgt == NULL) LOG("<no-cgt>");
    else {
        LOG("<CGT%d:%W", cgt->allocation_id, cgt->text_of_token);
        if (cgt->slash_class != 0) LOG("/%d", cgt->slash_class);
        if (cgt->slash_dash_dash) LOG("/--");
        switch (cgt->grammar_token_code) {
            case NAMED_TOKEN_GTC:        LOG(" = named token"); break;
            case RELATED_GTC:            LOG(" = related"); break;
            case STUFF_GTC:              LOG(" = stuff"); break;
            case ANY_STUFF_GTC:          LOG(" = any stuff"); break;
            case ANY_THINGS_GTC:         LOG(" = any things"); break;
            case NOUN_TOKEN_GTC:         LOG(" = noun"); break;
            case MULTI_TOKEN_GTC:        LOG(" = multi"); break;
            case MULTIINSIDE_TOKEN_GTC:  LOG(" = multiinside"); break;
            case MULTIHELD_TOKEN_GTC:    LOG(" = multiheld"); break;
            case HELD_TOKEN_GTC:         LOG(" = held"); break;
            case CREATURE_TOKEN_GTC:     LOG(" = creature"); break;
            case TOPIC_TOKEN_GTC:        LOG(" = topic"); break;
            case MULTIEXCEPT_TOKEN_GTC:  LOG(" = multiexcept"); break;
        }
        LOG(">");
    }
}

§10. Parsing nonliteral tokens. Unless a token is literal and in double-quotes, it will start out as having UNDETERMINED_GTC until we investigate what the words in it mean, which we will do with the following Preform grammar.

Note that <grammar-token> always matches any text, even if it sometimes throws a problem message on the way. Its return integer is a valid GTC, and its return pointer is a (non-null) description of what the token matches.

<grammar-token> ::=
    <named-grammar-token> |       ==> Apply the command grammar10.1
    any things |                  ==> { ANY_THINGS_GTC, Specifications::from_kind(K_thing) }
    any <s-description> |         ==> { ANY_STUFF_GTC, RP[1] }
    anything |                    ==> { ANY_STUFF_GTC, Specifications::from_kind(K_thing) }
    anybody |                     ==> { ANY_STUFF_GTC, Specifications::from_kind(K_person) }
    anyone |                      ==> { ANY_STUFF_GTC, Specifications::from_kind(K_person) }
    anywhere |                    ==> { ANY_STUFF_GTC, Specifications::from_kind(K_room) }
    something related by reversed <relation-name> |   ==> Apply the reversed relation10.2
    something related by <relation-name> |            ==> Apply the relation10.3
    something related by ... |    ==> Issue PM_GrammarBadRelation problem10.4
    <standard-grammar-token> |    ==> { pass 1 }
    <definite-article> <k-kind> | ==> { STUFF_GTC, Specifications::from_kind(RP[2]) }
    <s-description> |             ==> { STUFF_GTC, RP[1] }
    <s-type-expression> |         ==> Issue PM_BizarreToken problem10.9
    ...                           ==> Issue PM_UnknownToken problem10.10

<standard-grammar-token> ::=
    something |                 ==> { NOUN_TOKEN_GTC, Specifications::from_kind(K_object) }
    things |                    ==> { MULTI_TOKEN_GTC, Specifications::from_kind(K_object) }
    things inside |             ==> { MULTIINSIDE_TOKEN_GTC, Specifications::from_kind(K_object) }
    things preferably held |    ==> { MULTIHELD_TOKEN_GTC, Specifications::from_kind(K_object) }
    something preferably held | ==> { HELD_TOKEN_GTC, Specifications::from_kind(K_object) }
    other things |              ==> { MULTIEXCEPT_TOKEN_GTC, Specifications::from_kind(K_object) }
    someone |                   ==> { CREATURE_TOKEN_GTC, Specifications::from_kind(K_object) }
    somebody |                  ==> { CREATURE_TOKEN_GTC, Specifications::from_kind(K_object) }
    text |                      ==> { TOPIC_TOKEN_GTC, Specifications::from_kind(K_understanding) }
    topic |                     ==> Issue PM_UseTextNotTopic problem10.5
    a topic |                   ==> Issue PM_UseTextNotTopic problem10.5
    object |                    ==> Issue PM_UseThingNotObject problem10.6
    an object |                 ==> Issue PM_UseThingNotObject problem10.6
    something held |            ==> Issue something held problem message10.7
    things held                 ==> Issue things held problem message10.8

<named-grammar-token> internal {
    command_grammar *cg = CommandGrammars::named_token_by_name(W);
    if (cg) {
        ==> { -, cg };
        return TRUE;
    }
    ==> { fail nonterminal };
}

§10.1. Apply the command grammar10.1 =

    ==> { NAMED_TOKEN_GTC, ParsingPlugin::rvalue_from_command_grammar(RP[1]) }

§10.2. Apply the reversed relation10.2 =

    ==> { RELATED_GTC, Rvalues::from_binary_predicate(BinaryPredicates::get_reversal(RP[1])) }

§10.3. Apply the relation10.3 =

    ==> { RELATED_GTC, Rvalues::from_binary_predicate(RP[1]) }

§10.4. Issue PM_GrammarBadRelation problem10.4 =

    Problems::quote_source(1, current_sentence);
    Problems::quote_wording(2, W);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_GrammarBadRelation));
    Problems::issue_problem_segment(
        "The grammar token '%2' in the sentence %1 "
        "invites me to understand names of related things, "
        "but the relation is not one that I know.");
    Problems::issue_problem_end();
    ==> { RELATED_GTC, Rvalues::from_binary_predicate(R_equality) }

§10.5. Issue PM_UseTextNotTopic problem10.5 =

    Problems::quote_source(1, current_sentence);
    Problems::quote_wording(2, W);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UseTextNotTopic));
    Problems::issue_problem_segment(
        "The grammar token '%2' in the sentence %1 would in some "
        "ways be the right logical way to suggest 'any words at "
        "all here', but Inform in actually uses the special syntax "
        "'[text]' for that. %P"
        "This is partly for historical reasons, but also because "
        "'[text]' is a token which can't be used in every sort of "
        "Understand grammar - for example, it can't be used with 'matches' "
        "or in descriptions of actions or in table columns; it's really "
        "intended only for defining new commands.");
    Problems::issue_problem_end();
    ==> { TOPIC_TOKEN_GTC, Specifications::from_kind(K_understanding) };

§10.6. Issue PM_UseThingNotObject problem10.6 =

    Problems::quote_source(1, current_sentence);
    Problems::quote_wording(2, W);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UseThingNotObject));
    Problems::issue_problem_segment(
        "The grammar token '%2' in the sentence %1 would in some "
        "ways be the right logical way to suggest 'any object at "
        "all here', but Inform uses the special syntax '[thing]' "
        "for that. (Or '[things]' if multiple objects are allowed.)");
    Problems::issue_problem_end();
    ==> { MULTI_TOKEN_GTC, Specifications::from_kind(K_object) }

§10.7. Issue something held problem message10.7 =

    CGTokens::incompatible_change_problem(
        "something held", "something", "something preferably held");
    ==> { HELD_TOKEN_GTC, Specifications::from_kind(K_object) }

§10.8. Issue things held problem message10.8 =

    CGTokens::incompatible_change_problem(
            "things held", "things", "things preferably held");
    ==> { MULTIHELD_TOKEN_GTC, Specifications::from_kind(K_object) }

§10.9. Issue PM_BizarreToken problem10.9 =

    LOG("$T", current_sentence);
    Problems::quote_source(1, current_sentence);
    Problems::quote_wording(2, W);
    Problems::quote_kind_of(3, RP[1]);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_BizarreToken));
    Problems::issue_problem_segment(
        "The grammar token '%2' in the sentence %1 looked to me as "
        "if it might be %3, but this isn't something allowed in "
        "parsing grammar.");
    Problems::issue_problem_end();
    ==> { STUFF_GTC, Specifications::from_kind(K_thing) }

§10.10. Issue PM_UnknownToken problem10.10 =

    LOG("$T", current_sentence);
    Problems::quote_source(1, current_sentence);
    Problems::quote_wording(2, W);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UnknownToken));
    Problems::issue_problem_segment(
        "I was unable to understand what you meant by the grammar token '%2' "
        "in the sentence %1.");
    Problems::issue_problem_end();
    ==> { STUFF_GTC, Specifications::from_kind(K_thing) }

§11. Something of an extended mea culpa: but it had the desired effect, in that nobody complained about what might have been a controversial change.

void CGTokens::incompatible_change_problem(char *token_tried, char *token_instead,
    char *token_better) {
    Problems::quote_source(1, current_sentence);
    Problems::quote_text(2, token_tried);
    Problems::quote_text(3, token_instead);
    Problems::quote_text(4, token_better);
    StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_ObsoleteHeldTokens));
    Problems::issue_problem_segment(
        "In the sentence %1, you used the '[%2]' as a token, which was "
        "allowed in the early Public Beta versions of Inform 7, but became "
        "out of date in August 2006.%L A change was then made so that if an "
        "action needed to apply to something which was carried, this would "
        "now be specified when the action is created - not in the Understand "
        "line for it. For instance, one might say 'Dismantling is an action "
        "which applies to one carried thing', instead of '...which applies "
        "to one thing', and then write grammar such as 'Understand \"dismantle "
        "[something] as dismantling' instead of '...[something held]...'. "
        "So you probably need to change your '[%2]' token to '[%3]', and "
        "change the action's definition (unless it is a built-in action "
        "such as 'dropping'). An alternative, though, for fine-tuning is to "
        "change it to '[%4]', which allows anything to be Understood, but "
        "in cases of ambiguity tends to guess that something held is more "
        "likely to be what the player means than something not held.");
    Problems::issue_problem_end();
}

§12. Determining. To calculate a description of what is being described by a token, then, we call the following function, which delegates to <grammar-token> above.

In the two cases NAMED_TOKEN_GTC and RELATED_GTC the pointer result is a temporary one telling us which named token, and which relation, respectively: we then convert those into the result. In all other cases, the parse_node pointer returned by <grammar-token> is the result.

parse_node *CGTokens::determine(cg_token *cgt, int depth) {
    if (CGTokens::is_literal(cgt)) return NULL;

    <grammar-token>(CGTokens::text(cgt));
    cgt->grammar_token_code = <<r>>;
    parse_node *result = <<rp>>;

    switch (cgt->grammar_token_code) {
        case NAMED_TOKEN_GTC:
            cgt->defined_by = ParsingPlugin::rvalue_to_command_grammar(result);
            result = CommandGrammars::determine(cgt->defined_by, depth+1);
            break;
        case ANY_STUFF_GTC:
            Make sure the result is a description with one free variable12.1;
            cgt->noun_filter = UnderstandFilterTokens::nft_new(result, TRUE, FALSE);
            break;
        case ANY_THINGS_GTC:
            Make sure the result is a description with one free variable12.1;
            cgt->noun_filter = UnderstandFilterTokens::nft_new(result, TRUE, TRUE);
            break;
        case RELATED_GTC:
            cgt->token_relation = Rvalues::to_binary_predicate(result);
            kind *K = BinaryPredicates::term_kind(cgt->token_relation, 0);
            if (K == NULL) K = K_object;
            result = Specifications::from_kind(K);
            break;
        case STUFF_GTC:
            Make sure the result is a description with one free variable12.1;
            cgt->noun_filter = UnderstandFilterTokens::nft_new(result, FALSE, FALSE);
            break;
        default:
            Node::set_text(result, CGTokens::text(cgt));
            break;
    }

    if (result) Vet the grammar token determination for parsability at run-time12.2;
    cgt->what_token_describes = result;
    return cgt->what_token_describes;
}

§12.1. If the token determines an actual constant value — as it can when it is a named token which always refers to a specific thing, for example — it is possible for result not to be a description. Otherwise, though, it has to be a description which is true or false for any given value, so:

Make sure the result is a description with one free variable12.1 =

    pcalc_prop *prop = Specifications::to_proposition(result);
    if ((prop) && (Binding::number_free(prop) != 1)) {
        LOG("So $P and $D\n", result, prop);
        StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_FilterQuantified),
            "the [any ...] doesn't clearly give a description in the '...' part",
            "where I was expecting something like '[any vehicle]'.");
        result = Specifications::from_kind(K_object);
    }

§12.2. Vet the grammar token determination for parsability at run-time12.2 =

    if (Specifications::is_description(result)) {
        kind *K = Specifications::to_kind(result);
        if ((K_understanding) &&
            (Kinds::Behaviour::is_object(K) == FALSE) &&
            (Kinds::eq(K, K_understanding) == FALSE) &&
            (RTKindConstructors::request_I6_GPR(K) == FALSE)) {
            Problems::quote_source(1, current_sentence);
            Problems::quote_wording(2, CGTokens::text(cgt));
            StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_UnparsableKind));
            Problems::issue_problem_segment(
                "The grammar token '%2' in the sentence %1 invites me to understand "
                "values typed by the player during play but for a kind of value which "
                "is beyond my ability. Generally speaking, the allowable kinds of value "
                "are number, time, text and any new kind of value you may have created - "
                "but not, for instance, scene or rule.");
            Problems::issue_problem_end();
            result = Specifications::from_kind(K_object);
        }
    }

§13. Scoring. This score is needed when sorting CG lines in order of applicability: see the discussion at CGLines::cgl_determine. The function must return a value which is at least 0 but strictly less than CGL_SCORE_TOKEN_RANGE. The general idea is that higher scores cause tokens to take precedence over lower ones.

int CGTokens::score_bonus(cg_token *cgt) {
    if (cgt == NULL) internal_error("no cgt");
    if (cgt->grammar_token_code == UNDETERMINED_GTC) internal_error("undetermined");
    int gtc = cgt->grammar_token_code;
    switch(gtc) {
        case STUFF_GTC:             return 5;
        case NOUN_TOKEN_GTC:        return 1;
        case MULTI_TOKEN_GTC:       return 1;
        case MULTIINSIDE_TOKEN_GTC: return 2;
        case MULTIHELD_TOKEN_GTC:   return 3;
        case HELD_TOKEN_GTC:        return 4;
        case CREATURE_TOKEN_GTC:    return 1;
        case TOPIC_TOKEN_GTC:       return 0;
        case MULTIEXCEPT_TOKEN_GTC: return 3;
    }
    return 1;
}