What This Module Does. An overview of the syntax module's role and abilities. @h Prerequisites. The syntax module is a part of the Inform compiler toolset. It is presented as a literate program or "web". Before diving in: (a) It helps to have some experience of reading webs: see //inweb// for more. (b) The module is written in C, in fact ANSI C99, but this is disguised by the fact that it uses some extension syntaxes provided by the //inweb// literate programming tool, making it a dialect of C called InC. See //inweb// for full details, but essentially: it's C without predeclarations or header files, and where functions have names like |Tags::add_by_name| rather than |add_by_name|. (c) This module uses other modules drawn from the compiler (see //structure//), and also uses a module of utility functions called //foundation//. For more, see //foundation: A Brief Guide to Foundation//. @h Syntax trees. Most algorithms for parsing natural language involve the construction of trees, in which the original words appear as leaves at the top of the tree, while the grammatical functions they serve appear as the branches and trunk: thus the word "orange", as an adjective, might be growing from a branch which represents a noun clause ("the orange envelope"), growing in turn from a trunk which in turn might represent a assertion sentence: >> The card is in the orange envelope. The Inform tools represent syntax trees by //parse_node_tree// structures (see //SyntaxTree::new//), but there are very few of these: the entire source text compiled by //inform7// is just one syntax tree. When //supervisor// manages extensions, it may generate one //parse_node_tree// object for each extension whose text it reads. Still -- there are few trees. @ The trunk of the tree can be grown in any sequence: call //SyntaxTree::push_bud// to begin "budding" from a particular branch, and //SyntaxTree::pop_bud// to go back to where you were. These are also used automatically to ensure that sentences arriving at //SyntaxTree::graft_sentence// are grafted under the headings to which they belong. Thus, the sentences = (text as Inform 7) Chapter 20 Section 1 The cat is in the cardboard box. Section 2 The ball of yarn is here. = would actually be grafted like so: = (text) RESULT BUD STACK BEFORE THIS Chapter 20 (empty) Section 1 Chapter 20 The cat is in the cardboard box. Chapter 20 > Section 1 Section 2 Chapter 20 > Section 1 The ball of yarn is here. Chapter 20 > Section 2 = But it is also possible to graft smaller (not-whole-sentence) cuttings onto each other using //SyntaxTree::graft//, which doesn't involve the bud stack at all. @ Meaning is an ambiguous thing, and so the tree needs to be capable of representing multiple interpretations of the same wording. So nodes have not only |next| and |down| links to other nodes, but also |next_alternative| links, which -- if used -- fork the syntax tree into different possible readings. These are not added to the tree by grafting: that's only done for definite meanings. Instead, multiple ambiguous readings mostly lie beneath |AMBIGUITY_NT| nodes -- see //SyntaxTree::add_reading//. For example, we might have: = (text) sun is orange sun AMBIGUITY orange (read as being a fruit) orange (read as being a colour) = @ An extensive suite of functions is provided to make it easy to traverse a syntax tree, calling a visitor function on each node: see //SyntaxTree::traverse//. @h Nodes. Syntax trees are made up of //parse_node// structures. While these are in principle individual nodes, they effectively represent subtrees, because they carry with them links to the nodes below. A //parse_node// object can therefore equally represent "orange", "the orange envelope", or "now the card is in the orange envelope". Each node carries three essential pieces of information with it: (1) The text giving rise to it (say, "Section Five - Fruit"). (2) A node type ID, which in broad terms says what kind of reference is being made (say, |HEADING_NT|). The possible node types are stored in the C type |node_type_t|, which corresponds to some metadata in a //node_type_metadata// object: see //Node::get_type// and //NodeType::get_metadata//. (3) A list of optional annotations, which are either integer or object-valued, and which give specifics about the meaning (say, the level number in the hierarchy of headings). See //Node Annotations//. @h Fussy, defensive, pedantry. Safe to say that Inform includes bugs: the more defensive coding we can do, the better. That means not only extensive logging (see //Node::log_tree//) but also strict verification tests on every tree made (see //Tree Verification//). (a) The only nodes allowed to exist are those for node types declared by //NodeType::new//: more generally, see //Node Types// on metadata associated with these. (b) A node of type |A| can only be a child of a node of type |B| if //NodeType::parentage_allowed// says so, and this is (mostly) a matter of calling //NodeType::allow_parentage_for_categories// -- parentage depends not on the type per se, but on the category of the type, which groups types together. (c) A node of type |A| can only have an annotation with ID |I| if //Annotations::is_allowed// says so. To declare an annotation legal, call |Annotations::allow(A, I)|, or |Annotations::allow_for_category(C, I)| for the category |C| of |A|.