A general introduction to the S-parser and the data structures it makes use of.

§1. At this point, the text read in by Inform is now a stream of words, each of which is identified by a pointer to a vocabulary_entry structure in its dictionary. The words are numbered upwards from 0, and we refer to any contiguous run of words as an "excerpt", often writing (w1, w2) to mean the text starting with word w1 and continuing to word w2. The stream of words has been divided further into sentences.

Inform has two mechanisms for making sense of this text, the A-parser and the S-parser.

There are many similarities between the A-parser and the S-parser, partly because A makes use of S, but also because they contain parallel mechanisms which handle verbs and prepositions similarly. But there are also many differences. The A-parser will accept "On the dressing table is an amber comb." even if table and comb have never been mentioned before, whereas the S-parser can only recognise meanings already defined. On the other hand, the S-parser will accept conditions like the one in "if there are fewer than 8 men in Verona, ..." whereas the A-parser would reject the assertion "There are fewer than 8 men in Verona." as being too vague to act upon. Similarly, the A-parser works only in the present tense, whereas the S-parser can handle the past and perfect tenses. (Neither can handle any future tenses, since a computer cannot either control or definitely predict the future.)

The A-parser works by applying the S-parser to text at parse_node structures in the parse tree. So we will build the S-parser first, which won't involve the parse tree at all. We will then go back to the parse tree to write the A-parser.

§2. The S-parser is similar to the expression parser in a regular compiler. It is in some ways simpler because natural language tends not to form complex formulae, but in other ways more complicated, because performance issues are very significant when comparing excerpts of text, and because there are many more ambiguities to resolve.

Our aim is to turn any excerpt into a specification structure inside Inform. This is a universal holder for both values and descriptions of values, where "value" is interpreted very broadly. It is usually too difficult to go directly from text to a specification, so we use a two-stage process:

Thus parse_node structures are private to the S-parser, whereas specification structures appear all over Inform.

§3. Consider the following contrived example.

if Mr Fitzwilliam Darcy was carrying at least three things which are in the box, increase the score by 7;

So parsing the text "if Mr Fitzwilliam Darcy was carrying at least three things which are in the box, increase the score by 7" is going to result in a mass of pointers to different structures, and we need an umbrella structure to hold this mass together. This is what the parse_node is for, but as explained above, it's really only an intermediate state used while the S-parser is working.

§4. One obvious category of word is missing: there are no adjectives in this example. Inform currently supports many sorts of adjective — either/or properties, such as "open"; values of kinds of value which coincide with properties, such as "green" as a value of a "colour"; and adjectives defined with conditions or full phrases, such as "invisible" resulting from "Definition: a thing is invisible if...".

The S-parser treats all adjectives alike — more or less just as names. This is because "open" may mean one thing for containers and another for scenes, for example. The identification of an adjective's name with its set of possible meanings is via a structure called adjective.

§5. To sum up. If we write "text" \(\rightarrow\) structure used for parsing \(\rightarrow\) structure used to hold meaning, our example is parsed like so:

§6. To sum up further still, excerpt_meaning structures are used to parse simple nouns and imperative phrases, whereas other specialist structures (preposition, determiner, etc.) are used to parse the hinges which hold sentences together. Once parsed, individual excerpts tend to have meanings which might be pointers to a bewildering range of structures (instance, quantifier, binary_predicate, adjective, etc.) but these pointers are held together inside the S-parser by a single unifying construction: the parse_node. And we will eventually turn the whole thing into a specification for the rest of Inform to use.