inform7/docs/core-module/10-its.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
	<head>
		<title>Introduction to Semantics</title>
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
		<meta name="viewport" content="width=device-width initial-scale=1">
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
		<meta http-equiv="Content-Language" content="en-gb">

<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
<script>
MathJax = {
	tex: {
		inlineMath: '$', '$'], ['\\(', '\\)'
	},
	svg: {
		fontCache: 'global'
	}
};
</script>
<script type="text/javascript" id="MathJax-script" async
	src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>

<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">

	</head>
	<body class="commentary-font">
		<nav role="navigation">
		<h1><a href="../index.html">
<img src="../docs-assets/Inform.png" height=72">
</a></h1>
<ul><li><a href="../compiler.html">compiler tools</a></li>
<li><a href="../other.html">other tools</a></li>
<li><a href="../extensions.html">extensions and kits</a></li>
<li><a href="../units.html">unit test tools</a></li>
</ul><h2>Compiler Webs</h2><ul>
<li><a href="../inbuild/index.html">inbuild</a></li>
<li><a href="../inform7/index.html">inform7</a></li>
<li><a href="../inter/index.html">inter</a></li>
</ul><h2>Inbuild Modules</h2><ul>
<li><a href="../supervisor-module/index.html">supervisor</a></li>
</ul><h2>Inform7 Modules</h2><ul>
<li><a href="index.html"><span class="selectedlink">core</span></a></li>
<li><a href="../if-module/index.html">if</a></li>
<li><a href="../multimedia-module/index.html">multimedia</a></li>
<li><a href="../index-module/index.html">index</a></li>
</ul><h2>Inter Modules</h2><ul>
<li><a href="../bytecode-module/index.html">bytecode</a></li>
<li><a href="../building-module/index.html">building</a></li>
<li><a href="../codegen-module/index.html">codegen</a></li>
</ul><h2>Services</h2><ul>
<li><a href="../arch-module/index.html">arch</a></li>
<li><a href="../calculus-module/index.html">calculus</a></li>
<li><a href="../html-module/index.html">html</a></li>
<li><a href="../inflections-module/index.html">inflections</a></li>
<li><a href="../kinds-module/index.html">kinds</a></li>
<li><a href="../linguistics-module/index.html">linguistics</a></li>
<li><a href="../problems-module/index.html">problems</a></li>
<li><a href="../syntax-module/index.html">syntax</a></li>
<li><a href="../words-module/index.html">words</a></li>
<li><a href="../../../inweb/docs/foundation-module/index.html">foundation</a></li>

</ul>
		</nav>
		<main role="main">
		<!--Weave of 'Introduction to Semantics' generated by Inweb-->
<div class="breadcrumbs">
    <ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../compiler.html">Inform7 Modules</a></li><li><a href="index.html">core</a></li><li><a href="index.html#10">Chapter 10: The S-Parser</a></li><li><b>Introduction to Semantics</b></li></ul></div>
<p class="purpose">A general introduction to the S-parser and the data structures it makes use of.</p>

<p class="commentary firstcommentary"><a id="SP1" class="paragraph-anchor"></a><b>&#167;1.  </b>At this point, the text read in by Inform is now a stream of words, each of
which is identified by a pointer to a <span class="extract"><span class="extract-syntax">vocabulary_entry</span></span> structure in its
dictionary. The words are numbered upwards from 0, and we refer to any
contiguous run of words as an "excerpt", often writing <span class="extract"><span class="extract-syntax">(w1, w2)</span></span> to mean
the text starting with word <span class="extract"><span class="extract-syntax">w1</span></span> and continuing to word <span class="extract"><span class="extract-syntax">w2</span></span>. The stream
of words has been divided further into sentences.
</p>

<p class="commentary">Inform has two mechanisms for making sense of this text, the A-parser and
the S-parser.
</p>

<ul class="items"><li>(A) A stands for "assertion". For instance, "Two men are in Verona." is
an assertion, telling Inform that at the start of play there are to be two
previously unknown men and that they begin in the room called Verona. The
A-parser handles entire sentences.
</li><li>(S) S stands for "semantics", the study of how already-understood
meanings correspond to excerpts of text within a sentence. The S-parser
handles anything from tiny excerpts like "6" through noun phrases such as
"Verona" to complicated expressions like "the number of men in Verona".
</li></ul>
<p class="commentary">There are many similarities between the A-parser and the S-parser, partly
because A makes use of S, but also because they contain parallel mechanisms
which handle verbs and prepositions similarly. But there are also many
differences. The A-parser will accept "On the dressing table is an amber
comb." even if table and comb have never been mentioned before, whereas
the S-parser can only recognise meanings already defined. On the other
hand, the S-parser will accept conditions like the one in "if there are
fewer than 8 men in Verona, ..." whereas the A-parser would reject
the assertion "There are fewer than 8 men in Verona." as being too vague
to act upon. Similarly, the A-parser works only in the present tense,
whereas the S-parser can handle the past and perfect tenses. (Neither
can handle any future tenses, since a computer cannot either control or
definitely predict the future.)
</p>

<p class="commentary">The A-parser works by applying the S-parser to text at <span class="extract"><span class="extract-syntax">parse_node</span></span> structures
in the parse tree. So we will build the S-parser first, which won't involve
the parse tree at all. We will then go back to the parse tree to write the
A-parser.
</p>

<p class="commentary firstcommentary"><a id="SP2" class="paragraph-anchor"></a><b>&#167;2.  </b>The S-parser is similar to the expression parser in a regular compiler. It
is in some ways simpler because natural language tends not to form complex
formulae, but in other ways more complicated, because performance issues
are very significant when comparing excerpts of text, and because there are
many more ambiguities to resolve.
</p>

<p class="commentary">Our aim is to turn any excerpt into a <span class="extract"><span class="extract-syntax">specification</span></span> structure inside
Inform. This is a universal holder for both values and descriptions of
values, where "value" is interpreted very broadly. It is usually too
difficult to go directly from text to a <span class="extract"><span class="extract-syntax">specification</span></span>, so we use
a two-stage process:
</p>

<ul class="items"><li>(1) parse the text to a <span class="extract"><span class="extract-syntax">parse_node</span></span> which holds all possible interpretations
of it, and then
</li><li>(2) convert the most likely-looking interpretation(s) to a <span class="extract"><span class="extract-syntax">specification</span></span>.
</li></ul>
<p class="commentary">Thus <span class="extract"><span class="extract-syntax">parse_node</span></span> structures are private to the S-parser, whereas
<span class="extract"><span class="extract-syntax">specification</span></span> structures appear all over Inform.
</p>

<p class="commentary firstcommentary"><a id="SP3" class="paragraph-anchor"></a><b>&#167;3.  </b>Consider the following contrived example.
</p>

<blockquote>
    <p>if Mr Fitzwilliam Darcy was carrying at least three things which are in the box, increase the score by 7;</p>
</blockquote>

<ul class="items"><li>(1) There are <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures for "Mr Fitzwilliam Darcy" and
"Mr Bingham's box", which hold the wording needed to refer to these objects.
In parsing the example sentence, we connect these structures to the excerpts
"Mr Fitzwilliam Darcy" &mdash; an exact match &mdash; and "the box" &mdash; an
abbreviated one. The <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures contain pointers to
further <span class="extract"><span class="extract-syntax">instance</span></span> structures which represent the identities of these
two tangible things, that is, Darcy and his friend's box.
</li><li>(2) Another <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> holds the name "score" and points it to a
<span class="extract"><span class="extract-syntax">nonlocal_variable</span></span> structure for the relevant global variable.
</li><li>(3) And a further <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> holds the name "things" and points it
to a <span class="extract"><span class="extract-syntax">instance</span></span> structure representing the common identity shared by
all things. Inform treats individual, tangible objects such as Mr Darcy
and intangible categories of objects such as thing by representing both
with the same structure &mdash; <span class="extract"><span class="extract-syntax">instance</span></span>. This mirrors the way that common
and proper nouns are grammatically quite similar in natural language.
</li><li>(4) The final noun phrase in the above example is "7". There's no
<span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure for this &mdash; it would be insanely inefficient
to make such things &mdash; and instead it is parsed directly as a "literal",
being converted immediately into a <span class="extract"><span class="extract-syntax">specification</span></span>, of which more below.
</li><li>(5) Another <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure holds the wording "if ... , ..."
and is connected to a <span class="extract"><span class="extract-syntax">phrase</span></span> structure for the "if" construction. Here,
the wording includes flexible-sized gaps (written "...") where excerpts
should appear: the S-parser will only recognise this if the excerpts make
sense in themselves. The combination of a <span class="extract"><span class="extract-syntax">phrase</span></span> plus the results of parsing
these gaps is stored in a structure called an <span class="extract"><span class="extract-syntax">invocation</span></span>.
</li><li>(6) In the example, the first gap is filled by "Mr Fitzwilliam Darcy was
carrying at least three things which are in the box", which the S-parser
detects as being a condition. This is translated into a <span class="extract"><span class="extract-syntax">pcalc_prop</span></span>
structure &mdash; a predicate-calculus proposition, that is, which is a
representation in mathematical logic of the meaning of this sentence.
<ul class="items"><li>(a) "was carrying" is recognised as matching wording in a <span class="extract"><span class="extract-syntax">verb_usage</span></span>
structure. This points to an underlying relation, stored in a <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
structure, but combines it with an indication of tense stored in a <span class="extract"><span class="extract-syntax">time_period</span></span>.
Here the <span class="extract"><span class="extract-syntax">binary_predicate</span></span> is the carrying relation and the <span class="extract"><span class="extract-syntax">time_period</span></span>
is the past tense. (The term "binary predicate" comes from logic once
again; an Inform author would call the same concept a "relation".)
</li><li>(b) "are in" is recognised as a usage of the verb "to be" plus "in",
which matches the wording of a <span class="extract"><span class="extract-syntax">preposition</span></span> structure. Here the tense
derives only from the "to be" part: which is "are", so the <span class="extract"><span class="extract-syntax">time_period</span></span>
parsed is the present tense. This makes the <span class="extract"><span class="extract-syntax">preposition</span></span> a simpler
business than the <span class="extract"><span class="extract-syntax">verb_usage</span></span> structure &mdash; it only needs to refer to the
underlying meaning, which is once again a <span class="extract"><span class="extract-syntax">binary_predicate</span></span> structure,
the one for the containment relation.
</li><li>(c) "which" is a word introducing a relative clause. A sentence can
only have one primary verb, which in this example is "was carrying".
But other verbs can exist in relative clauses, and the effect of writing
"X which V Y" qualifies X by saying that any noun N matching X must also
satisfy "N V Y", where V is the verb. The relative-clause construction
is an example of syntax built directly into the S-parser. It doesn't come
from any data structures, like the meanings of "score" or "Mr Fitzwilliam
Darcy".
</li><li>(d) "at least three things" is an example of a noun phrase which has a
head and a tail. The head, "at least three", is recognised as matching
the wording in a <span class="extract"><span class="extract-syntax">determiner</span></span> structure, "at least (number)", together
with the literal number 3. Once again, the <span class="extract"><span class="extract-syntax">determiner</span></span> describes textual
appearances; it points to another structure, a <span class="extract"><span class="extract-syntax">quantifier</span></span>, to hold the
meaning. This is another logical term, and Inform's debugging log would
write the resulting term as <span class="extract"><span class="extract-syntax">Card&gt;=3</span></span> ("cardinality of at least 3").
Inform only uses <span class="extract"><span class="extract-syntax">determiner</span></span> structures when they quantify, that is, when
they talk about a possible range of objects rather than a single item.
A grammar of English would probably say that the "the" in "the box" is
also grammatically a determiner, but it doesn't get a <span class="extract"><span class="extract-syntax">determiner</span></span> structure
in Inform.
</li></ul>
<li>(7) The second gap in the "if ... , ..." excerpt is "increase the score
by 1", which the S-parser detects as a use of yet another <span class="extract"><span class="extract-syntax">phrase</span></span>, this
time referred to by the <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure for "increase ... by
...". It's worth noting that the S-parser doesn't check types, so it would
have been happy to match "increase 2 by 1" &mdash; an impossibility. The
S-parser's job is to find all possible meanings at a textual level,
sometimes producing a list of options: the type-checker will winnow these
out later on.
</li></ul>
<p class="commentary">So parsing the text "if Mr Fitzwilliam Darcy was carrying at least three
things which are in the box, increase the score by 7" is going to result in
a mass of pointers to different structures, and we need an umbrella structure
to hold this mass together. This is what the <span class="extract"><span class="extract-syntax">parse_node</span></span> is for, but as
explained above, it's really only an intermediate state used while the S-parser
is working.
</p>

<p class="commentary firstcommentary"><a id="SP4" class="paragraph-anchor"></a><b>&#167;4.  </b>One obvious category of word is missing: there are no adjectives in this
example. Inform currently supports many sorts of adjective &mdash; either/or
properties, such as "open"; values of kinds of value which coincide with
properties, such as "green" as a value of a "colour"; and adjectives
defined with conditions or full phrases, such as "invisible" resulting
from "Definition: a thing is invisible if...".
</p>

<p class="commentary">The S-parser treats all adjectives alike &mdash; more or less just as names.
This is because "open" may mean one thing for containers and another
for scenes, for example. The identification of an adjective's name with
its set of possible meanings is via a structure called <span class="extract"><span class="extract-syntax">adjective</span></span>.
</p>

<p class="commentary firstcommentary"><a id="SP5" class="paragraph-anchor"></a><b>&#167;5.  </b>To sum up. If we write "text" \(\rightarrow\) structure used for parsing
\(\rightarrow\) structure used to hold meaning, our example is parsed like so:
</p>

<ul class="items"><li>(1) "Mr Fitzwilliam Darcy" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">instance</span></span>
</li><li>(2) "the score" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">nonlocal_variable</span></span>
</li><li>(3) "things" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">instance</span></span>
</li><li>(4) "7" \(\rightarrow\) ...none... \(\rightarrow\) <span class="extract"><span class="extract-syntax">specification</span></span>
</li><li>(5) "if Mr Fitzwilliam Darcy was carrying at least three things which are in the
box, increase the score by 7" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\)
<span class="extract"><span class="extract-syntax">invocation</span></span> (incorporating a <span class="extract"><span class="extract-syntax">phrase</span></span>)
</li><li>(6) "Mr Fitzwilliam Darcy was carrying at least three things which are in the box"
\(\rightarrow\) ...many... \(\rightarrow\) <span class="extract"><span class="extract-syntax">pcalc_prop</span></span>
<ul class="items"><li>(a) "was carrying" \(\rightarrow\) <span class="extract"><span class="extract-syntax">verb_usage</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
plus <span class="extract"><span class="extract-syntax">time_period</span></span>
</li><li>(b) "are in" \(\rightarrow\) <span class="extract"><span class="extract-syntax">preposition</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
plus <span class="extract"><span class="extract-syntax">time_period</span></span>
</li><li>(c) "at least three" \(\rightarrow\) <span class="extract"><span class="extract-syntax">determiner</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">quantifier</span></span> plus
literal number
</li></ul>
<li>(7) "increase the score by 7" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\)
<span class="extract"><span class="extract-syntax">invocation</span></span> (incorporating a <span class="extract"><span class="extract-syntax">phrase</span></span>)
</li><li>(8) Adjectives like the "closed" in "three closed doors" are identified
by name only, with little attempt to detect which sense is meant, so they
pass straight through the S-parser as pointers to <span class="extract"><span class="extract-syntax">adjective</span></span>
structures.
</li></ul>
<p class="commentary firstcommentary"><a id="SP6" class="paragraph-anchor"></a><b>&#167;6.  </b>To sum up further still, <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures are used to parse
simple nouns and imperative phrases, whereas other specialist structures
(<span class="extract"><span class="extract-syntax">preposition</span></span>, <span class="extract"><span class="extract-syntax">determiner</span></span>, etc.) are used to parse the hinges
which hold sentences together. Once parsed, individual excerpts tend to
have meanings which might be pointers to a bewildering range of structures
(<span class="extract"><span class="extract-syntax">instance</span></span>, <span class="extract"><span class="extract-syntax">quantifier</span></span>, <span class="extract"><span class="extract-syntax">binary_predicate</span></span>, <span class="extract"><span class="extract-syntax">adjective</span></span>,
etc.) but these pointers are held together inside the S-parser by a single
unifying construction: the <span class="extract"><span class="extract-syntax">parse_node</span></span>. And we will eventually turn the
whole thing into a <span class="extract"><span class="extract-syntax">specification</span></span> for the rest of Inform to use.
</p>

<nav role="progress"><div class="progresscontainer">
    <ul class="progressbar"><li class="progressprev"><a href="9-ef.html">&#10094;</a></li><li class="progresschapter"><a href="P-wtmd.html">P</a></li><li class="progresschapter"><a href="1-cm.html">1</a></li><li class="progresschapter"><a href="2-up.html">2</a></li><li class="progresschapter"><a href="3-bv.html">3</a></li><li class="progresschapter"><a href="4-dlr.html">4</a></li><li class="progresschapter"><a href="5-rpt.html">5</a></li><li class="progresschapter"><a href="6-lp.html">6</a></li><li class="progresschapter"><a href="7-am.html">7</a></li><li class="progresschapter"><a href="8-ptu.html">8</a></li><li class="progresschapter"><a href="9-ef.html">9</a></li><li class="progresscurrentchapter">10</li><li class="progresscurrent">its</li><li class="progresssection"><a href="10-aots.html">aots</a></li><li class="progresssection"><a href="10-pl.html">pl</a></li><li class="progresssection"><a href="10-cad.html">cad</a></li><li class="progresssection"><a href="10-teav.html">teav</a></li><li class="progresssection"><a href="10-varc.html">varc</a></li><li class="progresssection"><a href="10-cap.html">cap</a></li><li class="progresschapter"></li><li class="progresschapter"><a href="12-terr.html">12</a></li><li class="progresschapter"><a href="13-kak.html">13</a></li><li class="progresschapter"><a href="14-sp.html">14</a></li><li class="progresschapter"><a href="15-pr.html">15</a></li><li class="progresschapter"><a href="16-is.html">16</a></li><li class="progresschapter"><a href="17-tl.html">17</a></li><li class="progresschapter"><a href="18-lc.html">18</a></li><li class="progresschapter"><a href="19-tc.html">19</a></li><li class="progresschapter"><a href="20-eq.html">20</a></li><li class="progresschapter"><a href="21-rl.html">21</a></li><li class="progresschapter"><a href="22-itp.html">22</a></li><li class="progresschapter"><a href="23-ad.html">23</a></li><li class="progresschapter"><a href="24-lv.html">24</a></li><li class="progresschapter"><a href="25-in.html">25</a></li><li class="progresschapter"><a href="26-fc.html">26</a></li><li class="progresschapter"><a href="27-hr.html">27</a></li><li class="progressnext"><a href="10-aots.html">&#10095;</a></li></ul></div>
</nav><!--End of weave-->

		</main>
	</body>
</html>