mirror of
https://github.com/ganelson/inform.git
synced 2024-07-08 10:04:21 +03:00
288 lines
23 KiB
HTML
288 lines
23 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>Introduction to Semantics</title>
|
|
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
|
|
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<script>
|
|
MathJax = {
|
|
tex: {
|
|
inlineMath: '$', '$'], ['\\(', '\\)'
|
|
},
|
|
svg: {
|
|
fontCache: 'global'
|
|
}
|
|
};
|
|
</script>
|
|
<script type="text/javascript" id="MathJax-script" async
|
|
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
|
|
</script>
|
|
|
|
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
|
|
</head>
|
|
<body class="commentary-font">
|
|
<nav role="navigation">
|
|
<h1><a href="../index.html">
|
|
<img src="../docs-assets/Inform.png" height=72">
|
|
</a></h1>
|
|
<ul><li><a href="../compiler.html">compiler tools</a></li>
|
|
<li><a href="../other.html">other tools</a></li>
|
|
<li><a href="../extensions.html">extensions and kits</a></li>
|
|
<li><a href="../units.html">unit test tools</a></li>
|
|
</ul><h2>Compiler Webs</h2><ul>
|
|
<li><a href="../inbuild/index.html">inbuild</a></li>
|
|
<li><a href="../inform7/index.html">inform7</a></li>
|
|
<li><a href="../inter/index.html">inter</a></li>
|
|
</ul><h2>Inbuild Modules</h2><ul>
|
|
<li><a href="../supervisor-module/index.html">supervisor</a></li>
|
|
</ul><h2>Inform7 Modules</h2><ul>
|
|
<li><a href="index.html"><span class="selectedlink">core</span></a></li>
|
|
<li><a href="../if-module/index.html">if</a></li>
|
|
<li><a href="../multimedia-module/index.html">multimedia</a></li>
|
|
<li><a href="../index-module/index.html">index</a></li>
|
|
</ul><h2>Inter Modules</h2><ul>
|
|
<li><a href="../bytecode-module/index.html">bytecode</a></li>
|
|
<li><a href="../building-module/index.html">building</a></li>
|
|
<li><a href="../codegen-module/index.html">codegen</a></li>
|
|
</ul><h2>Services</h2><ul>
|
|
<li><a href="../arch-module/index.html">arch</a></li>
|
|
<li><a href="../calculus-module/index.html">calculus</a></li>
|
|
<li><a href="../html-module/index.html">html</a></li>
|
|
<li><a href="../inflections-module/index.html">inflections</a></li>
|
|
<li><a href="../kinds-module/index.html">kinds</a></li>
|
|
<li><a href="../linguistics-module/index.html">linguistics</a></li>
|
|
<li><a href="../problems-module/index.html">problems</a></li>
|
|
<li><a href="../syntax-module/index.html">syntax</a></li>
|
|
<li><a href="../words-module/index.html">words</a></li>
|
|
<li><a href="../../../inweb/docs/foundation-module/index.html">foundation</a></li>
|
|
|
|
</ul>
|
|
</nav>
|
|
<main role="main">
|
|
<!--Weave of 'Introduction to Semantics' generated by Inweb-->
|
|
<div class="breadcrumbs">
|
|
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../compiler.html">Inform7 Modules</a></li><li><a href="index.html">core</a></li><li><a href="index.html#10">Chapter 10: The S-Parser</a></li><li><b>Introduction to Semantics</b></li></ul></div>
|
|
<p class="purpose">A general introduction to the S-parser and the data structures it makes use of.</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP1" class="paragraph-anchor"></a><b>§1. </b>At this point, the text read in by Inform is now a stream of words, each of
|
|
which is identified by a pointer to a <span class="extract"><span class="extract-syntax">vocabulary_entry</span></span> structure in its
|
|
dictionary. The words are numbered upwards from 0, and we refer to any
|
|
contiguous run of words as an "excerpt", often writing <span class="extract"><span class="extract-syntax">(w1, w2)</span></span> to mean
|
|
the text starting with word <span class="extract"><span class="extract-syntax">w1</span></span> and continuing to word <span class="extract"><span class="extract-syntax">w2</span></span>. The stream
|
|
of words has been divided further into sentences.
|
|
</p>
|
|
|
|
<p class="commentary">Inform has two mechanisms for making sense of this text, the A-parser and
|
|
the S-parser.
|
|
</p>
|
|
|
|
<ul class="items"><li>(A) A stands for "assertion". For instance, "Two men are in Verona." is
|
|
an assertion, telling Inform that at the start of play there are to be two
|
|
previously unknown men and that they begin in the room called Verona. The
|
|
A-parser handles entire sentences.
|
|
</li><li>(S) S stands for "semantics", the study of how already-understood
|
|
meanings correspond to excerpts of text within a sentence. The S-parser
|
|
handles anything from tiny excerpts like "6" through noun phrases such as
|
|
"Verona" to complicated expressions like "the number of men in Verona".
|
|
</li></ul>
|
|
<p class="commentary">There are many similarities between the A-parser and the S-parser, partly
|
|
because A makes use of S, but also because they contain parallel mechanisms
|
|
which handle verbs and prepositions similarly. But there are also many
|
|
differences. The A-parser will accept "On the dressing table is an amber
|
|
comb." even if table and comb have never been mentioned before, whereas
|
|
the S-parser can only recognise meanings already defined. On the other
|
|
hand, the S-parser will accept conditions like the one in "if there are
|
|
fewer than 8 men in Verona, ..." whereas the A-parser would reject
|
|
the assertion "There are fewer than 8 men in Verona." as being too vague
|
|
to act upon. Similarly, the A-parser works only in the present tense,
|
|
whereas the S-parser can handle the past and perfect tenses. (Neither
|
|
can handle any future tenses, since a computer cannot either control or
|
|
definitely predict the future.)
|
|
</p>
|
|
|
|
<p class="commentary">The A-parser works by applying the S-parser to text at <span class="extract"><span class="extract-syntax">parse_node</span></span> structures
|
|
in the parse tree. So we will build the S-parser first, which won't involve
|
|
the parse tree at all. We will then go back to the parse tree to write the
|
|
A-parser.
|
|
</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP2" class="paragraph-anchor"></a><b>§2. </b>The S-parser is similar to the expression parser in a regular compiler. It
|
|
is in some ways simpler because natural language tends not to form complex
|
|
formulae, but in other ways more complicated, because performance issues
|
|
are very significant when comparing excerpts of text, and because there are
|
|
many more ambiguities to resolve.
|
|
</p>
|
|
|
|
<p class="commentary">Our aim is to turn any excerpt into a <span class="extract"><span class="extract-syntax">specification</span></span> structure inside
|
|
Inform. This is a universal holder for both values and descriptions of
|
|
values, where "value" is interpreted very broadly. It is usually too
|
|
difficult to go directly from text to a <span class="extract"><span class="extract-syntax">specification</span></span>, so we use
|
|
a two-stage process:
|
|
</p>
|
|
|
|
<ul class="items"><li>(1) parse the text to a <span class="extract"><span class="extract-syntax">parse_node</span></span> which holds all possible interpretations
|
|
of it, and then
|
|
</li><li>(2) convert the most likely-looking interpretation(s) to a <span class="extract"><span class="extract-syntax">specification</span></span>.
|
|
</li></ul>
|
|
<p class="commentary">Thus <span class="extract"><span class="extract-syntax">parse_node</span></span> structures are private to the S-parser, whereas
|
|
<span class="extract"><span class="extract-syntax">specification</span></span> structures appear all over Inform.
|
|
</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP3" class="paragraph-anchor"></a><b>§3. </b>Consider the following contrived example.
|
|
</p>
|
|
|
|
<blockquote>
|
|
<p>if Mr Fitzwilliam Darcy was carrying at least three things which are in the box, increase the score by 7;</p>
|
|
</blockquote>
|
|
|
|
<ul class="items"><li>(1) There are <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures for "Mr Fitzwilliam Darcy" and
|
|
"Mr Bingham's box", which hold the wording needed to refer to these objects.
|
|
In parsing the example sentence, we connect these structures to the excerpts
|
|
"Mr Fitzwilliam Darcy" — an exact match — and "the box" — an
|
|
abbreviated one. The <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures contain pointers to
|
|
further <span class="extract"><span class="extract-syntax">instance</span></span> structures which represent the identities of these
|
|
two tangible things, that is, Darcy and his friend's box.
|
|
</li><li>(2) Another <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> holds the name "score" and points it to a
|
|
<span class="extract"><span class="extract-syntax">nonlocal_variable</span></span> structure for the relevant global variable.
|
|
</li><li>(3) And a further <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> holds the name "things" and points it
|
|
to a <span class="extract"><span class="extract-syntax">instance</span></span> structure representing the common identity shared by
|
|
all things. Inform treats individual, tangible objects such as Mr Darcy
|
|
and intangible categories of objects such as thing by representing both
|
|
with the same structure — <span class="extract"><span class="extract-syntax">instance</span></span>. This mirrors the way that common
|
|
and proper nouns are grammatically quite similar in natural language.
|
|
</li><li>(4) The final noun phrase in the above example is "7". There's no
|
|
<span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure for this — it would be insanely inefficient
|
|
to make such things — and instead it is parsed directly as a "literal",
|
|
being converted immediately into a <span class="extract"><span class="extract-syntax">specification</span></span>, of which more below.
|
|
</li><li>(5) Another <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure holds the wording "if ... , ..."
|
|
and is connected to a <span class="extract"><span class="extract-syntax">phrase</span></span> structure for the "if" construction. Here,
|
|
the wording includes flexible-sized gaps (written "...") where excerpts
|
|
should appear: the S-parser will only recognise this if the excerpts make
|
|
sense in themselves. The combination of a <span class="extract"><span class="extract-syntax">phrase</span></span> plus the results of parsing
|
|
these gaps is stored in a structure called an <span class="extract"><span class="extract-syntax">invocation</span></span>.
|
|
</li><li>(6) In the example, the first gap is filled by "Mr Fitzwilliam Darcy was
|
|
carrying at least three things which are in the box", which the S-parser
|
|
detects as being a condition. This is translated into a <span class="extract"><span class="extract-syntax">pcalc_prop</span></span>
|
|
structure — a predicate-calculus proposition, that is, which is a
|
|
representation in mathematical logic of the meaning of this sentence.
|
|
<ul class="items"><li>(a) "was carrying" is recognised as matching wording in a <span class="extract"><span class="extract-syntax">verb_usage</span></span>
|
|
structure. This points to an underlying relation, stored in a <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
|
|
structure, but combines it with an indication of tense stored in a <span class="extract"><span class="extract-syntax">time_period</span></span>.
|
|
Here the <span class="extract"><span class="extract-syntax">binary_predicate</span></span> is the carrying relation and the <span class="extract"><span class="extract-syntax">time_period</span></span>
|
|
is the past tense. (The term "binary predicate" comes from logic once
|
|
again; an Inform author would call the same concept a "relation".)
|
|
</li><li>(b) "are in" is recognised as a usage of the verb "to be" plus "in",
|
|
which matches the wording of a <span class="extract"><span class="extract-syntax">preposition</span></span> structure. Here the tense
|
|
derives only from the "to be" part: which is "are", so the <span class="extract"><span class="extract-syntax">time_period</span></span>
|
|
parsed is the present tense. This makes the <span class="extract"><span class="extract-syntax">preposition</span></span> a simpler
|
|
business than the <span class="extract"><span class="extract-syntax">verb_usage</span></span> structure — it only needs to refer to the
|
|
underlying meaning, which is once again a <span class="extract"><span class="extract-syntax">binary_predicate</span></span> structure,
|
|
the one for the containment relation.
|
|
</li><li>(c) "which" is a word introducing a relative clause. A sentence can
|
|
only have one primary verb, which in this example is "was carrying".
|
|
But other verbs can exist in relative clauses, and the effect of writing
|
|
"X which V Y" qualifies X by saying that any noun N matching X must also
|
|
satisfy "N V Y", where V is the verb. The relative-clause construction
|
|
is an example of syntax built directly into the S-parser. It doesn't come
|
|
from any data structures, like the meanings of "score" or "Mr Fitzwilliam
|
|
Darcy".
|
|
</li><li>(d) "at least three things" is an example of a noun phrase which has a
|
|
head and a tail. The head, "at least three", is recognised as matching
|
|
the wording in a <span class="extract"><span class="extract-syntax">determiner</span></span> structure, "at least (number)", together
|
|
with the literal number 3. Once again, the <span class="extract"><span class="extract-syntax">determiner</span></span> describes textual
|
|
appearances; it points to another structure, a <span class="extract"><span class="extract-syntax">quantifier</span></span>, to hold the
|
|
meaning. This is another logical term, and Inform's debugging log would
|
|
write the resulting term as <span class="extract"><span class="extract-syntax">Card>=3</span></span> ("cardinality of at least 3").
|
|
Inform only uses <span class="extract"><span class="extract-syntax">determiner</span></span> structures when they quantify, that is, when
|
|
they talk about a possible range of objects rather than a single item.
|
|
A grammar of English would probably say that the "the" in "the box" is
|
|
also grammatically a determiner, but it doesn't get a <span class="extract"><span class="extract-syntax">determiner</span></span> structure
|
|
in Inform.
|
|
</li></ul>
|
|
<li>(7) The second gap in the "if ... , ..." excerpt is "increase the score
|
|
by 1", which the S-parser detects as a use of yet another <span class="extract"><span class="extract-syntax">phrase</span></span>, this
|
|
time referred to by the <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structure for "increase ... by
|
|
...". It's worth noting that the S-parser doesn't check types, so it would
|
|
have been happy to match "increase 2 by 1" — an impossibility. The
|
|
S-parser's job is to find all possible meanings at a textual level,
|
|
sometimes producing a list of options: the type-checker will winnow these
|
|
out later on.
|
|
</li></ul>
|
|
<p class="commentary">So parsing the text "if Mr Fitzwilliam Darcy was carrying at least three
|
|
things which are in the box, increase the score by 7" is going to result in
|
|
a mass of pointers to different structures, and we need an umbrella structure
|
|
to hold this mass together. This is what the <span class="extract"><span class="extract-syntax">parse_node</span></span> is for, but as
|
|
explained above, it's really only an intermediate state used while the S-parser
|
|
is working.
|
|
</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP4" class="paragraph-anchor"></a><b>§4. </b>One obvious category of word is missing: there are no adjectives in this
|
|
example. Inform currently supports many sorts of adjective — either/or
|
|
properties, such as "open"; values of kinds of value which coincide with
|
|
properties, such as "green" as a value of a "colour"; and adjectives
|
|
defined with conditions or full phrases, such as "invisible" resulting
|
|
from "Definition: a thing is invisible if...".
|
|
</p>
|
|
|
|
<p class="commentary">The S-parser treats all adjectives alike — more or less just as names.
|
|
This is because "open" may mean one thing for containers and another
|
|
for scenes, for example. The identification of an adjective's name with
|
|
its set of possible meanings is via a structure called <span class="extract"><span class="extract-syntax">adjective</span></span>.
|
|
</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP5" class="paragraph-anchor"></a><b>§5. </b>To sum up. If we write "text" \(\rightarrow\) structure used for parsing
|
|
\(\rightarrow\) structure used to hold meaning, our example is parsed like so:
|
|
</p>
|
|
|
|
<ul class="items"><li>(1) "Mr Fitzwilliam Darcy" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">instance</span></span>
|
|
</li><li>(2) "the score" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">nonlocal_variable</span></span>
|
|
</li><li>(3) "things" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">instance</span></span>
|
|
</li><li>(4) "7" \(\rightarrow\) ...none... \(\rightarrow\) <span class="extract"><span class="extract-syntax">specification</span></span>
|
|
</li><li>(5) "if Mr Fitzwilliam Darcy was carrying at least three things which are in the
|
|
box, increase the score by 7" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\)
|
|
<span class="extract"><span class="extract-syntax">invocation</span></span> (incorporating a <span class="extract"><span class="extract-syntax">phrase</span></span>)
|
|
</li><li>(6) "Mr Fitzwilliam Darcy was carrying at least three things which are in the box"
|
|
\(\rightarrow\) ...many... \(\rightarrow\) <span class="extract"><span class="extract-syntax">pcalc_prop</span></span>
|
|
<ul class="items"><li>(a) "was carrying" \(\rightarrow\) <span class="extract"><span class="extract-syntax">verb_usage</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
|
|
plus <span class="extract"><span class="extract-syntax">time_period</span></span>
|
|
</li><li>(b) "are in" \(\rightarrow\) <span class="extract"><span class="extract-syntax">preposition</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">binary_predicate</span></span>
|
|
plus <span class="extract"><span class="extract-syntax">time_period</span></span>
|
|
</li><li>(c) "at least three" \(\rightarrow\) <span class="extract"><span class="extract-syntax">determiner</span></span> \(\rightarrow\) <span class="extract"><span class="extract-syntax">quantifier</span></span> plus
|
|
literal number
|
|
</li></ul>
|
|
<li>(7) "increase the score by 7" \(\rightarrow\) <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> \(\rightarrow\)
|
|
<span class="extract"><span class="extract-syntax">invocation</span></span> (incorporating a <span class="extract"><span class="extract-syntax">phrase</span></span>)
|
|
</li><li>(8) Adjectives like the "closed" in "three closed doors" are identified
|
|
by name only, with little attempt to detect which sense is meant, so they
|
|
pass straight through the S-parser as pointers to <span class="extract"><span class="extract-syntax">adjective</span></span>
|
|
structures.
|
|
</li></ul>
|
|
<p class="commentary firstcommentary"><a id="SP6" class="paragraph-anchor"></a><b>§6. </b>To sum up further still, <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> structures are used to parse
|
|
simple nouns and imperative phrases, whereas other specialist structures
|
|
(<span class="extract"><span class="extract-syntax">preposition</span></span>, <span class="extract"><span class="extract-syntax">determiner</span></span>, etc.) are used to parse the hinges
|
|
which hold sentences together. Once parsed, individual excerpts tend to
|
|
have meanings which might be pointers to a bewildering range of structures
|
|
(<span class="extract"><span class="extract-syntax">instance</span></span>, <span class="extract"><span class="extract-syntax">quantifier</span></span>, <span class="extract"><span class="extract-syntax">binary_predicate</span></span>, <span class="extract"><span class="extract-syntax">adjective</span></span>,
|
|
etc.) but these pointers are held together inside the S-parser by a single
|
|
unifying construction: the <span class="extract"><span class="extract-syntax">parse_node</span></span>. And we will eventually turn the
|
|
whole thing into a <span class="extract"><span class="extract-syntax">specification</span></span> for the rest of Inform to use.
|
|
</p>
|
|
|
|
<nav role="progress"><div class="progresscontainer">
|
|
<ul class="progressbar"><li class="progressprev"><a href="9-ef.html">❮</a></li><li class="progresschapter"><a href="P-wtmd.html">P</a></li><li class="progresschapter"><a href="1-cm.html">1</a></li><li class="progresschapter"><a href="2-up.html">2</a></li><li class="progresschapter"><a href="3-bv.html">3</a></li><li class="progresschapter"><a href="4-dlr.html">4</a></li><li class="progresschapter"><a href="5-rpt.html">5</a></li><li class="progresschapter"><a href="6-lp.html">6</a></li><li class="progresschapter"><a href="7-am.html">7</a></li><li class="progresschapter"><a href="8-ptu.html">8</a></li><li class="progresschapter"><a href="9-ef.html">9</a></li><li class="progresscurrentchapter">10</li><li class="progresscurrent">its</li><li class="progresssection"><a href="10-aots.html">aots</a></li><li class="progresssection"><a href="10-pl.html">pl</a></li><li class="progresssection"><a href="10-cad.html">cad</a></li><li class="progresssection"><a href="10-teav.html">teav</a></li><li class="progresssection"><a href="10-varc.html">varc</a></li><li class="progresssection"><a href="10-cap.html">cap</a></li><li class="progresschapter"></li><li class="progresschapter"><a href="12-terr.html">12</a></li><li class="progresschapter"><a href="13-kak.html">13</a></li><li class="progresschapter"><a href="14-sp.html">14</a></li><li class="progresschapter"><a href="15-pr.html">15</a></li><li class="progresschapter"><a href="16-is.html">16</a></li><li class="progresschapter"><a href="17-tl.html">17</a></li><li class="progresschapter"><a href="18-lc.html">18</a></li><li class="progresschapter"><a href="19-tc.html">19</a></li><li class="progresschapter"><a href="20-eq.html">20</a></li><li class="progresschapter"><a href="21-rl.html">21</a></li><li class="progresschapter"><a href="22-itp.html">22</a></li><li class="progresschapter"><a href="23-ad.html">23</a></li><li class="progresschapter"><a href="24-lv.html">24</a></li><li class="progresschapter"><a href="25-in.html">25</a></li><li class="progresschapter"><a href="26-fc.html">26</a></li><li class="progresschapter"><a href="27-hr.html">27</a></li><li class="progressnext"><a href="10-aots.html">❯</a></li></ul></div>
|
|
</nav><!--End of weave-->
|
|
|
|
</main>
|
|
</body>
|
|
</html>
|
|
|