mirror of
https://github.com/ganelson/inform.git
synced 2024-07-05 08:34:22 +03:00
962 lines
91 KiB
HTML
962 lines
91 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>1/lm</title>
|
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
<link href="../inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
</head>
|
|
<body>
|
|
<nav role="navigation">
|
|
<h1><a href="../webs.html">Sources</a></h1>
|
|
<ul>
|
|
<li><a href="../compiler.html"><b>compiler tools</b></a></li>
|
|
<li><a href="../other.html">other tools</a></li>
|
|
<li><a href="../extensions.html">extensions and kits</a></li>
|
|
<li><a href="../units.html">unit test tools</a></li>
|
|
</ul>
|
|
<h2>Compiler Webs</h2>
|
|
<ul>
|
|
<li><a href="../inbuild/index.html">inbuild</a></li>
|
|
<li><a href="../inform7/index.html">inform7</a></li>
|
|
<li><a href="../inter/index.html">inter</a></li>
|
|
</ul>
|
|
<h2>Inbuild Modules</h2>
|
|
<ul>
|
|
<li><a href="../inbuild-module/index.html">inbuild</a></li>
|
|
<li><a href="../arch-module/index.html">arch</a></li>
|
|
<li><a href="../words-module/index.html">words</a></li>
|
|
<li><a href="../syntax-module/index.html">syntax</a></li>
|
|
<li><a href="../html-module/index.html">html</a></li>
|
|
</ul>
|
|
<h2>Inform7 Modules</h2>
|
|
<ul>
|
|
<li><a href="../core-module/index.html">core</a></li>
|
|
<li><a href="../problems-module/index.html">problems</a></li>
|
|
<li><a href="../inflections-module/index.html">inflections</a></li>
|
|
<li><a href="../linguistics-module/index.html">linguistics</a></li>
|
|
<li><a href="../kinds-module/index.html">kinds</a></li>
|
|
<li><a href="../if-module/index.html">if</a></li>
|
|
<li><a href="../multimedia-module/index.html">multimedia</a></li>
|
|
<li><a href="../index-module/index.html">index</a></li>
|
|
</ul>
|
|
<h2>Inter Modules</h2>
|
|
<ul>
|
|
<li><a href="../inter-module/index.html">inter</a></li>
|
|
<li><a href="../building-module/index.html">building</a></li>
|
|
<li><a href="../codegen-module/index.html">codegen</a></li>
|
|
</ul>
|
|
<h2>Foundation</h2>
|
|
<ul>
|
|
<li><a href="../../../inweb/docs/foundation-module/index.html">foundation</a></li>
|
|
</ul>
|
|
|
|
|
|
</nav>
|
|
<main role="main">
|
|
|
|
<!--Weave of '2/em' generated by 7-->
|
|
<ul class="crumbs"><li><a href="../webs.html">Source</a></li><li><a href="../compiler.html">Compiler Modules</a></li><li><a href="index.html">linguistics</a></li><li><a href="index.html#2">Chapter 2: Excerpts</a></li><li><b>Excerpt Meanings</b></li></ul><p class="purpose">To register and deregister meanings for excerpts of text as nouns, adjectives, imperative phrases and other usages.</p>
|
|
|
|
<ul class="toc"><li><a href="#SP1">§1. Excerpt meanings</a></li><li><a href="#SP3">§3. Meaning codes</a></li><li><a href="#SP7">§7. Creating EMs</a></li><li><a href="#SP9">§9. Debugging log</a></li><li><a href="#SP10">§10. Hashing excerpts</a></li><li><a href="#SP13">§13. EM Listing</a></li><li><a href="#SP15">§15. Registration</a></li><li><a href="#SP15_3_2">§15.3.2. Meaning from assemblages</a></li><li><a href="#SP16">§16. Errors</a></li></ul><hr class="tocbar">
|
|
|
|
<p class="inwebparagraph"><a id="SP1"></a><b>§1. Excerpt meanings. </b>Most compilers keep a "symbols table" of identifier names and what
|
|
meanings they have: for instance, when compiling Inform, GCC's symbols
|
|
table records that <code class="display"><span class="extract">problem_count</span></code> is the name of an integer variable and
|
|
<code class="display"><span class="extract">excerpt_meaning</span></code> of a defined type. This is usually stored so that a
|
|
new name can rapidly be checked to see if it matches one that is currently
|
|
known.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">In natural language we must similarly remember meanings of excerpts. (Recall
|
|
that an "excerpt" is a run of one or more adjacent words in the source text.)
|
|
Here we store just such a lexicon. We won't use this for every grammatical
|
|
category (determiners and verb forms are more efficiently stored elsewhere),
|
|
but otherwise it's a general grab-bag of meanings. Inform uses this data
|
|
structure to store (a) adjectives, (b) nouns and (c) imperative phrases
|
|
of the sort used to define rules. Examples include:
|
|
</p>
|
|
|
|
<blockquote>
|
|
<p>american dialect, say close bracket, player's command, open, Hall of Mirrors</p>
|
|
|
|
</blockquote>
|
|
|
|
<p class="inwebparagraph">Most compilers use a symbols table whose efficiency depends on the fact
|
|
that symbol names are relatively long strings (say, 8 or more units)
|
|
drawn from a small alphabet (say, the 37 letters, digits and the underscore).
|
|
But Inform has short symbols (typically one to three units) drawn from a
|
|
huge alphabet (say, all 5,000 different words found in the source text).
|
|
And we also need to parse in ways which a conventional compiler does not.
|
|
If C has registered the identifier <code class="display"><span class="extract">pink_martini</span></code> then it never needs to
|
|
notice <code class="display"><span class="extract">pnk_martin</span></code> as being related to it. But when Inform registers
|
|
"pink martini" as the name of an instance, it then has to spot that
|
|
either "pink" or "martini" alone might also refer to the same object.
|
|
So we are not going to use the conventional algorithms.
|
|
</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2"></a><b>§2. </b>We now define the <code class="display"><span class="extract">excerpt_meaning</span></code> data structure, which holds a single
|
|
entry in this super-dictionary. The text to be matched is specified as a
|
|
sequence of at least one, and at most 32, tokens: these can either be
|
|
pointers to specific vocabulary, or can be null, which implies that
|
|
arbitrary non-empty text can appear in the given position. (It is forbidden
|
|
for the token list to contain two nulls in a row.) For instance, the
|
|
token list:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<pre class="display">
|
|
<span class="plain">drink # milk #</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph">matches "drink more milk today and every day", but not "drink milk". (The
|
|
sharp symbol <code class="display"><span class="extract">#</span></code> is printed in place of a null token, both in this documentation
|
|
and in the debugging log.)
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Each excerpt meaning also comes with a hash code, which is automatically
|
|
generated from its token list, and a pointer to some structure.
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">enum</span> <span class="constant">TooLongName_LINERROR</span><span class="definitionkeyword"> from </span><span class="constant">1</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
|
|
<span class="definitionkeyword">define</span> <span class="constant">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain"> </span><span class="constant">32</span>
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">excerpt_meaning</span><span class="plain"> {</span>
|
|
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">meaning_code</span><span class="plain">; </span><span class="comment">what kind of meaning: a single MC, not a bitmap</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">general_pointer</span><span class="plain"> </span><span class="identifier">data</span><span class="plain">; </span><span class="comment">data structure being referred to</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_em_tokens</span><span class="plain">; </span><span class="comment">length of token list</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">em_tokens</span><span class="plain">[</span><span class="constant">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain">]; </span><span class="comment">token list</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">excerpt_hash</span><span class="plain">; </span><span class="comment">hash code generated from the token list</span>
|
|
<span class="identifier">MEMORY_MANAGEMENT</span>
|
|
<span class="plain">} </span><span class="reserved">excerpt_meaning</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure excerpt_meaning is accessed in 2/pe and here.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP3"></a><b>§3. Meaning codes. </b>These assign a context to a meaning, and so decide how the <code class="display"><span class="extract">data</span></code> pointer for
|
|
an excerpt meaning is to interpreted. For instance, "Persian carpet" might
|
|
have a meaning with code <code class="display"><span class="extract">NOUN_MC</span></code>.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Meaning codes are used in other contexts in Inform besides this one. There
|
|
are up to 31 of them and each is a distinct power of two; there is no
|
|
significance to their ordering. The point is that a signed integer (which
|
|
we know can hold values at least up to 2^{31}-1) can hold a bitmap
|
|
representing any subset of these meaning codes; for instance, <code class="display"><span class="extract">PROPERTY_MC
|
|
</span></code>+ TABLE_MC<code class="display"><span class="extract"> might mean "either a property name or a table name".
|
|
</span></code></p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4"></a><b>§4. </b>The <code class="display"><span class="extract">meaning_code</span></code> field of an <code class="display"><span class="extract">excerpt_meaning</span></code> is always exactly
|
|
one of the <code class="display"><span class="extract">*_MC</span></code> values. (It is never a bitmap combination.)
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">MISCELLANEOUS_MC</span><span class="plain"> </span><span class="constant">0x00000001</span><span class="plain"> </span><span class="comment">a grab-bag of other possible nouns</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">NOUN_MC</span><span class="plain"> </span><span class="constant">0x00000002</span><span class="plain"> </span><span class="comment">e.g., <code class="display"><span class="extract">upright chair</span></code></span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">ADJECTIVE_MC</span><span class="plain"> </span><span class="constant">0x00000004</span><span class="plain"> </span><span class="comment">e.g., <code class="display"><span class="extract">invisible</span></code></span>
|
|
</pre>
|
|
<p class="inwebparagraph"><a id="SP5"></a><b>§5. </b>Each word in our vocabulary will be annotated with this structure:
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">VOCABULARY_MEANING_INITIALISER</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::new_meaning</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">vocabulary_meaning</span><span class="plain"> {</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">kind</span><span class="plain"> *</span><span class="identifier">one_word_kind</span><span class="plain">; </span><span class="comment">ditto as a kind with single-word name</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">start_list</span><span class="plain">; </span><span class="comment">meanings starting with this</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">end_list</span><span class="plain">; </span><span class="comment">meanings ending with this</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">middle_list</span><span class="plain">; </span><span class="comment">meanings with this inside but at neither end</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">subset_list</span><span class="plain">; </span><span class="comment">meanings allowing subsets which include this</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">subset_list_length</span><span class="plain">; </span><span class="comment">number of meanings in the subset list</span>
|
|
<span class="plain">} </span><span class="reserved">vocabulary_meaning</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure vocabulary_meaning is accessed in 2/pe and here.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP6"></a><b>§6. </b>With the following initialiser:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">vocabulary_meaning</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::new_meaning</span><span class="plain">(</span><span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Kinds::Textual::parse_variable</span><span class="plain">(</span><span class="identifier">ve</span><span class="plain">)) </span><span class="identifier">ve</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain"> |= </span><span class="identifier">KIND_FAST_MC</span><span class="plain">;</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">ve</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">NUMBER_MC</span><span class="plain">) </span><span class="functiontext">Cardinals::mark_as_cardinal</span><span class="plain">(</span><span class="identifier">ve</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">ve</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">ORDINAL_MC</span><span class="plain">) </span><span class="functiontext">Cardinals::mark_as_ordinal</span><span class="plain">(</span><span class="identifier">ve</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">vocabulary_meaning</span><span class="plain"> </span><span class="identifier">vm</span><span class="plain">;</span>
|
|
<span class="identifier">vm</span><span class="plain">.</span><span class="element">start_list</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">; </span><span class="identifier">vm</span><span class="plain">.</span><span class="element">end_list</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">; </span><span class="identifier">vm</span><span class="plain">.</span><span class="element">middle_list</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">vm</span><span class="plain">.</span><span class="element">subset_list</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">; </span><span class="identifier">vm</span><span class="plain">.</span><span class="element">subset_list_length</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="identifier">vm</span><span class="plain">.</span><span class="element">one_word_kind</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">vm</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::new_meaning appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP7"></a><b>§7. Creating EMs. </b>The following makes a skeletal EM structure, with no token list or hash code
|
|
as yet.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="functiontext">ExcerptMeanings::new</span><span class="plain">(</span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">mc</span><span class="plain">, </span><span class="identifier">general_pointer</span><span class="plain"> </span><span class="identifier">data</span><span class="plain">) {</span>
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain"> = </span><span class="identifier">CREATE</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain">);</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">meaning_code</span><span class="plain"> = </span><span class="identifier">mc</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">data</span><span class="plain"> = </span><span class="identifier">data</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">excerpt_hash</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">em</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::new is used in <a href="#SP15">§15</a>, <a href="#SP15_3_2">§15.3.2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP8"></a><b>§8. </b>Access routines:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="identifier">general_pointer</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::data</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">) {</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">em</span><span class="plain">-></span><span class="identifier">data</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::data is used in 2/pe (<a href="2-pe.html#SP6">§6</a>), 3/adj (<a href="3-adj.html#SP6">§6</a>), 3/nns (<a href="3-nns.html#SP9">§9</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP9"></a><b>§9. Debugging log. </b>First to log a general bitmap made up from meaning codes:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::log</span><span class="plain">(</span><span class="identifier">OUTPUT_STREAM</span><span class="plain">, </span><span class="reserved">void</span><span class="plain"> *</span><span class="identifier">vem</span><span class="plain">) {</span>
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain"> = (</span><span class="reserved">excerpt_meaning</span><span class="plain"> *) </span><span class="identifier">vem</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) { </span><span class="identifier">LOG</span><span class="plain">(</span><span class="string">"<null-em>"</span><span class="plain">); </span><span class="reserved">return</span><span class="plain">; }</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"{"</span><span class="plain">);</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">>0) </span><span class="identifier">LOG</span><span class="plain">(</span><span class="string">" "</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == </span><span class="identifier">NULL</span><span class="plain">) { </span><span class="identifier">LOG</span><span class="plain">(</span><span class="string">"#"</span><span class="plain">); </span><span class="reserved">continue</span><span class="plain">; }</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"%V"</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">" = $N"</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">-></span><span class="element">meaning_code</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"}"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::log_all</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">;</span>
|
|
<span class="identifier">LOOP_OVER</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">, </span><span class="reserved">excerpt_meaning</span><span class="plain">)</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"%02d: %08x $M\n"</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">++, (</span><span class="identifier">pointer_sized_int</span><span class="plain">) </span><span class="identifier">em</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::log is used in 1/lm (<a href="1-lm.html#SP3_5">§3.5</a>).</p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::log_all is used in 2/pe (<a href="2-pe.html#SP7">§7</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP10"></a><b>§10. Hashing excerpts. </b>For excerpts <code class="display"><span class="extract">(w1, w2)</span></code>, we need a form of hash function which makes it
|
|
easy to test whether the words in one excerpt can all be found in another,
|
|
or to be more exact whether
|
|
</p>
|
|
|
|
<p class="inwebparagraph"> { I_j| w_1<= j<= w_2} ⊆
|
|
{ I_j| w_3<= j<= w_4}
|
|
</p>
|
|
|
|
<p class="inwebparagraph">where I_n is the identity of word n. As with all hash algorithms, we do
|
|
not need to guarantee a positive match, only a negative, so we can throw
|
|
away a lot of information. And we also want a hash function which makes it
|
|
easy to test whether an excerpt contains any of the literals.
|
|
</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11"></a><b>§11. </b>There are two sources of text which we might want to hash in this way:
|
|
first, actual excerpts found in the source text. These are not very
|
|
expensive to calculate, but every ounce of speed helps here, so we cache
|
|
the most recent.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The hash generated this way is an arbitrary bitmap of bits 1 to 30, with
|
|
bits 31 and 32 left clear.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">cached_hash_w1</span><span class="plain"> = -2, </span><span class="identifier">cached_hash_w2</span><span class="plain"> = -2, </span><span class="identifier">cached_value</span><span class="plain">;</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::hash_code</span><span class="plain">(</span><span class="identifier">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">w1</span><span class="plain"> = </span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">w2</span><span class="plain"> = </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">h</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">w1</span><span class="plain"> == </span><span class="identifier">cached_hash_w1</span><span class="plain">) && (</span><span class="identifier">w2</span><span class="plain"> == </span><span class="identifier">cached_hash_w2</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">cached_value</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=</span><span class="identifier">w1</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"><=</span><span class="identifier">w2</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain">) </span><<span class="cwebmacro">Allow this vocabulary entry to contribute to the excerpt's hash code</span> <span class="cwebmacronumber">11.2</span>><span class="character">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">h</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::hash_code is used in 2/pe (<a href="2-pe.html#SP5">§5</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_1"></a><b>§11.1. </b>Second, when a new excerpt meaning is to be registered, we want to hash
|
|
code its token list. But only some of the tokens are vocabulary entries,
|
|
while others instead represent gaps where arbitrary text can appear (referred
|
|
to with a null pointer). Note that we simply ignore that gaps when hashing,
|
|
that is, we produce the same hash as we would if the gaps were not there at
|
|
all.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The hash generated this way is an arbitrary bitmap of bits 1 to 31, with
|
|
bit 32 left clear. Bit 31 is set, as a special case, for excerpts in the
|
|
context of text substitutions which begin with a word known to exist, and
|
|
with differing meanings, in two differently cased forms: this is how "[the
|
|
noun]" is distinguished from "[The noun]". (The lower 30 bits have the
|
|
same meaning as in the first case above.)
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">CAPITALISED_VARIANT_FORM</span><span class="plain"> (1 << </span><span class="constant">30</span><span class="plain">)</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::hash_code_from_token_list</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">h</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> == </span><span class="constant">0</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Empty text when registering"</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> >= </span><span class="constant">1</span><span class="plain">) && (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0])) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">lcf</span><span class="plain"> = </span><span class="identifier">Vocabulary::get_lower_case_form</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0]);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">lcf</span><span class="plain">) {</span>
|
|
<span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="constant">CAPITALISED_VARIANT_FORM</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0] = </span><span class="identifier">lcf</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain">) </span><<span class="cwebmacro">Allow this vocabulary entry to contribute to the excerpt's hash code</span> <span class="cwebmacronumber">11.2</span>><span class="character">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">excerpt_hash</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::hash_code_from_token_list is used in <a href="#SP13_1">§13.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_2"></a><b>§11.2. </b>Now each vocabulary entry <code class="display"><span class="extract">v</span></code>, i.e., each distinct word identity, itself has
|
|
a hash code to identify it. These are stored in <code class="display"><span class="extract">v->hash</span></code> and, except for
|
|
literals, are more or less evenly distributed in about the range 0 to 1000.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The contribution made by a single word's individual hash to the bitmap hash
|
|
for the whole excerpt is as follows.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Allow this vocabulary entry to contribute to the excerpt's hash code</span> <span class="cwebmacronumber">11.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">NUMBER_MC</span><span class="plain">) </span><span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="constant">1</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">TEXT_MC</span><span class="plain">) </span><span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="constant">2</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">I6_MC</span><span class="plain">) </span><span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="constant">4</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | (8 << ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">hash</span><span class="plain">) % </span><span class="constant">27</span><span class="plain">));</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>, <a href="#SP11_1">§11.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP12"></a><b>§12. </b>To sum up: the excerpt hash is a bitmap indicating what categories of
|
|
words are present in the excerpt. It ignores "gaps" in token lists, and
|
|
it ignores the order of the words and repetitions. The three least
|
|
significant bits indicate whether numbers, text or I6 verbatims are
|
|
present, and the next 27 bits indicate the presence of other words: e.g.,
|
|
bit 4 indicates that a word with hash code 0, 27, 54, ..., is present, and
|
|
so on. Bit 31, which is used only for token lists of excerpt meanings,
|
|
marks that an excerpt is a variant form whose first word must be
|
|
capitalised in order for it to match. Bit 32 is always left blank (for
|
|
superstitious reasons to do with the sign bit and differences between
|
|
platforms in handling signed bit shifts).
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The result is not a tremendously good hashing number, since it generally
|
|
produces a sparse bitmap, so that the variety is not as great as might be
|
|
thought. But it is optimised for the trickiest parsing cases where the
|
|
rewards of saving unnecessary tests are greatest.
|
|
</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13"></a><b>§13. EM Listing. </b>We are clearly not going to store the excerpt meanings in a hash table
|
|
keyed by the hash values of excerpts — with hash values as large as
|
|
2^{31}-1, that would be practically impossible.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Instead we key using the actual words. Each vocabulary entry has four
|
|
linked lists of EMs: its subset list, its start list, its middle list,
|
|
and its end list.
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<ul class="items"><li>(a) If an EM needs to allow parsing as a subset, it must be placed in the
|
|
subset list of every word. For instance, "buttress against cathedral
|
|
wall" registered under the code <code class="display"><span class="extract">NOUN_MC</span></code> would be listed
|
|
in the subset lists of "buttress", "against", "cathedral" and "wall".
|
|
</li></ul>
|
|
<ul class="items"><li>(b) Otherwise it is placed in only one list:
|
|
</li></ul>
|
|
<ul class="items"><ul class="items"><li>(b1) If the token list consists only of a single gap <code class="display"><span class="extract">#</span></code>, we must be
|
|
registering a "say" phrase to say a value. (There is one of these for
|
|
each kind of value.) This meaning is listed under a special <code class="display"><span class="extract">blank_says_p</span></code>
|
|
list, which is not attached to any vocabulary entry.
|
|
</li><li>(b2) Otherwise, if the first token is not a <code class="display"><span class="extract">#</span></code> gap, it goes into the
|
|
start list for the first token's word: for instance, <code class="display"><span class="extract">award # points</span></code> joins
|
|
the start list for "award".
|
|
</li><li>(b3) Otherwise, if the last token is not a <code class="display"><span class="extract">#</span></code> gap, it goes into the end
|
|
list for the last token's word: for instance, <code class="display"><span class="extract"># in # from now</span></code> joins the
|
|
end list for "now".
|
|
</li><li>(b4) Otherwise, it goes into the middle list of the word for the leftmost
|
|
token which is not a <code class="display"><span class="extract">#</span></code>: for instance, <code class="display"><span class="extract"># plus #</span></code> joins the middle list for
|
|
"plus".
|
|
</li></ul>
|
|
</li></ul>
|
|
<p class="inwebparagraph">Since no token lists of two or more consecutive <code class="display"><span class="extract">#</span></code>s cannot exist, this exhausts the possibilities.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Outside of subset mode, we will then test a given excerpt <code class="display"><span class="extract">(w1, w2)</span></code> in the
|
|
source text against all possible meanings by checking the start list for <code class="display"><span class="extract">w1</span></code>,
|
|
the end list for <code class="display"><span class="extract">w2</span></code> and the middle list for every one of <code class="display"><span class="extract">(w1+1, w2-1)</span></code>.
|
|
Because of this:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<ul class="items"><li>(i) Performance suffers if lists for individual words become unbalanced
|
|
in size. This is why we register Unicode translations as "white chess
|
|
knight" rather than "Unicode white chess knight", and so on; the
|
|
alternative would be a stupendously long start list for "unicode".
|
|
</li><li>(ii) Middle lists are tested far more often than start or end lists, so
|
|
we should keep them as small as possible. This is why (b4) above is our last
|
|
resort; happily phrases both starting and ending with <code class="display"><span class="extract">#</span></code> are uncommon.
|
|
</li></ul>
|
|
<pre class="display">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">blank_says_p</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::register_em</span><span class="plain">(</span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">) {</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="identifier">ExParser::warn_expression_cache</span><span class="plain">(); </span><span class="comment">the existence of new meanings jeopardises any cached parsing results</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
|
|
<<span class="cwebmacro">Compute the new excerpt's hash code from its token list</span> <span class="cwebmacronumber">13.1</span>><span class="character">;</span>
|
|
<<span class="cwebmacro">Watermark each word in the token list with the meaning code being applied</span> <span class="cwebmacronumber">13.2</span>><span class="plain">;</span>
|
|
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_MEANINGS</span><span class="plain">,</span>
|
|
<span class="string">"Logging meaning: $M with hash %08x, mc=%d, %d tokens\n"</span><span class="plain">,</span>
|
|
<span class="identifier">em</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">-></span><span class="element">excerpt_hash</span><span class="plain">, </span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">meaning_code</span><span class="plain"> & </span><span class="constant">SUBSET_PARSING_BITMAP</span><span class="plain">) {</span>
|
|
<<span class="cwebmacro">Place the new meaning under the subset list for each non-article word</span> <span class="cwebmacronumber">13.3</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">EM_ALLOW_BLANK_TEST</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> == </span><span class="constant">1</span><span class="plain">) && (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0] == </span><span class="identifier">NULL</span><span class="plain">) &&</span>
|
|
<span class="plain">(</span><span class="identifier">EM_ALLOW_BLANK_TEST</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">))) {</span>
|
|
<<span class="cwebmacro">Place the new meaning under the say-blank list</span> <span class="cwebmacronumber">13.4</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0]) {</span>
|
|
<<span class="cwebmacro">Place the new meaning under the start list of the first word</span> <span class="cwebmacronumber">13.5</span>><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">-1]) {</span>
|
|
<<span class="cwebmacro">Place the new meaning under the end list of the last word</span> <span class="cwebmacronumber">13.6</span>><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=1; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">-1; </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) { </span><<span class="cwebmacro">Place the new meaning under the middle list of word i</span> <span class="cwebmacronumber">13.7</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain"> >= </span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">-1) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"registered meaning of two or more #s"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::register_em is used in <a href="#SP15">§15</a>, <a href="#SP15_3_2">§15.3.2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_1"></a><b>§13.1. </b>See above.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Compute the new excerpt's hash code from its token list</span> <span class="cwebmacronumber">13.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="functiontext">ExcerptMeanings::hash_code_from_token_list</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_2"></a><b>§13.2. </b>Another important optimisation is to flag each word in the meaning with
|
|
the given meaning code — this is why vocabulary flags and excerpt meaning
|
|
codes share the same numbering space. If we register "Table of Surgical
|
|
Instruments" as a table name, the word "surgical", for instance, picks
|
|
up the <code class="display"><span class="extract">TABLE_MC</span></code> bit in its <code class="display"><span class="extract">flags</span></code> bitmap.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The advantage of this is that if we want to see whether <code class="display"><span class="extract">(w1, w2)</span></code> might be
|
|
a table name, we can take a bitwise AND of the flags for each word in
|
|
the range; if the result doesn't have the <code class="display"><span class="extract">TABLE_MC</span></code> bit set, then at least
|
|
one of the words never occurs in a table name, so the answer must be
|
|
"no". This produces rapid, definite negatives with only a few false
|
|
positives.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Watermark each word in the token list with the meaning code being applied</span> <span class="cwebmacronumber">13.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">])</span>
|
|
<span class="plain">((</span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">])-></span><span class="identifier">flags</span><span class="plain">) |= </span><span class="identifier">meaning_code</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_3"></a><b>§13.3. </b>Note that articles (a, an, the, some) are excluded: this means we don't
|
|
waste time trying to see if the excerpt "the" might be a reference to the
|
|
object "Gregory the Great".
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Place the new meaning under the subset list for each non-article word</span> <span class="cwebmacronumber">13.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) {</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"Logging meaning: $M with hash %08x\n"</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">-></span><span class="element">excerpt_hash</span><span class="plain">);</span>
|
|
<span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"# in registration of subset meaning"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Preform::test_vocabulary</span><span class="plain">(</span><span class="identifier">v</span><span class="plain">, <</span><span class="identifier">article</span><span class="plain">>)) </span><span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">subset_list</span><span class="plain">;</span>
|
|
<span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">subset_list</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">subset_list_length</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_4"></a><b>§13.4. </b>To register <code class="display"><span class="extract">#</span></code>, which is what "To say (N - a number)" and similar
|
|
constructions translate to.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Place the new meaning under the say-blank list</span> <span class="cwebmacronumber">13.4</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">blank_says_p</span><span class="plain">) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p2</span><span class="plain"> = </span><span class="identifier">blank_says_p</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">p2</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">) </span><span class="identifier">p2</span><span class="plain"> = </span><span class="identifier">p2</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">;</span>
|
|
<span class="identifier">p2</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="identifier">blank_says_p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_MEANINGS</span><span class="plain">,</span>
|
|
<span class="string">"The blank list with $M is now:\n$T"</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">, </span><span class="identifier">blank_says_p</span><span class="plain">);</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_5"></a><b>§13.5. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Place the new meaning under the start list of the first word</span> <span class="cwebmacronumber">13.5</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">start_list</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[0]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">start_list</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_6"></a><b>§13.6. </b>...and similarly...
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Place the new meaning under the end list of the last word</span> <span class="cwebmacronumber">13.6</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">-1]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">end_list</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain">-1]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">end_list</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13_7"></a><b>§13.7. </b>...and similarly again:
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Place the new meaning under the middle list of word i</span> <span class="cwebmacronumber">13.7</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">middle_list</span><span class="plain">;</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]-></span><span class="identifier">means</span><span class="plain">.</span><span class="element">middle_list</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP14"></a><b>§14. </b>Parse nodes are only created from excerpt meanings for storage inside the
|
|
excerpt parser, so these never live on into trees.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="functiontext">ExcerptMeanings::new_em_pnode</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">pn</span><span class="plain"> = </span><span class="identifier">ParseTree::new</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">-></span><span class="element">meaning_code</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::set_meaning</span><span class="plain">(</span><span class="identifier">pn</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">pn</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::new_em_pnode is used in <a href="#SP13_3">§13.3</a>, <a href="#SP13_4">§13.4</a>, <a href="#SP13_5">§13.5</a>, <a href="#SP13_6">§13.6</a>, <a href="#SP13_7">§13.7</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15"></a><b>§15. Registration. </b>The following is the main routine used throughout Inform to register new
|
|
meanings.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="functiontext">ExcerptMeanings::register</span><span class="plain">(</span>
|
|
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">general_pointer</span><span class="plain"> </span><span class="identifier">data</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"tried to register empty excerpt meaning"</span><span class="plain">);</span>
|
|
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">meaning_code</span><span class="plain"> == </span><span class="constant">NOUN_MC</span><span class="plain">)</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">)</span>
|
|
<span class="identifier">Preform::mark_word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, <</span><span class="identifier">s</span><span class="plain">-</span><span class="identifier">instance</span><span class="plain">-</span><span class="identifier">name</span><span class="plain">>);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">meaning_code</span><span class="plain"> == </span><span class="identifier">KIND_SLOW_MC</span><span class="plain">)</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">)</span>
|
|
<span class="identifier">Preform::mark_word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, <</span><span class="identifier">k</span><span class="plain">-</span><span class="identifier">kind</span><span class="plain">>);</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">data</span><span class="plain">);</span>
|
|
|
|
<<span class="cwebmacro">Unless this is parametrised, skip any initial article</span> <span class="cwebmacronumber">15.1</span>><span class="plain">;</span>
|
|
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">EM_CASE_SENSITIVITY_TEST</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EM_CASE_SENSITIVITY_TEST</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">))</span>
|
|
<<span class="cwebmacro">Detect use of upper case on the first word of this new text substitution</span> <span class="cwebmacronumber">15.2</span>><span class="plain">;</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
|
|
<<span class="cwebmacro">Build the token list for the new EM</span> <span class="cwebmacronumber">15.3</span>><span class="plain">;</span>
|
|
|
|
<span class="functiontext">ExcerptMeanings::register_em</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">IF_MODULE</span>
|
|
<span class="reserved">if</span><span class="plain"> ((<</span><span class="identifier">notable</span><span class="plain">-</span><span class="identifier">player</span><span class="plain">-</span><span class="identifier">variables</span><span class="plain">>(</span><span class="identifier">W</span><span class="plain">)) && (<<</span><span class="identifier">r</span><span class="plain">>> == </span><span class="constant">0</span><span class="plain">)</span>
|
|
<span class="plain">&& (</span><span class="identifier">meaning_code</span><span class="plain"> & </span><span class="identifier">VARIABLE_MC</span><span class="plain">)) </span><span class="identifier">meaning_of_player</span><span class="plain"> = </span><span class="identifier">RETRIEVE_POINTER_parse_node</span><span class="plain">(</span><span class="identifier">data</span><span class="plain">);</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">em</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::register is used in 3/adj (<a href="3-adj.html#SP4">§4</a>), 3/nns (<a href="3-nns.html#SP4">§4</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_1"></a><b>§15.1. </b>Articles are preserved at the front of phrase definitions, mainly because
|
|
text substitutions need to distinguish (for instance) "say [the X]" from
|
|
"say [an X]".
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Unless this is parametrised, skip any initial article</span> <span class="cwebmacronumber">15.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">meaning_code</span><span class="plain"> & </span><span class="constant">PARAMETRISED_PARSING_BITMAP</span><span class="plain">) == </span><span class="constant">0</span><span class="plain">)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Preform::test_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), <</span><span class="identifier">article</span><span class="plain">>)) {</span>
|
|
<span class="identifier">W</span><span class="plain"> = </span><span class="identifier">Wordings::trim_first_word</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">))</span>
|
|
<span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"registered a meaning which was only an article"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP15">§15</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_2"></a><b>§15.2. </b>Because an open bracket fails <code class="display"><span class="extract">isupper</span></code>, the following looks at the first
|
|
letter of the first word only if it's not a blank. If it finds upper case, as
|
|
it would when reading the "T" in:
|
|
</p>
|
|
|
|
<blockquote>
|
|
<p>To say The Portrait: ...</p>
|
|
|
|
</blockquote>
|
|
|
|
<p class="inwebparagraph">then it makes a new upper-case version of the word "the", i.e., "The",
|
|
with a distinct lexical identity; and places this distinguished identity as
|
|
the new first token. This ensures that we end up with a different token list
|
|
from the one in:
|
|
</p>
|
|
|
|
<blockquote>
|
|
<p>To say the Portrait: ...</p>
|
|
|
|
</blockquote>
|
|
|
|
<p class="inwebparagraph">(These are the only circumstances in which phrase parsing has any case
|
|
sensitivity.)
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Detect use of upper case on the first word of this new text substitution</span> <span class="cwebmacronumber">15.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">tx</span><span class="plain"> = </span><span class="identifier">Lexer::word_raw_text</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">tx</span><span class="plain">[0]) && ((</span><span class="identifier">isupper</span><span class="plain">(</span><span class="identifier">tx</span><span class="plain">[0])) || (</span><span class="identifier">tx</span><span class="plain">[1] == </span><span class="constant">0</span><span class="plain">))) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ucf</span><span class="plain"> = </span><span class="identifier">Vocabulary::make_case_sensitive</span><span class="plain">(</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)));</span>
|
|
<span class="reserved">if</span><span class="plain"> (!</span><span class="identifier">Characters::isupper</span><span class="plain">(</span><span class="identifier">tx</span><span class="plain">[0])) </span><span class="identifier">ucf</span><span class="plain"> = </span><span class="identifier">Vocabulary::get_lower_case_form</span><span class="plain">(</span><span class="identifier">ucf</span><span class="plain">);</span>
|
|
<span class="identifier">Lexer::set_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">ucf</span><span class="plain">);</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_MEANINGS</span><span class="plain">,</span>
|
|
<span class="string">"Allowing initial capitalised word %w: meaning_code = %08x\n"</span><span class="plain">,</span>
|
|
<span class="identifier">tx</span><span class="plain">, </span><span class="identifier">meaning_code</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP15">§15</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_3"></a><b>§15.3. </b>We read the text in something like:
|
|
</p>
|
|
|
|
<blockquote>
|
|
<p>award (P - a number) points</p>
|
|
|
|
</blockquote>
|
|
|
|
<p class="inwebparagraph">and transcribe it into the token list, collapsing bracketed parts into <code class="display"><span class="extract">#</span></code>
|
|
tokens denoting gaps, to result in something like:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<pre class="display">
|
|
<span class="plain">award # points</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph">with a token count of 3.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Build the token list for the new EM</span> <span class="cwebmacronumber">15.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">tc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"> < </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">tc</span><span class="plain"> >= </span><span class="constant">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain">) {</span>
|
|
<<span class="cwebmacro">Complain of excessive length of the new excerpt</span> <span class="cwebmacronumber">15.3.3</span>><span class="plain">;</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">compare_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">OPENBRACKET_V</span><span class="plain">)) {</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">tc</span><span class="plain">++] = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Skip over bracketed token description</span> <span class="cwebmacronumber">15.3.1</span>><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="identifier">em</span><span class="plain">-></span><span class="identifier">em_tokens</span><span class="plain">[</span><span class="identifier">tc</span><span class="plain">++] = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + </span><span class="identifier">i</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> = </span><span class="identifier">tc</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP15">§15</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_3_1"></a><b>§15.3.1. </b>This is all a little defensive, but syntax bugs higher up tend to find
|
|
their way down to this plughole:
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Skip over bracketed token description</span> <span class="cwebmacronumber">15.3.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">bl</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain"> >= </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"Bad meaning: <%W>\n"</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Bracket mismatch when registering"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">compare_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">OPENBRACKET_V</span><span class="plain">)) </span><span class="identifier">bl</span><span class="plain">++;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">compare_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">CLOSEBRACKET_V</span><span class="plain">)) </span><span class="identifier">bl</span><span class="plain">--;</span>
|
|
<span class="identifier">i</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">i</span><span class="plain"> < </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) && (</span><span class="identifier">compare_word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">OPENBRACKET_V</span><span class="plain">))) {</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"Bad meaning: <%W>\n"</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Two consecutive bracketed tokens when registering"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">i</span><span class="plain">--;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP15_3">§15.3</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_3_2"></a><b>§15.3.2. Meaning from assemblages. </b>In a few cases it is convenient to register a meaning from a wording which
|
|
isn't contiguously present in the lexer, so we also provide a method for
|
|
taking it from a word assemblage.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">In other respects this is a simpler routine, because it's never needed for
|
|
token lists with gaps in.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="functiontext">ExcerptMeanings::register_assemblage</span><span class="plain">(</span>
|
|
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">word_assemblage</span><span class="plain"> </span><span class="identifier">wa</span><span class="plain">, </span><span class="identifier">general_pointer</span><span class="plain"> </span><span class="identifier">data</span><span class="plain">) {</span>
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain"> = </span><span class="functiontext">ExcerptMeanings::new</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">data</span><span class="plain">);</span>
|
|
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> **</span><span class="identifier">array</span><span class="plain">; </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">len</span><span class="plain">;</span>
|
|
<span class="identifier">WordAssemblages::as_array</span><span class="plain">(&</span><span class="identifier">wa</span><span class="plain">, &</span><span class="identifier">array</span><span class="plain">, &</span><span class="identifier">len</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">tc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">len</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">tc</span><span class="plain"> >= </span><span class="constant">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain">) {</span>
|
|
<<span class="cwebmacro">Complain of excessive length of the new excerpt</span> <span class="cwebmacronumber">15.3.3</span>><span class="plain">;</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">em_tokens</span><span class="plain">[</span><span class="identifier">tc</span><span class="plain">++] = </span><span class="identifier">array</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">em</span><span class="plain">-></span><span class="element">no_em_tokens</span><span class="plain"> = </span><span class="identifier">tc</span><span class="plain">;</span>
|
|
|
|
<span class="functiontext">ExcerptMeanings::register_em</span><span class="plain">(</span><span class="identifier">meaning_code</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">em</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::register_assemblage appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP15_3_3"></a><b>§15.3.3. </b>In practice, nobody ever hits this message except deliberately. It has
|
|
a tendency to fire twice or more on the same source text because of
|
|
registering multiple inflected forms of the same text; but it's not worth
|
|
going to any trouble to prevent this.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Complain of excessive length of the new excerpt</span> <span class="cwebmacronumber">15.3.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="functiontext">ExcerptMeanings::problem_handler</span><span class="plain">(</span><span class="constant">TooLongName_LINERROR</span><span class="plain">, </span><span class="identifier">EMPTY_WORDING</span><span class="plain">, </span><span class="identifier">NULL</span><span class="plain">, </span><span class="constant">0</span><span class="plain">);</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP15_3">§15.3</a>, <a href="#SP15_3_2">§15.3.2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP16"></a><b>§16. Errors. </b>Some tools using this module will want to push simple error messages out to
|
|
the command line; others will want to translate them into elaborate problem
|
|
texts in HTML. So the client is allowed to define <code class="display"><span class="extract">LINGUISTICS_PROBLEM_HANDLER</span></code>
|
|
to some routine of her own, gazumping this one.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExcerptMeanings::problem_handler</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">err_no</span><span class="plain">, </span><span class="identifier">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">, </span><span class="reserved">void</span><span class="plain"> *</span><span class="identifier">ref</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">k</span><span class="plain">) {</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">LINGUISTICS_PROBLEM_HANDLER</span>
|
|
<span class="identifier">LINGUISTICS_PROBLEM_HANDLER</span><span class="plain">(</span><span class="identifier">err_no</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">ref</span><span class="plain">, </span><span class="identifier">k</span><span class="plain">);</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="plain">#</span><span class="identifier">ifndef</span><span class="plain"> </span><span class="identifier">LINGUISTICS_PROBLEM_HANDLER</span>
|
|
<span class="identifier">TEMPORARY_TEXT</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">);</span>
|
|
<span class="identifier">WRITE_TO</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="string">"%+W"</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">err_no</span><span class="plain">) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TooLongName_LINERROR:</span>
|
|
<span class="identifier">Errors::nowhere</span><span class="plain">(</span><span class="string">"noun too long"</span><span class="plain">);</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">DISCARD_TEXT</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">);</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExcerptMeanings::problem_handler is used in <a href="#SP15_3_3">§15.3.3</a>.</p>
|
|
|
|
<hr class="tocbar">
|
|
<ul class="toc"><li><i>(This section begins Chapter 2: Excerpts.)</i></li><li><a href="2-pe.html">Continue with 'Parse Excerpts'</a></li></ul><hr class="tocbar">
|
|
<!--End of weave-->
|
|
</main>
|
|
</body>
|
|
</html>
|
|
|