1
0
Fork 0
mirror of https://github.com/ganelson/inform.git synced 2024-07-18 15:04:25 +03:00
inform7/docs/linguistics-module/2-em.html

798 lines
112 KiB
HTML
Raw Normal View History

2019-03-17 14:40:57 +02:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
2020-04-14 19:56:54 +03:00
<title>Excerpt Meanings</title>
2020-05-03 03:20:55 +03:00
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-03-19 02:11:25 +02:00
<meta name="viewport" content="width=device-width initial-scale=1">
2019-03-17 14:40:57 +02:00
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
2020-05-03 03:01:21 +03:00
2020-05-03 03:20:55 +03:00
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-05-03 03:01:21 +03:00
<script>
MathJax = {
tex: {
inlineMath: '$', '$'], ['\\(', '\\)'
},
svg: {
fontCache: 'global'
}
};
</script>
<script type="text/javascript" id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>
<script>
function togglePopup(material_id) {
var popup = document.getElementById(material_id);
popup.classList.toggle("show");
}
</script>
2020-05-03 03:20:55 +03:00
<link href="../docs-assets/Popups.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-14 19:56:54 +03:00
2019-03-17 14:40:57 +02:00
</head>
2020-05-03 03:01:21 +03:00
<body class="commentary-font">
2020-03-19 02:11:25 +02:00
<nav role="navigation">
2020-04-14 19:56:54 +03:00
<h1><a href="../index.html">
2020-05-03 18:34:53 +03:00
<img src="../docs-assets/Inform.png" height=72">
2020-04-14 19:56:54 +03:00
</a></h1>
<ul><li><a href="../compiler.html">compiler tools</a></li>
2020-03-19 02:11:25 +02:00
<li><a href="../other.html">other tools</a></li>
<li><a href="../extensions.html">extensions and kits</a></li>
<li><a href="../units.html">unit test tools</a></li>
2020-04-14 19:56:54 +03:00
</ul><h2>Compiler Webs</h2><ul>
2020-03-19 02:11:25 +02:00
<li><a href="../inbuild/index.html">inbuild</a></li>
<li><a href="../inform7/index.html">inform7</a></li>
<li><a href="../inter/index.html">inter</a></li>
2020-04-14 19:56:54 +03:00
</ul><h2>Inbuild Modules</h2><ul>
<li><a href="../supervisor-module/index.html">supervisor</a></li>
</ul><h2>Inform7 Modules</h2><ul>
2020-03-19 02:11:25 +02:00
<li><a href="../core-module/index.html">core</a></li>
<li><a href="../inflections-module/index.html">inflections</a></li>
2020-04-14 19:56:54 +03:00
<li><a href="index.html"><span class="selectedlink">linguistics</span></a></li>
2020-03-19 02:11:25 +02:00
<li><a href="../kinds-module/index.html">kinds</a></li>
<li><a href="../if-module/index.html">if</a></li>
<li><a href="../multimedia-module/index.html">multimedia</a></li>
2020-04-14 19:56:54 +03:00
<li><a href="../problems-module/index.html">problems</a></li>
2020-03-19 02:11:25 +02:00
<li><a href="../index-module/index.html">index</a></li>
2020-04-14 19:56:54 +03:00
</ul><h2>Inter Modules</h2><ul>
<li><a href="../bytecode-module/index.html">bytecode</a></li>
2020-03-19 02:11:25 +02:00
<li><a href="../building-module/index.html">building</a></li>
<li><a href="../codegen-module/index.html">codegen</a></li>
2020-04-14 19:56:54 +03:00
</ul><h2>Shared Modules</h2><ul>
<li><a href="../arch-module/index.html">arch</a></li>
<li><a href="../syntax-module/index.html">syntax</a></li>
<li><a href="../words-module/index.html">words</a></li>
<li><a href="../html-module/index.html">html</a></li>
2020-03-19 02:11:25 +02:00
<li><a href="../../../inweb/docs/foundation-module/index.html">foundation</a></li>
2020-04-14 19:56:54 +03:00
</ul>
2020-03-19 02:11:25 +02:00
</nav>
<main role="main">
2020-05-03 03:01:21 +03:00
<!--Weave of 'Excerpt Meanings' generated by Inweb-->
<div class="breadcrumbs">
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../compiler.html">Inform7 Modules</a></li><li><a href="index.html">linguistics</a></li><li><a href="index.html#2">Chapter 2: Excerpts</a></li><li><b>Excerpt Meanings</b></li></ul></div>
<p class="purpose">To register and deregister meanings for excerpts of text as nouns, adjectives, imperative phrases and other usages.</p>
2019-03-17 14:40:57 +02:00
<ul class="toc"><li><a href="2-em.html#SP1">&#167;1. Excerpt meanings</a></li><li><a href="2-em.html#SP3">&#167;3. Meaning codes</a></li><li><a href="2-em.html#SP7">&#167;7. Creating EMs</a></li><li><a href="2-em.html#SP9">&#167;9. Debugging log</a></li><li><a href="2-em.html#SP10">&#167;10. Hashing excerpts</a></li><li><a href="2-em.html#SP13">&#167;13. EM Listing</a></li><li><a href="2-em.html#SP15">&#167;15. Registration</a></li><li><a href="2-em.html#SP15_3_2">&#167;15.3.2. Meaning from assemblages</a></li><li><a href="2-em.html#SP16">&#167;16. Errors</a></li></ul><hr class="tocbar">
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP1"></a><b>&#167;1. Excerpt meanings. </b>Most compilers keep a "symbols table" of identifier names and what
2019-03-17 14:40:57 +02:00
meanings they have: for instance, when compiling Inform, GCC's symbols
2020-05-03 03:01:21 +03:00
table records that <span class="extract"><span class="extract-syntax">problem_count</span></span> is the name of an integer variable and
<span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> of a defined type. This is usually stored so that a
2019-03-17 14:40:57 +02:00
new name can rapidly be checked to see if it matches one that is currently
known.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">In natural language we must similarly remember meanings of excerpts. (Recall
2019-03-17 14:40:57 +02:00
that an "excerpt" is a run of one or more adjacent words in the source text.)
Here we store just such a lexicon. We won't use this for every grammatical
category (determiners and verb forms are more efficiently stored elsewhere),
but otherwise it's a general grab-bag of meanings. Inform uses this data
structure to store (a) adjectives, (b) nouns and (c) imperative phrases
of the sort used to define rules. Examples include:
</p>
<blockquote>
<p>american dialect, say close bracket, player's command, open, Hall of Mirrors</p>
</blockquote>
2020-05-03 03:01:21 +03:00
<p class="commentary">Most compilers use a symbols table whose efficiency depends on the fact
2019-03-17 14:40:57 +02:00
that symbol names are relatively long strings (say, 8 or more units)
drawn from a small alphabet (say, the 37 letters, digits and the underscore).
But Inform has short symbols (typically one to three units) drawn from a
huge alphabet (say, all 5,000 different words found in the source text).
And we also need to parse in ways which a conventional compiler does not.
2020-05-03 03:01:21 +03:00
If C has registered the identifier <span class="extract"><span class="extract-syntax">pink_martini</span></span> then it never needs to
notice <span class="extract"><span class="extract-syntax">pnk_martin</span></span> as being related to it. But when Inform registers
2019-03-17 14:40:57 +02:00
"pink martini" as the name of an instance, it then has to spot that
either "pink" or "martini" alone might also refer to the same object.
So we are not going to use the conventional algorithms.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP2"></a><b>&#167;2. </b>We now define the <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> data structure, which holds a single
2019-03-17 14:40:57 +02:00
entry in this super-dictionary. The text to be matched is specified as a
sequence of at least one, and at most 32, tokens: these can either be
pointers to specific vocabulary, or can be null, which implies that
arbitrary non-empty text can appear in the given position. (It is forbidden
for the token list to contain two nulls in a row.) For instance, the
token list:
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> drink # milk #</span>
<span class="plain-syntax">#</span>
<span class="plain-syntax">matches "drink more milk today and every day", but not "drink milk". (The</span>
<span class="plain-syntax">sharp symbol |#| is printed in place of a null token, both in this documentation</span>
<span class="plain-syntax">and in the debugging log.)</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax">Each excerpt meaning also comes with a hash code, which is automatically</span>
<span class="plain-syntax">generated from its token list, and a pointer to some structure.</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="definition-keyword">enum</span> <span class="constant-syntax">TooLongName_LINERROR</span><span class="plain-syntax"> </span><span class="identifier-syntax">from</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="definition-keyword">define</span> <span class="constant-syntax">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span>
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">; </span><span class="comment-syntax"> what kind of meaning: a single MC, not a bitmap</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">general_pointer</span><span class="plain-syntax"> </span><span class="identifier-syntax">data</span><span class="plain-syntax">; </span><span class="comment-syntax"> data structure being referred to</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">no_em_tokens</span><span class="plain-syntax">; </span><span class="comment-syntax"> length of token list</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="constant-syntax">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain-syntax">]; </span><span class="comment-syntax"> token list</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">excerpt_hash</span><span class="plain-syntax">; </span><span class="comment-syntax"> hash code generated from the token list</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">MEMORY_MANAGEMENT</span>
<span class="plain-syntax">} </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>The structure excerpt_meaning is accessed in 2/pe and here.</li></ul>
<p class="commentary firstcommentary"><a id="SP3"></a><b>&#167;3. Meaning codes. </b>These assign a context to a meaning, and so decide how the <span class="extract"><span class="extract-syntax">data</span></span> pointer for
2019-03-17 14:40:57 +02:00
an excerpt meaning is to interpreted. For instance, "Persian carpet" might
2020-05-03 03:01:21 +03:00
have a meaning with code <span class="extract"><span class="extract-syntax">NOUN_MC</span></span>.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">Meaning codes are used in other contexts in Inform besides this one. There
2019-03-17 14:40:57 +02:00
are up to 31 of them and each is a distinct power of two; there is no
significance to their ordering. The point is that a signed integer (which
2020-04-14 19:56:54 +03:00
we know can hold values at least up to \(2^{31}-1\)) can hold a bitmap
2020-05-03 03:01:21 +03:00
representing any subset of these meaning codes; for instance, <span class="extract"><span class="extract-syntax">PROPERTY_MC
</span></span>+ TABLE_MC<span class="extract"><span class="extract-syntax"> might mean "either a property name or a table name".
</span></span></p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP4"></a><b>&#167;4. </b>The <span class="extract"><span class="extract-syntax">meaning_code</span></span> field of an <span class="extract"><span class="extract-syntax">excerpt_meaning</span></span> is always exactly
one of the <span class="extract"><span class="extract-syntax">*_MC</span></span> values. (It is never a bitmap combination.)
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">MISCELLANEOUS_MC</span><span class="plain-syntax"> </span><span class="constant-syntax">0x00000001</span><span class="plain-syntax"> </span><span class="comment-syntax"> a grab-bag of other possible nouns</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">NOUN_MC</span><span class="plain-syntax"> </span><span class="constant-syntax">0x00000002</span><span class="plain-syntax"> </span><span class="comment-syntax"> e.g., </span><span class="extract"><span class="extract-syntax">upright chair</span></span>
<span class="definition-keyword">define</span> <span class="constant-syntax">ADJECTIVE_MC</span><span class="plain-syntax"> </span><span class="constant-syntax">0x00000004</span><span class="plain-syntax"> </span><span class="comment-syntax"> e.g., </span><span class="extract"><span class="extract-syntax">invisible</span></span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP5"></a><b>&#167;5. </b>Each word in our vocabulary will be annotated with this structure:
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">VOCABULARY_MEANING_INITIALISER</span><span class="plain-syntax"> </span><a href="2-em.html#SP6" class="function-link"><span class="function-syntax">ExcerptMeanings::new_meaning</span></a>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">vocabulary_meaning</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">kind</span><span class="plain-syntax"> *</span><span class="identifier-syntax">one_word_kind</span><span class="plain-syntax">; </span><span class="comment-syntax"> ditto as a kind with single-word name</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">start_list</span><span class="plain-syntax">; </span><span class="comment-syntax"> meanings starting with this</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">end_list</span><span class="plain-syntax">; </span><span class="comment-syntax"> meanings ending with this</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">middle_list</span><span class="plain-syntax">; </span><span class="comment-syntax"> meanings with this inside but at neither end</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">subset_list</span><span class="plain-syntax">; </span><span class="comment-syntax"> meanings allowing subsets which include this</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">subset_list_length</span><span class="plain-syntax">; </span><span class="comment-syntax"> number of meanings in the subset list</span>
<span class="plain-syntax">} </span><span class="reserved-syntax">vocabulary_meaning</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>The structure vocabulary_meaning is accessed in 2/pe and here.</li></ul>
<p class="commentary firstcommentary"><a id="SP6"></a><b>&#167;6. </b>With the following initialiser:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">vocabulary_meaning</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::new_meaning</span><button class="popup" onclick="togglePopup('usagePopup1')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup1">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::new_meaning</span></span>:<br/><a href="2-em.html#SP5">&#167;5</a></span></button><span class="plain-syntax">(</span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">ve</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">CORE_MODULE</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Kinds::Textual::parse_variable</span><span class="plain-syntax">(</span><span class="identifier-syntax">ve</span><span class="plain-syntax">)) </span><span class="identifier-syntax">ve</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax"> |= </span><span class="identifier-syntax">KIND_FAST_MC</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ve</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) &amp; </span><span class="identifier-syntax">NUMBER_MC</span><span class="plain-syntax">) </span><a href="3-cao.html#SP3" class="function-link"><span class="function-syntax">Cardinals::mark_as_cardinal</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">ve</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ve</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) &amp; </span><span class="identifier-syntax">ORDINAL_MC</span><span class="plain-syntax">) </span><a href="3-cao.html#SP3" class="function-link"><span class="function-syntax">Cardinals::mark_as_ordinal</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">ve</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">vocabulary_meaning</span><span class="plain-syntax"> </span><span class="identifier-syntax">vm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">start_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">; </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">end_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">; </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">middle_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">subset_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">; </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">subset_list_length</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vm</span><span class="plain-syntax">.</span><span class="element-syntax">one_word_kind</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">vm</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP7"></a><b>&#167;7. Creating EMs. </b>The following makes a skeletal EM structure, with no token list or hash code
2019-03-17 14:40:57 +02:00
as yet.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="function-syntax">ExcerptMeanings::new</span><button class="popup" onclick="togglePopup('usagePopup2')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup2">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::new</span></span>:<br/><a href="2-em.html#SP15">&#167;15</a>, <a href="2-em.html#SP15_3_2">&#167;15.3.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">mc</span><span class="plain-syntax">, </span><span class="identifier-syntax">general_pointer</span><span class="plain-syntax"> </span><span class="identifier-syntax">data</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax"> = </span><span class="identifier-syntax">CREATE</span><span class="plain-syntax">(</span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">meaning_code</span><span class="plain-syntax"> = </span><span class="identifier-syntax">mc</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">data</span><span class="plain-syntax"> = </span><span class="identifier-syntax">data</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">excerpt_hash</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP8"></a><b>&#167;8. </b>Access routines:
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">general_pointer</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::data</span><button class="popup" onclick="togglePopup('usagePopup3')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup3">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::data</span></span>:<br/>Parse Excerpts - <a href="2-pe.html#SP6">&#167;6</a><br/>Adjectives - <a href="3-adj.html#SP6">&#167;6</a><br/>Nouns - <a href="3-nns.html#SP9">&#167;9</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">data</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP9"></a><b>&#167;9. Debugging log. </b>First to log a general bitmap made up from meaning codes:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::log</span><button class="popup" onclick="togglePopup('usagePopup4')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup4">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::log</span></span>:<br/>Linguistics Module - <a href="1-lm.html#SP3_5">&#167;3.5</a></span></button><span class="plain-syntax">(</span><span class="identifier-syntax">OUTPUT_STREAM</span><span class="plain-syntax">, </span><span class="reserved-syntax">void</span><span class="plain-syntax"> *</span><span class="identifier-syntax">vem</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax"> = (</span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *) </span><span class="identifier-syntax">vem</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">em</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) { </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"&lt;null-em&gt;"</span><span class="plain-syntax">); </span><span class="reserved-syntax">return</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"{"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">&gt;0) </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">" "</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">] == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) { </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"#"</span><span class="plain-syntax">); </span><span class="reserved-syntax">continue</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"%V"</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">" = $N"</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">meaning_code</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"}"</span><span class="plain-syntax">);</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::log_all</span><button class="popup" onclick="togglePopup('usagePopup5')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup5">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::log_all</span></span>:<br/>Parse Excerpts - <a href="2-pe.html#SP7">&#167;7</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOOP_OVER</span><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">, </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"%02d: %08x $M\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">++, (</span><span class="identifier-syntax">pointer_sized_int</span><span class="plain-syntax">) </span><span class="identifier-syntax">em</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP10"></a><b>&#167;10. Hashing excerpts. </b>For excerpts <span class="extract"><span class="extract-syntax">(w1, w2)</span></span>, we need a form of hash function which makes it
2019-03-17 14:40:57 +02:00
easy to test whether the words in one excerpt can all be found in another,
or to be more exact whether
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">\(\) \lbrace I_j\mid w_1\leq j\leq w_2\rbrace \subseteq
\lbrace I_j\mid w_3\leq j\leq w_4\rbrace \(\)
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">where \(I_n\) is the identity of word \(n\). As with all hash algorithms, we do
2019-03-17 14:40:57 +02:00
not need to guarantee a positive match, only a negative, so we can throw
away a lot of information. And we also want a hash function which makes it
easy to test whether an excerpt contains any of the literals.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP11"></a><b>&#167;11. </b>There are two sources of text which we might want to hash in this way:
2019-03-17 14:40:57 +02:00
first, actual excerpts found in the source text. These are not very
expensive to calculate, but every ounce of speed helps here, so we cache
the most recent.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">The hash generated this way is an arbitrary bitmap of bits 1 to 30, with
2019-03-17 14:40:57 +02:00
bits 31 and 32 left clear.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">cached_hash_w1</span><span class="plain-syntax"> = -2, </span><span class="identifier-syntax">cached_hash_w2</span><span class="plain-syntax"> = -2, </span><span class="identifier-syntax">cached_value</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::hash_code</span><button class="popup" onclick="togglePopup('usagePopup6')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup6">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::hash_code</span></span>:<br/>Parse Excerpts - <a href="2-pe.html#SP5">&#167;5</a></span></button><span class="plain-syntax">(</span><span class="identifier-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">W</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Wordings::empty</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">w1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">), </span><span class="identifier-syntax">w2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Wordings::last_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">v</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">w1</span><span class="plain-syntax"> == </span><span class="identifier-syntax">cached_hash_w1</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">w2</span><span class="plain-syntax"> == </span><span class="identifier-syntax">cached_hash_w2</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">cached_value</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">w1</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;=</span><span class="identifier-syntax">w2</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">v</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Lexer::word</span><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">v</span><span class="plain-syntax">) </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP11_2" class="named-paragraph-link"><span class="named-paragraph">Allow this vocabulary entry to contribute to the excerpt's hash code</span><span class="named-paragraph-number">11.2</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> }</span>
<span class="character-syntax"> return h;</span>
<span class="character-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP11_1"></a><b>&#167;11.1. </b>Second, when a new excerpt meaning is to be registered, we want to hash
2019-03-17 14:40:57 +02:00
code its token list. But only some of the tokens are vocabulary entries,
while others instead represent gaps where arbitrary text can appear (referred
to with a null pointer). Note that we simply ignore that gaps when hashing,
that is, we produce the same hash as we would if the gaps were not there at
all.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">The hash generated this way is an arbitrary bitmap of bits 1 to 31, with
2019-03-17 14:40:57 +02:00
bit 32 left clear. Bit 31 is set, as a special case, for excerpts in the
context of text substitutions which begin with a word known to exist, and
with differing meanings, in two differently cased forms: this is how "[the
noun]" is distinguished from "[The noun]". (The lower 30 bits have the
same meaning as in the first case above.)
</p>
2020-05-03 03:01:21 +03:00
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">CAPITALISED_VARIANT_FORM</span><span class="plain-syntax"> (1 &lt;&lt; </span><span class="constant-syntax">30</span><span class="plain-syntax">)</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::hash_code_from_token_list</span><button class="popup" onclick="togglePopup('usagePopup7')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup7">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::hash_code_from_token_list</span></span>:<br/><a href="2-em.html#SP13_1">&#167;13.1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"Empty text when registering"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">1</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[0])) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lcf</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Vocabulary::get_lower_case_form</span><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[0]);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lcf</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="identifier-syntax">h</span><span class="plain-syntax"> | </span><span class="constant-syntax">CAPITALISED_VARIANT_FORM</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[0] = </span><span class="identifier-syntax">lcf</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">v</span><span class="plain-syntax"> = </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">];</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">v</span><span class="plain-syntax">) </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP11_2" class="named-paragraph-link"><span class="named-paragraph">Allow this vocabulary entry to contribute to the excerpt's hash code</span><span class="named-paragraph-number">11.2</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> }</span>
<span class="character-syntax"> em-&gt;excerpt_hash = h;</span>
<span class="character-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP11_2"></a><b>&#167;11.2. </b>Now each vocabulary entry <span class="extract"><span class="extract-syntax">v</span></span>, i.e., each distinct word identity, itself has
a hash code to identify it. These are stored in <span class="extract"><span class="extract-syntax">v-&gt;hash</span></span> and, except for
2019-03-17 14:40:57 +02:00
literals, are more or less evenly distributed in about the range 0 to 1000.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">The contribution made by a single word's individual hash to the bitmap hash
2019-03-17 14:40:57 +02:00
for the whole excerpt is as follows.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Allow this vocabulary entry to contribute to the excerpt's hash code</span><span class="named-paragraph-number">11.2</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) &amp; </span><span class="identifier-syntax">NUMBER_MC</span><span class="plain-syntax">) </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="identifier-syntax">h</span><span class="plain-syntax"> | </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) &amp; </span><span class="identifier-syntax">TEXT_MC</span><span class="plain-syntax">) </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="identifier-syntax">h</span><span class="plain-syntax"> | </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) &amp; </span><span class="identifier-syntax">I6_MC</span><span class="plain-syntax">) </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="identifier-syntax">h</span><span class="plain-syntax"> | </span><span class="constant-syntax">4</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">h</span><span class="plain-syntax"> = </span><span class="identifier-syntax">h</span><span class="plain-syntax"> | (8 &lt;&lt; ((</span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">hash</span><span class="plain-syntax">) % </span><span class="constant-syntax">27</span><span class="plain-syntax">));</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP11">&#167;11</a>, <a href="2-em.html#SP11_1">&#167;11.1</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP12"></a><b>&#167;12. </b>To sum up: the excerpt hash is a bitmap indicating what categories of
2019-03-17 14:40:57 +02:00
words are present in the excerpt. It ignores "gaps" in token lists, and
it ignores the order of the words and repetitions. The three least
significant bits indicate whether numbers, text or I6 verbatims are
present, and the next 27 bits indicate the presence of other words: e.g.,
bit 4 indicates that a word with hash code 0, 27, 54, ..., is present, and
so on. Bit 31, which is used only for token lists of excerpt meanings,
marks that an excerpt is a variant form whose first word must be
capitalised in order for it to match. Bit 32 is always left blank (for
superstitious reasons to do with the sign bit and differences between
platforms in handling signed bit shifts).
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">The result is not a tremendously good hashing number, since it generally
2019-03-17 14:40:57 +02:00
produces a sparse bitmap, so that the variety is not as great as might be
thought. But it is optimised for the trickiest parsing cases where the
rewards of saving unnecessary tests are greatest.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP13"></a><b>&#167;13. EM Listing. </b>We are clearly not going to store the excerpt meanings in a hash table
2019-03-17 14:40:57 +02:00
keyed by the hash values of excerpts &mdash; with hash values as large as
2020-04-14 19:56:54 +03:00
\(2^{31}-1\), that would be practically impossible.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">Instead we key using the actual words. Each vocabulary entry has four
2019-03-17 14:40:57 +02:00
linked lists of EMs: its subset list, its start list, its middle list,
and its end list.
</p>
<ul class="items"><li>(a) If an EM needs to allow parsing as a subset, it must be placed in the
subset list of every word. For instance, "buttress against cathedral
2020-05-03 03:01:21 +03:00
wall" registered under the code <span class="extract"><span class="extract-syntax">NOUN_MC</span></span> would be listed
2019-03-17 14:40:57 +02:00
in the subset lists of "buttress", "against", "cathedral" and "wall".
2020-05-03 03:01:21 +03:00
</li><li>(b) Otherwise it is placed in only one list:
<ul class="items"><li>(b1) If the token list consists only of a single gap <span class="extract"><span class="extract-syntax">#</span></span>, we must be
2019-03-17 14:40:57 +02:00
registering a "say" phrase to say a value. (There is one of these for
2020-05-03 03:01:21 +03:00
each kind of value.) This meaning is listed under a special <span class="extract"><span class="extract-syntax">blank_says_p</span></span>
2019-03-17 14:40:57 +02:00
list, which is not attached to any vocabulary entry.
2020-05-03 03:01:21 +03:00
</li><li>(b2) Otherwise, if the first token is not a <span class="extract"><span class="extract-syntax">#</span></span> gap, it goes into the
start list for the first token's word: for instance, <span class="extract"><span class="extract-syntax">award # points</span></span> joins
2019-03-17 14:40:57 +02:00
the start list for "award".
2020-05-03 03:01:21 +03:00
</li><li>(b3) Otherwise, if the last token is not a <span class="extract"><span class="extract-syntax">#</span></span> gap, it goes into the end
list for the last token's word: for instance, <span class="extract"><span class="extract-syntax"># in # from now</span></span> joins the
2019-03-17 14:40:57 +02:00
end list for "now".
</li><li>(b4) Otherwise, it goes into the middle list of the word for the leftmost
2020-05-03 03:01:21 +03:00
token which is not a <span class="extract"><span class="extract-syntax">#</span></span>: for instance, <span class="extract"><span class="extract-syntax"># plus #</span></span> joins the middle list for
2019-03-17 14:40:57 +02:00
"plus".
</li></ul>
</li></ul>
2020-05-03 03:01:21 +03:00
<p class="commentary">Since no token lists of two or more consecutive <span class="extract"><span class="extract-syntax">#</span></span>s cannot exist, this exhausts the possibilities.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">Outside of subset mode, we will then test a given excerpt <span class="extract"><span class="extract-syntax">(w1, w2)</span></span> in the
source text against all possible meanings by checking the start list for <span class="extract"><span class="extract-syntax">w1</span></span>,
the end list for <span class="extract"><span class="extract-syntax">w2</span></span> and the middle list for every one of <span class="extract"><span class="extract-syntax">(w1+1, w2-1)</span></span>.
2019-03-17 14:40:57 +02:00
Because of this:
</p>
<ul class="items"><li>(i) Performance suffers if lists for individual words become unbalanced
in size. This is why we register Unicode translations as "white chess
knight" rather than "Unicode white chess knight", and so on; the
alternative would be a stupendously long start list for "unicode".
</li><li>(ii) Middle lists are tested far more often than start or end lists, so
we should keep them as small as possible. This is why (b4) above is our last
2020-05-03 03:01:21 +03:00
resort; happily phrases both starting and ending with <span class="extract"><span class="extract-syntax">#</span></span> are uncommon.
2019-03-17 14:40:57 +02:00
</li></ul>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">blank_says_p</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::register_em</span><button class="popup" onclick="togglePopup('usagePopup8')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup8">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::register_em</span></span>:<br/><a href="2-em.html#SP15">&#167;15</a>, <a href="2-em.html#SP15_3_2">&#167;15.3.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">CORE_MODULE</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">ExParser::warn_expression_cache</span><span class="plain-syntax">(); </span><span class="comment-syntax"> the existence of new meanings jeopardises any cached parsing results</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_1" class="named-paragraph-link"><span class="named-paragraph">Compute the new excerpt's hash code from its token list</span><span class="named-paragraph-number">13.1</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_2" class="named-paragraph-link"><span class="named-paragraph">Watermark each word in the token list with the meaning code being applied</span><span class="named-paragraph-number">13.2</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> LOGIF(EXCERPT_MEANINGS,</span>
<span class="character-syntax"> "Logging meaning: $M with hash %08x, mc=%d, %d tokens\n",</span>
<span class="character-syntax"> em, em-&gt;excerpt_hash, meaning_code, em-&gt;no_em_tokens);</span>
<span class="character-syntax"> if (meaning_code &amp; SUBSET_PARSING_BITMAP) {</span>
<span class="character-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_3" class="named-paragraph-link"><span class="named-paragraph">Place the new meaning under the subset list for each non-article word</span><span class="named-paragraph-number">13.3</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> }</span>
<span class="character-syntax"> #ifdef EM_ALLOW_BLANK_TEST</span>
<span class="character-syntax"> else if ((em-&gt;no_em_tokens == 1) &amp;&amp; (em-&gt;em_tokens[0] == NULL) &amp;&amp;</span>
<span class="character-syntax"> (EM_ALLOW_BLANK_TEST(meaning_code))) {</span>
<span class="character-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_4" class="named-paragraph-link"><span class="named-paragraph">Place the new meaning under the say-blank list</span><span class="named-paragraph-number">13.4</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> }</span>
<span class="character-syntax"> #endif</span>
<span class="character-syntax"> else if (em-&gt;em_tokens[0]) {</span>
<span class="character-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_5" class="named-paragraph-link"><span class="named-paragraph">Place the new meaning under the start list of the first word</span><span class="named-paragraph-number">13.5</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> } else if (em-&gt;em_tokens[em-&gt;no_em_tokens-1]) {</span>
<span class="character-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_6" class="named-paragraph-link"><span class="named-paragraph">Place the new meaning under the end list of the last word</span><span class="named-paragraph-number">13.6</span></a></span><span class="character-syntax">;</span>
<span class="character-syntax"> } else {</span>
<span class="character-syntax"> int i;</span>
<span class="character-syntax"> for (i=1; i&lt;em-&gt;no_em_tokens-1; i++)</span>
<span class="character-syntax"> if (em-&gt;em_tokens[i]) { </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP13_7" class="named-paragraph-link"><span class="named-paragraph">Place the new meaning under the middle list of word i</span><span class="named-paragraph-number">13.7</span></a></span><span class="character-syntax">; break; }</span>
<span class="character-syntax"> if (i &gt;= em-&gt;no_em_tokens-1) internal_error("registered meaning of two or more #s");</span>
<span class="character-syntax"> }</span>
<span class="character-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP13_1"></a><b>&#167;13.1. </b>See above.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Compute the new excerpt's hash code from its token list</span><span class="named-paragraph-number">13.1</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><a href="2-em.html#SP11_1" class="function-link"><span class="function-syntax">ExcerptMeanings::hash_code_from_token_list</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_2"></a><b>&#167;13.2. </b>Another important optimisation is to flag each word in the meaning with
2019-03-17 14:40:57 +02:00
the given meaning code &mdash; this is why vocabulary flags and excerpt meaning
codes share the same numbering space. If we register "Table of Surgical
Instruments" as a table name, the word "surgical", for instance, picks
2020-05-03 03:01:21 +03:00
up the <span class="extract"><span class="extract-syntax">TABLE_MC</span></span> bit in its <span class="extract"><span class="extract-syntax">flags</span></span> bitmap.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">The advantage of this is that if we want to see whether <span class="extract"><span class="extract-syntax">(w1, w2)</span></span> might be
2019-03-17 14:40:57 +02:00
a table name, we can take a bitwise AND of the flags for each word in
2020-05-03 03:01:21 +03:00
the range; if the result doesn't have the <span class="extract"><span class="extract-syntax">TABLE_MC</span></span> bit set, then at least
2019-03-17 14:40:57 +02:00
one of the words never occurs in a table name, so the answer must be
"no". This produces rapid, definite negatives with only a few false
positives.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Watermark each word in the token list with the meaning code being applied</span><span class="named-paragraph-number">13.2</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">])</span>
<span class="plain-syntax"> ((</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">])-&gt;</span><span class="identifier-syntax">flags</span><span class="plain-syntax">) |= </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_3"></a><b>&#167;13.3. </b>Note that articles (a, an, the, some) are excluded: this means we don't
2019-03-17 14:40:57 +02:00
waste time trying to see if the excerpt "the" might be a reference to the
object "Gregory the Great".
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Place the new meaning under the subset list for each non-article word</span><span class="named-paragraph-number">13.3</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">v</span><span class="plain-syntax"> = </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">];</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">v</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Logging meaning: $M with hash %08x\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">excerpt_hash</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"# in registration of subset meaning"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Preform::test_vocabulary</span><span class="plain-syntax">(</span><span class="identifier-syntax">v</span><span class="plain-syntax">, &lt;</span><span class="identifier-syntax">article</span><span class="plain-syntax">&gt;)) </span><span class="reserved-syntax">continue</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><a href="2-em.html#SP14" class="function-link"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">p</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax"> = </span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">subset_list</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">subset_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">v</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">subset_list_length</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> }</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_4"></a><b>&#167;13.4. </b>To register <span class="extract"><span class="extract-syntax">#</span></span>, which is what "To say (N - a number)" and similar
2019-03-17 14:40:57 +02:00
constructions translate to.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Place the new meaning under the say-blank list</span><span class="named-paragraph-number">13.4</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><a href="2-em.html#SP14" class="function-link"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">blank_says_p</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">blank_says_p</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">p2</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax">) </span><span class="identifier-syntax">p2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p2</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">p2</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">blank_says_p</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOGIF</span><span class="plain-syntax">(</span><span class="identifier-syntax">EXCERPT_MEANINGS</span><span class="plain-syntax">,</span>
<span class="plain-syntax"> </span><span class="string-syntax">"The blank list with $M is now:\n$T"</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">, </span><span class="identifier-syntax">blank_says_p</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_5"></a><b>&#167;13.5. </b><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Place the new meaning under the start list of the first word</span><span class="named-paragraph-number">13.5</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><a href="2-em.html#SP14" class="function-link"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">p</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax"> = </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[0]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">start_list</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[0]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">start_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_6"></a><b>&#167;13.6. </b>...and similarly...
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Place the new meaning under the end list of the last word</span><span class="named-paragraph-number">13.6</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><a href="2-em.html#SP14" class="function-link"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">p</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax"> = </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">-1]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">end_list</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax">-1]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">end_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP13_7"></a><b>&#167;13.7. </b>...and similarly again:
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Place the new meaning under the middle list of word i</span><span class="named-paragraph-number">13.7</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><a href="2-em.html#SP14" class="function-link"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">p</span><span class="plain-syntax">-&gt;</span><span class="identifier-syntax">next_alternative</span><span class="plain-syntax"> = </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">middle_list</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]-&gt;</span><span class="identifier-syntax">means</span><span class="plain-syntax">.</span><span class="element-syntax">middle_list</span><span class="plain-syntax"> = </span><span class="identifier-syntax">p</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP13">&#167;13</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP14"></a><b>&#167;14. </b>Parse nodes are only created from excerpt meanings for storage inside the
2019-03-17 14:40:57 +02:00
excerpt parser, so these never live on into trees.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="function-syntax">ExcerptMeanings::new_em_pnode</span><button class="popup" onclick="togglePopup('usagePopup9')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup9">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::new_em_pnode</span></span>:<br/><a href="2-em.html#SP13_3">&#167;13.3</a>, <a href="2-em.html#SP13_4">&#167;13.4</a>, <a href="2-em.html#SP13_5">&#167;13.5</a>, <a href="2-em.html#SP13_6">&#167;13.6</a>, <a href="2-em.html#SP13_7">&#167;13.7</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">parse_node</span><span class="plain-syntax"> *</span><span class="identifier-syntax">pn</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ParseTree::new</span><span class="plain-syntax">(</span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">meaning_code</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">ParseTree::set_meaning</span><span class="plain-syntax">(</span><span class="identifier-syntax">pn</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">pn</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP15"></a><b>&#167;15. Registration. </b>The following is the main routine used throughout Inform to register new
2019-03-17 14:40:57 +02:00
meanings.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="function-syntax">ExcerptMeanings::register</span><button class="popup" onclick="togglePopup('usagePopup10')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup10">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::register</span></span>:<br/>Adjectives - <a href="3-adj.html#SP4">&#167;4</a><br/>Nouns - <a href="3-nns.html#SP4">&#167;4</a></span></button><span class="plain-syntax">(</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">W</span><span class="plain-syntax">, </span><span class="identifier-syntax">general_pointer</span><span class="plain-syntax"> </span><span class="identifier-syntax">data</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Wordings::empty</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"tried to register empty excerpt meaning"</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">CORE_MODULE</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax"> == </span><span class="constant-syntax">NOUN_MC</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOOP_THROUGH_WORDING</span><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Preform::mark_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">, &lt;</span><span class="identifier-syntax">s</span><span class="plain-syntax">-</span><span class="identifier-syntax">instance</span><span class="plain-syntax">-</span><span class="identifier-syntax">name</span><span class="plain-syntax">&gt;);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax"> == </span><span class="identifier-syntax">KIND_SLOW_MC</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOOP_THROUGH_WORDING</span><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Preform::mark_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">, &lt;</span><span class="identifier-syntax">k</span><span class="plain-syntax">-</span><span class="identifier-syntax">kind</span><span class="plain-syntax">&gt;);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax"> = </span><a href="2-em.html#SP7" class="function-link"><span class="function-syntax">ExcerptMeanings::new</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">data</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_1" class="named-paragraph-link"><span class="named-paragraph">Unless this is parametrised, skip any initial article</span><span class="named-paragraph-number">15.1</span></a></span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">EM_CASE_SENSITIVITY_TEST</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">EM_CASE_SENSITIVITY_TEST</span><span class="plain-syntax">(</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_2" class="named-paragraph-link"><span class="named-paragraph">Detect use of upper case on the first word of this new text substitution</span><span class="named-paragraph-number">15.2</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_3" class="named-paragraph-link"><span class="named-paragraph">Build the token list for the new EM</span><span class="named-paragraph-number">15.3</span></a></span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><a href="2-em.html#SP13" class="function-link"><span class="function-syntax">ExcerptMeanings::register_em</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">IF_MODULE</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((&lt;</span><span class="identifier-syntax">notable</span><span class="plain-syntax">-</span><span class="identifier-syntax">player</span><span class="plain-syntax">-</span><span class="identifier-syntax">variables</span><span class="plain-syntax">&gt;(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)) &amp;&amp; (&lt;&lt;</span><span class="identifier-syntax">r</span><span class="plain-syntax">&gt;&gt; == </span><span class="constant-syntax">0</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> &amp;&amp; (</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax"> &amp; </span><span class="identifier-syntax">VARIABLE_MC</span><span class="plain-syntax">)) </span><span class="identifier-syntax">meaning_of_player</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RETRIEVE_POINTER_parse_node</span><span class="plain-syntax">(</span><span class="identifier-syntax">data</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP15_1"></a><b>&#167;15.1. </b>Articles are preserved at the front of phrase definitions, mainly because
2019-03-17 14:40:57 +02:00
text substitutions need to distinguish (for instance) "say [the X]" from
"say [an X]".
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Unless this is parametrised, skip any initial article</span><span class="named-paragraph-number">15.1</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">PARAMETRISED_PARSING_BITMAP</span><span class="plain-syntax">) == </span><span class="constant-syntax">0</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Preform::test_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">), &lt;</span><span class="identifier-syntax">article</span><span class="plain-syntax">&gt;)) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">W</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Wordings::trim_first_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Wordings::empty</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"registered a meaning which was only an article"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP15">&#167;15</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP15_2"></a><b>&#167;15.2. </b>Because an open bracket fails <span class="extract"><span class="extract-syntax">isupper</span></span>, the following looks at the first
2019-03-17 14:40:57 +02:00
letter of the first word only if it's not a blank. If it finds upper case, as
it would when reading the "T" in:
</p>
<blockquote>
<p>To say The Portrait: ...</p>
</blockquote>
2020-05-03 03:01:21 +03:00
<p class="commentary">then it makes a new upper-case version of the word "the", i.e., "The",
2019-03-17 14:40:57 +02:00
with a distinct lexical identity; and places this distinguished identity as
the new first token. This ensures that we end up with a different token list
from the one in:
</p>
<blockquote>
<p>To say the Portrait: ...</p>
</blockquote>
2020-05-03 03:01:21 +03:00
<p class="commentary">(These are the only circumstances in which phrase parsing has any case
2019-03-17 14:40:57 +02:00
sensitivity.)
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Detect use of upper case on the first word of this new text substitution</span><span class="named-paragraph-number">15.2</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">tx</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Lexer::word_raw_text</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">tx</span><span class="plain-syntax">[0]) &amp;&amp; ((</span><span class="identifier-syntax">isupper</span><span class="plain-syntax">(</span><span class="identifier-syntax">tx</span><span class="plain-syntax">[0])) || (</span><span class="identifier-syntax">tx</span><span class="plain-syntax">[1] == </span><span class="constant-syntax">0</span><span class="plain-syntax">))) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">ucf</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Vocabulary::make_case_sensitive</span><span class="plain-syntax">(</span><span class="identifier-syntax">Lexer::word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)));</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (!</span><span class="identifier-syntax">Characters::isupper</span><span class="plain-syntax">(</span><span class="identifier-syntax">tx</span><span class="plain-syntax">[0])) </span><span class="identifier-syntax">ucf</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Vocabulary::get_lower_case_form</span><span class="plain-syntax">(</span><span class="identifier-syntax">ucf</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Lexer::set_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">), </span><span class="identifier-syntax">ucf</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOGIF</span><span class="plain-syntax">(</span><span class="identifier-syntax">EXCERPT_MEANINGS</span><span class="plain-syntax">,</span>
<span class="plain-syntax"> </span><span class="string-syntax">"Allowing initial capitalised word %w: meaning_code = %08x\n"</span><span class="plain-syntax">,</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tx</span><span class="plain-syntax">, </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP15">&#167;15</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP15_3"></a><b>&#167;15.3. </b>We read the text in something like:
2019-03-17 14:40:57 +02:00
</p>
<blockquote>
<p>award (P - a number) points</p>
</blockquote>
2020-05-03 03:01:21 +03:00
<p class="commentary">and transcribe it into the token list, collapsing bracketed parts into <span class="extract"><span class="extract-syntax">#</span></span>
2019-03-17 14:40:57 +02:00
tokens denoting gaps, to result in something like:
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> award # points</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary">with a token count of 3.
2019-03-17 14:40:57 +02:00
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Build the token list for the new EM</span><span class="named-paragraph-number">15.3</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">tc</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">Wordings::length</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">); </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">tc</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_3_3" class="named-paragraph-link"><span class="named-paragraph">Complain of excessive length of the new excerpt</span><span class="named-paragraph-number">15.3.3</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">compare_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">) + </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">OPENBRACKET_V</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">tc</span><span class="plain-syntax">++] = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_3_1" class="named-paragraph-link"><span class="named-paragraph">Skip over bracketed token description</span><span class="named-paragraph-number">15.3.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">tc</span><span class="plain-syntax">++] = </span><span class="identifier-syntax">Lexer::word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">) + </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax"> = </span><span class="identifier-syntax">tc</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP15">&#167;15</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP15_3_1"></a><b>&#167;15.3.1. </b>This is all a little defensive, but syntax bugs higher up tend to find
2019-03-17 14:40:57 +02:00
their way down to this plughole:
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Skip over bracketed token description</span><span class="named-paragraph-number">15.3.1</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">bl</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">bl</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &gt;= </span><span class="identifier-syntax">Wordings::length</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Bad meaning: &lt;%W&gt;\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"Bracket mismatch when registering"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">compare_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">) + </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">OPENBRACKET_V</span><span class="plain-syntax">)) </span><span class="identifier-syntax">bl</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">compare_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">) + </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">CLOSEBRACKET_V</span><span class="plain-syntax">)) </span><span class="identifier-syntax">bl</span><span class="plain-syntax">--;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">Wordings::length</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)) &amp;&amp; (</span><span class="identifier-syntax">compare_word</span><span class="plain-syntax">(</span><span class="identifier-syntax">Wordings::first_wn</span><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">) + </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">OPENBRACKET_V</span><span class="plain-syntax">))) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Bad meaning: &lt;%W&gt;\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"Two consecutive bracketed tokens when registering"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">--;</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP15_3">&#167;15.3</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP15_3_2"></a><b>&#167;15.3.2. Meaning from assemblages. </b>In a few cases it is convenient to register a meaning from a wording which
2019-03-17 14:40:57 +02:00
isn't contiguously present in the lexer, so we also provide a method for
taking it from a word assemblage.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary">In other respects this is a simpler routine, because it's never needed for
2019-03-17 14:40:57 +02:00
token lists with gaps in.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="function-syntax">ExcerptMeanings::register_assemblage</span><span class="plain-syntax">(</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">word_assemblage</span><span class="plain-syntax"> </span><span class="identifier-syntax">wa</span><span class="plain-syntax">, </span><span class="identifier-syntax">general_pointer</span><span class="plain-syntax"> </span><span class="identifier-syntax">data</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">excerpt_meaning</span><span class="plain-syntax"> *</span><span class="identifier-syntax">em</span><span class="plain-syntax"> = </span><a href="2-em.html#SP7" class="function-link"><span class="function-syntax">ExcerptMeanings::new</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">data</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="identifier-syntax">vocabulary_entry</span><span class="plain-syntax"> **</span><span class="identifier-syntax">array</span><span class="plain-syntax">; </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">len</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">WordAssemblages::as_array</span><span class="plain-syntax">(&amp;</span><span class="identifier-syntax">wa</span><span class="plain-syntax">, &amp;</span><span class="identifier-syntax">array</span><span class="plain-syntax">, &amp;</span><span class="identifier-syntax">len</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">tc</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">len</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">tc</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="2-em.html#SP15_3_3" class="named-paragraph-link"><span class="named-paragraph">Complain of excessive length of the new excerpt</span><span class="named-paragraph-number">15.3.3</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">em_tokens</span><span class="plain-syntax">[</span><span class="identifier-syntax">tc</span><span class="plain-syntax">++] = </span><span class="identifier-syntax">array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">];</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">no_em_tokens</span><span class="plain-syntax"> = </span><span class="identifier-syntax">tc</span><span class="plain-syntax">;</span>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<span class="plain-syntax"> </span><a href="2-em.html#SP13" class="function-link"><span class="function-syntax">ExcerptMeanings::register_em</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">meaning_code</span><span class="plain-syntax">, </span><span class="identifier-syntax">em</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">em</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<p class="commentary firstcommentary"><a id="SP15_3_3"></a><b>&#167;15.3.3. </b>In practice, nobody ever hits this message except deliberately. It has
2019-03-17 14:40:57 +02:00
a tendency to fire twice or more on the same source text because of
registering multiple inflected forms of the same text; but it's not worth
going to any trouble to prevent this.
</p>
2020-05-03 03:01:21 +03:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Complain of excessive length of the new excerpt</span><span class="named-paragraph-number">15.3.3</span></span><span class="comment-syntax"> =</span>
</p>
2019-03-17 14:40:57 +02:00
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><a href="2-em.html#SP16" class="function-link"><span class="function-syntax">ExcerptMeanings::problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">TooLongName_LINERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">EMPTY_WORDING</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
2019-03-17 14:40:57 +02:00
</pre>
2020-05-03 03:01:21 +03:00
<ul class="endnotetexts"><li>This code is used in <a href="2-em.html#SP15_3">&#167;15.3</a>, <a href="2-em.html#SP15_3_2">&#167;15.3.2</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP16"></a><b>&#167;16. Errors. </b>Some tools using this module will want to push simple error messages out to
the command line; others will want to translate them into elaborate problem
2020-05-03 03:01:21 +03:00
texts in HTML. So the client is allowed to define <span class="extract"><span class="extract-syntax">LINGUISTICS_PROBLEM_HANDLER</span></span>
to some routine of her own, gazumping this one.
</p>
2020-05-03 03:01:21 +03:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">ExcerptMeanings::problem_handler</span><button class="popup" onclick="togglePopup('usagePopup11')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup11">Usage of <span class="code-font"><span class="function-syntax">ExcerptMeanings::problem_handler</span></span>:<br/><a href="2-em.html#SP15_3_3">&#167;15.3.3</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">err_no</span><span class="plain-syntax">, </span><span class="identifier-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">W</span><span class="plain-syntax">, </span><span class="reserved-syntax">void</span><span class="plain-syntax"> *</span><span class="identifier-syntax">ref</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">k</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">LINGUISTICS_PROBLEM_HANDLER</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LINGUISTICS_PROBLEM_HANDLER</span><span class="plain-syntax">(</span><span class="identifier-syntax">err_no</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">, </span><span class="identifier-syntax">ref</span><span class="plain-syntax">, </span><span class="identifier-syntax">k</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifndef</span><span class="plain-syntax"> </span><span class="identifier-syntax">LINGUISTICS_PROBLEM_HANDLER</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">TEMPORARY_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">text</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">WRITE_TO</span><span class="plain-syntax">(</span><span class="identifier-syntax">text</span><span class="plain-syntax">, </span><span class="string-syntax">"%+W"</span><span class="plain-syntax">, </span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">err_no</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">TooLongName_LINERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::nowhere</span><span class="plain-syntax">(</span><span class="string-syntax">"noun too long"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">DISCARD_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">text</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax">}</span>
</pre>
2020-05-03 03:01:21 +03:00
<nav role="progress"><div class="progresscontainer">
<ul class="progressbar"><li class="progressprev"><a href="1-lm.html">&#10094;</a></li><li class="progresschapter"><a href="1-lm.html">1</a></li><li class="progresscurrentchapter">2</li><li class="progresscurrent">em</li><li class="progresssection"><a href="2-pe.html">pe</a></li><li class="progresschapter"><a href="3-aap.html">3</a></li><li class="progresschapter"><a href="4-vrb.html">4</a></li><li class="progresschapter"><a href="5-dgr.html">5</a></li><li class="progressnext"><a href="2-pe.html">&#10095;</a></li></ul></div>
</nav><!--End of weave-->
2020-03-19 02:11:25 +02:00
</main>
2019-03-17 14:40:57 +02:00
</body>
</html>