1
0
Fork 0
mirror of https://github.com/ganelson/inform.git synced 2024-07-08 18:14:21 +03:00
inform7/docs/words-module/2-vcb.html
2019-07-20 07:18:40 +01:00

665 lines
78 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>1/wm</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
<link href="inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
</head>
<body>
<!--Weave of '2/vcb' generated by 7-->
<ul class="crumbs"><li><a href="../webs.html">&#9733;</a></li><li><a href="index.html">words</a></li><li><a href="index.html#2">Chapter 2: Words in Isolation</a></li><li><b>Vocabulary</b></li></ul><p class="purpose">To classify the words in the lexical stream, where two different words are considered equivalent if they are unquoted and have the same text, taken case insensitively.</p>
<ul class="toc"><li><a href="#SP1">&#167;1. Definitions</a></li><li><a href="#SP14">&#167;14. Hash coding of words</a></li><li><a href="#SP15">&#167;15. The hash table of vocabulary</a></li><li><a href="#SP17">&#167;17. Partial words</a></li><li><a href="#SP18">&#167;18. Ordinals</a></li></ul><hr class="tocbar">
<p class="inwebparagraph"><a id="SP1"></a><b>&#167;1. Definitions. </b></p>
<p class="inwebparagraph"><a id="SP2"></a><b>&#167;2. </b>The following structure is created for each different word found in the
source. (Recall that these are not necessarily words in the usual English
sense: for instance, <code class="display"><span class="extract">17</span></code> is a word here.)
</p>
<p class="inwebparagraph">The vocabulary entry structure exists to make textual comparisons faster,
which is essential to make NI run tolerably quickly: NI's speed on typical
source texts increased by a factor of 5-10 when this structure was
introduced. Firstly, the vocabulary is hashed so that it is not too
painful to compare a newly-read word against the known vocabulary;
secondly, each word stores linked lists of meanings which it begins,
occurs in the middle of, ends, or is optionally part of (in the sense
that "brown" is optionally part of the name "small brown shoe", which
could also be written "small shoe"); and thirdly, each word also carries
a bitmap of flags indicating the possible contexts in which it might
be used. Finally, to avoid parsing the same text over and over for its
possible meaning as a literal integer, we cache the result: for instance,
17 for the text <code class="display"><span class="extract">17</span></code>.
</p>
<p class="inwebparagraph">The meaning codes alluded to below are also used for excerpts of text
(i.e., are not just for single words) and are defined in Excerpt Meanings.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">ING_MC</span><span class="plain"> 0</span><span class="identifier">x04000000</span><span class="plain"> </span> <span class="comment">a word ending in -ing</span>
<span class="definitionkeyword">define</span> <span class="constant">NUMBER_MC</span><span class="plain"> 0</span><span class="identifier">x08000000</span><span class="plain"> </span> <span class="comment">one, two, ..., twelve, 1, 2, ...</span>
<span class="definitionkeyword">define</span> <span class="constant">I6_MC</span><span class="plain"> 0</span><span class="identifier">x10000000</span><span class="plain"> </span> <span class="comment">piece of verbatim I6 code</span>
<span class="definitionkeyword">define</span> <span class="constant">TEXTWITHSUBS_MC</span><span class="plain"> 0</span><span class="identifier">x20000000</span><span class="plain"> </span> <span class="comment">double-quoted text literal with substitutions</span>
<span class="definitionkeyword">define</span> <span class="constant">TEXT_MC</span><span class="plain"> 0</span><span class="identifier">x40000000</span><span class="plain"> </span> <span class="comment">double-quoted text literal without substitutions</span>
<span class="definitionkeyword">define</span> <span class="constant">ORDINAL_MC</span><span class="plain"> 0</span><span class="identifier">x80000000</span><span class="plain"> </span> <span class="comment">first, second, third, ..., twelfth</span>
</pre>
<pre class="display">
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">vocabulary_entry</span><span class="plain"> {</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">flags</span><span class="plain">; </span> <span class="comment">bitmap of "meaning codes" indicating possible usages</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">literal_number_value</span><span class="plain">; </span> <span class="comment">evaluation as a literal number, if any</span>
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">exemplar</span><span class="plain">; </span> <span class="comment">text of one instance of this word</span>
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">raw_exemplar</span><span class="plain">; </span> <span class="comment">text of one instance in its raw untreated form</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">hash</span><span class="plain">; </span> <span class="comment">hash code derived from text of word</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">next_in_vocab_hash</span><span class="plain">; </span> <span class="comment">next in list with this hash</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">lower_case_form</span><span class="plain">; </span> <span class="comment">or null if none exists</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">upper_case_form</span><span class="plain">; </span> <span class="comment">or null if none exists</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">nt_incidence</span><span class="plain">; </span> <span class="comment">bitmap hashing which Preform nonterminals it occurs in</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="identifier">vocabulary_meaning</span><span class="plain"> </span><span class="identifier">means</span><span class="plain">;</span>
<span class="plain">} </span><span class="reserved">vocabulary_entry</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The structure vocabulary_entry is accessed in 4/prf and here.</p>
<p class="inwebparagraph"><a id="SP3"></a><b>&#167;3. </b>Some standard punctuation marks:
</p>
<pre class="display">
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">CLOSEBRACE_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">CLOSEBRACKET_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">COLON_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">COMMA_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">DOUBLEDASH_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">FORWARDSLASH_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">FULLSTOP_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">OPENBRACE_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">OPENBRACKET_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">OPENI6_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">PARBREAK_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">PLUS_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">SEMICOLON_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">STROKE_V</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::create_punctuation</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
<span class="identifier">CLOSEBRACE_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"}"</span><span class="plain">);</span>
<span class="identifier">CLOSEBRACKET_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">")"</span><span class="plain">);</span>
<span class="identifier">COLON_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">":"</span><span class="plain">);</span>
<span class="identifier">COMMA_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">","</span><span class="plain">);</span>
<span class="identifier">DOUBLEDASH_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"--"</span><span class="plain">);</span>
<span class="identifier">FORWARDSLASH_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"/"</span><span class="plain">);</span>
<span class="identifier">FULLSTOP_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"."</span><span class="plain">);</span>
<span class="identifier">OPENBRACE_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"{"</span><span class="plain">);</span>
<span class="identifier">OPENBRACKET_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"("</span><span class="plain">);</span>
<span class="identifier">OPENI6_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"(-"</span><span class="plain">);</span>
<span class="identifier">PARBREAK_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="constant">PARAGRAPH_BREAK</span><span class="plain">);</span>
<span class="identifier">PLUS_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"+"</span><span class="plain">);</span>
<span class="identifier">SEMICOLON_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">";"</span><span class="plain">);</span>
<span class="identifier">STROKE_V</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">L</span><span class="string">"|"</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::create_punctuation is used in 1/wm (<a href="1-wm.html#SP3">&#167;3</a>).</p>
<p class="inwebparagraph"><a id="SP4"></a><b>&#167;4. </b>Each distinct word is to have a unique <code class="display"><span class="extract">vocabulary_entry</span></code> structure, and the
"identity" at word number <code class="display"><span class="extract">wn</span></code> is to point to the structure for the text
at that word. Two words are distinct if their lower-case forms are different,
except that two quoted literal texts are always distinct, even if they have
the same content. So for instance,
</p>
<blockquote>
<p>Daleks conquer and destroy! "Ba-dum." Exterminate, exterminate! "Ba-dum."</p>
</blockquote>
<p class="inwebparagraph">would be identified as
</p>
<blockquote>
<p>|ve0| |ve1| |ve2| |ve3| |ve4| |ve5| |ve6| |ve6| |ve4| |ve7|</p>
</blockquote>
<p class="inwebparagraph">where <code class="display"><span class="extract">ve4</span></code> is the common identity of both exclamation marks, and <code class="display"><span class="extract">ve6</span></code>
that of the two "exterminate"s, even though they have different casings;
while the quoted text <code class="display"><span class="extract">"Ba-dum."</span></code> came out with two different identities
<code class="display"><span class="extract">ve5</span></code> and <code class="display"><span class="extract">ve7</span></code>.
</p>
<p class="inwebparagraph">When we want to set the identity for a given word, we call these front-door
routines, either on a single word or on a range.
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::identify_word</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">wn</span><span class="plain">) {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain"> = </span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="functiontext">Lexer::word_text</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">));</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;raw_exemplar</span><span class="plain"> = </span><span class="functiontext">Lexer::word_raw_text</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">);</span>
<span class="functiontext">Lexer::set_word</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::identify_word_range</span><span class="plain">(</span><span class="reserved">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">) {</span>
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">)</span>
<span class="functiontext">Vocabulary::identify_word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::identify_word is used in <a href="#SP5">&#167;5</a>, 3/lxr (<a href="3-lxr.html#SP26_5_2">&#167;26.5.2</a>, <a href="3-lxr.html#SP26_6">&#167;26.6</a>), 4/nw (<a href="4-nw.html#SP8">&#167;8</a>).</p>
<p class="endnote">The function Vocabulary::identify_word_range is used in 3/fds (<a href="3-fds.html#SP5">&#167;5</a>).</p>
<p class="inwebparagraph"><a id="SP5"></a><b>&#167;5. </b>Should we ever change the text of a word, it's essential to re-identify it,
as otherwise its <code class="display"><span class="extract">lw_identity</span></code> points to the wrong vocabulary entry.
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::change_text_of_word</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">wn</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">new</span><span class="plain">) {</span>
<span class="functiontext">Lexer::set_word_text</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">, </span><span class="identifier">new</span><span class="plain">);</span>
<span class="functiontext">Lexer::set_word_raw_text</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">, </span><span class="identifier">new</span><span class="plain">);</span>
<span class="functiontext">Vocabulary::identify_word</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::change_text_of_word appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP6"></a><b>&#167;6. </b>We now need some utilities for dealing with vocabulary entries. Here is a
creator, and a debugging logger:
</p>
<pre class="display">
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="functiontext">Vocabulary::vocab_entry_new</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">hash_code</span><span class="plain">, </span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">flags</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">val</span><span class="plain">) {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain"> = </span><span class="identifier">CREATE</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain">);</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain"> = </span><span class="identifier">text</span><span class="plain">; </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;raw_exemplar</span><span class="plain"> = </span><span class="identifier">text</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;next_in_vocab_hash</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;lower_case_form</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">; </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;hash</span><span class="plain"> = </span><span class="identifier">hash_code</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;nt_incidence</span><span class="plain"> = 0;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain"> = </span><span class="identifier">flags</span><span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">l</span><span class="plain"> = </span><span class="identifier">Wide::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">l</span><span class="plain">&gt;3) &amp;&amp; (</span><span class="identifier">text</span><span class="plain">[</span><span class="identifier">l</span><span class="plain">-3] == </span><span class="character">'i'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">text</span><span class="plain">[</span><span class="identifier">l</span><span class="plain">-2] == </span><span class="character">'n'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">text</span><span class="plain">[</span><span class="identifier">l</span><span class="plain">-1] == </span><span class="character">'g'</span><span class="plain">))</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain"> |= </span><span class="constant">ING_MC</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;literal_number_value</span><span class="plain"> = </span><span class="identifier">val</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;means</span><span class="plain"> = </span><span class="identifier">VOCABULARY_MEANING_INITIALISER</span><span class="plain">(</span><span class="identifier">ve</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::log</span><span class="plain">(</span><span class="identifier">OUTPUT_STREAM</span><span class="plain">, </span><span class="reserved">void</span><span class="plain"> *</span><span class="identifier">vve</span><span class="plain">) {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain"> = (</span><span class="reserved">vocabulary_entry</span><span class="plain"> *) </span><span class="identifier">vve</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ve</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) { </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"NULL"</span><span class="plain">); </span><span class="reserved">return</span><span class="plain">; }</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) { </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"NULL-EXEMPLAR"</span><span class="plain">); </span><span class="reserved">return</span><span class="plain">; }</span>
<span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"%08x-%w-%08x"</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;hash</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;raw_exemplar</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::vocab_entry_new is used in <a href="#SP10">&#167;10</a>, <a href="#SP16_1">&#167;16.1</a>, <a href="#SP16_2">&#167;16.2</a>.</p>
<p class="endnote">The function Vocabulary::log is used in 1/wm (<a href="1-wm.html#SP3_4">&#167;3.4</a>).</p>
<p class="inwebparagraph"><a id="SP7"></a><b>&#167;7. </b>It's perhaps unexpected that a vocabulary entry not only stores a (pointer
to) a copy of the text, the "exemplar" (since it is text which is an
example of this vocabulary being used), but also a separate raw copy of
the text: raw in the sense of retaining the original form in the source
files which the word came from. This looks strange because we normally
identify words on their case-lowered text, not on their raw text. In
the source material:
</p>
<blockquote>
<p>Former Marillion vocalist Fish derived his nickname not from a fish, but from habitual bathing.</p>
</blockquote>
<p class="inwebparagraph">words 4, "Fish", and 11, "fish", each have the same vocabulary entry
as identity, even though their raw texts differ. Clearly the ordinary
exemplar of this entry must be "fish". But what should the raw exemplar
be, "Fish" or "fish"? The answer is the latter, or in general, the raw
exemplar will always be the same as the exemplar; unless we have amended
it by hand, using the following routine.
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::set_raw_exemplar_to_text</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">wn</span><span class="plain">) {</span>
<span class="functiontext">Lexer::word</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">)-</span><span class="element">&gt;raw_exemplar</span><span class="plain"> = </span><span class="functiontext">Lexer::word_text</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::set_raw_exemplar_to_text is used in 4/nw (<a href="4-nw.html#SP8">&#167;8</a>).</p>
<p class="inwebparagraph"><a id="SP8"></a><b>&#167;8. </b>Here are some access routines for the data stored in this
structure:
</p>
<pre class="display">
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="functiontext">Vocabulary::get_exemplar</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">raw</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">raw</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;raw_exemplar</span><span class="plain">;</span>
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::writer</span><span class="plain">(</span><span class="identifier">OUTPUT_STREAM</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">format_string</span><span class="plain">, </span><span class="reserved">void</span><span class="plain"> *</span><span class="identifier">vV</span><span class="plain">) {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain"> = (</span><span class="reserved">vocabulary_entry</span><span class="plain"> *) </span><span class="identifier">vV</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ve</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"tried to write null vocabulary"</span><span class="plain">);</span>
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">format_string</span><span class="plain">[0]) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'+'</span><span class="plain">: </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"%w"</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;raw_exemplar</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'V'</span><span class="plain">: </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"%w"</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">default</span><span class="plain">: </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"bad %V extension"</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::get_exemplar is used in 4/prf (<a href="4-prf.html#SP31">&#167;31</a>).</p>
<p class="endnote">The function Vocabulary::writer is used in 1/wm (<a href="1-wm.html#SP3_1">&#167;3.1</a>).</p>
<p class="inwebparagraph"><a id="SP9"></a><b>&#167;9. </b>An integer is stored at each vocabulary entry, recording its value
if it every turns out to parse as a literal number:
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::get_literal_number_value</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;literal_number_value</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::set_literal_number_value</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">val</span><span class="plain">) {</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;literal_number_value</span><span class="plain"> = </span><span class="identifier">val</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::get_literal_number_value is used in 4/prf (<a href="4-prf.html#SP29_1_1">&#167;29.1.1</a>, <a href="4-prf.html#SP29_1_3">&#167;29.1.3</a>), 4/bn (<a href="4-bn.html#SP5">&#167;5</a>).</p>
<p class="endnote">The function Vocabulary::set_literal_number_value appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP10"></a><b>&#167;10. </b>Almost all text is used case insensitively in Inform source, but we do
occasionally need to distinguish "The" from "the" and the like, when
parsing the names of text substitutions. When a new text substitution is
declared whose first word, in the definition, begins with a capital letter,
<code class="display"><span class="extract">Vocabulary::make_case_sensitive</span></code> is called on the first word, and its identity
is changed to the upper case variant form.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::used_case_sensitively</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain">) || (</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;lower_case_form</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">TRUE</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">FALSE</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="functiontext">Vocabulary::get_lower_case_form</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;lower_case_form</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="functiontext">Vocabulary::make_case_sensitive</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain">;</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain"> =</span>
<span class="functiontext">Vocabulary::vocab_entry_new</span><span class="plain">(</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;hash</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain">, </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;literal_number_value</span><span class="plain">);</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain">-</span><span class="element">&gt;lower_case_form</span><span class="plain"> = </span><span class="identifier">ve</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;upper_case_form</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::used_case_sensitively appears nowhere else.</p>
<p class="endnote">The function Vocabulary::get_lower_case_form appears nowhere else.</p>
<p class="endnote">The function Vocabulary::make_case_sensitive appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP11"></a><b>&#167;11. </b>Finally, each vocabulary entry comes with a bitmap of flags, and here
we get to set and test them:
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::set_flags</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">, </span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">t</span><span class="plain">) {</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain"> |= </span><span class="identifier">t</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::test_vflags</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">, </span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">t</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> (</span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;flags</span><span class="plain">) &amp; </span><span class="identifier">t</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::test_flags</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">wn</span><span class="plain">, </span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">t</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> (</span><span class="functiontext">Lexer::word</span><span class="plain">(</span><span class="identifier">wn</span><span class="plain">)-</span><span class="element">&gt;flags</span><span class="plain">) &amp; </span><span class="identifier">t</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::set_flags appears nowhere else.</p>
<p class="endnote">The function Vocabulary::test_vflags appears nowhere else.</p>
<p class="endnote">The function Vocabulary::test_flags is used in 3/wrd (<a href="3-wrd.html#SP16">&#167;16</a>), 4/prf (<a href="4-prf.html#SP29_1_1">&#167;29.1.1</a>, <a href="4-prf.html#SP29_1_3">&#167;29.1.3</a>), 4/bn (<a href="4-bn.html#SP5">&#167;5</a>, <a href="4-bn.html#SP6">&#167;6</a>).</p>
<p class="inwebparagraph"><a id="SP12"></a><b>&#167;12. </b>It can be useful to find the disjunction of the flags for all the words
in a range, as that gives us a single bitmap which tells us quickly whether
any of the words in that range is a number, or is a word ending in "-ing",
and so on:
</p>
<pre class="display">
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::disjunction_of_flags</span><span class="plain">(</span><span class="reserved">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">) {</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">d</span><span class="plain"> = 0;</span>
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">)</span>
<span class="identifier">d</span><span class="plain"> |= (</span><span class="functiontext">Lexer::word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">)-</span><span class="element">&gt;flags</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">d</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::disjunction_of_flags appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP13"></a><b>&#167;13. </b>Also:
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::set_ntb</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">R</span><span class="plain">) {</span>
<span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;nt_incidence</span><span class="plain"> = </span><span class="identifier">R</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::get_ntb</span><span class="plain">(</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">ve</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">ve</span><span class="plain">-</span><span class="element">&gt;nt_incidence</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::set_ntb is used in 4/prf (<a href="4-prf.html#SP34">&#167;34</a>).</p>
<p class="endnote">The function Vocabulary::get_ntb is used in 4/prf (<a href="4-prf.html#SP34">&#167;34</a>, <a href="4-prf.html#SP35">&#167;35</a>, <a href="4-prf.html#SP36">&#167;36</a>).</p>
<p class="inwebparagraph"><a id="SP14"></a><b>&#167;14. Hash coding of words. </b>To find all the different words used in the source text, we need in principle
to make an enormous number of comparisons of their texts. It is slow to make
a correct identification of two texts as being equal: we have to compare
their every characters against each other. Fortunately, it can be much
faster to tell if they are different. We do this by rapidly deriving a
number from their texts, and then comparing the numbers: if different,
the texts were different.
</p>
<p class="inwebparagraph">The most obvious number would be the length of the text, but this produces
too little variation, and too many false positives: "blue" and "cyan",
for instance, would each produce the number 4.
</p>
<p class="inwebparagraph">Instead we use a standard method to derive a number traditionally called
a "hash code". This is the algorithm called "X 30011" in Aho, Sethi and
Ullman's standard reference "Compilers: Principles, Techniques and Tools" (1986).
Because it is derived from constantly overflowing integer arithmetic,
it will produce different codes on different architectures (say, where
<code class="display"><span class="extract">int</span></code> is 64 bits long rather than 32, or where <code class="display"><span class="extract">char</span></code> is unsigned).
All that matters is that it provides a good spread of hash codes for
typical texts fed into it on any given occasion.
</p>
<p class="inwebparagraph">Good results depend on the number of possible codes being not too tiny
compared to the number of different texts fed in, and also on the key value
30011 being coprime to this number (but 30011 is prime, so that's easily
arranged). A typical source text of 50,000 words has an unquoted vocabulary
of only about 2000 different words. The variation in vocabulary size
between the smallest text source and the largest is only about a factor of
three or four, so there is no need to make a dynamic estimate of the size
of the source. We will always choose 997 as the number of possible hash
codes produced by X 30011: we reserve a further three special codes to be
the hashes of literals rather than ordinary words, and this brings us up to
a round 1000.
</p>
<p class="inwebparagraph">Inside the lexer, decimal integers such as <code class="display"><span class="extract">-506</span></code> were treated as ordinary
words, as there were no lexical difficulties in parsing them. Here they
begin to semantically diverge from the way other ordinary words are handled:
they're treated more like literal texts and I6 inclusions.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">HASH_TAB_SIZE</span><span class="plain"> 1000 </span> <span class="comment">the possible hash codes are 0 up to this minus 1</span>
<span class="definitionkeyword">define</span> <span class="constant">NUMBER_HASH</span><span class="plain"> 0 </span> <span class="comment">literal decimal integers, and no other words, have this hash code</span>
<span class="definitionkeyword">define</span> <span class="constant">TEXT_HASH</span><span class="plain"> 1 </span> <span class="comment">double quoted texts, and no other words, have this hash code</span>
<span class="definitionkeyword">define</span> <span class="constant">I6_HASH</span><span class="plain"> 2 </span> <span class="comment">the <code class="display"><span class="extract">(-</span></code> word introducing an I6 inclusion uniquely has this hash code</span>
</pre>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::hash_code_from_word</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">hash_code</span><span class="plain"> = 0;</span>
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">text</span><span class="plain">;</span>
<span class="reserved">switch</span><span class="plain">(*</span><span class="identifier">p</span><span class="plain">) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'-'</span><span class="plain">: </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain">[1] == 0) </span><span class="reserved">break</span><span class="plain">; </span> <span class="comment">an isolated minus sign is an ordinary word</span>
<span class="comment">and otherwise fall into...</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'0'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'1'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'2'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'3'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'4'</span><span class="plain">:</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'5'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'6'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'7'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'8'</span><span class="plain">: </span><span class="reserved">case</span><span class="plain"> </span><span class="character">'9'</span><span class="plain">:</span>
<span class="comment">the first character may prove to be the start of a number: is this true?</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain">++; *</span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">++) </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Characters::isdigit</span><span class="plain">(*</span><span class="identifier">p</span><span class="plain">) == </span><span class="identifier">FALSE</span><span class="plain">) </span><span class="reserved">goto</span><span class="plain"> </span><span class="identifier">Try_Text</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">NUMBER_HASH</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">' '</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">I6_HASH</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'('</span><span class="plain">: </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain">[1] == </span><span class="character">'-'</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">I6_HASH</span><span class="plain">;</span>
<span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'"'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TEXT_HASH</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="identifier">Try_Text</span><span class="plain">:</span>
<span class="plain">#</span><span class="identifier">pragma</span><span class="plain"> </span><span class="identifier">clang</span><span class="plain"> </span><span class="identifier">diagnostic</span><span class="plain"> </span><span class="identifier">push</span>
<span class="plain">#</span><span class="identifier">pragma</span><span class="plain"> </span><span class="identifier">clang</span><span class="plain"> </span><span class="identifier">diagnostic</span><span class="plain"> </span><span class="identifier">ignored</span><span class="plain"> </span><span class="string">"-Wsign-conversion"</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain">=</span><span class="identifier">text</span><span class="plain">; *</span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">++) </span><span class="identifier">hash_code</span><span class="plain"> = </span><span class="identifier">hash_code</span><span class="plain">*30011 + (*</span><span class="identifier">p</span><span class="plain">);</span>
<span class="plain">#</span><span class="identifier">pragma</span><span class="plain"> </span><span class="identifier">clang</span><span class="plain"> </span><span class="identifier">diagnostic</span><span class="plain"> </span><span class="identifier">pop</span>
<span class="reserved">return</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain">) (3+(</span><span class="identifier">hash_code</span><span class="plain"> % (</span><span class="constant">HASH_TAB_SIZE</span><span class="plain">-3))); </span> <span class="comment">result of X 30011, plus 3</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::hash_code_from_word is used in <a href="#SP16">&#167;16</a>.</p>
<p class="inwebparagraph"><a id="SP15"></a><b>&#167;15. The hash table of vocabulary. </b>Armed with these hash codes, we now store the pointers to the vocabulary
entry structures in linked lists, one for each possible hash code.
These begin empty.
</p>
<pre class="display">
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="constant">HASH_TAB_SIZE</span><span class="plain">];</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::start_hash_table</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="constant">HASH_TAB_SIZE</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) </span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Vocabulary::write_hash_table</span><span class="plain">(</span><span class="identifier">OUTPUT_STREAM</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="constant">HASH_TAB_SIZE</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">=0;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">entry</span><span class="plain"> = </span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
<span class="identifier">entry</span><span class="plain">; </span><span class="identifier">entry</span><span class="plain"> = </span><span class="identifier">entry</span><span class="plain">-</span><span class="element">&gt;next_in_vocab_hash</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain">++ == 0) </span><span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"%d:"</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">);</span>
<span class="identifier">PRINT</span><span class="plain">(</span><span class="string">" %w"</span><span class="plain">, </span><span class="identifier">entry</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain">&gt;0) </span><span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"\</span><span class="plain">n</span><span class="string">"</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::start_hash_table is used in 3/lxr (<a href="3-lxr.html#SP11">&#167;11</a>).</p>
<p class="endnote">The function Vocabulary::write_hash_table appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP16"></a><b>&#167;16. </b>And that leaves only one routine: for finding the unique vocabulary
entry pointer associated with the material in <code class="display"><span class="extract">text</span></code>. We search the
hash table to see if we have the word already, and if not, we add it.
Either way, we return a valid pointer. (Compare Isaiah 55:11, "So shall
my word be that goeth forth out of my mouth: it shall not return unto
me void.")
</p>
<p class="inwebparagraph">It is in order to set the initial values of the flags for the new
word (if it does turn out to be new) that we mandated special hash
codes for any number, any text, or any I6 inclusion.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_vocabulary_entries</span><span class="plain"> = 0;</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="functiontext">Vocabulary::entry_for_text</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">new_entry</span><span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">hash_code</span><span class="plain"> = </span><span class="functiontext">Vocabulary::hash_code_from_word</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">), </span><span class="identifier">val</span><span class="plain"> = 0;</span>
<span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">f</span><span class="plain"> = 0;</span>
<span class="reserved">switch</span><span class="plain">(</span><span class="identifier">hash_code</span><span class="plain">) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="constant">NUMBER_HASH</span><span class="plain">: </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">NUMBER_MC</span><span class="plain">; </span><span class="identifier">val</span><span class="plain"> = </span><span class="identifier">Wide::atoi</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="constant">TEXT_HASH</span><span class="plain">:</span>
<span class="reserved">switch</span><span class="plain"> (</span><span class="functiontext">Word::perhaps_ill_formed_text_routine</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRUE</span><span class="plain">: </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">TEXTWITHSUBS_MC</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">FALSE</span><span class="plain">: </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">TEXT_MC</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">NOT_APPLICABLE</span><span class="plain">: </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">TEXT_MC</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="constant">I6_HASH</span><span class="plain">: </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">I6_MC</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">default</span><span class="plain">:</span>
<span class="identifier">val</span><span class="plain"> = </span><span class="functiontext">Vocabulary::an_ordinal_number</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">val</span><span class="plain"> &gt;= 0) </span><span class="identifier">f</span><span class="plain"> = </span><span class="constant">NUMBER_MC</span><span class="plain"> + </span><span class="constant">ORDINAL_MC</span><span class="plain">; </span> <span class="comment">so that "4th", say, picks up both</span>
<span class="reserved">break</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="identifier">hash_code</span><span class="plain">] == </span><span class="identifier">NULL</span><span class="plain">) {</span>
&lt;<span class="cwebmacro">Pi-ty? That word is not in my vocabulary banks</span> <span class="cwebmacronumber">16.1</span>&gt;<span class="plain">;</span>
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">old_entry</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">n</span><span class="plain">;</span>
<span class="comment">search the non-empty list of words with this hash code</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">n</span><span class="plain">=0, </span><span class="identifier">new_entry</span><span class="plain"> = </span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="identifier">hash_code</span><span class="plain">];</span>
<span class="identifier">new_entry</span><span class="plain"> != </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="identifier">n</span><span class="plain">++, </span><span class="identifier">old_entry</span><span class="plain"> = </span><span class="identifier">new_entry</span><span class="plain">, </span><span class="identifier">new_entry</span><span class="plain"> = </span><span class="identifier">new_entry</span><span class="plain">-</span><span class="element">&gt;next_in_vocab_hash</span><span class="plain">)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wide::cmp</span><span class="plain">(</span><span class="identifier">new_entry</span><span class="plain">-</span><span class="element">&gt;exemplar</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">) == 0)</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">new_entry</span><span class="plain">;</span>
<span class="comment">and if we do not find <code class="display"><span class="extract">text</span></code> in there, then...</span>
&lt;<span class="cwebmacro">My vision is impaired! I cannot see!</span> <span class="cwebmacronumber">16.2</span>&gt;<span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::entry_for_text is used in <a href="#SP3">&#167;3</a>, <a href="#SP4">&#167;4</a>, 4/prf (<a href="4-prf.html#SP25">&#167;25</a>, <a href="4-prf.html#SP26">&#167;26</a>).</p>
<p class="inwebparagraph"><a id="SP16_1"></a><b>&#167;16.1. </b>Here the list for this word's hash code was empty, either meaning that this
is a hash code never seen for any word before (in which case we start the
list for that hash code with the new word), or that the word is a text
literal &mdash; because, for efficiency's sake, we deliberately keep the
hash list for all text literals empty.
</p>
<p class="macrodefinition"><code class="display">
&lt;<span class="cwebmacrodefn">Pi-ty? That word is not in my vocabulary banks</span> <span class="cwebmacronumber">16.1</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="identifier">new_entry</span><span class="plain"> = </span><span class="functiontext">Vocabulary::vocab_entry_new</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">hash_code</span><span class="plain">, </span><span class="identifier">f</span><span class="plain">, </span><span class="identifier">val</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">hash_code</span><span class="plain"> != </span><span class="constant">TEXT_HASH</span><span class="plain">) </span><span class="identifier">list_of_vocab_with_hash</span><span class="plain">[</span><span class="identifier">hash_code</span><span class="plain">] = </span><span class="identifier">new_entry</span><span class="plain">;</span>
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">VOCABULARY</span><span class="plain">, </span><span class="string">"Word %d &lt;%w&gt; is first vocabulary with hash %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">,</span>
<span class="identifier">no_vocabulary_entries</span><span class="plain">++, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">hash_code</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">new_entry</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP16">&#167;16</a>.</p>
<p class="inwebparagraph"><a id="SP16_2"></a><b>&#167;16.2. </b>And here, we exhausted the list at entry <code class="display"><span class="extract">n-1</span></code>, with the last entry being
pointed to by <code class="display"><span class="extract">old_entry</span></code>. We add the new word at the end.
</p>
<p class="macrodefinition"><code class="display">
&lt;<span class="cwebmacrodefn">My vision is impaired! I cannot see!</span> <span class="cwebmacronumber">16.2</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="identifier">new_entry</span><span class="plain"> = </span><span class="functiontext">Vocabulary::vocab_entry_new</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">hash_code</span><span class="plain">, </span><span class="identifier">f</span><span class="plain">, </span><span class="identifier">val</span><span class="plain">);</span>
<span class="identifier">old_entry</span><span class="plain">-</span><span class="element">&gt;next_in_vocab_hash</span><span class="plain"> = </span><span class="identifier">new_entry</span><span class="plain">;</span>
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">VOCABULARY</span><span class="plain">, </span><span class="string">"Word %d &lt;%w&gt; is vocabulary entry no. %d with hash %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">,</span>
<span class="identifier">no_vocabulary_entries</span><span class="plain">++, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">n</span><span class="plain">, </span><span class="identifier">hash_code</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">new_entry</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP16">&#167;16</a>.</p>
<p class="inwebparagraph"><a id="SP17"></a><b>&#167;17. Partial words. </b>Much the same, except that we enter a fragment of a word into lexical memory
and then find its identity as if it were a whole word.
</p>
<pre class="display">
<span class="reserved">vocabulary_entry</span><span class="plain"> *</span><span class="functiontext">Vocabulary::entry_for_partial_text</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">str</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">to</span><span class="plain">) {</span>
<span class="identifier">TEMPORARY_TEXT</span><span class="plain">(</span><span class="identifier">TEMP</span><span class="plain">);</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=</span><span class="identifier">from</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">&lt;=</span><span class="identifier">to</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) </span><span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">TEMP</span><span class="plain">, </span><span class="identifier">str</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">TEMP</span><span class="plain">, 0);</span>
<span class="reserved">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain"> = </span><span class="functiontext">Feeds::feed_stream</span><span class="plain">(</span><span class="identifier">TEMP</span><span class="plain">);</span>
<span class="identifier">DISCARD_TEXT</span><span class="plain">(</span><span class="identifier">TEMP</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="functiontext">Lexer::word</span><span class="plain">(</span><span class="functiontext">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::entry_for_partial_text appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP18"></a><b>&#167;18. Ordinals. </b>The following parses the string to see if it is a non-negative integer,
written as an English ordinal: 0th, 1st, 2nd, 3rd, 4th, 5th, ... Note
that we don't bother to police the finicky rules on which suffix should
accompany which value (22nd not 22th, and so on).
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Vocabulary::an_ordinal_number</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">fw</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] != 0; </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> (!(</span><span class="identifier">Characters::isdigit</span><span class="plain">(</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]))) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">i</span><span class="plain">&gt;0) &amp;&amp;</span>
<span class="plain">(((</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == </span><span class="character">'s'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+1] == </span><span class="character">'t'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+2] == 0)) ||</span>
<span class="plain">((</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == </span><span class="character">'n'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+1] == </span><span class="character">'d'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+2] == 0)) ||</span>
<span class="plain">((</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == </span><span class="character">'r'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+1] == </span><span class="character">'d'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+2] == 0)) ||</span>
<span class="plain">((</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == </span><span class="character">'t'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+1] == </span><span class="character">'h'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">fw</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">+2] == 0))))</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">Wide::atoi</span><span class="plain">(</span><span class="identifier">fw</span><span class="plain">);</span>
<span class="reserved">break</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">return</span><span class="plain"> -1;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Vocabulary::an_ordinal_number is used in <a href="#SP16">&#167;16</a>.</p>
<hr class="tocbar">
<ul class="toc"><li><i>(This section begins Chapter 2: Words in Isolation.)</i></li><li><a href="2-wa.html">Continue with 'Word Assemblages'</a></li></ul><hr class="tocbar">
<!--End of weave-->
</body>
</html>