<ulclass="crumbs"><li><ahref="../index.html">Home</a></li><li><ahref="../services.html">Services</a></li><li><ahref="index.html">words</a></li><li><ahref="index.html#P">Preliminaries</a></li><li><b>What This Module Does</b></li></ul></div>
<ulclass="toc"><li><ahref="P-wtmd.html#SP1">§1. Prerequisites</a></li><li><ahref="P-wtmd.html#SP2">§2. Words, words, words</a></li><li><ahref="P-wtmd.html#SP5">§5. Meaning codes</a></li><li><ahref="P-wtmd.html#SP6">§6. Contiguous runs of words</a></li><li><ahref="P-wtmd.html#SP7">§7. Hypothetical words</a></li><li><ahref="P-wtmd.html#SP8">§8. Rock, paper, scissors</a></li><li><ahref="P-wtmd.html#SP9">§9. Traditional identifiers</a></li><li><ahref="P-wtmd.html#SP10">§10. Preform</a></li></ul><hrclass="tocbar">
<pclass="commentary firstcommentary"><aid="SP1"class="paragraph-anchor"></a><b>§1. Prerequisites. </b>The words module is a part of the Inform compiler toolset. It is
presented as a literate program or "web". Before diving in:
</p>
<ulclass="items"><li>(a) It helps to have some experience of reading webs: see <ahref="../../../inweb/docs/index.html"class="internal">inweb</a> for more.
</li><li>(b) The module is written in C, in fact ANSI C99, but this is disguised by the
fact that it uses some extension syntaxes provided by the <ahref="../../../inweb/docs/index.html"class="internal">inweb</a> literate
programming tool, making it a dialect of C called InC. See <ahref="../../../inweb/docs/index.html"class="internal">inweb</a> for
full details, but essentially: it's C without predeclarations or header files,
and where functions have names like <spanclass="extract"><spanclass="extract-syntax">Tags::add_by_name</span></span> rather than <spanclass="extract"><spanclass="extract-syntax">add_by_name</span></span>.
<pclass="commentary firstcommentary"><aid="SP2"class="paragraph-anchor"></a><b>§2. Words, words, words. </b>Natural language text for use with Inform begins as text files written by
a <ahref="2-vcb.html#SP1"class="internal">vocabulary_entry</a> object for each of these distinct words, and <ahref="3-lxr.html#SP19"class="internal">Lexer::word</a>
<spanclass="plain-syntax"></span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(17) == </span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(25) </span><spanclass="comment-syntax"> both are uses of "Mary"</span>
<spanclass="plain-syntax"></span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(21) == </span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(29) </span><spanclass="comment-syntax"> both are uses of "lamb"</span>
<spanclass="plain-syntax"></span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(20) != </span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(24) </span><spanclass="comment-syntax"> one is "little", the other "that"</span>
</pre>
<pclass="commentary">The important point is that words at two positions can be tested for textual
equality in an essentially instant process, by comparing <spanclass="extract"><spanclass="extract-syntax">vocabulary_entry *</span></span>
<pclass="commentary">Nothing in life is free, and building the vocabulary efficiently is itself a
challenge: see <ahref="2-vcb.html#SP13"class="internal">Vocabulary::hash_code_from_word</a>. The key function is
<ahref="2-vcb.html#SP15"class="internal">Vocabulary::entry_for_text</a>, which takes a wide C string for a word and
returns its <ahref="2-vcb.html#SP1"class="internal">vocabulary_entry</a>. There are also issues with casing: in
general we want "Lamb" and "lamb" to match, but not always.
</p>
<ulclass="footnotetexts"><liclass="footnote"id="fn:1"><pclass="inwebfootnote"><supid="fnref:1"><ahref="#fn:1"rel="footnote">1</a></sup> A piece of text in double-quotes is treated as a single word by the lexer,
although <ahref="../inform7/index.html"class="internal">inform7</a> may later unroll text substitutions in it, calling the
lexer again to do that.
<ahref="#fnref:1"title="return to text">↩</a></p></li></ul>
<pclass="commentary firstcommentary"><aid="SP3"class="paragraph-anchor"></a><b>§3. </b>A few <ahref="2-vcb.html#SP1"class="internal">vocabulary_entry</a> objects are hardwired into <ahref="index.html"class="internal">words</a>, but only
<spanclass="plain-syntax"></span><spanclass="function-syntax">Lexer::word</span><spanclass="plain-syntax">(27) == </span><spanclass="identifier-syntax">COMMA_V</span><spanclass="plain-syntax"></span><spanclass="comment-syntax"> the comma between "went" and "the"</span>
<pclass="commentary">See <ahref="2-vcb.html#SP2"class="internal">Vocabulary::create_punctuation</a>, and also <ahref="4-lp.html#SP6"class="internal">LoadPreform::create_punctuation</a>,
where further punctuation marks are created in order to parse Preform syntax —
<pclass="commentary firstcommentary"><aid="SP4"class="paragraph-anchor"></a><b>§4. </b>Lexical errors occur if words are too long, or quoted text continues without
<pclass="commentary firstcommentary"><aid="SP5"class="paragraph-anchor"></a><b>§5. Meaning codes. </b>Each <ahref="2-vcb.html#SP1"class="internal">vocabulary_entry</a> has a bitmap of <spanclass="extract"><spanclass="extract-syntax">*_MC</span></span> meaning codes assigned to it.
(And <ahref="2-vcb.html#SP10"class="internal">Vocabulary::test_flags</a> tests whether the Nth word has a given bit.)
For example, <spanclass="extract"><spanclass="extract-syntax">ORDINAL_MC</span></span> is applied to ordinal numbers like "sixth" or "15th"
— see <ahref="2-vcb.html#SP17"class="internal">Vocabulary::an_ordinal_number</a>, and <spanclass="extract"><spanclass="extract-syntax">NUMBER_MC</span></span> to cardinals. The
<ahref="index.html"class="internal">words</a> module uses only a few bits in this map, but the <ahref="../linguistics-module/index.html"class="internal">linguistics</a>
module develops the idea much further: for example, any word which can be used
in a particular semantic category — say, in a variable name — is marked
with a bit representing that — say, <spanclass="extract"><spanclass="extract-syntax">VARIABLE_MC</span></span>. The <ahref="../core-module/index.html"class="internal">core</a> module
uses this for 15 or so of the most commonly used semantic categories in the
Inform language. See <ahref="../linguistics-module/P-wtmd.html"class="internal">What This Module Does (in linguistics)</a> to pick up the story.
<pclass="commentary firstcommentary"><aid="SP6"class="paragraph-anchor"></a><b>§6. Contiguous runs of words. </b>Natural languages are fundamentally unlike programming languages because a noun
<pclass="commentary firstcommentary"><aid="SP7"class="paragraph-anchor"></a><b>§7. Hypothetical words. </b>Sometimes Inform needs to make hypothetical passages of text. For example,
is called a "feed": see <ahref="3-fds.html"class="internal">Feeds</a>. In particular, <ahref="3-fds.html#SP3"class="internal">Feeds::feed_text</a> will
<pclass="commentary firstcommentary"><aid="SP8"class="paragraph-anchor"></a><b>§8. Rock, paper, scissors. </b>We now have three ways to represent text which may contain multiple words:
as a <spanclass="extract"><spanclass="extract-syntax">text_stream</span></span>, as a <spanclass="extract"><spanclass="extract-syntax">wording</span></span>, as a <spanclass="extract"><spanclass="extract-syntax">word_assemblage</span></span>. Each can be
<ulclass="items"><li>● Use <ahref="3-fds.html#SP3"class="internal">Feeds::feed_text</a> to turn a <spanclass="extract"><spanclass="extract-syntax">text_stream</span></span> to a <spanclass="extract"><spanclass="extract-syntax">wording</span></span>.
</li><li>● Use <ahref="2-wa.html#SP3"class="internal">WordAssemblages::from_wording</a> to turn a <spanclass="extract"><spanclass="extract-syntax">wording</span></span> to a <spanclass="extract"><spanclass="extract-syntax">word_assemblage</span></span>.
</li><li>● Use <ahref="2-wa.html#SP6"class="internal">WordAssemblages::to_wording</a> to turn a <spanclass="extract"><spanclass="extract-syntax">word_assemblage</span></span> to a <spanclass="extract"><spanclass="extract-syntax">wording</span></span>.
</li><li>● Use <ahref="3-wrd.html#SP21"class="internal">Wordings::writer</a> or use the formatted <spanclass="extract"><spanclass="extract-syntax">WRITE</span></span> escape <spanclass="extract"><spanclass="extract-syntax">%W</span></span> to
write a <spanclass="extract"><spanclass="extract-syntax">wording</span></span> into a <spanclass="extract"><spanclass="extract-syntax">text_stream</span></span>.
</li><li>● Use <ahref="2-wa.html#SP8"class="internal">WordAssemblages::writer</a> or use the formatted <spanclass="extract"><spanclass="extract-syntax">WRITE</span></span> escape <spanclass="extract"><spanclass="extract-syntax">%A</span></span> to
write a <spanclass="extract"><spanclass="extract-syntax">word_assemblage</span></span> into a <spanclass="extract"><spanclass="extract-syntax">text_stream</span></span>.
<pclass="commentary firstcommentary"><aid="SP9"class="paragraph-anchor"></a><b>§9. Traditional identifiers. </b>Imagine you're a compiler turning natural language into some sort of computer
code, just hypothetically: then you probably want "a little lamb" to come out
as a named location in memory, or object, or something like that: and this name
must be a valid identifier for some other compiler or assembler — alphanumeric,
not too long, and so on. Calling it "a little lamb" is not an option.
</p>
<pclass="commentary">You could of course name it <spanclass="extract"><spanclass="extract-syntax">ref_15A40F</span></span>, or some such, because the user will
never see it anyway, so why have a helpful name? But that won't make debugging
your output easy. The function <ahref="3-idn.html#SP3"class="internal">Identifiers::compose</a> therefore takes a
wording and a unique ID number and makes something sensible: <spanclass="extract"><spanclass="extract-syntax">I15_a_little_lamb</span></span>,
<pclass="commentary firstcommentary"><aid="SP10"class="paragraph-anchor"></a><b>§10. Preform. </b>Preform is a meta-language for writing a simple grammar: it's in some sense