1
0
Fork 0
mirror of https://github.com/ganelson/inform.git synced 2024-07-05 00:24:22 +03:00
inform7/docs/words-module/3-lxr.html
2023-04-25 10:37:50 +01:00

1360 lines
201 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Lexer</title>
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
<meta name="viewport" content="width=device-width initial-scale=1">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
<script>
function togglePopup(material_id) {
var popup = document.getElementById(material_id);
popup.classList.toggle("show");
}
</script>
<link href="../docs-assets/Popups.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
</head>
<body class="commentary-font">
<nav role="navigation">
<h1><a href="../index.html">
<img src="../docs-assets/Inform.png" height=72">
</a></h1>
<ul><li><a href="../index.html">home</a></li>
</ul><h2>Compiler</h2><ul>
<li><a href="../structure.html">structure</a></li>
<li><a href="../inbuildn.html">inbuild</a></li>
<li><a href="../inform7n.html">inform7</a></li>
<li><a href="../intern.html">inter</a></li>
<li><a href="../services.html">services</a></li>
<li><a href="../secrets.html">secrets</a></li>
</ul><h2>Other Tools</h2><ul>
<li><a href="../inblorbn.html">inblorb</a></li>
<li><a href="../indocn.html">indoc</a></li>
<li><a href="../inform6.html">inform6</a></li>
<li><a href="../inpolicyn.html">inpolicy</a></li>
<li><a href="../inrtpsn.html">inrtps</a></li>
</ul><h2>Resources</h2><ul>
<li><a href="../extensions.html">extensions</a></li>
<li><a href="../kits.html">kits</a></li>
</ul><h2>Repository</h2><ul>
<li><a href="https://github.com/ganelson/inform"><img src="../docs-assets/github.png" height=18> github</a></li>
</ul><h2>Related Projects</h2><ul>
<li><a href="../../../inweb/index.html">inweb</a></li>
<li><a href="../../../intest/index.html">intest</a></li>
</ul>
</nav>
<main role="main">
<!--Weave of 'Lexer' generated by Inweb-->
<div class="breadcrumbs">
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../services.html">Services</a></li><li><a href="index.html">words</a></li><li><a href="index.html#3">Chapter 3: Words in Sequence</a></li><li><b>Lexer</b></li></ul></div>
<p class="purpose">To break down a stream of characters into a numbered sequence of words, literal strings and literal I6 inclusions, removing comments and unnecessary whitespace.</p>
<ul class="toc"><li><a href="3-lxr.html#SP6">&#167;6. The lexical structure of source text</a></li><li><a href="3-lxr.html#SP10">&#167;10. What the lexer stores for each word</a></li><li><a href="3-lxr.html#SP16">&#167;16. External lexer states</a></li><li><a href="3-lxr.html#SP17">&#167;17. Definition of punctuation</a></li><li><a href="3-lxr.html#SP18">&#167;18. Definition of indentation</a></li><li><a href="3-lxr.html#SP19">&#167;19. Access functions</a></li><li><a href="3-lxr.html#SP20">&#167;20. Definition of white space</a></li><li><a href="3-lxr.html#SP21">&#167;21. Internal lexer states</a></li><li><a href="3-lxr.html#SP25">&#167;25. Feeding the lexer</a></li><li><a href="3-lxr.html#SP27">&#167;27. Lexing one character at a time</a></li><li><a href="3-lxr.html#SP27_1">&#167;27.1. Dealing with whitespace</a></li><li><a href="3-lxr.html#SP27_5">&#167;27.5. Completing a word</a></li><li><a href="3-lxr.html#SP27_6">&#167;27.6. Entering and leaving literal mode</a></li><li><a href="3-lxr.html#SP27_8">&#167;27.8. Breaking strings up at text substitutions</a></li><li><a href="3-lxr.html#SP29">&#167;29. Splicing</a></li><li><a href="3-lxr.html#SP30">&#167;30. Basic command-line error handler</a></li><li><a href="3-lxr.html#SP31">&#167;31. Logging absolutely everything</a></li></ul><hr class="tocbar">
<p class="commentary firstcommentary"><a id="SP1" class="paragraph-anchor"></a><b>&#167;1. </b>Lexical analysis is the process of reading characters from the source
text files and forming them into globs which we call "words": the part of
Inform which does this is the "lexical analyser", or lexer for short. The
algorithms in this chapter are entirely routine, but occasional eye-opening
moments come because natural language does not have the rigorous division
between lexical and semantic parsing which programming language theory
expects. For instance, we want Inform to be case insensitive for the most part,
but we cannot discard upper case entirely at the lexical stage because we
will need it later to decide whether punctuation at the end of a quotation
is meant to end the sentence making the quote, or not. Humans certainly
read these differently:
</p>
<blockquote>
<p>Say "Hello!" with alarm, ... Say "Hello!" With alarm, ...</p>
</blockquote>
<p class="commentary">And paragraph breaks can also have semantic meanings. A gap between two words
does not end a sentence, but a paragraph break between two words clearly does.
So semantic considerations occasionally infiltrate themselves into even the
earliest parts of this chapter.
</p>
<p class="commentary firstcommentary"><a id="SP2" class="paragraph-anchor"></a><b>&#167;2. </b>We must never lose sight of the origin of text, because we may need to
print problem messages back to the user which refer to that original material.
We record the provenance of text using the following structure; the
<span class="extract"><span class="extract-syntax">lexer_position</span></span> is such a structure, and marks where the lexer is
currently reading.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">source_file</span><span class="plain-syntax"> *</span><span class="identifier-syntax">file_of_origin</span><span class="plain-syntax">; </span><span class="comment-syntax"> or </span><span class="extract"><span class="extract-syntax">NULL</span></span><span class="comment-syntax"> if internally written and not from a file</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">line_number</span><span class="plain-syntax">; </span><span class="comment-syntax"> counting upwards from 1 within file (if any)</span>
<span class="plain-syntax">} </span><span class="reserved-syntax">source_location</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>The structure source_location is accessed in 3/tff and here.</li></ul>
<p class="commentary firstcommentary"><a id="SP3" class="paragraph-anchor"></a><b>&#167;3. </b>When words are being invented by the compiler, we use:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::as_if_from_nowhere</span><button class="popup" onclick="togglePopup('usagePopup1')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup1">Usage of <span class="code-font"><span class="function-syntax">Lexer::as_if_from_nowhere</span></span>:<br/>Feeds - <a href="3-fds.html#SP4_1">&#167;4.1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">as_if_from_nowhere</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">as_if_from_nowhere</span><span class="plain-syntax">.</span><span class="element-syntax">file_of_origin</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">as_if_from_nowhere</span><span class="plain-syntax">.</span><span class="element-syntax">line_number</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">as_if_from_nowhere</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP4" class="paragraph-anchor"></a><b>&#167;4. </b>And while lexing, we maintain:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_position</span><span class="plain-syntax">;</span>
</pre>
<p class="commentary firstcommentary"><a id="SP5" class="paragraph-anchor"></a><b>&#167;5. </b>A word can be an English word such as <span class="extract"><span class="extract-syntax">bedspread</span></span>, or a piece of punctuation
such as <span class="extract"><span class="extract-syntax">!</span></span>, or a number such as <span class="extract"><span class="extract-syntax">127</span></span>, or a piece of quoted text of arbitrary
size such as <span class="extract"><span class="extract-syntax">"I summon up remembrance of things past"</span></span>.
</p>
<p class="commentary">The words found are numbered 0, 1, 2, ... in order of being read by
the lexer. The first eight or so words come from the mandatory insertion
text (see Read Source Text.w), then come the words from the primary source
text, then those from the extensions loaded.
</p>
<p class="commentary">References to text throughout Inform's data structure are often in the form
of a pair of word numbers, usually called <span class="extract"><span class="extract-syntax">w1</span></span> and <span class="extract"><span class="extract-syntax">w2</span></span> or some variation
on that, indicating the text which starts at word <span class="extract"><span class="extract-syntax">w1</span></span> and finishes
at <span class="extract"><span class="extract-syntax">w2</span></span> (including both ends). Thus if the text is
</p>
<blockquote>
<p>When to the sessions of sweet silent thought</p>
</blockquote>
<p class="commentary">then the eight words are numbered 0 to 7 and a reference to <span class="extract"><span class="extract-syntax">w1=2</span></span>, <span class="extract"><span class="extract-syntax">w2=5</span></span>
would mean the sub-text "the sessions of sweet". The special null value
<span class="extract"><span class="extract-syntax">wn=-1</span></span> is used when no word reference has been made: never 0, as that would
mean the first word in the list. The maximum legal word number is always one
less than the following variable's value.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">; </span><span class="comment-syntax"> Number of words read in to arrays</span>
</pre>
<p class="commentary firstcommentary"><a id="SP6" class="paragraph-anchor"></a><b>&#167;6. The lexical structure of source text. </b>The following definitions are fairly self-evident: they specify which
characters cause word divisions, or signal literals.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">STRING_BEGIN</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="comment-syntax"> Strings are always double-quoted</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">STRING_END</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">TEXT_SUBSTITUTION_BEGIN</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="comment-syntax"> Inside strings, this denotes a text substitution</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">TEXT_SUBSTITUTION_END</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">TEXT_SUBSTITUTION_SEPARATOR</span><span class="plain-syntax"> </span><span class="character-syntax">','</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">COMMENT_BEGIN</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="comment-syntax"> Text between these, outside strings, is comment</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">COMMENT_END</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">INFORM6_ESCAPE_BEGIN_1</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="comment-syntax"> Text beginning with this pair is literal I6 code</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">INFORM6_ESCAPE_BEGIN_2</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">INFORM6_ESCAPE_END_1</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">INFORM6_ESCAPE_END_2</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">PARAGRAPH_BREAK</span><span class="plain-syntax"> </span><span class="identifier-syntax">L</span><span class="string-syntax">"|__"</span><span class="plain-syntax"> </span><span class="comment-syntax"> Inserted as a special word to mark paragraph breaks</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">UNICODE_CHAR_IN_STRING</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="constant-syntax">0x1b</span><span class="plain-syntax">) </span><span class="comment-syntax"> To represent awkward characters in metadata only</span>
</pre>
<p class="commentary firstcommentary"><a id="SP7" class="paragraph-anchor"></a><b>&#167;7. </b>This is the standard set used for parsing source text.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">STANDARD_PUNCTUATION_MARKS</span><span class="plain-syntax"> </span><span class="identifier-syntax">L</span><span class="string-syntax">".,:;?!(){}[]"</span><span class="plain-syntax"> </span><span class="comment-syntax"> Do not add to this list lightly!</span>
</pre>
<p class="commentary firstcommentary"><a id="SP8" class="paragraph-anchor"></a><b>&#167;8. </b>This seems a good point to describe how best to syntax-colour source
text, something which the user interfaces do on every platform. By
convention we are sparing with the colours: ordinary word-processing
is not a kaleidoscopic experience (even when Microsoft Word's impertinent
grammar checker is accidentally left switched on), and we want the experience
of writing Inform source text to be like writing, not like programming.
So we use just a little colour, and that goes a long way.
</p>
<p class="commentary">Because the Inform applications generally syntax-colour source text in the
Source panel of the user interface, it is probably worth writing down the
lexical specification. There are eight basic categories of text, and
they should be detected in the following order, with the first category
that applies being the one to determine the colour and/or font weight:
</p>
<ul class="items"><li>(1) Titling text (primary source text only: not found in extensions).
If the first non-whitespace in the file is a double-quoted text (see (4a)),
this is the title of the work.
</li><li>(2) Documentation text (extension text only: not found in primary source).
If a paragraph consists of a single non-whitespace token only, and that
token is <span class="extract"><span class="extract-syntax">----</span></span> (four hyphens in a row), then this paragraph and all
subsequent text down to the bottom of the file.
</li><li>(3) Heading text. If a paragraph consists of a single line only and which
begins with one of the five words Volume, Book, Part, Chapter or Section,
capitalised as here, then that paragraph is a heading. (A paragraph
division is found at the start and end of a file, and also at any run
of white space containing two or more newline characters: a newline
can be any of the Unicode characters <span class="extract"><span class="extract-syntax">0x000A</span></span>, <span class="extract"><span class="extract-syntax">0x2028</span></span> or <span class="extract"><span class="extract-syntax">0x2029</span></span>.)
</li><li>(4a) Quoted text. Outside of (4b) and (4c), a double-quotation mark
(in principle any of Unicode <span class="extract"><span class="extract-syntax">0x0022</span></span>, <span class="extract"><span class="extract-syntax">0x201C</span></span>, <span class="extract"><span class="extract-syntax">0x201D</span></span>) begins
quoted text provided it follows either whitespace, or the start of
the file, or one of the punctuation marks in the <span class="extract"><span class="extract-syntax">PUNCTUATION_MARKS</span></span>
string defined above. Quoted text continues until the next
double-quotation mark (or the end of the file if there isn't one,
though Inform would issue Problems if asked to compile this).
</li><li>(4a1) Text substitution text. Within (4a) only, an open square bracket
introduced text substitution matter which continues until the next
close square bracket or the end of the quoted text. (Again, Inform would
issue problem messages if given a string malformed in this way.)
</li><li>(4b) Comment text. Outside of (4a) and (4c), an open square bracket begins
comment. Comment continues until the next matching close square
bracket. (This is the case even if that is in double quotes within the
comment, i.e., quotation marks should be ignored when matching <span class="extract"><span class="extract-syntax">[</span></span> and <span class="extract"><span class="extract-syntax">]</span></span>
inside a comment.) Thus, nested comments are allowed, and the following
text contains a single comment running from just after "the" through to
the full stop:
</li></ul>
<blockquote>
<p>|Snow White and the [Seven Dwarfs [but not Doc]].|</p>
</blockquote>
<ul class="items"><li>(4c) Literal I6 code. Outside of (4a) and (4b), the combination <span class="extract"><span class="extract-syntax">(-</span></span> begins
literal I6 matter. This matter continues until the next <span class="extract"><span class="extract-syntax">-)</span></span> is reached.
Within literal I6 matter, one can escape back into I7 source text using a
matched pair of <span class="extract"><span class="extract-syntax">(+</span></span> and <span class="extract"><span class="extract-syntax">+)</span></span> tokens, but it really doesn't seem worth
syntax colouring this very much. And the authors of Inform will lose no
sleep if we miscolour this, for instance, especially if it deters people
from such horrible coding practices:
</li></ul>
<blockquote>
<p>|(- Constant BLOB = (+ the total weight of things in (- selfobj -) +); -)|</p>
</blockquote>
<ul class="items"><li>(5) Normal text. Everything else.
</li></ul>
<p class="commentary">Inform regards all of the Unicode characters <span class="extract"><span class="extract-syntax">0x0009</span></span>, <span class="extract"><span class="extract-syntax">0x000A</span></span>, <span class="extract"><span class="extract-syntax">0x000D</span></span>,
<span class="extract"><span class="extract-syntax">0x0020</span></span>, <span class="extract"><span class="extract-syntax">0x0085</span></span>, <span class="extract"><span class="extract-syntax">0x00A0</span></span>, <span class="extract"><span class="extract-syntax">0x02000</span></span> to <span class="extract"><span class="extract-syntax">0x200A</span></span>, <span class="extract"><span class="extract-syntax">0x2028</span></span> and <span class="extract"><span class="extract-syntax">0x2029</span></span>
as instances of white space. Of course, it's entirely open to the Inform
user interfaces to not allow the user to key some of these codes, but
we should bear in mind that projects using them might be created on one
platform and then reopened on another one, so it's probably best to be
careful.
</p>
<p class="commentary firstcommentary"><a id="SP9" class="paragraph-anchor"></a><b>&#167;9. </b>These categories of text are conventionally displayed as follows:
</p>
<ul class="items"><li>(1) Titling text: black boldface.
</li><li>(2) Documentation text: grey type.
</li><li>(3) Heading text: black boldface, perhaps of a slightly larger point
size.
</li><li>(4a) Quoted text: dark blue boldface.
</li><li>(4a1) Text substitution text: lighter blue and not boldface.
</li><li>(4b) Comment text: darkish green type, perhaps of a slightly smaller point
size.
</li><li>(4c) Literal I6 code: grey type. (Inform for OS X rather coolly goes into
I6 syntax-colouring, which is considerably harder, for this material:
see "The Inform 6 Technical Manual" for an algorithm.)
</li><li>(5) Normal text: black type.
</li></ul>
<p class="commentary firstcommentary"><a id="SP10" class="paragraph-anchor"></a><b>&#167;10. What the lexer stores for each word. </b>The lexer builds a small data structure for each individual word it reads.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lw_text</span><span class="plain-syntax">; </span><span class="comment-syntax"> text of word after treatment to normalise</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lw_rawtext</span><span class="plain-syntax">; </span><span class="comment-syntax"> original untouched text of word</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_source</span><span class="plain-syntax">; </span><span class="comment-syntax"> where it was read from</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_break</span><span class="plain-syntax">; </span><span class="comment-syntax"> the divider (space, tab, etc.) preceding it</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lw_identity</span><span class="plain-syntax">; </span><span class="comment-syntax"> which distinct word</span>
<span class="plain-syntax">} </span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">lexer_details</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lw_array</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">; </span><span class="comment-syntax"> a dynamically allocated (and mobile) array</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_details_memory_allocated</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> bytes allocated to this array</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_workspace_allocated</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> bytes allocated to text storage</span>
</pre>
<ul class="endnotetexts"><li>The structure lexer_details is private to this section.</li></ul>
<p class="commentary firstcommentary"><a id="SP11" class="paragraph-anchor"></a><b>&#167;11. </b>The following bounds on how much we can read are immutable without
editing and recompiling Inform.
</p>
<p class="commentary">Some readers will be wondering about Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochuchaf
(the upper old part of the village of Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch,
on the Welsh isle of Anglesey), but this has a mere 63 letters, and in any case
the name was "improved" by the village cobbler in the mid-19th century to
make it a tourist attraction for the new railway age.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">TEXT_STORAGE_CHUNK_SIZE</span><span class="plain-syntax"> </span><span class="constant-syntax">600000</span><span class="plain-syntax"> </span><span class="comment-syntax"> Must exceed </span><span class="extract"><span class="extract-syntax">MAX_VERBATIM_LENGTH+MAX_WORD_LENGTH</span></span>
<span class="definition-keyword">define</span> <span class="constant-syntax">MAX_VERBATIM_LENGTH</span><span class="plain-syntax"> </span><span class="constant-syntax">200000</span><span class="plain-syntax"> </span><span class="comment-syntax"> Largest quantity of Inform 6 which can be quoted verbatim.</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">MAX_WORD_LENGTH</span><span class="plain-syntax"> </span><span class="constant-syntax">128</span><span class="plain-syntax"> </span><span class="comment-syntax"> Maximum length of any unquoted word</span>
</pre>
<p class="commentary firstcommentary"><a id="SP12" class="paragraph-anchor"></a><b>&#167;12. </b>The main text area of memory has a simple structure: it is allocated in
one contiguous block, and at any given time the memory is used from the
lowest address up to (but not including) the "high water mark", a pointer
in effect to the first free character.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_workspace</span><span class="plain-syntax">; </span><span class="comment-syntax"> Large area of contiguous memory for text</span>
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">; </span><span class="comment-syntax"> Start of current word in workspace</span>
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">; </span><span class="comment-syntax"> High water mark of workspace</span>
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_workspace_end</span><span class="plain-syntax">; </span><span class="comment-syntax"> Pointer to just past the end of the workspace: HWM must not exceed this</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::start</span><button class="popup" onclick="togglePopup('usagePopup2')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup2">Usage of <span class="code-font"><span class="function-syntax">Lexer::start</span></span>:<br/>Words Module - <a href="1-wm.html#SP3">&#167;3</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP13" class="function-link"><span class="function-syntax">Lexer::ensure_space_up_to</span></a><span class="plain-syntax">(50000); </span><span class="comment-syntax"> the Standard Rules are about 44,000 words</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP14" class="function-link"><span class="function-syntax">Lexer::allocate_lexer_workspace_chunk</span></a><span class="plain-syntax">(1);</span>
<span class="plain-syntax"> </span><a href="2-vcb.html#SP14" class="function-link"><span class="function-syntax">Vocabulary::start_hash_table</span></a><span class="plain-syntax">();</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP13" class="paragraph-anchor"></a><b>&#167;13. </b>These are quite hefty memory allocations, with the expensive one &mdash;
<span class="extract"><span class="extract-syntax">lw_source</span></span> &mdash; also being the least essential to Inform's running. But at least
we use memory in a way at least vaguely related to the size of the source
text, never using more than twice what we need, and we impose no absolute
upper limits.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="identifier-syntax">next_lw_array_size</span><span class="plain-syntax"> = </span><span class="constant-syntax">75000</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::ensure_space_up_to</span><button class="popup" onclick="togglePopup('usagePopup3')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup3">Usage of <span class="code-font"><span class="function-syntax">Lexer::ensure_space_up_to</span></span>:<br/><a href="3-lxr.html#SP12">&#167;12</a>, <a href="3-lxr.html#SP27_5_2">&#167;27.5.2</a>, <a href="3-lxr.html#SP29">&#167;29</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">n</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">new_size</span><span class="plain-syntax"> = </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">n</span><span class="plain-syntax"> &gt;= </span><span class="identifier-syntax">new_size</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_size</span><span class="plain-syntax"> = </span><span class="identifier-syntax">next_lw_array_size</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">next_lw_array_size</span><span class="plain-syntax"> = </span><span class="identifier-syntax">next_lw_array_size</span><span class="plain-syntax">*2;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_details_memory_allocated</span><span class="plain-syntax"> = </span><span class="identifier-syntax">new_size</span><span class="plain-syntax">*((</span><span class="reserved-syntax">int</span><span class="plain-syntax">) </span><span class="reserved-syntax">sizeof</span><span class="plain-syntax">(</span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax"> *</span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax"> =</span>
<span class="plain-syntax"> ((</span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax"> *) (</span><span class="identifier-syntax">Memory::calloc</span><span class="plain-syntax">(</span><span class="identifier-syntax">new_size</span><span class="plain-syntax">, </span><span class="reserved-syntax">sizeof</span><span class="plain-syntax">(</span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax">), </span><span class="constant-syntax">LEXER_WORDS_MREASON</span><span class="plain-syntax">)));</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">MEMORY_OUT_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">exit</span><span class="plain-syntax">(1); </span><span class="comment-syntax"> in case the handler fails to do this</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">new_size</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax">) </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">] = </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">];</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_break</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax">.</span><span class="element-syntax">file_of_origin</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax">.</span><span class="element-syntax">line_number</span><span class="plain-syntax"> = -1;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">].</span><span class="element-syntax">lw_identity</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">) </span><span class="identifier-syntax">Memory::I7_array_free</span><span class="plain-syntax">(</span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">, </span><span class="constant-syntax">LEXER_WORDS_MREASON</span><span class="plain-syntax">,</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax">, ((</span><span class="reserved-syntax">int</span><span class="plain-syntax">) </span><span class="reserved-syntax">sizeof</span><span class="plain-syntax">(</span><span class="reserved-syntax">lexer_details</span><span class="plain-syntax">)));</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax"> = </span><span class="identifier-syntax">new_lw_array</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">current_lw_array_size</span><span class="plain-syntax"> = </span><span class="identifier-syntax">new_size</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP14" class="paragraph-anchor"></a><b>&#167;14. </b>Inform would almost certainly crash if we wrote past the end of the
workspace, so we need to watch for the water running high. The following
routine checks that there is room for another <span class="extract"><span class="extract-syntax">n</span></span> characters, plus a
termination character, plus breathing space for a single character's worth
of lookahead:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::ensure_lexer_hwm_can_be_raised_by</span><button class="popup" onclick="togglePopup('usagePopup4')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup4">Usage of <span class="code-font"><span class="function-syntax">Lexer::ensure_lexer_hwm_can_be_raised_by</span></span>:<br/><a href="3-lxr.html#SP15">&#167;15</a>, <a href="3-lxr.html#SP27">&#167;27</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">transfer_partial_word</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax"> + </span><span class="identifier-syntax">n</span><span class="plain-syntax"> + </span><span class="constant-syntax">2</span><span class="plain-syntax"> &gt;= </span><span class="identifier-syntax">lexer_workspace_end</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">old_hwm</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">m</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">transfer_partial_word</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">m</span><span class="plain-syntax"> = (((</span><span class="reserved-syntax">int</span><span class="plain-syntax">) (</span><span class="identifier-syntax">old_hwm</span><span class="plain-syntax"> - </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">) + </span><span class="identifier-syntax">n</span><span class="plain-syntax"> + </span><span class="constant-syntax">3</span><span class="plain-syntax">)/</span><span class="constant-syntax">TEXT_STORAGE_CHUNK_SIZE</span><span class="plain-syntax">) + </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">m</span><span class="plain-syntax"> &lt; </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="identifier-syntax">m</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP14" class="function-link"><span class="function-syntax">Lexer::allocate_lexer_workspace_chunk</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">m</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">transfer_partial_word</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = </span><span class="character-syntax">' '</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">new_lword</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">old_hwm</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = *(</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">++);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax"> = </span><span class="identifier-syntax">new_lword</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax"> + </span><span class="identifier-syntax">n</span><span class="plain-syntax"> + </span><span class="constant-syntax">2</span><span class="plain-syntax"> &gt;= </span><span class="identifier-syntax">lexer_workspace_end</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"further allocation failed to liberate enough space"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::allocate_lexer_workspace_chunk</span><button class="popup" onclick="togglePopup('usagePopup5')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup5">Usage of <span class="code-font"><span class="function-syntax">Lexer::allocate_lexer_workspace_chunk</span></span>:<br/><a href="3-lxr.html#SP12">&#167;12</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">multiplier</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">extent</span><span class="plain-syntax"> = </span><span class="identifier-syntax">multiplier</span><span class="plain-syntax"> * </span><span class="constant-syntax">TEXT_STORAGE_CHUNK_SIZE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_workspace</span><span class="plain-syntax"> = ((</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *) (</span><span class="identifier-syntax">Memory::calloc</span><span class="plain-syntax">(</span><span class="identifier-syntax">extent</span><span class="plain-syntax">, </span><span class="reserved-syntax">sizeof</span><span class="plain-syntax">(</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">), </span><span class="constant-syntax">LEXER_TEXT_MREASON</span><span class="plain-syntax">)));</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_workspace_allocated</span><span class="plain-syntax"> += </span><span class="identifier-syntax">extent</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_workspace</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_workspace_end</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_workspace</span><span class="plain-syntax"> + </span><span class="identifier-syntax">extent</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP15" class="paragraph-anchor"></a><b>&#167;15. </b>We occasionally want to reprocess the text of a word again in higher-level
parsing, and it's convenient to use the lexer workspace to store the results
of such a reprocessed text. The following routine makes a persistent copy
of its argument, then: it should never be used while the lexer is actually
running.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="function-syntax">Lexer::copy_to_memory</span><button class="popup" onclick="togglePopup('usagePopup6')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup6">Usage of <span class="code-font"><span class="function-syntax">Lexer::copy_to_memory</span></span>:<br/>Numbered Words - <a href="2-nw.html#SP7">&#167;7</a></span></button><span class="plain-syntax">(</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">p</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP14" class="function-link"><span class="function-syntax">Lexer::ensure_lexer_hwm_can_be_raised_by</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">Wide::len</span><span class="plain-syntax">(</span><span class="identifier-syntax">p</span><span class="plain-syntax">), </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">q</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax"> = </span><span class="identifier-syntax">q</span><span class="plain-syntax"> + </span><span class="identifier-syntax">Wide::len</span><span class="plain-syntax">(</span><span class="identifier-syntax">p</span><span class="plain-syntax">) + </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">wcscpy</span><span class="plain-syntax">(</span><span class="identifier-syntax">q</span><span class="plain-syntax">, </span><span class="identifier-syntax">p</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">q</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP16" class="paragraph-anchor"></a><b>&#167;16. External lexer states. </b>The lexer is a finite state machine at heart. Its current state is the
collective value of an extensive set of variables, almost all of them
flags, but with three exceptions this state is used only within the lexer.
</p>
<p class="commentary">The three exceptional modes are by default both off and by default they
stay off: the lexer never goes into either mode by itself.
</p>
<p class="commentary"><span class="extract"><span class="extract-syntax">lexer_divide_strings_at_text_substitutions</span></span> is used by some of the lexical writing-back
machinery, when it has been decided to compile something like
</p>
<blockquote>
<p>say "[The noun] falls onto [the second noun]."</p>
</blockquote>
<p class="commentary">In its ordinary mode, with this setting off, the lexer will render this as
two words, the second being the entire quoted text. But if
<span class="extract"><span class="extract-syntax">lexer_divide_strings_at_text_substitutions</span></span> is set then the text is reinterpreted as
</p>
<blockquote>
<p>say The noun, " falls onto ", the second noun, "."</p>
</blockquote>
<p class="commentary">which runs to eleven words, three of them commas (punctuation always counts
as a word).
</p>
<p class="commentary"><span class="extract"><span class="extract-syntax">lexer_wait_for_dashes</span></span> is set by the extension-reading machinery, in
cases where it wants to get at the documentation text of an extension but
does not want to have to fill Inform's memory with the source text of its code.
In this mode, the lexer ignores the whole stream of words until it reaches
<span class="extract"><span class="extract-syntax">----</span></span>, the special marker used in extensions to divide source text from
documentation: it then drops out of this mode and back into normal running,
so that subsequent words are lexed as usual.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_punctuation_marks</span><span class="plain-syntax"> = </span><span class="identifier-syntax">L</span><span class="string-syntax">""</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_divide_strings_at_text_substitutions</span><span class="plain-syntax">; </span><span class="comment-syntax"> Break up text substitutions in quoted text</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_allow_I6_escapes</span><span class="plain-syntax">; </span><span class="comment-syntax"> Recognise </span><span class="extract"><span class="extract-syntax">(-</span></span><span class="comment-syntax"> and </span><span class="extract"><span class="extract-syntax">-)</span></span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wait_for_dashes</span><span class="plain-syntax">; </span><span class="comment-syntax"> Ignore all text until first </span><span class="extract"><span class="extract-syntax">----</span></span><span class="comment-syntax"> found</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_break_at_slashes</span><span class="plain-syntax">;</span>
</pre>
<p class="commentary firstcommentary"><a id="SP17" class="paragraph-anchor"></a><b>&#167;17. Definition of punctuation. </b>As we have seen, the question of whether something is a punctuation mark
or not depends slightly on the context:
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::is_punctuation</span><button class="popup" onclick="togglePopup('usagePopup7')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup7">Usage of <span class="code-font"><span class="function-syntax">Lexer::is_punctuation</span></span>:<br/><a href="3-lxr.html#SP26">&#167;26</a><br/>Text From Files - <a href="3-tff.html#SP4">&#167;4</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">lexer_punctuation_marks</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="identifier-syntax">lexer_punctuation_marks</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">])</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP18" class="paragraph-anchor"></a><b>&#167;18. Definition of indentation. </b>We're going to record the level of indentation in the "break" character.
We will recognise anything from 1 to 25 tabs as distinct indentation amounts;
a value of 26 means "26 or more", and at such sizes, indentation isn't
distinguished. We'll do this with the letters <span class="extract"><span class="extract-syntax">A</span></span> to <span class="extract"><span class="extract-syntax">Z</span></span>.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">GROSS_AMOUNT_OF_INDENTATION</span><span class="plain-syntax"> </span><span class="constant-syntax">26</span>
</pre>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::indentation_level</span><button class="popup" onclick="togglePopup('usagePopup8')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup8">Usage of <span class="code-font"><span class="function-syntax">Lexer::indentation_level</span></span>:<br/>Wordings - <a href="3-wrd.html#SP20">&#167;20</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">q</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_break</span><span class="plain-syntax"> - </span><span class="character-syntax">'A'</span><span class="plain-syntax"> + </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">q</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">1</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">q</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">GROSS_AMOUNT_OF_INDENTATION</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">q</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::break_char_for_indents</span><button class="popup" onclick="togglePopup('usagePopup9')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup9">Usage of <span class="code-font"><span class="function-syntax">Lexer::break_char_for_indents</span></span>:<br/><a href="3-lxr.html#SP27_2">&#167;27.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">t</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">t</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"bad indentation break"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">t</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">26</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'Z'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'A'</span><span class="plain-syntax"> + </span><span class="identifier-syntax">t</span><span class="plain-syntax"> - </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP19" class="paragraph-anchor"></a><b>&#167;19. Access functions. </b></p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="function-syntax">Lexer::word</span><button class="popup" onclick="togglePopup('usagePopup10')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup10">Usage of <span class="code-font"><span class="function-syntax">Lexer::word</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP6">&#167;6</a>, <a href="2-vcb.html#SP10">&#167;10</a>, <a href="2-vcb.html#SP11">&#167;11</a>, <a href="2-vcb.html#SP16">&#167;16</a><br/>Word Assemblages - <a href="2-wa.html#SP3">&#167;3</a>, <a href="2-wa.html#SP8">&#167;8</a>, <a href="2-wa.html#SP9">&#167;9</a>, <a href="2-wa.html#SP10">&#167;10</a><br/>Numbered Words - <a href="2-nw.html#SP1">&#167;1</a><br/>Wordings - <a href="3-wrd.html#SP7">&#167;7</a>, <a href="3-wrd.html#SP17">&#167;17</a>, <a href="3-wrd.html#SP18">&#167;18</a>, <a href="3-wrd.html#SP19">&#167;19</a><br/>Text From Files - <a href="3-tff.html#SP4">&#167;4</a><br/>Loading Preform - <a href="4-lp.html#SP7">&#167;7</a>, <a href="4-lp.html#SP7_2">&#167;7.2</a>, <a href="4-lp.html#SP7_3">&#167;7.3</a>, <a href="4-lp.html#SP14_1_1">&#167;14.1.1</a>, <a href="4-lp.html#SP14_1_1_1">&#167;14.1.1.1</a>, <a href="4-lp.html#SP14_1_3">&#167;14.1.3</a>, <a href="4-lp.html#SP15_2">&#167;15.2</a><br/>Nonterminal Incidences - <a href="4-ni.html#SP4">&#167;4</a>, <a href="4-ni.html#SP5">&#167;5</a>, <a href="4-ni.html#SP8_1">&#167;8.1</a>, <a href="4-ni.html#SP8_2">&#167;8.2</a>, <a href="4-ni.html#SP8_3">&#167;8.3</a>, <a href="4-ni.html#SP8_4">&#167;8.4</a>, <a href="4-ni.html#SP8_5">&#167;8.5</a><br/>Preform - <a href="4-prf.html#SP1_3_2_1_4_4_2_3_1_3">&#167;1.3.2.1.4.4.2.3.1.3</a>, <a href="4-prf.html#SP3">&#167;3</a><br/>Basic Nonterminals - <a href="4-bn.html#SP1">&#167;1</a>, <a href="4-bn.html#SP6">&#167;6</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_identity</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::set_word</span><button class="popup" onclick="togglePopup('usagePopup11')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup11">Usage of <span class="code-font"><span class="function-syntax">Lexer::set_word</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP3">&#167;3</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">, </span><span class="reserved-syntax">vocabulary_entry</span><span class="plain-syntax"> *</span><span class="identifier-syntax">ve</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_identity</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ve</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::break_before</span><button class="popup" onclick="togglePopup('usagePopup12')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup12">Usage of <span class="code-font"><span class="function-syntax">Lexer::break_before</span></span>:<br/><a href="3-lxr.html#SP31">&#167;31</a><br/>Wordings - <a href="3-wrd.html#SP20">&#167;20</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_break</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">source_file</span><span class="plain-syntax"> *</span><span class="function-syntax">Lexer::file_of_origin</span><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax">.</span><span class="element-syntax">file_of_origin</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::line_of_origin</span><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax">.</span><span class="element-syntax">line_number</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::word_location</span><button class="popup" onclick="togglePopup('usagePopup13')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup13">Usage of <span class="code-font"><span class="function-syntax">Lexer::word_location</span></span>:<br/>Wordings - <a href="3-wrd.html#SP10">&#167;10</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">wn</span><span class="plain-syntax"> &lt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">nowhere</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">nowhere</span><span class="plain-syntax">.</span><span class="element-syntax">file_of_origin</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">nowhere</span><span class="plain-syntax">.</span><span class="element-syntax">line_number</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">nowhere</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::set_word_location</span><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">, </span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">sl</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">wn</span><span class="plain-syntax"> &lt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"can't set word location"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax"> = </span><span class="identifier-syntax">sl</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="function-syntax">Lexer::word_raw_text</span><button class="popup" onclick="togglePopup('usagePopup14')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup14">Usage of <span class="code-font"><span class="function-syntax">Lexer::word_raw_text</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP3">&#167;3</a><br/>Numbered Words - <a href="2-nw.html#SP1">&#167;1</a>, <a href="2-nw.html#SP2">&#167;2</a>, <a href="2-nw.html#SP4">&#167;4</a>, <a href="2-nw.html#SP5">&#167;5</a>, <a href="2-nw.html#SP6">&#167;6</a>, <a href="2-nw.html#SP7">&#167;7</a><br/>Wordings - <a href="3-wrd.html#SP15">&#167;15</a>, <a href="3-wrd.html#SP21_3">&#167;21.3</a>, <a href="3-wrd.html#SP21_4">&#167;21.4</a><br/>Loading Preform - <a href="4-lp.html#SP15_1">&#167;15.1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::set_word_raw_text</span><button class="popup" onclick="togglePopup('usagePopup15')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup15">Usage of <span class="code-font"><span class="function-syntax">Lexer::set_word_raw_text</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP4">&#167;4</a><br/>Numbered Words - <a href="2-nw.html#SP7">&#167;7</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">, </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">rt</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax"> = </span><span class="identifier-syntax">rt</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="function-syntax">Lexer::word_text</span><button class="popup" onclick="togglePopup('usagePopup16')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup16">Usage of <span class="code-font"><span class="function-syntax">Lexer::word_text</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP3">&#167;3</a>, <a href="2-vcb.html#SP6">&#167;6</a><br/>Numbered Words - <a href="2-nw.html#SP2">&#167;2</a>, <a href="2-nw.html#SP7">&#167;7</a><br/>Wordings - <a href="3-wrd.html#SP16">&#167;16</a><br/>Text From Files - <a href="3-tff.html#SP4">&#167;4</a><br/>Identifiers - <a href="3-idn.html#SP3">&#167;3</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::set_word_text</span><button class="popup" onclick="togglePopup('usagePopup17')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup17">Usage of <span class="code-font"><span class="function-syntax">Lexer::set_word_text</span></span>:<br/>Vocabulary - <a href="2-vcb.html#SP4">&#167;4</a><br/>Numbered Words - <a href="2-nw.html#SP7">&#167;7</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">, </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">rt</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax"> = </span><span class="identifier-syntax">rt</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::word_copy</span><button class="popup" onclick="togglePopup('usagePopup18')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup18">Usage of <span class="code-font"><span class="function-syntax">Lexer::word_copy</span></span>:<br/><a href="3-lxr.html#SP29">&#167;29</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">to</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">from</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">to</span><span class="plain-syntax">] = </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">from</span><span class="plain-syntax">];</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::writer</span><button class="popup" onclick="togglePopup('usagePopup19')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup19">Usage of <span class="code-font"><span class="function-syntax">Lexer::writer</span></span>:<br/>Words Module - <a href="1-wm.html#SP3">&#167;3</a></span></button><span class="plain-syntax">(</span><span class="identifier-syntax">OUTPUT_STREAM</span><span class="plain-syntax">, </span><span class="reserved-syntax">char</span><span class="plain-syntax"> *</span><span class="identifier-syntax">format_string</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">wn</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">wn</span><span class="plain-syntax"> &lt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">wn</span><span class="plain-syntax"> &gt;= </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">format_string</span><span class="plain-syntax">[0]) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'+'</span><span class="plain-syntax">: </span><span class="identifier-syntax">WRITE</span><span class="plain-syntax">(</span><span class="string-syntax">"%w"</span><span class="plain-syntax">, </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax">); </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'~'</span><span class="plain-syntax">:</span>
<span class="plain-syntax"> </span><a href="2-nw.html#SP8" class="function-link"><span class="function-syntax">Word::compile_to_I6_dictionary</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">, </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax">, </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'&lt;'</span><span class="plain-syntax">:</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">STREAM_USES_UTF8</span><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">)) </span><span class="identifier-syntax">Streams::enable_XML_escapes</span><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">WRITE</span><span class="plain-syntax">(</span><span class="string-syntax">"%w"</span><span class="plain-syntax">, </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">STREAM_USES_UTF8</span><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">)) </span><span class="identifier-syntax">Streams::disable_XML_escapes</span><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'N'</span><span class="plain-syntax">: </span><span class="identifier-syntax">WRITE</span><span class="plain-syntax">(</span><span class="string-syntax">"%w"</span><span class="plain-syntax">, </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">wn</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax">); </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">default:</span><span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"bad %N extension"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP20" class="paragraph-anchor"></a><b>&#167;20. Definition of white space. </b>The following macro (to save time over a function call) is highly dangerous,
and of the kind which all books on C counsel against. If it were called with
any argument whose evaluation had side-effects, disaster would ensue.
It is therefore used only twice, with care, and only in this section below.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="identifier-syntax">is_whitespace</span><span class="plain-syntax">(</span><span class="identifier-syntax">c</span><span class="plain-syntax">) ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">' '</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\t'</span><span class="plain-syntax">))</span>
</pre>
<p class="commentary firstcommentary"><a id="SP21" class="paragraph-anchor"></a><b>&#167;21. Internal lexer states. </b>The current situation of the lexer is specified by the collective values
of all of the following. First, the start of the current word being
recorded, and the current high water mark &mdash; those are defined above.
Second, we need the feeder machinery to maintain a variable telling us
the previous character in the raw, un-respaced source. We need to be a
little careful about the type of this: it needs to be an <span class="extract"><span class="extract-syntax">int</span></span> so that it
can on occasion hold the pseudo-character value <span class="extract"><span class="extract-syntax">EOF</span></span>.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_previous_char_in_raw_feed</span><span class="plain-syntax">; </span><span class="comment-syntax"> Preceding character in raw file read</span>
</pre>
<p class="commentary firstcommentary"><a id="SP22" class="paragraph-anchor"></a><b>&#167;22. </b>There are four kinds of word: ordinary words, [comments in square brackets],
"strings in double quotes," and <span class="extract"><span class="extract-syntax">(- I6_inclusion_text -)</span></span>. The latter
three are kinds are collectively called literals. As each word is read,
the variable <span class="extract"><span class="extract-syntax">lxs_kind_of_word</span></span> holds what it is currently believed to be.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">COMMENT_KW</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">STRING_KW</span><span class="plain-syntax"> </span><span class="constant-syntax">2</span>
<span class="definition-keyword">define</span> <span class="constant-syntax">I6_INCLUSION_KW</span><span class="plain-syntax"> </span><span class="constant-syntax">3</span>
</pre>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax">; </span><span class="comment-syntax"> One of the defined values above</span>
</pre>
<p class="commentary firstcommentary"><a id="SP23" class="paragraph-anchor"></a><b>&#167;23. </b>While there are a pile of state variables below, the basic situation is that
the lexer has two main modes: ordinary mode and literal mode, determined
by whether <span class="extract"><span class="extract-syntax">lxs_literal_mode</span></span> is false or true. It might look as if this
variable is redundant &mdash; can't we simply see whether <span class="extract"><span class="extract-syntax">lxs_kind_of_word</span></span>
is <span class="extract"><span class="extract-syntax">ORDINARY_KW</span></span> or not? &mdash; but in fact we return to ordinary mode slightly
before we finish recording a literal, as we shall see, so it is important
to be able to switch in and out of literal mode without changing the kind
of word.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax">; </span><span class="comment-syntax"> Are we in literal or ordinary mode?</span>
<span class="comment-syntax"> significant in ordinary mode:</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax">; </span><span class="comment-syntax"> Most significant whitespace character preceding</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax">; </span><span class="comment-syntax"> Number of consecutive tabs</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax">; </span><span class="comment-syntax"> Current line white space so far?</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_word_is_empty_so_far</span><span class="plain-syntax">; </span><span class="comment-syntax"> Looking for a word to start?</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_scanning_text_substitution</span><span class="plain-syntax">; </span><span class="comment-syntax"> Used to break up strings at [substitutions]</span>
<span class="comment-syntax"> significant in literal mode:</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax">; </span><span class="comment-syntax"> For square brackets within square brackets</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax">; </span><span class="comment-syntax"> Used to fold strings which break across lines</span>
</pre>
<p class="commentary firstcommentary"><a id="SP24" class="paragraph-anchor"></a><b>&#167;24. </b>The lexer needs to be reset each time it is used on a given feed of text,
whether from a file or internally. Note that this resets both external
and internal states to their defaults (the default for external states
always being "off").
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::reset_lexer</span><button class="popup" onclick="togglePopup('usagePopup20')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup20">Usage of <span class="code-font"><span class="function-syntax">Lexer::reset_lexer</span></span>:<br/><a href="3-lxr.html#SP25">&#167;25</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_previous_char_in_raw_feed</span><span class="plain-syntax"> = </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> reset the external states</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wait_for_dashes</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_punctuation_marks</span><span class="plain-syntax"> = </span><span class="constant-syntax">STANDARD_PUNCTUATION_MARKS</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_divide_strings_at_text_substitutions</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_allow_I6_escapes</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_break_at_slashes</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> reset the internal states</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> = </span><span class="character-syntax">'\n'</span><span class="plain-syntax">; </span><span class="comment-syntax"> we imagine each lexer feed starting a new line</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> but not yet indented with tabs</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="comment-syntax"> clearly</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_word_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="comment-syntax"> likewise</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">; </span><span class="comment-syntax"> begin in ordinary mode...</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax">; </span><span class="comment-syntax"> ...expecting an ordinary word</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_scanning_text_substitution</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP25" class="paragraph-anchor"></a><b>&#167;25. Feeding the lexer. </b>The lexer takes its input as a stream of characters, sent from a "feeder
routine": there are two of these, one sending the stream from a file, the
other from a C string. A feeder routine is required to:
</p>
<ul class="items"><li>(1) call <span class="extract"><span class="extract-syntax">Lexer::feed_begins</span></span> before sending the first character,
</li><li>(2) send ISO Latin-1 characters which also exist in ZSCII, in sequence,
via <span class="extract"><span class="extract-syntax">Lexer::feed_triplet</span></span>,
</li><li>(3) conclude by calling <span class="extract"><span class="extract-syntax">Lexer::feed_ends</span></span>.
</li></ul>
<p class="commentary">Only one feeder can be active at a time, as the following routines ensure.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> = -1;</span>
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::feed_begins</span><button class="popup" onclick="togglePopup('usagePopup21')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup21">Usage of <span class="code-font"><span class="function-syntax">Lexer::feed_begins</span></span>:<br/>Text From Files - <a href="3-tff.html#SP2">&#167;2</a><br/>Feeds - <a href="3-fds.html#SP4_1">&#167;4.1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">source_location</span><span class="plain-syntax"> </span><span class="identifier-syntax">sl</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"one lexer feeder interrupted another"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_position</span><span class="plain-syntax"> = </span><span class="identifier-syntax">sl</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP24" class="function-link"><span class="function-syntax">Lexer::reset_lexer</span></a><span class="plain-syntax">();</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOGIF</span><span class="plain-syntax">(</span><span class="identifier-syntax">LEXICAL_OUTPUT</span><span class="plain-syntax">, </span><span class="string-syntax">"Lexer feed began at %d\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax">);</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">wording</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::feed_ends</span><button class="popup" onclick="togglePopup('usagePopup22')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup22">Usage of <span class="code-font"><span class="function-syntax">Lexer::feed_ends</span></span>:<br/>Text From Files - <a href="3-tff.html#SP2">&#167;2</a><br/>Feeds - <a href="3-fds.html#SP4_2">&#167;4.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">extra_padding</span><span class="plain-syntax">, </span><span class="identifier-syntax">text_stream</span><span class="plain-syntax"> *</span><span class="identifier-syntax">problem_source_description</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> == -1) </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"lexer feeder ended without starting"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP25_1" class="named-paragraph-link"><span class="named-paragraph">Feed whitespace as padding</span><span class="named-paragraph-number">25.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">RRW</span><span class="plain-syntax"> = </span><span class="constant-syntax">EMPTY_WORDING</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> &lt; </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">RRW</span><span class="plain-syntax"> = </span><a href="3-wrd.html#SP5" class="function-link"><span class="function-syntax">Wordings::new</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_feed_started_at</span><span class="plain-syntax"> = -1;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOGIF</span><span class="plain-syntax">(</span><span class="identifier-syntax">LEXICAL_OUTPUT</span><span class="plain-syntax">, </span><span class="string-syntax">"Lexer feed ended at %d\n"</span><span class="plain-syntax">, </span><a href="3-wrd.html#SP7" class="function-link"><span class="function-syntax">Wordings::first_wn</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">RRW</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP25_3" class="named-paragraph-link"><span class="named-paragraph">Issue Problem messages if feed ended in the middle of quoted text, comment or verbatim I6</span><span class="named-paragraph-number">25.3</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">RRW</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP25_1" class="paragraph-anchor"></a><b>&#167;25.1. </b>White space padding guarantees that a word running right up to the end of
the feed will be processed, since (outside literal mode) that white space
signals to the lexer that a word is complete. (If we are in literal mode at
the end of the feed, problem messages are produced. We code Inform to ensure
that this never occurs when feeding our own C strings through.)
</p>
<p class="commentary">At the end of each complete file, we also want to ensure there is always a
paragraph break, because this simplifies the parsing of headings (which in
turn is because a file boundary counts as a super-heading-break, and headings
are only detected as stand-alone paragraphs). We add a bit more white
space than is strictly necessary, because it saves worrying about whether
it is safe to look ahead to characters further on in the lexer's workspace
when we are close to the high water mark, and because it means that a source
file which is empty or contains only a byte-order marker comes out as at
least one paragraph, even if a blank one.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Feed whitespace as padding</span><span class="named-paragraph-number">25.1</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">extra_padding</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">'\n'</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">'\n'</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">'\n'</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">'\n'</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP25">&#167;25</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP25_2" class="paragraph-anchor"></a><b>&#167;25.2. </b>These problem messages can, of course, never result from text which Inform
is feeding into the lexer itself, independently of source files. That would
be a bug, and Inform is bug-free, so it follows that it could never happen.
</p>
<pre class="definitions code-font"><span class="definition-keyword">enum</span> <span class="constant-syntax">MEMORY_OUT_LEXERERROR</span><span class="plain-syntax"> </span><span class="identifier-syntax">from</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">STRING_NEVER_ENDS_LEXERERROR</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">COMMENT_NEVER_ENDS_LEXERERROR</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">I6_NEVER_ENDS_LEXERERROR</span>
</pre>
<p class="commentary firstcommentary"><a id="SP25_3" class="paragraph-anchor"></a><b>&#167;25.3. </b><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Issue Problem messages if feed ended in the middle of quoted text, comment or verbatim I6</span><span class="named-paragraph-number">25.3</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> != </span><span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">20</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Last words: %W\n"</span><span class="plain-syntax">, </span><a href="3-wrd.html#SP5" class="function-link"><span class="function-syntax">Wordings::new</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-20, </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1));</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Last words: %W\n"</span><span class="plain-syntax">, </span><a href="3-wrd.html#SP5" class="function-link"><span class="function-syntax">Wordings::new</span></a><span class="plain-syntax">(0, </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1));</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"No words recorded\n"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">STRING_KW</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">STRING_NEVER_ENDS_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">problem_source_description</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">COMMENT_KW</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">COMMENT_NEVER_ENDS_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">problem_source_description</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">I6_INCLUSION_KW</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">I6_NEVER_ENDS_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">problem_source_description</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP25">&#167;25</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP26" class="paragraph-anchor"></a><b>&#167;26. </b>The feeder routine is required to send us a triple each time: <span class="extract"><span class="extract-syntax">cr</span></span>
must be a valid character (see above) and may not be <span class="extract"><span class="extract-syntax">EOF</span></span>; <span class="extract"><span class="extract-syntax">last_cr</span></span> must
be the previous one or else perhaps <span class="extract"><span class="extract-syntax">EOF</span></span> at the start of feed;
while <span class="extract"><span class="extract-syntax">next_cr</span></span> must be the next or else perhaps <span class="extract"><span class="extract-syntax">EOF</span></span> at the end of feed.
</p>
<p class="commentary">Spaces, often redundant, are inserted around punctuation unless one of the
following exceptions holds:
</p>
<p class="commentary">The lexer is in literal mode (inside strings, for instance);
</p>
<p class="commentary">Where a single punctuation mark occurs in between two digits, or between
a digit and a minus sign, or (in the case of full stops) between two lower-case
alphanumeric characters. This is done so that, for instance, "0.91" does
not split into three words in the lexer. We do not count square brackets
here, because if we did, that would cause trouble in parsing
</p>
<blockquote>
<p>say "[if M is less than 10]0[otherwise]1";</p>
</blockquote>
<p class="commentary">where the <span class="extract"><span class="extract-syntax">0]0</span></span> would go unbroken in <span class="extract"><span class="extract-syntax">lexer_divide_strings_at_text_substitutions</span></span>
mode, and therefore the <span class="extract"><span class="extract-syntax">]</span></span> would remain glued to the preceding text;
</p>
<p class="commentary">Where the character following is a slash. (This is done essentially to make
most common URLs glue up as single words.)
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::feed_triplet</span><button class="popup" onclick="togglePopup('usagePopup23')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup23">Usage of <span class="code-font"><span class="function-syntax">Lexer::feed_triplet</span></span>:<br/>Text From Files - <a href="3-tff.html#SP2">&#167;2</a><br/>Feeds - <a href="3-fds.html#SP4">&#167;4</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">last_cr</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">cr</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">next_cr</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_previous_char_in_raw_feed</span><span class="plain-syntax"> = </span><span class="identifier-syntax">last_cr</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><a href="3-lxr.html#SP17" class="function-link"><span class="function-syntax">Lexer::is_punctuation</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">cr</span><span class="plain-syntax">)) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">space</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax">)) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">space</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">cr</span><span class="plain-syntax"> != </span><span class="character-syntax">'['</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">cr</span><span class="plain-syntax"> != </span><span class="character-syntax">']'</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">next_cr</span><span class="plain-syntax"> == </span><span class="character-syntax">'/'</span><span class="plain-syntax">) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">lc</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="identifier-syntax">nc</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Characters::isdigit</span><span class="plain-syntax">((</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">last_cr</span><span class="plain-syntax">)) </span><span class="identifier-syntax">lc</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">last_cr</span><span class="plain-syntax"> &gt;= </span><span class="character-syntax">'a'</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">last_cr</span><span class="plain-syntax"> &lt;= </span><span class="character-syntax">'z'</span><span class="plain-syntax">)) </span><span class="identifier-syntax">lc</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Characters::isdigit</span><span class="plain-syntax">((</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">next_cr</span><span class="plain-syntax">)) </span><span class="identifier-syntax">nc</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">next_cr</span><span class="plain-syntax"> == </span><span class="character-syntax">'-'</span><span class="plain-syntax">) </span><span class="identifier-syntax">nc</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">next_cr</span><span class="plain-syntax"> &gt;= </span><span class="character-syntax">'a'</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">next_cr</span><span class="plain-syntax"> &lt;= </span><span class="character-syntax">'z'</span><span class="plain-syntax">)) </span><span class="identifier-syntax">nc</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lc</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">nc</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">)) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">cr</span><span class="plain-syntax"> == </span><span class="character-syntax">'.'</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">lc</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">nc</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">)) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lexer_break_at_slashes</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">cr</span><span class="plain-syntax"> == </span><span class="character-syntax">'/'</span><span class="plain-syntax">)) </span><span class="identifier-syntax">space</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">space</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">cr</span><span class="plain-syntax">); </span><span class="comment-syntax"> which might take us into literal mode, so to be careful...</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">) </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">cr</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">cr</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">lexer_position</span><span class="plain-syntax">.</span><span class="element-syntax">file_of_origin</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_position</span><span class="plain-syntax">.</span><span class="element-syntax">line_number</span><span class="plain-syntax">++;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP27" class="paragraph-anchor"></a><b>&#167;27. Lexing one character at a time. </b>We can think of characters as a stream of differently-coloured marbles,
flowing from various sources into a hopper above our marble-sorting
machine. The hopper lets the marbles drop through one at a time into the
mechanism below, but inserts transparent glass marbles of its own on either
side of certain colours of marble, so that the sequence of marbles entering
the mechanism is no longer the same as that which entered the hopper.
Moreover, the mechanism can itself cause extra marbles of its choice to
drop in from time to time, further interrupting the original flow.
</p>
<p class="commentary">The following routine is the mechanism which receives the marbles. We want
the marbles to run swiftly through and either be pulverised to glass
powder, or dropped into the output bucket, as the mechanism chooses.
(Whatever marbles from the original source survive will always emerge in
their original order, though.) Every so often the mechanism decides that it
has completed one batch, and moves on to dropping marbles into the next
bucket.
</p>
<p class="commentary">The marbles are characters; transparent glass ones are whitespace, which
will always now be <span class="extract"><span class="extract-syntax">' '</span></span>, <span class="extract"><span class="extract-syntax">'\t'</span></span> or <span class="extract"><span class="extract-syntax">'\n'</span></span>; the routine <span class="extract"><span class="extract-syntax">Lexer::feed_triplet</span></span>
above was the hopper; the routine <span class="extract"><span class="extract-syntax">Lexer::feed_char_into_lexer</span></span>, which occupies
the whole of the rest of this section, is the mechanism which takes each marble
in turn. (On occasion it calls itself recursively to cause extra characters of
its choice to drop in.) The batches are words, and the bucket receiving the
surviving marbles is the sequence of characters starting at <span class="extract"><span class="extract-syntax">lexer_word</span></span> and
extending to <span class="extract"><span class="extract-syntax">lexer_hwm-1</span></span>.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::feed_char_into_lexer</span><button class="popup" onclick="togglePopup('usagePopup24')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup24">Usage of <span class="code-font"><span class="function-syntax">Lexer::feed_char_into_lexer</span></span>:<br/><a href="3-lxr.html#SP25_1">&#167;25.1</a>, <a href="3-lxr.html#SP26">&#167;26</a>, <a href="3-lxr.html#SP27_3">&#167;27.3</a>, <a href="3-lxr.html#SP27_8">&#167;27.8</a>, <a href="3-lxr.html#SP27_9">&#167;27.9</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP14" class="function-link"><span class="function-syntax">Lexer::ensure_lexer_hwm_can_be_raised_by</span></a><span class="plain-syntax">(</span><span class="constant-syntax">MAX_WORD_LENGTH</span><span class="plain-syntax">, </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_7" class="named-paragraph-link"><span class="named-paragraph">Contemplate leaving literal mode</span><span class="named-paragraph-number">27.7</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">STRING_KW</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_8" class="named-paragraph-link"><span class="named-paragraph">Force string division at the start of a text substitution, if necessary</span><span class="named-paragraph-number">27.8</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_4" class="named-paragraph-link"><span class="named-paragraph">Soak up whitespace around line breaks inside a literal string</span><span class="named-paragraph-number">27.4</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> whitespace outside literal mode ends any partly built word and need not be recorded</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">is_whitespace</span><span class="plain-syntax">(</span><span class="identifier-syntax">c</span><span class="plain-syntax">))) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_1" class="named-paragraph-link"><span class="named-paragraph">Admire the texture of the whitespace</span><span class="named-paragraph-number">27.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax"> != </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">) </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_5" class="named-paragraph-link"><span class="named-paragraph">Complete the current word</span><span class="named-paragraph-number">27.5</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_3" class="named-paragraph-link"><span class="named-paragraph">Line break outside a literal</span><span class="named-paragraph-number">27.3</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> otherwise record the current character as part of the word being built</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = (</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_scanning_text_substitution</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_9" class="named-paragraph-link"><span class="named-paragraph">Force string division at the end of a text substitution, if necessary</span><span class="named-paragraph-number">27.9</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_this_word_is_empty_so_far</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_2" class="named-paragraph-link"><span class="named-paragraph">Look at recent whitespace to see what break it followed</span><span class="named-paragraph-number">27.2</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_6" class="named-paragraph-link"><span class="named-paragraph">Contemplate entering literal mode</span><span class="named-paragraph-number">27.6</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_word_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP27_1" class="paragraph-anchor"></a><b>&#167;27.1. Dealing with whitespace. </b>Let's deal with the different textures of whitespace first, as these are
surprisingly rich all by themselves.
</p>
<p class="commentary">The following keeps track of the biggest white space character it has seen
of late, ranking newlines bigger than tabs, which are in turn bigger than
spaces; and it counts up the number of tabs it has seen (cancelling
back to none if a newline is found).
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Admire the texture of the whitespace</span><span class="named-paragraph-number">27.1</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\t'</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> != </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> = </span><span class="character-syntax">'\t'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> = </span><span class="character-syntax">'\n'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_2" class="paragraph-anchor"></a><b>&#167;27.2. </b>To recall: we need to know what kind of whitespace prefaces each word
the lexer records.
</p>
<p class="commentary">When we record the first character of a new word, it cannot be whitespace,
but it probably follows a sequence of one or more whitespace characters,
and the code in the previous paragraph has been watching them for us.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Look at recent whitespace to see what break it followed</span><span class="named-paragraph-number">27.2</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (((</span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> &amp;&amp; (</span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">1</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_break</span><span class="plain-syntax"> =</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP18" class="function-link"><span class="function-syntax">Lexer::break_char_for_indents</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax">); </span><span class="comment-syntax"> newline followed by 1 or more tabs</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_break</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_most_significant_space_char</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">; </span><span class="comment-syntax"> waiting for the next run of whitespace, after this word</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_number_of_tab_stops</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_3" class="paragraph-anchor"></a><b>&#167;27.3. </b>Line breaks are usually like any other white space, if we are outside
literal mode, but we want to keep an eye out for paragraph breaks, because
these are sometimes semantically meaningful in Inform and so cannot be
discarded. A paragraph break is converted into a special "divider" word.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Line break outside a literal</span><span class="named-paragraph-number">27.3</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="constant-syntax">PARAGRAPH_BREAK</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="constant-syntax">PARAGRAPH_BREAK</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_line_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_4" class="paragraph-anchor"></a><b>&#167;27.4. </b>When working through a literal string, a new-line together with any
preceding whitespace is converted into a single space character, and we
enter "soak up spaces" mode: in which mode, any subsequent whitespace is
ignored until something else is reached. If we reach another new-line while
still soaking up, then the literal text contained a paragraph break. In
this instance, the splurge of whitespace is converted not to a single
space <span class="extract"><span class="extract-syntax">" "</span></span> but to two forced newlines in quick succession. In other words,
paragraph breaks in literal strings are converted to codes which will make
Inform print a paragraph break at run-time.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Soak up whitespace around line breaks inside a literal string</span><span class="named-paragraph-number">27.4</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax">(</span><span class="identifier-syntax">c</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">' '</span><span class="plain-syntax">: </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'\t'</span><span class="plain-syntax">: </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">-1); </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">--; </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="character-syntax">'\n'</span><span class="plain-syntax">:</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">-1) = </span><span class="identifier-syntax">NEWLINE_IN_STRING</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NEWLINE_IN_STRING</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">default:</span><span class="plain-syntax"> </span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">; </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\n'</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">is_whitespace</span><span class="plain-syntax">(*(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">-1))) </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">--;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_5" class="paragraph-anchor"></a><b>&#167;27.5. Completing a word. </b>Outside of whitespace, then, our word (whatever it was &mdash; ordinary word,
literal string, I6 insertion or comment) has been stored character by
character at the steadily rising high water mark. We have now hit the end
by reaching whitespace (in the case of a literal, this has happened because
we found the end of the literal, escaped literal mode, and then hit
whitespace). The start of the word is at <span class="extract"><span class="extract-syntax">lexer_word</span></span>; the last character
is stored just below <span class="extract"><span class="extract-syntax">lexer_hwm</span></span>.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Complete the current word</span><span class="named-paragraph-number">27.5</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> *</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++ = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> terminate the current word as a C string</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lexer_wait_for_dashes</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">Wide::cmp</span><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">, </span><span class="identifier-syntax">L</span><span class="string-syntax">"----"</span><span class="plain-syntax">) == </span><span class="constant-syntax">0</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wait_for_dashes</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">; </span><span class="comment-syntax"> our long wait for documentation is over</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lexer_wait_for_dashes</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> != </span><span class="constant-syntax">COMMENT_KW</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_5_1" class="named-paragraph-link"><span class="named-paragraph">Issue problem message and truncate if over maximum length for what it is</span><span class="named-paragraph-number">27.5.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="3-lxr.html#SP27_5_2" class="named-paragraph-link"><span class="named-paragraph">Store everything about the word except its break, which we already know</span><span class="named-paragraph-number">27.5.2</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> now get ready for what we expect by default to be an ordinary word next</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_this_word_is_empty_so_far</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_5_1" class="paragraph-anchor"></a><b>&#167;27.5.1. </b>Note that here we are recording either an ordinary word, a literal string
or a literal I6 insertion: comments are also literal, but are thrown away,
and do not come here.
</p>
<pre class="definitions code-font"><span class="definition-keyword">define</span> <span class="constant-syntax">MAX_STRING_LENGTH</span><span class="plain-syntax"> </span><span class="constant-syntax">8</span><span class="plain-syntax">*1024</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">STRING_TOO_LONG_LEXERERROR</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">WORD_TOO_LONG_LEXERERROR</span>
<span class="definition-keyword">enum</span> <span class="constant-syntax">I6_TOO_LONG_LEXERERROR</span>
</pre>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Issue problem message and truncate if over maximum length for what it is</span><span class="named-paragraph-number">27.5.1</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">len</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Wide::len</span><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">), </span><span class="identifier-syntax">max_len</span><span class="plain-syntax"> = </span><span class="constant-syntax">MAX_WORD_LENGTH</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">STRING_KW</span><span class="plain-syntax">) </span><span class="identifier-syntax">max_len</span><span class="plain-syntax"> = </span><span class="constant-syntax">MAX_STRING_LENGTH</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">I6_INCLUSION_KW</span><span class="plain-syntax">) </span><span class="identifier-syntax">max_len</span><span class="plain-syntax"> = </span><span class="constant-syntax">MAX_VERBATIM_LENGTH</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">len</span><span class="plain-syntax"> &gt; </span><span class="identifier-syntax">max_len</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">[</span><span class="identifier-syntax">max_len</span><span class="plain-syntax">] = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> truncate to its maximum length</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">STRING_KW</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">STRING_TOO_LONG_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">I6_INCLUSION_KW</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">[100] = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax"> to avoid an absurdly long problem message</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">I6_TOO_LONG_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP30" class="function-link"><span class="function-syntax">Lexer::lexer_problem_handler</span></a><span class="plain-syntax">(</span><span class="constant-syntax">WORD_TOO_LONG_LEXERERROR</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27_5">&#167;27.5</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_5_2" class="paragraph-anchor"></a><b>&#167;27.5.2. </b>We recorded the break for the word when it started (recall that, even if
the current word is a literal, its first character was read outside literal
mode, so it started out in life as an ordinary word and therefore had its
break recorded). So now we need to set everything else about it, and to
increment the word-count. We must not allow this to reach its maximum,
since this would allow the next word's break setting to overwrite the
array.
</p>
<p class="commentary">For ordinary words (but not literals), the copy of a word in the main array
<span class="extract"><span class="extract-syntax">lw_text</span></span> is lowered in case. The original is preserved in <span class="extract"><span class="extract-syntax">lw_rawtext</span></span> and
is used to print more attractive error messages, and also to enable a few
semantic parts of Inform to be case sensitive. This copying means that in the
worst case &mdash; when we complete an ordinary word of maximal length &mdash; we need
to consume an additional <span class="extract"><span class="extract-syntax">MAX_WORD_LENGTH+2</span></span> bytes of the lexer's workspace,
which is why that was the amount we checked to ensure existed when the
lexer was called. The lowering loop can therefore never overspill the
workspace.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Store everything about the word except its break, which we already know</span><span class="named-paragraph-number">27.5.2</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_source</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_position</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> == </span><span class="constant-syntax">ORDINARY_KW</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = </span><span class="identifier-syntax">Characters::tolower</span><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_word</span><span class="plain-syntax">[</span><span class="identifier-syntax">i</span><span class="plain-syntax">]);</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_text</span><span class="plain-syntax"> = </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><a href="2-vcb.html#SP3" class="function-link"><span class="function-syntax">Vocabulary::identify_word</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">); </span><span class="comment-syntax"> which sets </span><span class="extract"><span class="extract-syntax">lw_array[lexer_wordcount].lw_identity</span></span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP13" class="function-link"><span class="function-syntax">Lexer::ensure_space_up_to</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">);</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27_5">&#167;27.5</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_6" class="paragraph-anchor"></a><b>&#167;27.6. Entering and leaving literal mode. </b>After a character has been stored, in ordinary mode, we see if it
provokes us into entering literal mode, by signifying the start of a
comment, string or passage of verbatim Inform 6.
</p>
<p class="commentary">In the case of a string, we positively want to keep the opening character
just recorded as part of the word: it's the opening double-quote mark.
In the case of a comment, we don't care, as we're going to throw it away
anyhow; as it happens, we keep it for now. But in the case of an I6
escape we are in danger, because of the auto-spacing around brackets, of
recording two words
</p>
<blockquote>
<p>|( -something|</p>
</blockquote>
<p class="commentary">when in fact we want to record
</p>
<blockquote>
<p>|(- something|</p>
</blockquote>
<p class="commentary">We do this by adding a hyphen to the previous word (the <span class="extract"><span class="extract-syntax">(</span></span> word), and by
throwing away the hyphen from the material of the current word.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Contemplate entering literal mode</span><span class="named-paragraph-number">27.6</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax">(</span><span class="identifier-syntax">c</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">COMMENT_BEGIN:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">COMMENT_KW</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">STRING_BEGIN:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">STRING_KW</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">INFORM6_ESCAPE_BEGIN_2:</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lxs_previous_char_in_raw_feed</span><span class="plain-syntax"> != </span><span class="constant-syntax">INFORM6_ESCAPE_BEGIN_1</span><span class="plain-syntax">) ||</span>
<span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_allow_I6_escapes</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">)) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax"> = </span><span class="constant-syntax">I6_INCLUSION_KW</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="comment-syntax"> because of spacing around punctuation outside literal mode, the </span><span class="extract"><span class="extract-syntax">(</span></span><span class="comment-syntax"> became a word</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) { </span><span class="comment-syntax"> this should always be true: just being cautious</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1].</span><span class="element-syntax">lw_text</span><span class="plain-syntax"> = </span><span class="identifier-syntax">L</span><span class="string-syntax">"(-"</span><span class="plain-syntax">; </span><span class="comment-syntax"> change the previous word's text from </span><span class="extract"><span class="extract-syntax">(</span></span><span class="comment-syntax"> to </span><span class="extract"><span class="extract-syntax">(-</span></span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lw_array</span><span class="plain-syntax">[</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1].</span><span class="element-syntax">lw_rawtext</span><span class="plain-syntax"> = </span><span class="identifier-syntax">L</span><span class="string-syntax">"(-"</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><a href="2-vcb.html#SP3" class="function-link"><span class="function-syntax">Vocabulary::identify_word</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">-1); </span><span class="comment-syntax"> and re-identify</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">--; </span><span class="comment-syntax"> erase the just-recorded </span><span class="extract"><span class="extract-syntax">INFORM6_ESCAPE_BEGIN_2</span></span><span class="comment-syntax"> character</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_7" class="paragraph-anchor"></a><b>&#167;27.7. </b>So literal mode is used for comments, strings and verbatim passages of
Inform 6 code. We are in this mode when scanning only the middle of
the literal: after all, we scanned (and recorded) the start of the literal
in ordinary mode, before noticing that the character(s) marked the onset of
a literal.
</p>
<p class="commentary">Note that, when we leave literal mode, we set the current character to a
space. This means the character forcing our departure is lost and not
recorded: but we only actually want it in the case of strings (because
we prefer to record them in the form <span class="extract"><span class="extract-syntax">"frogs and lilies"</span></span> rather than
<span class="extract"><span class="extract-syntax">"frogs and lilies</span></span>, for tidiness's sake). And so for strings we explicitly
record a close quotation mark.
</p>
<p class="commentary">The new current character, being a space and thus whitespace outside of
literal mode, triggers the completion of the word, recording whatever
literal we have just made. (Or, if it was a comment, discarding it.)
<span class="extract"><span class="extract-syntax">lxs_kind_of_word</span></span> continues to hold the kind of literal we have just
finished.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Contemplate leaving literal mode</span><span class="named-paragraph-number">27.7</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax">(</span><span class="identifier-syntax">lxs_kind_of_word</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">COMMENT_KW:</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">COMMENT_BEGIN</span><span class="plain-syntax">) </span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax">++;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">COMMENT_END</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax">--;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_comment_nesting</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">STRING_KW:</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">STRING_END</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_string_soak_up_spaces_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">++) = (</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax">; </span><span class="comment-syntax"> record the </span><span class="extract"><span class="extract-syntax">STRING_END</span></span><span class="comment-syntax"> character as part of the word</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">I6_INCLUSION_KW:</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">INFORM6_ESCAPE_END_2</span><span class="plain-syntax">) &amp;&amp;</span>
<span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_previous_char_in_raw_feed</span><span class="plain-syntax"> == </span><span class="constant-syntax">INFORM6_ESCAPE_END_1</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">--; </span><span class="comment-syntax"> erase the </span><span class="extract"><span class="extract-syntax">INFORM6_ESCAPE_END_1</span></span><span class="comment-syntax"> character recorded last time</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">default:</span><span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"in unknown literal mode"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">lxs_literal_mode</span><span class="plain-syntax"> == </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">; </span><span class="comment-syntax"> trigger completion of this word</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_8" class="paragraph-anchor"></a><b>&#167;27.8. Breaking strings up at text substitutions. </b>When text contains text substitutions, these are ordinarily ignored by the
lexer, but in <span class="extract"><span class="extract-syntax">lexer_divide_strings_at_text_substitutions</span></span> mode, we need to
force strings to end and resume at the two ends of each substitution. For
instance:
</p>
<blockquote>
<p>"Hello, [greeted person]. Do you make it [supper time]?"</p>
</blockquote>
<p class="commentary">must be split as
</p>
<blockquote>
<p>|"Hello, " , greeted person , ". Do you make it " , supper time , "?"|</p>
</blockquote>
<p class="commentary">where our original single text literal is now three text literals, plus
eight ordinary words (four of them commas).
</p>
<p class="commentary">Note that each open square bracket, and each close square bracket, has been
removed and become a comma word. We see to open squares before we come
to recording the character, so to get rid of the <span class="extract"><span class="extract-syntax">[</span></span> character, we change
<span class="extract"><span class="extract-syntax">c</span></span> to a space:
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Force string division at the start of a text substitution, if necessary</span><span class="named-paragraph-number">27.8</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lexer_divide_strings_at_text_substitutions</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">TEXT_SUBSTITUTION_BEGIN</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="constant-syntax">STRING_END</span><span class="plain-syntax">); </span><span class="comment-syntax"> feed </span><span class="extract"><span class="extract-syntax">"</span></span><span class="comment-syntax"> to close the old string</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="constant-syntax">TEXT_SUBSTITUTION_SEPARATOR</span><span class="plain-syntax">); </span><span class="comment-syntax"> feed </span><span class="extract"><span class="extract-syntax">,</span></span><span class="comment-syntax"> to start new word</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">; </span><span class="comment-syntax"> the lexer now goes on to record a space, which will end the </span><span class="extract"><span class="extract-syntax">,</span></span><span class="comment-syntax"> word</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_scanning_text_substitution</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TRUE</span><span class="plain-syntax">; </span><span class="comment-syntax"> but remember that we must get back again</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP27_9" class="paragraph-anchor"></a><b>&#167;27.9. </b>Whereas we see to close squares after recording the character, so we have
to erase it to get rid of the <span class="extract"><span class="extract-syntax">]</span></span>. Note that since this was read in ordinary
mode, it was automatically spaced (being punctuation), and that therefore
the feeder above has just sent the second of a sequence of three characters:
space, <span class="extract"><span class="extract-syntax">]</span></span>, space. That means we have recorded, so far, a one-character
word in ordinary mode, whose text consists only of <span class="extract"><span class="extract-syntax">]</span></span>. By overwriting
this with a comma, we instead get a one-character word in ordinary mode
whose text consists only of a comma. We then feed a space to end that word;
then feed a double-quote to start text again.
</p>
<p class="commentary">But, it might be objected: surely the feeder above is still poised with
that third character in its sequence space, <span class="extract"><span class="extract-syntax">]</span></span>, space, and that means
it will now feed a spurious space into the start of our resumed text?
Happily, the answer is no: this is why the feeder above checks that it
is still in ordinary mode before sending that third character. Having
open quotes again, we have put the lexer into literal mode: and so the
spurious space is never fed, and there is no problem.
</p>
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Force string division at the end of a text substitution, if necessary</span><span class="named-paragraph-number">27.9</span></span><span class="comment-syntax"> =</span>
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">lexer_divide_strings_at_text_substitutions</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">TEXT_SUBSTITUTION_END</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lxs_scanning_text_substitution</span><span class="plain-syntax"> = </span><span class="identifier-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> *(</span><span class="identifier-syntax">lexer_hwm</span><span class="plain-syntax">-1) = </span><span class="constant-syntax">TEXT_SUBSTITUTION_SEPARATOR</span><span class="plain-syntax">; </span><span class="comment-syntax"> overwrite recorded copy of </span><span class="extract"><span class="extract-syntax">]</span></span><span class="comment-syntax"> with </span><span class="extract"><span class="extract-syntax">,</span></span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="character-syntax">' '</span><span class="plain-syntax">); </span><span class="comment-syntax"> then feed a space to end the </span><span class="extract"><span class="extract-syntax">,</span></span><span class="comment-syntax"> word</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP27" class="function-link"><span class="function-syntax">Lexer::feed_char_into_lexer</span></a><span class="plain-syntax">(</span><span class="constant-syntax">STRING_BEGIN</span><span class="plain-syntax">); </span><span class="comment-syntax"> then feed </span><span class="extract"><span class="extract-syntax">"</span></span><span class="comment-syntax"> to open a new string</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="3-lxr.html#SP27">&#167;27</a>.</li></ul>
<p class="commentary firstcommentary"><a id="SP28" class="paragraph-anchor"></a><b>&#167;28. </b>Finally, note that the breaking-up process may result in empty strings
where square brackets abut each other or the ends of the original string.
Thus
</p>
<blockquote>
<p>"[The noun] is on the [colour][style] table."</p>
</blockquote>
<p class="commentary">is split as: <span class="extract"><span class="extract-syntax">"" , The noun , " is on the " , colour , "" , style , " table."</span></span>
This is not a bug: empty strings are legal. It's for higher-level code to
remove them if they aren't wanted.
</p>
<p class="commentary firstcommentary"><a id="SP29" class="paragraph-anchor"></a><b>&#167;29. Splicing. </b>Once in a while, we need to have a run of words in the lexer which
all do occur in the source text, but not contiguously, so that they
cannot be represented by a pair <span class="extract"><span class="extract-syntax">(w1, w2)</span></span>. In that event we use the
following routine to splice duplicate references at the end of the word
list (this does not duplicate the text itself, only references to it):
for instance, if we start with 10 words (0 to 9) and then splice <span class="extract"><span class="extract-syntax">(2,3)</span></span>
and then <span class="extract"><span class="extract-syntax">(6,8)</span></span>, we end up with 15 words, and the text of <span class="extract"><span class="extract-syntax">(10,14)</span></span>
contains the same material as words 2, 3, 6, 7, 8.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">wording</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::splice_words</span><button class="popup" onclick="togglePopup('usagePopup25')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup25">Usage of <span class="code-font"><span class="function-syntax">Lexer::splice_words</span></span>:<br/>Feeds - <a href="3-fds.html#SP5">&#167;5</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">W</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">L</span><span class="plain-syntax"> = </span><a href="3-wrd.html#SP7" class="function-link"><span class="function-syntax">Wordings::length</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP13" class="function-link"><span class="function-syntax">Lexer::ensure_space_up_to</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> + </span><span class="identifier-syntax">L</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">L</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
<span class="plain-syntax"> </span><a href="3-lxr.html#SP19" class="function-link"><span class="function-syntax">Lexer::word_copy</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">+</span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><a href="3-wrd.html#SP7" class="function-link"><span class="function-syntax">Wordings::first_wn</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">W</span><span class="plain-syntax">)+</span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">wording</span><span class="plain-syntax"> </span><span class="identifier-syntax">N</span><span class="plain-syntax"> = </span><a href="3-wrd.html#SP5" class="function-link"><span class="function-syntax">Wordings::new</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">, </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> + </span><span class="identifier-syntax">L</span><span class="plain-syntax"> - </span><span class="constant-syntax">1</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax"> += </span><span class="identifier-syntax">L</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">N</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP30" class="paragraph-anchor"></a><b>&#167;30. Basic command-line error handler. </b>Some tools using this module will want to push simple error messages out to
the command line; others will want to translate them into elaborate problem
texts in HTML. So the client is allowed to define <span class="extract"><span class="extract-syntax">PROBLEM_WORDS_CALLBACK</span></span>
to some routine of her own, gazumping this one.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::lexer_problem_handler</span><button class="popup" onclick="togglePopup('usagePopup26')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup26">Usage of <span class="code-font"><span class="function-syntax">Lexer::lexer_problem_handler</span></span>:<br/><a href="3-lxr.html#SP13">&#167;13</a>, <a href="3-lxr.html#SP25_3">&#167;25.3</a>, <a href="3-lxr.html#SP27_5_1">&#167;27.5.1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">err</span><span class="plain-syntax">, </span><span class="identifier-syntax">text_stream</span><span class="plain-syntax"> *</span><span class="identifier-syntax">details</span><span class="plain-syntax">, </span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax"> *</span><span class="identifier-syntax">word</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifdef</span><span class="plain-syntax"> </span><span class="identifier-syntax">PROBLEM_WORDS_CALLBACK</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">PROBLEM_WORDS_CALLBACK</span><span class="plain-syntax">(</span><span class="identifier-syntax">err</span><span class="plain-syntax">, </span><span class="identifier-syntax">details</span><span class="plain-syntax">, </span><span class="identifier-syntax">word</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">ifndef</span><span class="plain-syntax"> </span><span class="identifier-syntax">PROBLEM_WORDS_CALLBACK</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">err</span><span class="plain-syntax"> == </span><span class="constant-syntax">MEMORY_OUT_LEXERERROR</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::fatal</span><span class="plain-syntax">(</span><span class="string-syntax">"Out of memory: unable to create lexer workspace"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">TEMPORARY_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">word_t</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">word</span><span class="plain-syntax">) </span><span class="identifier-syntax">WRITE_TO</span><span class="plain-syntax">(</span><span class="identifier-syntax">word_t</span><span class="plain-syntax">, </span><span class="string-syntax">"%w"</span><span class="plain-syntax">, </span><span class="identifier-syntax">word</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">err</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">STRING_TOO_LONG_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"Too much text in quotation marks: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">word_t</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">WORD_TOO_LONG_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"Word too long: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">word_t</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">I6_TOO_LONG_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"I6 inclusion too long: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">word_t</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">STRING_NEVER_ENDS_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"Quoted text never ends: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">details</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">COMMENT_NEVER_ENDS_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"Square-bracketed text never ends: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">details</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">case</span><span class="plain-syntax"> </span><span class="identifier-syntax">I6_NEVER_ENDS_LEXERERROR:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">Errors::with_text</span><span class="plain-syntax">(</span><span class="string-syntax">"I6 inclusion text never ends: %S"</span><span class="plain-syntax">, </span><span class="identifier-syntax">details</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">default:</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">internal_error</span><span class="plain-syntax">(</span><span class="string-syntax">"unknown lexer error"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">DISCARD_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">word_t</span><span class="plain-syntax">)</span>
<span class="plain-syntax"> #</span><span class="identifier-syntax">endif</span>
<span class="plain-syntax">}</span>
</pre>
<p class="commentary firstcommentary"><a id="SP31" class="paragraph-anchor"></a><b>&#167;31. Logging absolutely everything. </b>This is not to be done lightly: the output can be enormous.
</p>
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">Lexer::log_lexer_output</span><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"Entire lexer output to date:\n"</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">=0; </span><span class="identifier-syntax">i</span><span class="plain-syntax">&lt;</span><span class="identifier-syntax">lexer_wordcount</span><span class="plain-syntax">; </span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"%d: &lt;%+N&gt; &lt;%N&gt; &lt;%02x&gt;\n"</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><a href="3-lxr.html#SP19" class="function-link"><span class="function-syntax">Lexer::break_before</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">i</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">LOG</span><span class="plain-syntax">(</span><span class="string-syntax">"------\n"</span><span class="plain-syntax">);</span>
<span class="plain-syntax">}</span>
</pre>
<nav role="progress"><div class="progresscontainer">
<ul class="progressbar"><li class="progressprev"><a href="2-nw.html">&#10094;</a></li><li class="progresschapter"><a href="P-wtmd.html">P</a></li><li class="progresschapter"><a href="1-wm.html">1</a></li><li class="progresschapter"><a href="2-vcb.html">2</a></li><li class="progresscurrentchapter">3</li><li class="progresscurrent">lxr</li><li class="progresssection"><a href="3-wrd.html">wrd</a></li><li class="progresssection"><a href="3-tff.html">tff</a></li><li class="progresssection"><a href="3-fds.html">fds</a></li><li class="progresssection"><a href="3-idn.html">idn</a></li><li class="progresschapter"><a href="4-ap.html">4</a></li><li class="progressnext"><a href="3-wrd.html">&#10095;</a></li></ul></div>
</nav><!--End of weave-->
</main>
</body>
</html>