mirror of
https://github.com/ganelson/inform.git
synced 2024-07-08 18:14:21 +03:00
791 lines
99 KiB
HTML
791 lines
99 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>2/em</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
<link href="inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
</head>
|
|
<body>
|
|
|
|
<!--Weave of '2/pe' generated by 7-->
|
|
<ul class="crumbs"><li><a href="../webs.html">★</a></li><li><a href="index.html">linguistics</a></li><li><a href="index.html#2">Chapter 2: Excerpts</a></li><li><b>Parse Excerpts</b></li></ul><p class="purpose">Given an excerpt of wording, to construct S-nodes for one or more registered excerpt meanings which it matches.</p>
|
|
|
|
<ul class="toc"><li><a href="#SP1">§1. Parsing methods</a></li><li><a href="#SP4_5">§4.5. Exact parsing mode</a></li><li><a href="#SP4_6">§4.6. Maximal parsing mode</a></li><li><a href="#SP4_7">§4.7. Parametrised parsing mode</a></li><li><a href="#SP4_8">§4.8. Subset parsing mode</a></li><li><a href="#SP6">§6. Monitoring the efficiency of the parser</a></li></ul><hr class="tocbar">
|
|
|
|
<p class="inwebparagraph"><a id="SP1"></a><b>§1. Parsing methods. </b>The excerpt parser tests a given wording to see if it matches something
|
|
in the bank of excerpt meanings. It looks only for atomic meanings ("box"):
|
|
more sophisticated grammar higher up will have to parse compound meanings
|
|
(such as "something in an open box").
|
|
</p>
|
|
|
|
<p class="inwebparagraph">We will return either a single result or a list of possible results, as
|
|
alternative readings. It is not at all easy to decide what "door"
|
|
means, for instance: the class of doors, or a particular door, and if so
|
|
then which one? We cannot answer that question here, and do not even try.
|
|
However, we can specify a context, in effect saying something like "what
|
|
would this mean if it had to be an adjective name?".
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Depending on that context, four basic parsing modes can then be used.
|
|
</p>
|
|
|
|
<ul class="items"><li>(1) Exact parsing is what it sounds like: the texts have to match exactly,
|
|
except that an initial article is skipped. Thus "the going action"
|
|
exactly matches "going action", but "going" does not.
|
|
</li><li>(2) In subset parsing, a match is achieved if the text parsed consists of
|
|
words all of which are found in the meaning tested. Thus "red door" and
|
|
"red" are each subset matches for "ornate red door with brass handle".
|
|
</li><li>(3) In parametrised parsing, arbitrary (non-empty) text is allowed to
|
|
match against <code class="display"><span class="extract">#</span></code> gaps in the token list. Thus "award 5 points" is a
|
|
parametrised match for "award <code class="display"><span class="extract">#</span></code> points".
|
|
</li><li>(4) In maximal parsing, we find the longest possible initial match, allowing
|
|
it even if it does reach to the end of the excerpt, and we return a unique
|
|
finding, not a list of possibilities.
|
|
</li></ul>
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">EXACT_PM</span><span class="plain"> 1</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">SUBSET_PM</span><span class="plain"> 2</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">PARAMETRISED_PM</span><span class="plain"> 4</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">MAXIMAL_PM</span><span class="plain"> 8</span>
|
|
</pre>
|
|
<p class="inwebparagraph"><a id="SP2"></a><b>§2. </b>To monitor the efficiency of the excerpt parser, we keep track of:
|
|
</p>
|
|
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_calls_to_parse_excerpt</span><span class="plain"> = 0,</span>
|
|
<span class="identifier">no_meanings_tried</span><span class="plain"> = 0,</span>
|
|
<span class="identifier">no_meanings_tried_in_detail</span><span class="plain"> = 0,</span>
|
|
<span class="identifier">no_successful_calls_to_parse_excerpt</span><span class="plain"> = 0, </span><span class="identifier">no_matched_ems</span><span class="plain"> = 0;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="inwebparagraph"><a id="SP3"></a><b>§3. </b>In addition, it turns out to be convenient to have a global mode, for the
|
|
sake of disambiguating awkward cases:
|
|
</p>
|
|
|
|
|
|
<pre class="display">
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">word_to_suppress_in_phrases</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4"></a><b>§4. </b>As input, we supply not just the excerpt but also a context; or, to put it
|
|
another way, a filter on which excerpt meanings to look at. This must be a
|
|
bitmap made up from meaning codes, such as <code class="display"><span class="extract">TABLE_MC + TABLE_COLUMN_MC</span></code>,
|
|
which would check for tables and table columns simultaneously.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">However, there is one restriction on this. Recall that there are four
|
|
parsing modes, and that different modes are used for different meaning
|
|
codes. The <code class="display"><span class="extract">mc_bitmap</span></code> context is required not to mix MCs with different
|
|
parsing modes.
|
|
</p>
|
|
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">force_maximal_parsing</span><span class="plain"> = </span><span class="identifier">FALSE</span><span class="plain">;</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="functiontext">ExParser::parse_excerpt_maximal</span><span class="plain">(</span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">mc_bitmap</span><span class="plain">, </span><span class="identifier">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">s</span><span class="plain"> = </span><span class="identifier">force_maximal_parsing</span><span class="plain">;</span>
|
|
<span class="identifier">force_maximal_parsing</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain"> = </span><span class="functiontext">ExParser::parse_excerpt</span><span class="plain">(</span><span class="identifier">mc_bitmap</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">force_maximal_parsing</span><span class="plain"> = </span><span class="identifier">s</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="functiontext">ExParser::parse_excerpt</span><span class="plain">(</span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">mc_bitmap</span><span class="plain">, </span><span class="identifier">wording</span><span class="plain"> </span><span class="identifier">W</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">parsing_mode</span><span class="plain">, </span><span class="identifier">h</span><span class="plain">;</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">results</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
|
|
<span class="identifier">no_calls_to_parse_excerpt</span><span class="plain">++;</span>
|
|
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">Wordings::paired_brackets</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="identifier">W</span><span class="plain"> = </span><span class="identifier">Wordings::trim_both_ends</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
|
|
<span class="identifier">h</span><span class="plain"> = 0;</span>
|
|
|
|
<<span class="cwebmacro">Choose which parsing mode we should use, given the MC bitmap</span> <span class="cwebmacronumber">4.1</span>><span class="plain">;</span>
|
|
<<span class="cwebmacro">Take note of casing on first word, in the few circumstances when we care</span> <span class="cwebmacronumber">4.2</span>><span class="plain">;</span>
|
|
<<span class="cwebmacro">Skip an initial article most of the time</span> <span class="cwebmacronumber">4.3</span>><span class="plain">;</span>
|
|
|
|
<span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="functiontext">ExcerptMeanings::hash_code</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">,</span>
|
|
<span class="string">"Parsing excerpt <%W> hash %08x mc $N mode %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">h</span><span class="plain">, </span><span class="identifier">mc_bitmap</span><span class="plain">, </span><span class="identifier">parsing_mode</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">switch</span><span class="plain">(</span><span class="identifier">parsing_mode</span><span class="plain">) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="constant">EXACT_PM</span><span class="plain">: </span><<span class="cwebmacro">Enter exact parsing mode</span> <span class="cwebmacronumber">4.5</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="constant">MAXIMAL_PM</span><span class="plain">: </span><<span class="cwebmacro">Enter maximal parsing mode</span> <span class="cwebmacronumber">4.6</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="constant">PARAMETRISED_PM</span><span class="plain">: </span><<span class="cwebmacro">Enter parametrised parsing mode</span> <span class="cwebmacronumber">4.7</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="constant">SUBSET_PM</span><span class="plain">: </span><<span class="cwebmacro">Enter subset parsing mode</span> <span class="cwebmacronumber">4.8</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> 0: </span><span class="identifier">LOG</span><span class="plain">(</span><span class="string">"mc_bitmap: $N\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">mc_bitmap</span><span class="plain">); </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unknown parsing mode"</span><span class="plain">);</span>
|
|
<span class="reserved">default</span><span class="plain">: </span><span class="identifier">LOG</span><span class="plain">(</span><span class="string">"pm: %08x mc_bitmap: $N\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">parsing_mode</span><span class="plain">, </span><span class="identifier">mc_bitmap</span><span class="plain">); </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Mixed parsing modes"</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"Completed:\</span><span class="plain">n</span><span class="string">$m"</span><span class="plain">, </span><span class="identifier">results</span><span class="plain">);</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">loopy</span><span class="plain">; </span><span class="reserved">for</span><span class="plain"> (</span><span class="identifier">loopy</span><span class="plain"> = </span><span class="identifier">results</span><span class="plain">; </span><span class="identifier">loopy</span><span class="plain">; </span><span class="identifier">loopy</span><span class="plain"> = </span><span class="identifier">loopy</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<span class="identifier">no_matched_ems</span><span class="plain">++;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">results</span><span class="plain">) </span><span class="identifier">no_successful_calls_to_parse_excerpt</span><span class="plain">++;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">results</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExParser::parse_excerpt_maximal appears nowhere else.</p>
|
|
|
|
<p class="endnote">The function ExParser::parse_excerpt is used in 3/adj (<a href="3-adj.html#SP6">§6</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_1"></a><b>§4.1. </b>Maximal parsing is something of a special case: it is used only for adjective
|
|
lists, and we can only enter that mode by calling with exactly the correct
|
|
bitmap for this. Otherwise, the parsing mode depends on which MC(s) are
|
|
included in the bitmap.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Choose which parsing mode we should use, given the MC bitmap</span> <span class="cwebmacronumber">4.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parsing_mode</span><span class="plain"> = 0;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mc_bitmap</span><span class="plain"> & </span><span class="identifier">EXACT_PARSING_BITMAP</span><span class="plain">) </span><span class="identifier">parsing_mode</span><span class="plain"> |= </span><span class="constant">EXACT_PM</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mc_bitmap</span><span class="plain"> & </span><span class="identifier">SUBSET_PARSING_BITMAP</span><span class="plain">) </span><span class="identifier">parsing_mode</span><span class="plain"> |= </span><span class="constant">SUBSET_PM</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mc_bitmap</span><span class="plain"> & </span><span class="identifier">PARAMETRISED_PARSING_BITMAP</span><span class="plain">) </span><span class="identifier">parsing_mode</span><span class="plain"> |= </span><span class="constant">PARAMETRISED_PM</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">force_maximal_parsing</span><span class="plain">) </span><span class="identifier">parsing_mode</span><span class="plain"> = </span><span class="constant">MAXIMAL_PM</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_2"></a><b>§4.2. </b>Recall that excerpt parsing is case insensitive except for the first word
|
|
of a text substitution, and then only when two definitions have been given,
|
|
one capitalising the word and the other not, or when the word is a single
|
|
letter long.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">If we find the upper case form of such a text substitution, we set a special
|
|
bit in the hash code. (The upper and lower case forms are both registered as
|
|
excerpt meanings, with the same hash code except that one has this extra bit
|
|
set and the other hasn't.)
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Take note of casing on first word, in the few circumstances when we care</span> <span class="cwebmacronumber">4.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">EM_CASE_SENSITIVITY_TEST</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EM_CASE_SENSITIVITY_TEST</span><span class="plain">(</span><span class="identifier">mc_bitmap</span><span class="plain">)) {</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">tx</span><span class="plain"> = </span><span class="identifier">Lexer::word_raw_text</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">tx</span><span class="plain">[0]) && (</span><span class="identifier">Characters::isupper</span><span class="plain">(</span><span class="identifier">tx</span><span class="plain">[0])) &&</span>
|
|
<span class="plain">((</span><span class="identifier">tx</span><span class="plain">[1] == 0) || (</span><span class="identifier">Vocabulary::used_case_sensitively</span><span class="plain">(</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)))))) {</span>
|
|
<span class="identifier">h</span><span class="plain"> = </span><span class="identifier">h</span><span class="plain"> | </span><span class="constant">CAPITALISED_VARIANT_FORM</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_3"></a><b>§4.3. </b>An initial article is always skipped unless we are looking at a phrase;
|
|
but then we are only allowed to skip an initial "the", and even then only
|
|
if we aren't looking for text substitutions.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Skip an initial article most of the time</span> <span class="cwebmacronumber">4.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">parsing_mode</span><span class="plain"> & </span><span class="constant">PARAMETRISED_PM</span><span class="plain">) {</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">EM_IGNORE_DEFINITE_ARTICLE_TEST</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EM_IGNORE_DEFINITE_ARTICLE_TEST</span><span class="plain">(</span><span class="identifier">mc_bitmap</span><span class="plain">))</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="identifier">W</span><span class="plain"> = </span><span class="functiontext">Articles::remove_the</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="identifier">W</span><span class="plain"> = </span><span class="functiontext">Articles::remove_article</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_4"></a><b>§4.4. </b>When checking cases below, we are always going to consider only those
|
|
which have a meaning code among those we are looking for:
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="identifier">EXCERPT_MEANING_RELEVANT</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)</span>
|
|
<span class="plain">(</span><span class="identifier">no_meanings_tried</span><span class="plain">++, ((</span><span class="identifier">mc_bitmap</span><span class="plain"> & (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>meaning_code</span><span class="plain">))!=0))</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">EXAMINE_EXCERPT_MEANING_IN_DETAIL</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">,</span>
|
|
<span class="string">"Trying $M (parsing mode %d)\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">), </span><span class="identifier">parsing_mode</span><span class="plain">);</span>
|
|
<span class="identifier">no_meanings_tried_in_detail</span><span class="plain">++;</span>
|
|
</pre>
|
|
<p class="inwebparagraph"><a id="SP4_5"></a><b>§4.5. Exact parsing mode. </b>Exact matching is just what it sounds like: the match must be word
|
|
for word. Because of that, the excerpt meaning is guaranteed to be listed
|
|
under the start list of the first word, if it matches (because there cannot
|
|
be <code class="display"><span class="extract">#</span></code> tokens in the token list — if there were, we would be in parametrised
|
|
parsing mode).
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Enter exact parsing mode</span> <span class="cwebmacronumber">4.5</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">mc_bitmap</span><span class="plain">)</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.start_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in exact parsing mode</span> <span class="cwebmacronumber">4.5.1</span>><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_5_1"></a><b>§4.5.1. </b>In exact parsing, the hash codes must agree perfectly:
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Try to match excerpt in exact parsing mode</span> <span class="cwebmacronumber">4.5.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EXCERPT_MEANING_RELEVANT</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">) && (</span><span class="identifier">h</span><span class="plain"> == </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">)) {</span>
|
|
<span class="constant">EXAMINE_EXCERPT_MEANING_IN_DETAIL</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain"> == </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">, </span><span class="identifier">k</span><span class="plain">, </span><span class="identifier">err</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain">=0, </span><span class="identifier">k</span><span class="plain">=</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">FALSE</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"><</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++, </span><span class="identifier">k</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] != </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">)) { </span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain"> == </span><span class="identifier">FALSE</span><span class="plain">)</span>
|
|
<span class="identifier">results</span><span class="plain"> = </span><span class="functiontext">ExParser::result</span><span class="plain">(</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">), 1, </span><span class="identifier">results</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_5">§4.5</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_6"></a><b>§4.6. Maximal parsing mode. </b></p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Enter maximal parsing mode</span> <span class="cwebmacronumber">4.6</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">flags</span><span class="plain">) & </span><span class="identifier">mc_bitmap</span><span class="plain">) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">, *</span><span class="identifier">best_p</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">; </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">best_score</span><span class="plain"> = 0;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.start_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in maximal parsing mode</span> <span class="cwebmacronumber">4.6.1</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">best_p</span><span class="plain">)</span>
|
|
<span class="identifier">results</span><span class="plain"> =</span>
|
|
<span class="functiontext">ExParser::result</span><span class="plain">(</span>
|
|
<span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">best_p</span><span class="plain">), </span><span class="identifier">best_score</span><span class="plain">, </span><span class="identifier">results</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_6_1"></a><b>§4.6.1. </b>In maximal matching, we keep only the longest exact match found, and
|
|
if two have equal length then keep the first one found. (It should ideally
|
|
never be the case that clashes occur.)
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Try to match excerpt in maximal parsing mode</span> <span class="cwebmacronumber">4.6.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EXCERPT_MEANING_RELEVANT</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">) &&</span>
|
|
<span class="plain">((</span><span class="identifier">h</span><span class="plain"> & </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">) == </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">)) {</span>
|
|
<span class="constant">EXAMINE_EXCERPT_MEANING_IN_DETAIL</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain"> <= </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">, </span><span class="identifier">k</span><span class="plain">, </span><span class="identifier">err</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">FALSE</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">=0, </span><span class="identifier">k</span><span class="plain">=</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">); </span><span class="identifier">j</span><span class="plain"><</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++, </span><span class="identifier">k</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] != </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">)) { </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">err</span><span class="plain"> == </span><span class="identifier">FALSE</span><span class="plain">) && (</span><span class="identifier">j</span><span class="plain">></span><span class="identifier">best_score</span><span class="plain">)) {</span>
|
|
<span class="identifier">best_p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">best_score</span><span class="plain"> = </span><span class="identifier">j</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_6">§4.6</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7"></a><b>§4.7. Parametrised parsing mode. </b>This is the only parsing mode which allows for arbitrary text to appear:
|
|
i.e., where any text X can appear in "award X points", for example.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Enter parametrised parsing mode</span> <span class="cwebmacronumber">4.7</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">EM_ALLOW_BLANK_TEST</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EM_ALLOW_BLANK_TEST</span><span class="plain">(</span><span class="identifier">mc_bitmap</span><span class="plain">)) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">blank_says_p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">this_result</span><span class="plain"> =</span>
|
|
<span class="identifier">ParseTree::new_with_words</span><span class="plain">(</span><span class="identifier">mc_bitmap</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">wording</span><span class="plain"> </span><span class="identifier">SW</span><span class="plain"> = </span><span class="identifier">ParseTree::get_text</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::copy</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">p</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::set_text</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">SW</span><span class="plain">);</span>
|
|
<span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">down</span><span class="plain"> = </span><span class="identifier">ParseTree::new_with_words</span><span class="plain">(</span><span class="identifier">UNKNOWN_NT</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">results</span><span class="plain">;</span>
|
|
<span class="identifier">results</span><span class="plain"> = </span><span class="identifier">this_result</span><span class="plain">;</span>
|
|
<span class="identifier">no_meanings_tried</span><span class="plain">++, </span><span class="identifier">no_meanings_tried_in_detail</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.start_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in parametrised parsing mode</span> <span class="cwebmacronumber">4.7.1</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) > 1) {</span>
|
|
<span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.end_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in parametrised parsing mode</span> <span class="cwebmacronumber">4.7.1</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain"> > </span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.middle_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in parametrised parsing mode</span> <span class="cwebmacronumber">4.7.1</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1"></a><b>§4.7.1. </b>It is required here that the data supplied must be a pointer to a phrase,
|
|
though it can be any type of phrase.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Try to match excerpt in parametrised parsing mode</span> <span class="cwebmacronumber">4.7.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">eh</span><span class="plain"> = </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EXCERPT_MEANING_RELEVANT</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">) &&</span>
|
|
<span class="plain">((</span><span class="identifier">h</span><span class="plain"> & </span><span class="identifier">eh</span><span class="plain">) == </span><span class="identifier">eh</span><span class="plain">) &&</span>
|
|
<span class="plain">((</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[0] == 0) ||</span>
|
|
<span class="plain">((</span><span class="identifier">h</span><span class="plain"> & </span><span class="constant">CAPITALISED_VARIANT_FORM</span><span class="plain">) == (</span><span class="identifier">eh</span><span class="plain"> & </span><span class="constant">CAPITALISED_VARIANT_FORM</span><span class="plain">)))) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_tokens_to_match</span><span class="plain"> = </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">;</span>
|
|
<span class="identifier">wording</span><span class="plain"> </span><span class="identifier">saved_W</span><span class="plain"> = </span><span class="identifier">W</span><span class="plain">;</span>
|
|
<span class="identifier">wording</span><span class="plain"> </span><span class="identifier">params_W</span><span class="plain">[</span><span class="constant">MAX_TOKENS_PER_EXCERPT_MEANING</span><span class="plain">];</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="identifier">wording</span><span class="plain"> </span><span class="identifier">ph_opt_W</span><span class="plain"> = </span><span class="identifier">EMPTY_WORDING</span><span class="plain">;</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl</span><span class="plain">; </span> <span class="comment">the "bracket level" (0 for unbracketed, 1 for inside one pair, etc.)</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">, </span><span class="identifier">t</span><span class="plain">, </span><span class="identifier">err</span><span class="plain">;</span>
|
|
<span class="constant">EXAMINE_EXCERPT_MEANING_IN_DETAIL</span><span class="plain">;</span>
|
|
|
|
<<span class="cwebmacro">Look through to see if there are phrase options at the end</span> <span class="cwebmacronumber">4.7.1.1</span>><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">FALSE</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">=0, </span><span class="identifier">scan_pos</span><span class="plain">=</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">t</span><span class="plain">=0, </span><span class="identifier">bl</span><span class="plain">=0;</span>
|
|
<span class="plain">(</span><span class="identifier">j</span><span class="plain"><</span><span class="identifier">no_tokens_to_match</span><span class="plain">) && (</span><span class="identifier">scan_pos</span><span class="plain"><=</span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)); </span><span class="identifier">j</span><span class="plain">++) {</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"j=%d, scan_pos=%d, t=%d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">, </span><span class="identifier">t</span><span class="plain">);</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">this_word</span><span class="plain"> = </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">];</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">this_word</span><span class="plain">) </span><<span class="cwebmacro">We're required to match a fixed word</span> <span class="cwebmacronumber">4.7.1.2</span>>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain"> == </span><span class="identifier">no_tokens_to_match</span><span class="plain">-1)</span>
|
|
<<span class="cwebmacro">We're required to match a parameter at the excerpt's end</span> <span class="cwebmacronumber">4.7.1.3</span>>
|
|
<span class="reserved">else</span>
|
|
<<span class="cwebmacro">We're required to match a parameter before the excerpt's end</span> <span class="cwebmacronumber">4.7.1.4</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"outcome has err=%d (hash here %08x)\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">err</span><span class="plain">, </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">);</span>
|
|
<<span class="cwebmacro">Check the matched parameters for sanity</span> <span class="cwebmacronumber">4.7.1.5</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain"> == </span><span class="identifier">FALSE</span><span class="plain">) </span><<span class="cwebmacro">Record a successful parametrised match</span> <span class="cwebmacronumber">4.7.1.6</span>><span class="plain">;</span>
|
|
<span class="identifier">W</span><span class="plain"> = </span><span class="identifier">saved_W</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7">§4.7</a> (three times).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_1"></a><b>§4.7.1.1. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Look through to see if there are phrase options at the end</span> <span class="cwebmacronumber">4.7.1.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="identifier">phrase</span><span class="plain"> *</span><span class="identifier">ph</span><span class="plain"> = </span><span class="identifier">Routines::ToPhrases::meaning_as_phrase</span><span class="plain">(</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Routines::ToPhrases::allows_options</span><span class="plain">(</span><span class="identifier">ph</span><span class="plain">)) {</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"Looking for phrase options\</span><span class="plain">n</span><span class="string">"</span><span class="plain">);</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">bl</span><span class="plain">=0, </span><span class="identifier">scan_pos</span><span class="plain">=</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)+1; </span><span class="identifier">scan_pos</span><span class="plain"><</span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">); </span><span class="identifier">scan_pos</span><span class="plain">++) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">) == </span><span class="identifier">COMMA_V</span><span class="plain">) && (</span><span class="identifier">bl</span><span class="plain">==0)) {</span>
|
|
<span class="identifier">ph_opt_W</span><span class="plain"> = </span><span class="identifier">Wordings::from</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">+1);</span>
|
|
<span class="identifier">W</span><span class="plain"> = </span><span class="identifier">Wordings::up_to</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">-1);</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"Found phrase options <%W>\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">ph_opt_W</span><span class="plain">);</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<<span class="cwebmacro">Maintain bracket level</span> <span class="cwebmacronumber">4.7.1.1.1</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_2"></a><b>§4.7.1.2. </b><code class="display">
|
|
<<span class="cwebmacrodefn">We're required to match a fixed word</span> <span class="cwebmacronumber">4.7.1.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">this_word</span><span class="plain"> != </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">)) { </span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">this_word</span><span class="plain"> == </span><span class="identifier">word_to_suppress_in_phrases</span><span class="plain">) { </span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="identifier">scan_pos</span><span class="plain">++;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_3"></a><b>§4.7.1.3. </b><code class="display">
|
|
<<span class="cwebmacrodefn">We're required to match a parameter at the excerpt's end</span> <span class="cwebmacronumber">4.7.1.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">t</span><span class="plain">++] = </span><span class="identifier">Wordings::from</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">);</span>
|
|
<span class="identifier">scan_pos</span><span class="plain"> = </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) + 1;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_4"></a><b>§4.7.1.4. </b><code class="display">
|
|
<<span class="cwebmacrodefn">We're required to match a parameter before the excerpt's end</span> <span class="cwebmacronumber">4.7.1.4</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">fixed_words_at_end</span><span class="plain"> = 0;</span>
|
|
<span class="reserved">for</span><span class="plain"> (; </span><span class="identifier">j</span><span class="plain">+1+</span><span class="identifier">fixed_words_at_end</span><span class="plain"> < </span><span class="identifier">no_tokens_to_match</span><span class="plain">; </span><span class="identifier">fixed_words_at_end</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+1+</span><span class="identifier">fixed_words_at_end</span><span class="plain">] == </span><span class="identifier">NULL</span><span class="plain">) {</span>
|
|
<span class="identifier">fixed_words_at_end</span><span class="plain"> = 0; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">fixed_words_at_end</span><span class="plain"> > 0) {</span>
|
|
<span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">t</span><span class="plain">++] = </span><span class="identifier">Wordings::new</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">, </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) - </span><span class="identifier">fixed_words_at_end</span><span class="plain">);</span>
|
|
<span class="identifier">scan_pos</span><span class="plain"> = </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) - </span><span class="identifier">fixed_words_at_end</span><span class="plain"> + 1;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">sentinel</span><span class="plain"> = </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+1];</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl_initial</span><span class="plain"> = </span><span class="identifier">bl</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">start_word</span><span class="plain"> = </span><span class="identifier">scan_pos</span><span class="plain">;</span>
|
|
<span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">scan_pos</span><span class="plain"> <= </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<<span class="cwebmacro">Maintain bracket level</span> <span class="cwebmacronumber">4.7.1.1.1</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">bl</span><span class="plain"> == </span><span class="identifier">bl_initial</span><span class="plain">) && (</span><span class="identifier">scan_pos</span><span class="plain"> > </span><span class="identifier">start_word</span><span class="plain">) &&</span>
|
|
<span class="plain">(</span><span class="identifier">sentinel</span><span class="plain"> == </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">))) { </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">FALSE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">bl</span><span class="plain"> < </span><span class="identifier">bl_initial</span><span class="plain">) </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="identifier">scan_pos</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">t</span><span class="plain">++] = </span><span class="identifier">Wordings::new</span><span class="plain">(</span><span class="identifier">start_word</span><span class="plain">, </span><span class="identifier">scan_pos</span><span class="plain">-1);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_5"></a><b>§4.7.1.5. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Check the matched parameters for sanity</span> <span class="cwebmacronumber">4.7.1.5</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">x</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain"><</span><span class="identifier">no_tokens_to_match</span><span class="plain">) </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">scan_pos</span><span class="plain"> <= </span><span class="identifier">Wordings::last_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain"> == </span><span class="identifier">FALSE</span><span class="plain">)</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">x</span><span class="plain">=0; </span><span class="identifier">x</span><span class="plain"><</span><span class="identifier">t</span><span class="plain">; </span><span class="identifier">x</span><span class="plain">++) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::empty</span><span class="plain">(</span><span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">x</span><span class="plain">])) </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl</span><span class="plain"> = 0;</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">, </span><span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">x</span><span class="plain">]) {</span>
|
|
<<span class="cwebmacro">Maintain bracket level</span> <span class="cwebmacronumber">4.7.1.1.1</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">bl</span><span class="plain"> < 0) </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">bl</span><span class="plain"> != 0) </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_1_1"></a><b>§4.7.1.1.1. </b>Monitor bracket level:
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Maintain bracket level</span> <span class="cwebmacronumber">4.7.1.1.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">) == </span><span class="identifier">OPENBRACKET_V</span><span class="plain">) || (</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">) == </span><span class="identifier">OPENBRACE_V</span><span class="plain">)) </span><span class="identifier">bl</span><span class="plain">++;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">) == </span><span class="identifier">CLOSEBRACKET_V</span><span class="plain">) || (</span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">scan_pos</span><span class="plain">) == </span><span class="identifier">CLOSEBRACE_V</span><span class="plain">)) </span><span class="identifier">bl</span><span class="plain">--;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1_1">§4.7.1.1</a>, <a href="#SP4_7_1_4">§4.7.1.4</a>, <a href="#SP4_7_1_5">§4.7.1.5</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_7_1_6"></a><b>§4.7.1.6. </b>A happy ending. We add the result to our linked list, annotating it with
|
|
nodes for the parameters and any phrase options.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Record a successful parametrised match</span> <span class="cwebmacronumber">4.7.1.6</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">last_param</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">this_result</span><span class="plain"> =</span>
|
|
<span class="identifier">ParseTree::new_with_words</span><span class="plain">(</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>meaning_code</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::set_meaning</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">));</span>
|
|
<span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">results</span><span class="plain">;</span>
|
|
<span class="identifier">ParseTree::set_score</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, 1);</span>
|
|
<span class="plain">#</span><span class="identifier">ifdef</span><span class="plain"> </span><span class="identifier">CORE_MODULE</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::nonempty</span><span class="plain">(</span><span class="identifier">ph_opt_W</span><span class="plain">)) {</span>
|
|
<span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">down</span><span class="plain"> = </span><span class="identifier">ParseTree::new_with_words</span><span class="plain">(</span><span class="identifier">UNKNOWN_NT</span><span class="plain">, </span><span class="identifier">ph_opt_W</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::annotate_int</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">down</span><span class="plain">, </span><span class="identifier">is_phrase_option_ANNOT</span><span class="plain">, </span><span class="identifier">TRUE</span><span class="plain">);</span>
|
|
<span class="identifier">last_param</span><span class="plain"> = </span><span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">down</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">#</span><span class="identifier">endif</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">x</span><span class="plain">=0; </span><span class="identifier">x</span><span class="plain"><</span><span class="identifier">t</span><span class="plain">; </span><span class="identifier">x</span><span class="plain">++) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p2</span><span class="plain">;</span>
|
|
<span class="identifier">p2</span><span class="plain"> = </span><span class="identifier">ParseTree::new_with_words</span><span class="plain">(</span><span class="identifier">UNKNOWN_NT</span><span class="plain">, </span><span class="identifier">params_W</span><span class="plain">[</span><span class="identifier">x</span><span class="plain">]);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">last_param</span><span class="plain">) </span><span class="identifier">last_param</span><span class="plain">-></span><span class="identifier">next</span><span class="plain"> = </span><span class="identifier">p2</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">down</span><span class="plain"> = </span><span class="identifier">p2</span><span class="plain">;</span>
|
|
<span class="identifier">last_param</span><span class="plain"> = </span><span class="identifier">p2</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">results</span><span class="plain"> = </span><span class="identifier">this_result</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_7_1">§4.7.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_8"></a><b>§4.8. Subset parsing mode. </b>In subset mode, each possible match is kept, and is assigned a numerical
|
|
score based purely on the number of words in the full description which were
|
|
missed out. This makes "door" a better match against "door" (0 words
|
|
missed out) than against "green door" (1 word missed out).
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Note that a single word which also has a meaning as a number is never
|
|
matched. This is so that "11" (say) cannot be misinterpreted as an
|
|
abbreviated form of an object name like "Chamber 11".
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Enter subset parsing mode</span> <span class="cwebmacronumber">4.8</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) == 1) &&</span>
|
|
<span class="plain">((</span><span class="identifier">Vocabulary::test_flags</span><span class="plain">(</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">NUMBER_MC</span><span class="plain">)) != 0)) </span><span class="reserved">goto</span><span class="plain"> </span><span class="identifier">SubsetFailed</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = -1, </span><span class="identifier">k</span><span class="plain"> = -1;</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">i</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"Unidentified word when parsing"</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Preform::test_vocabulary</span><span class="plain">(</span><span class="identifier">v</span><span class="plain">, <</span><span class="identifier">article</span><span class="plain">>)) </span><span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.subset_list_length</span><span class="plain"> == 0) </span><span class="reserved">goto</span><span class="plain"> </span><span class="identifier">SubsetFailed</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.subset_list_length</span><span class="plain"> > </span><span class="identifier">j</span><span class="plain">) { </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.subset_list_length</span><span class="plain">; </span><span class="identifier">k</span><span class="plain"> = </span><span class="identifier">i</span><span class="plain">; }</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">k</span><span class="plain"> >= 0) {</span>
|
|
<span class="identifier">vocabulary_entry</span><span class="plain"> *</span><span class="identifier">v</span><span class="plain"> = </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">);</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">v</span><span class="plain">-></span><span class="identifier">means</span><span class="element">.subset_list</span><span class="plain">; </span><span class="identifier">p</span><span class="plain">; </span><span class="identifier">p</span><span class="plain"> = </span><span class="identifier">p</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">Try to match excerpt in subset parsing mode</span> <span class="cwebmacronumber">4.8.1</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">SubsetFailed</span><span class="plain">: ;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4">§4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4_8_1"></a><b>§4.8.1. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Try to match excerpt in subset parsing mode</span> <span class="cwebmacronumber">4.8.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">EXCERPT_MEANING_RELEVANT</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">) && ((</span><span class="identifier">h</span><span class="plain"> & </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>excerpt_hash</span><span class="plain">) == </span><span class="identifier">h</span><span class="plain">)) {</span>
|
|
<span class="constant">EXAMINE_EXCERPT_MEANING_IN_DETAIL</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">) <= </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>meaning_code</span><span class="plain"> == </span><span class="constant">NOUN_MC</span><span class="plain">) {</span>
|
|
<span class="reserved">noun</span><span class="plain"> *</span><span class="identifier">nt</span><span class="plain"> = </span><span class="identifier">RETRIEVE_POINTER_noun</span><span class="plain">(</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>data</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">nt</span><span class="plain">) && (</span><span class="functiontext">Nouns::exactitude</span><span class="plain">(</span><span class="identifier">nt</span><span class="plain">))) {</span>
|
|
<span class="identifier">LOGIF</span><span class="plain">(</span><span class="identifier">EXCERPT_PARSING</span><span class="plain">, </span><span class="string">"Require exact matching of $M\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">));</span>
|
|
<span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain"> == </span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain">=0, </span><span class="identifier">k</span><span class="plain">=</span><span class="identifier">Wordings::first_wn</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">), </span><span class="identifier">err</span><span class="plain"> = </span><span class="identifier">FALSE</span><span class="plain">;</span>
|
|
<span class="identifier">j</span><span class="plain"><</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++, </span><span class="identifier">k</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] != </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">)) {</span>
|
|
<span class="identifier">err</span><span class="plain">=</span><span class="identifier">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">goto</span><span class="plain"> </span><span class="identifier">SubsetMatchDecided</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">LOOP_THROUGH_WORDING</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">, </span><span class="identifier">W</span><span class="plain">) {</span>
|
|
<span class="identifier">err</span><span class="plain"> = </span><span class="identifier">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain">=0; </span><span class="identifier">j</span><span class="plain"><</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>no_em_tokens</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-</span><span class="element">>em_tokens</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] == </span><span class="identifier">Lexer::word</span><span class="plain">(</span><span class="identifier">k</span><span class="plain">)) </span><span class="identifier">err</span><span class="plain">=</span><span class="identifier">FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain">) </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">SubsetMatchDecided</span><span class="plain">:</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">err</span><span class="plain"> == </span><span class="identifier">FALSE</span><span class="plain">) {</span>
|
|
<span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain"> = </span><span class="identifier">ParseTree::get_meaning</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">);</span>
|
|
<span class="identifier">results</span><span class="plain"> = </span><span class="functiontext">ExParser::result</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">,</span>
|
|
<span class="plain">100-((</span><span class="identifier">em</span><span class="plain">-</span><span class="element">>no_em_tokens</span><span class="plain">) - (</span><span class="identifier">Wordings::length</span><span class="plain">(</span><span class="identifier">W</span><span class="plain">)-1)),</span>
|
|
<span class="identifier">results</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP4_8">§4.8</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP5"></a><b>§5. </b>Making a result node.
|
|
</p>
|
|
|
|
|
|
<pre class="display">
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="functiontext">ExParser::result</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain"> *</span><span class="identifier">em</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">score</span><span class="plain">, </span><span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">alternatives</span><span class="plain">) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">this_result</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">VALID_POINTER_parse_node</span><span class="plain">(</span><span class="functiontext">ExcerptMeanings::data</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">))) {</span>
|
|
<span class="identifier">parse_node</span><span class="plain"> *</span><span class="identifier">val</span><span class="plain"> = </span><span class="identifier">RETRIEVE_POINTER_parse_node</span><span class="plain">(</span><span class="functiontext">ExcerptMeanings::data</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">));</span>
|
|
<span class="identifier">this_result</span><span class="plain"> = </span><span class="identifier">ParseTree::new</span><span class="plain">(</span><span class="identifier">INVALID_NT</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::copy</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">val</span><span class="plain">);</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="identifier">this_result</span><span class="plain"> = </span><span class="identifier">ParseTree::new</span><span class="plain">(</span><span class="identifier">em</span><span class="plain">-</span><span class="element">>meaning_code</span><span class="plain">);</span>
|
|
<span class="identifier">ParseTree::set_meaning</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">em</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">this_result</span><span class="plain">-></span><span class="identifier">next_alternative</span><span class="plain"> = </span><span class="identifier">alternatives</span><span class="plain">;</span>
|
|
<span class="identifier">ParseTree::set_score</span><span class="plain">(</span><span class="identifier">this_result</span><span class="plain">, </span><span class="identifier">score</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">this_result</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExParser::result is used in <a href="#SP4_5_1">§4.5.1</a>, <a href="#SP4_6">§4.6</a>, <a href="#SP4_8_1">§4.8.1</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP6"></a><b>§6. Monitoring the efficiency of the parser. </b>To give a rough idea, experiments made in 2008 showed that the medium-large
|
|
"Bronze" runs to about 40,000 words of Inform source text including the
|
|
extensions it uses, and around 4000 excerpt meanings are registered.
|
|
<code class="display"><span class="extract">ExParser::parse_excerpt</span></code> was being called 40,046 times (it seems a rough rule
|
|
of thumb that it's called about once per word), so:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<ul class="items"><li>(a) a naive all-against-all comparison would mean 160,184,000 comparisons;
|
|
</li><li>(b) considering only those excerpt meanings attached to relevant words
|
|
reduces us to <code class="display"><span class="extract">no_meanings_tried</span></code> comparisons, which was 246,834;
|
|
</li><li>(c) considering only those with a viable correct hash code reduces us
|
|
further to <code class="display"><span class="extract">no_meanings_tried_in_detail</span></code> comparisons, which was 68,541.
|
|
</li></ul>
|
|
<p class="inwebparagraph">Considering that <code class="display"><span class="extract">ExParser::parse_excerpt</span></code> was successful 13,752 times,
|
|
and that on many of those occasions it will have returned multiple results, we
|
|
can be pretty confident that the tests in (b) and (c) — both very fast to
|
|
perform — are quite good. In fact, further investigation showed that of the
|
|
68,541 comparisons we had to make, 63,762 were successful — so (b) and (c)
|
|
were correct 93\% of the time.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Moreover, even the expensive checking in detail was not too bad: in all but
|
|
21,774 of the 68,541 cases the token list to check was empty, because it was a
|
|
test for "say [value]". Thus the mean number of direct word comparisons was
|
|
only 0.5 per call, and of course even those only involve comparing pointers,
|
|
not calling character-level routines like <code class="display"><span class="extract">strcmp</span></code>.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Profiling shows that time spent in the S-parser is dominated by memory
|
|
allocation for the results it generates. Actual parsing is very rapid, and no
|
|
further optimisation is worthwhile.
|
|
</p>
|
|
|
|
|
|
<pre class="display">
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">ExParser::debug_parser_statistics</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"no_calls_to_parse_excerpt = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">no_calls_to_parse_excerpt</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"no_meanings_tried = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">no_meanings_tried</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"no_meanings_tried_in_detail = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">no_meanings_tried_in_detail</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"no_successful_calls_to_parse_excerpt = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">no_successful_calls_to_parse_excerpt</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"no_matched_ems = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">no_matched_ems</span><span class="plain">);</span>
|
|
<span class="identifier">LOG</span><span class="plain">(</span><span class="string">"number of excerpt meanings registered = %d\</span><span class="plain">n</span><span class="string">"</span><span class="plain">, </span><span class="identifier">NUMBER_CREATED</span><span class="plain">(</span><span class="reserved">excerpt_meaning</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">Log::aspect_switched_on</span><span class="plain">(</span><span class="constant">EXCERPT_MEANINGS_DA</span><span class="plain">)) </span><span class="functiontext">ExcerptMeanings::log_all</span><span class="plain">();</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function ExParser::debug_parser_statistics appears nowhere else.</p>
|
|
|
|
<hr class="tocbar">
|
|
<ul class="toc"><li><a href="2-em.html">Back to 'Excerpt Meanings'</a></li><li><i>(This section ends Chapter 2: Excerpts.)</i></li></ul><hr class="tocbar">
|
|
<!--End of weave-->
|
|
</body>
|
|
</html>
|
|
|