mirror of
https://github.com/ganelson/inform.git
synced 2024-07-16 22:14:23 +03:00
161 lines
11 KiB
HTML
161 lines
11 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<head>
|
||
|
<title>About Preform</title>
|
||
|
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<meta http-equiv="Content-Language" content="en-gb">
|
||
|
|
||
|
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
||
|
|
||
|
</head>
|
||
|
<body class="commentary-font">
|
||
|
<nav role="navigation">
|
||
|
<h1><a href="../index.html">
|
||
|
<img src="../docs-assets/Inform.png" height=72">
|
||
|
</a></h1>
|
||
|
<ul><li><a href="../compiler.html">compiler tools</a></li>
|
||
|
<li><a href="../other.html">other tools</a></li>
|
||
|
<li><a href="../extensions.html">extensions and kits</a></li>
|
||
|
<li><a href="../units.html">unit test tools</a></li>
|
||
|
</ul><h2>Compiler Webs</h2><ul>
|
||
|
<li><a href="../inbuild/index.html">inbuild</a></li>
|
||
|
<li><a href="../inform7/index.html">inform7</a></li>
|
||
|
<li><a href="../inter/index.html">inter</a></li>
|
||
|
</ul><h2>Inbuild Modules</h2><ul>
|
||
|
<li><a href="../supervisor-module/index.html">supervisor</a></li>
|
||
|
</ul><h2>Inform7 Modules</h2><ul>
|
||
|
<li><a href="../core-module/index.html">core</a></li>
|
||
|
<li><a href="../inflections-module/index.html">inflections</a></li>
|
||
|
<li><a href="../linguistics-module/index.html">linguistics</a></li>
|
||
|
<li><a href="../kinds-module/index.html">kinds</a></li>
|
||
|
<li><a href="../if-module/index.html">if</a></li>
|
||
|
<li><a href="../multimedia-module/index.html">multimedia</a></li>
|
||
|
<li><a href="../problems-module/index.html">problems</a></li>
|
||
|
<li><a href="../index-module/index.html">index</a></li>
|
||
|
</ul><h2>Inter Modules</h2><ul>
|
||
|
<li><a href="../bytecode-module/index.html">bytecode</a></li>
|
||
|
<li><a href="../building-module/index.html">building</a></li>
|
||
|
<li><a href="../codegen-module/index.html">codegen</a></li>
|
||
|
</ul><h2>Shared Modules</h2><ul>
|
||
|
<li><a href="../arch-module/index.html">arch</a></li>
|
||
|
<li><a href="../syntax-module/index.html">syntax</a></li>
|
||
|
<li><a href="index.html"><span class="selectedlink">words</span></a></li>
|
||
|
<li><a href="../html-module/index.html">html</a></li>
|
||
|
<li><a href="../../../inweb/docs/foundation-module/index.html">foundation</a></li>
|
||
|
|
||
|
</ul>
|
||
|
</nav>
|
||
|
<main role="main">
|
||
|
<!--Weave of 'About Preform' generated by Inweb-->
|
||
|
<div class="breadcrumbs">
|
||
|
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../compiler.html">Shared Modules</a></li><li><a href="index.html">words</a></li><li><a href="index.html#4">Chapter 4: Parsing</a></li><li><b>About Preform</b></li></ul></div>
|
||
|
<p class="purpose">A brief guide to Preform and how to use it.</p>
|
||
|
|
||
|
<p class="commentary firstcommentary"><a id="SP1"></a><b>§1. </b>That's what it would look like in the Preform file, but here is how it's
|
||
|
typed in the Inform source code. Definitions like this one are scattered all
|
||
|
across the Inform web, in order to keep them close to the code which relates to
|
||
|
them. The <span class="extract"><span class="extract-syntax">inweb</span></span> tangler compiles them in two halves: the instructions right
|
||
|
of the <span class="extract"><span class="extract-syntax">==></span></span> arrows are extracted and compiled into a C routine called the
|
||
|
"compositor" for the nonterminal (see below), while the actual grammar is
|
||
|
extracted and placed into Inform's "Preform.txt" file.
|
||
|
</p>
|
||
|
|
||
|
<p class="commentary">In the document of Preform grammar extracted from Inform's source code to
|
||
|
lay the language out for translators, the <span class="extract"><span class="extract-syntax">==></span></span> arrows and formulae to the
|
||
|
right of them are omitted — those represent semantics, not syntax.
|
||
|
</p>
|
||
|
|
||
|
<pre class="displayed-code all-displayed-code code-font">
|
||
|
<span class="plain-syntax"> <competitor> ::=</span>
|
||
|
<span class="plain-syntax"> <ordinal-number> runner | ==> TRUE</span>
|
||
|
<span class="plain-syntax"> runner no <cardinal-number> ==> FALSE</span>
|
||
|
</pre>
|
||
|
<p class="commentary firstcommentary"><a id="SP2"></a><b>§2. </b>Each nonterminal, when successfully matched, can provide both or more usually
|
||
|
just one of two results: an integer, to be stored in <span class="extract"><span class="extract-syntax">*X</span></span>, and a void pointer,
|
||
|
to be stored in <span class="extract"><span class="extract-syntax">*XP</span></span>. For example, <k-kind> matches if and only if the
|
||
|
text declares a legal kind, such as "number"; its pointer result is to the
|
||
|
kind found, such as <span class="extract"><span class="extract-syntax">K_number</span></span>. But <competitor> only results in an integer.
|
||
|
The <span class="extract"><span class="extract-syntax">==></span></span> arrow is optional, but if present, it says what the result is if
|
||
|
the given production is matched; the <span class="extract"><span class="extract-syntax">inweb</span></span> tangler, if it sees an expression
|
||
|
on the right of the arrow, assigns that value to the integer result. So,
|
||
|
for example, "runner bean" or "beetroot" would not match <competitor>;
|
||
|
"4th runner" would match with integer result <span class="extract"><span class="extract-syntax">TRUE</span></span>; "runner no 17" would
|
||
|
match with integer result <span class="extract"><span class="extract-syntax">FALSE</span></span>.
|
||
|
</p>
|
||
|
|
||
|
<p class="commentary">Usually, though, the result(s) of a nonterminal depend on the result(s) of
|
||
|
other nonterminals used to make the match. In the compositing expression,
|
||
|
so called because it composes together the various intermediate results into
|
||
|
one final result, <span class="extract"><span class="extract-syntax">R[1]</span></span> is the integer result of the first nonterminal in
|
||
|
the production, <span class="extract"><span class="extract-syntax">R[2]</span></span> the second, and so on; <span class="extract"><span class="extract-syntax">RP[1]</span></span> and so on hold the
|
||
|
pointer results. Here, on both productions, there's just one nonterminal
|
||
|
in the line, <ordinal-number> in the first case, <cardinal-number> in
|
||
|
the second. So the following refinement of <competitor> means that "4th
|
||
|
runner" matches with integer result 4, because <ordinal-number> matches
|
||
|
"4th" with integer result 4, and that goes into <span class="extract"><span class="extract-syntax">R[1]</span></span>. Similarly,
|
||
|
"runner no 17" ends up with integer result 17. "The pacemaker" matches
|
||
|
with integer result 1; here there are no intermediate results to make use
|
||
|
of, so <span class="extract"><span class="extract-syntax">R[...]</span></span> can't be used.
|
||
|
</p>
|
||
|
|
||
|
<pre class="displayed-code all-displayed-code code-font">
|
||
|
<span class="plain-syntax"> <competitor> ::=</span>
|
||
|
<span class="plain-syntax"> the pacemaker | ==> 1</span>
|
||
|
<span class="plain-syntax"> <ordinal-number> runner | ==> R[1]</span>
|
||
|
<span class="plain-syntax"> runner no <cardinal-number> ==> R[1]</span>
|
||
|
</pre>
|
||
|
<p class="commentary firstcommentary"><a id="SP3"></a><b>§3. </b>The arrows and expressions are optional, and if they are omitted, then the
|
||
|
result integer is set to the production number, counting up from 0. For
|
||
|
example, given the following, "polkadot" matches with result 1, and "green"
|
||
|
with result 2.
|
||
|
</p>
|
||
|
|
||
|
<pre class="displayed-code all-displayed-code code-font">
|
||
|
<span class="plain-syntax"> <race-jersey> ::=</span>
|
||
|
<span class="plain-syntax"> yellow | polkadot | green | white</span>
|
||
|
|
||
|
<span class="plain-syntax">Since I have found that well-known computer programmers look at me strangely</span>
|
||
|
<span class="plain-syntax">when I tell them that Inform doesn't use |yacc|, or |antlr|, or for that</span>
|
||
|
<span class="plain-syntax">matter any of the elegant theory of LALR parsers, perhaps an explanation</span>
|
||
|
<span class="plain-syntax">is called for.</span>
|
||
|
|
||
|
<span class="plain-syntax">One reason is that I am sceptical that formal grammars specify natural language</span>
|
||
|
<span class="plain-syntax">terribly well -- which is ironic, considering that the relevant computer</span>
|
||
|
<span class="plain-syntax">science, dating from the 1950s and 1960s, was strongly influenced by Noam</span>
|
||
|
<span class="plain-syntax">Chomsky's generative linguistics. Such formal descriptions tend to be too rigid</span>
|
||
|
<span class="plain-syntax">to be applied universally. The classical use case for |yacc| is to manage</span>
|
||
|
<span class="plain-syntax">hierarchies of associative operators on different levels: well, natural language</span>
|
||
|
<span class="plain-syntax">doesn't have those.</span>
|
||
|
|
||
|
<span class="plain-syntax">Another reason is that |yacc|-style grammars tend to react badly to uncompliant</span>
|
||
|
<span class="plain-syntax">input: that is, they correctly reject it, but are bad at diagnosing the</span>
|
||
|
<span class="plain-syntax">problem, and at recovering their wits afterwards. For Inform purposes, this</span>
|
||
|
<span class="plain-syntax">would be too sloppy: the user more often miscompiles than compiles, and quality</span>
|
||
|
<span class="plain-syntax">lies in how good our problem messages are in reply.</span>
|
||
|
|
||
|
<span class="plain-syntax">Lastly, there are two pragmatic reasons. In order to make Preform grammar</span>
|
||
|
<span class="plain-syntax">extensible, we couldn't use a parser-compiler like |yacc| anyway: we have to</span>
|
||
|
<span class="plain-syntax">interpret our grammar, not compile code to parse it. And we also want speed;</span>
|
||
|
<span class="plain-syntax">folk wisdom has it that |yacc| parsers are about half as fast as a shrewdly</span>
|
||
|
<span class="plain-syntax">hand-coded equivalent. (|gcc| abandoned the use of |bison| for exactly this</span>
|
||
|
<span class="plain-syntax">reason some years ago.) Until Preform's arrival in February 2011, Inform had a</span>
|
||
|
<span class="plain-syntax">hard-coded syntax analyser scattered throughout its code, which often made what</span>
|
||
|
<span class="plain-syntax">were provably the minimum possible number of comparisons. Even Preform's</span>
|
||
|
<span class="plain-syntax">parser is intentionally lean.</span>
|
||
|
</pre>
|
||
|
<nav role="progress"><div class="progresscontainer">
|
||
|
<ul class="progressbar"><li class="progressprev"><a href="3-idn.html">❮</a></li><li class="progresschapter"><a href="P-wtmd.html">P</a></li><li class="progresschapter"><a href="1-wm.html">1</a></li><li class="progresschapter"><a href="2-vcb.html">2</a></li><li class="progresschapter"><a href="3-lxr.html">3</a></li><li class="progresscurrentchapter">4</li><li class="progresscurrent">ap</li><li class="progresssection"><a href="4-lp.html">lp</a></li><li class="progresssection"><a href="4-to.html">to</a></li><li class="progresssection"><a href="4-prf.html">prf</a></li><li class="progresssection"><a href="4-bn.html">bn</a></li><li class="progressnext"><a href="4-lp.html">❯</a></li></ul></div>
|
||
|
</nav><!--End of weave-->
|
||
|
|
||
|
</main>
|
||
|
</body>
|
||
|
</html>
|
||
|
|