1
0
Fork 0
mirror of https://github.com/ganelson/inform.git synced 2024-07-17 06:24:24 +03:00
inform7/services/words-module/Chapter 4/About Preform.w

285 lines
12 KiB
OpenEdge ABL
Raw Normal View History

2020-05-13 01:33:17 +03:00
[AboutPreform::] About Preform.
A brief guide to Preform and how to use it.
@ Preform is a meta-language for writing a simple grammar: it's in some sense
2020-05-14 23:55:05 +03:00
pre-Inform, because it defines the Inform language itself, and has to be read
by the //words// module (on behalf of Inform) before Inform can parse anything.
For example,
= (text as Preform)
<competitor> ::=
<ordinal-number> runner |
runner no <cardinal-number>
=
The |::=| indicates a definition: the following-on lines, divided by the
vertical stroke, are possibilities tried in turn. Each "non-terminal", written
in angle brackets, can in principle match (or not match) against any wording.
When writing code in InC (the slight extension of C granted by inweb: see
//inweb: The InC Dialect//), this can actually be written as a function call:
= (text as C)
if (<competitor>(W)) ...
=
This function returns |TRUE| if a match is made, and |FALSE| if it is not.
But if a match is indeed made, there are side-effects too, as we shall see.
2020-05-13 01:33:17 +03:00
2020-05-14 23:55:05 +03:00
So, for example, the above grammar would match any of these possibilities:
= (text)
7th runner
third runner
runner no 7
runner no three
=
but would fail, for example,
2020-05-13 01:33:17 +03:00
= (text)
2020-05-14 23:55:05 +03:00
runner
7 runner
runner no 7th
ice cream sandwich
=
A small number of nonterminals are "internal", meaning that they are defined
2020-05-17 13:51:27 +03:00
by the Inform compiler modules; all of the rest are called "regular" and are
defined rather like <competitor>, i.e., with grammar spelled out.
2020-05-14 23:55:05 +03:00
@ Preform grammar is stored in a text file which is read by Inform early in
its run: see //LoadPreform::load//. In principle, different natural language
definitions can be made: thus, French translators could supply a French-localised
Preform grammar. In practice this whole area of Inform needs more work before
2020-05-17 13:51:27 +03:00
it can fully advance. Still, the principle is that the user can therefore
2020-05-14 23:55:05 +03:00
modify the underlying grammar used by Inform.
The standard Inform distribution comes with the English Preform: in fact, the
file is in |inform7/Internal/Languages/English/Syntax.preform|. However,
this file is not the "original": it is mechanically generated from the source
code of Inform by //inweb//. For example, the excerpt of grammar might have
come from some (hypothetical) source code looking like this:
= (text as Preform)
2020-05-13 01:33:17 +03:00
<competitor> ::=
2020-07-28 12:43:16 +03:00
<ordinal-number> runner | ==> { TRUE, - }
runner no <cardinal-number> ==> { FALSE, - }
2020-05-14 23:55:05 +03:00
=
Definitions like this one are scattered all across the Inform web, in order
to keep them close to the code which relates to them. //inweb// tears this
code in half lengthways: the left-hand side goes into the |Syntax.preform|
file mentioned above, and is then read into Inform at run-time; and the
2020-05-17 13:51:27 +03:00
right-hand side, which is essentially C, becomes code which takes action
on any successful match against the grammar.
2020-05-13 01:33:17 +03:00
@ Each nonterminal, when successfully matched, can provide both or more usually
2020-05-14 23:55:05 +03:00
just one of two results: an integer, to be stored in a variable called |*X|,
and a void pointer, to be stored in |*XP|, which is usually an object.
The example above, |<competitor>|, only results in an integer. The |==>| arrow
is optional, but if present, it says what the integer result is if the given
production is matched. So, for example, "runner bean" or "beetroot" would not
match <competitor>; "4th runner" would match with integer result |TRUE|;
"runner no 17" would match with integer result |FALSE|.
2020-05-13 01:33:17 +03:00
Usually, though, the result(s) of a nonterminal depend on the result(s) of
2020-05-14 23:55:05 +03:00
other nonterminals used to make the match. If that's so, then the expression
2020-05-17 13:51:27 +03:00
right of the arrow will have to combine these. In such a compositing expression,
2020-05-13 01:33:17 +03:00
so called because it composes together the various intermediate results into
one final result, |R[1]| is the integer result of the first nonterminal in
the production, |R[2]| the second, and so on; |RP[1]| and so on hold the
2020-05-17 13:51:27 +03:00
pointer results. For example, you could make a very crude calculator with:
= (text as Preform)
<arithmetic> ::=
2020-07-28 12:43:16 +03:00
<cardinal-number> | ==> { pass 1 }
<cardinal-number> plus <cardinal-number> ==> { R[1]+R[2], - }
2020-05-17 13:51:27 +03:00
=
Here |R[1]+R[2]| produces a result by composition of the two results of
2022-05-01 16:25:26 +03:00
the <cardinal-number> nonterminal which occurred when parsing the line.
2020-05-17 13:51:27 +03:00
So, for example, "seven" matches <arithmetic> with result 7, and "two plus
three" with result 5.
2020-05-13 01:33:17 +03:00
2020-05-17 13:51:27 +03:00
Or consider the following refinement of <competitor>:
2020-05-14 23:55:05 +03:00
= (text as Preform)
2020-05-13 01:33:17 +03:00
<competitor> ::=
2020-07-28 11:57:58 +03:00
the pacemaker | ==> { 1, - }
2020-07-28 12:43:16 +03:00
<ordinal-number> runner | ==> { pass 1 }
runner no <cardinal-number> ==> { pass 1 }
2020-05-14 23:55:05 +03:00
=
Now "4th runner" matches with integer result 4, because <ordinal-number>
matches "4th" with integer result 4, and that goes into |R[1]|. Similarly,
"runner no 17" ends up with integer result 17. "The pacemaker" matches
with integer result 1; here there are no intermediate results to make use
2020-07-28 12:43:16 +03:00
of, so |pass| can't be used.
2020-05-13 01:33:17 +03:00
@ The arrows and expressions are optional, and if they are omitted, then the
2020-05-14 23:55:05 +03:00
result integer is set to the alternative number, counting up from 0. For
2020-05-13 01:33:17 +03:00
example, given the following, "polkadot" matches with result 1, and "green"
with result 2.
2020-05-14 23:55:05 +03:00
= (text as Preform)
2020-05-13 01:33:17 +03:00
<race-jersey> ::=
yellow | polkadot | green | white
2020-05-14 23:55:05 +03:00
=
2020-05-17 02:20:21 +03:00
2020-05-17 13:51:27 +03:00
@ As a convenient abbreviation, a slash character can be used to divide
alternative possibilities for a single word. For example:
= (text as Preform)
<race-jersey> ::=
yellow | polkadot/polka-dot | green | white
=
matches "polka-dot" equivalently to "polkadot".
Another convenient notation is the caret |^|, which negates the effect of
a token. For example,
= (text as Preform)
<competitor> ::=
2020-07-28 11:57:58 +03:00
the ^adjudicator ==> { 1, - }
2020-05-17 13:51:27 +03:00
=
matches "the pacemaker", "the cyclist", etc. -- the anything at all, but not
"the adjudicator".
The final modifying notation is the underscore |_|, which forbids unexpected
use of upper casing. Thus
= (text as Preform)
<race-jersey> ::=
yellow | polkadot | _green | white
=
means that it will match Yellow, yellow, Polkadot, polkadot, green, White
and white, but not Green (except as the first word of a sentence, where
the use of capitalisation has no significance).
If the modifiers |^| or |_| are given for the first of a series of slashed
alternatives, they apply to all of the alternatives: thus |^cat/dog| matches
any word which is neither "cat" nor "dog".
If these characters are needed in their literal form, a backslash |\| can
be used to escape them. Thus |\_green| actually matches |_green|.
2020-05-17 02:20:21 +03:00
@ So far, the only ingredients of Preform syntax have been nonterminals and
fixed words, but Preform also has "wildcards". For example, in
= (text as Preform)
<competitor> ::=
man with ... on his ...
=
would match, for example, "man with number 17 on his back", or "man with a
chip on his shoulder". |...| matches any non-empty wording, and the text
actually matched is recorded for any successful match. Wordings like this
are numbered upwards from 1 to a maximum of 4, and are usually retrieved by
whatever part of Inform requested the parse, using the |GET_RW| macro. For
example:
= (text)
TEXT GET_RW(<competitor>, 1) GET_RW(<competitor>, 2)
man with number 17 on his back number 17 back
man with a chip on his shoulder a chip shoulder
=
A few internal nonterminals also generate word ranges, using |PUT_RW| to do so,
and word ranges can also be inherited up from one nonterminal to another with
|INHERIT_RANGES|: see //Loading Preform// for definitions of these macros.
There are in fact several different wildcards:
(a) |...| matches any non-empty text, as shown above.
(b) |***| matches any text, including possibly the empty text.
(c) |......| matches any non-empty text in which brackets are used in a
balanced way -- thus they would match "alpha beta gamma" or "alpha (the
Greek letter)", but not "alpha (the" or "Greek letter)".
(d) |###| matches any single word, counting words as the lexer does.
It is also possible to use braces to widen ranges. For example,
= (text as Preform)
<competitor> ::=
man with {... on his ...}
=
groups together anything matching |... on his ...| into a single range. There
need not even be a wildcard inside the braces:
= (text as Preform)
<competitor> ::=
{man} with {... on his ...}
=
works fine, and makes two ranges, the first of which is always just "man".
2020-05-17 13:51:27 +03:00
Once again, literal brace characters can be achieved using the |\| escape.
@ The alternative lines (or "productions", as they're called) in a regular
definition are normally given the internal numbers 0, 1, 2, 3... in the
order in which they appear. For example, in
= (text as Preform)
<competitor> ::=
2020-07-28 11:57:58 +03:00
the pacemaker | ==> { 1, - }
2020-07-28 12:43:16 +03:00
<ordinal-number> runner | ==> { pass 1 }
runner no <cardinal-number> ==> { pass 1 }
2020-05-17 13:51:27 +03:00
=
the |the pacemaker| row is numbered 0, |<ordinal-number> runner| is numbered 1,
and so on. Those "match numbers" have little outward significance, but help
to determine the result when a successful match is made. Consider:
= (text as Preform)
<letter-score> ::=
2020-07-28 11:57:58 +03:00
alpha | ==> { 10, - }
beta | ==> { 20, - }
gamma ==> { 30, - }
2020-05-17 13:51:27 +03:00
=
Here, matching against "beta" produces 20 -- the result on the same row. But
we can mess with that:
= (text as Preform)
<letter-score> ::=
2020-07-28 11:57:58 +03:00
/c/ alpha | ==> { 10, - }
/a/ beta | ==> { 20, - }
/b/ gamma ==> { 30, - }
2020-05-17 13:51:27 +03:00
=
The special notation |/X/|, where |X| is a lower-case letter, marks the row
as having a different number from the obvious one. |/a/| means 0, |/b/| means
1, and so on. The practical effect of the above is to achieve the equivalent
of this:
= (text as Preform)
<letter-score> ::=
2020-07-28 11:57:58 +03:00
beta | ==> { 10, - }
gamma | ==> { 20, - }
alpha ==> { 30, - }
2020-05-17 13:51:27 +03:00
=
That might seem a stupidly obfuscatory thing to do, and indeed it is, when
done in the main Inform source code -- which is why we never do it. But
Preform can also be used by translators of Inform to other languages, who might
supply, e.g., a French version of |Syntax.preform|. Or suppose in this instance
that the Inform source code contains <letter-score> but that a translator into
Hebrew wants to override that definition. Her Hebrew version of |Syntax.preform|
could then write:
= (text as Preform)
<letter-score> ::=
/a/ aleph |
/a/ alef |
/b/ beth
=
This translator wanted to provide two alternative ways to write the Hebrew
version of "alpha", one for "beta", but none for "gamma". Using the remappings
|/a/| and |/b/| here, she is able to make her lines behave as if they were
lines 1, 1, 2 of the original, rather than 1, 2, 3, which would have been the
default.
Because there are a few rather long nonterminal definitions in Inform, the
labelling runs |/a/|, |/b/|, ..., |/z/| and then continues |/aa/|, |/bb/|,
..., |/zz/|, thus allowing for up to 52 productions to be remapped in this way.
2020-05-17 02:20:21 +03:00
2020-05-17 13:51:27 +03:00
@ A similar form if remapping is allowed with word ranges, using a special
notation. Suppose the Inform source contained:
= (text as Preform)
<coloured-thing> ::=
{ <race-colour> } { jersey/helmet }
=
but we want this in French, where adjectives usually come after nouns. So this:
= (text as Preform)
<coloured-thing> ::=
{ maillot/casque } { <race-colour> }
=
wouldn't work -- it would set the word ranges the wrong way around. Instead:
= (text as Preform)
<coloured-thing> ::=
{ maillot/casque }?2 { <race-colour> }?1
=
says that word range 2 is to be the article of clothing, and word range 1 the
colour.
2020-05-19 18:36:50 +03:00
@ Preform turns out to be a useful notation for patterns of wording, and can
be put to other uses besides parsing source text. For these other uses, see
//Preform Utilities//. Specifically, and in rough order of complexity:
2020-05-19 18:36:50 +03:00
(*) Specifying text being generated by Inform -- see //PreformUtilities::merge//.
2020-05-19 18:36:50 +03:00
(*) Specifying replacements of one set of words by another -- see
//PreformUtilities::find_corresponding_word//.
2020-05-19 18:36:50 +03:00
(*) Saying how to build a trie which will detect patterns in a single word and
then modify it -- see //PreformUtilities::define_trie//.
2020-05-19 18:36:50 +03:00
@ Finally, syntax errors in Preform are reported by //PreformUtilities::production_error//.