2020-05-17 02:20:21 +03:00
|
|
|
[Nonterminals::] Nonterminals.
|
|
|
|
|
|
|
|
The angle-bracketed terms appearing in Preform grammar.
|
|
|
|
|
|
|
|
@h How nonterminals are stored.
|
|
|
|
Each different nonterminal defined in the |Syntax.preform| code read in,
|
|
|
|
such as <any-integer>, is going to correspond to a global variable in the
|
|
|
|
program reading it in, such as |any_integer_NTM|. On the face of it, this is
|
|
|
|
impossible. How can what happens at run-time affect what variables are named
|
|
|
|
at compile time?
|
|
|
|
|
|
|
|
The answer is that the //inweb// literate programming tool looks through the
|
|
|
|
complete source code, sees the Preform nonterminals described in it, and
|
|
|
|
inserts declarations of the corresponding variables into the "tangled" form
|
|
|
|
of the source code sent to a C compiler to make the actual program. (This is
|
|
|
|
a feature of //inweb// available only for programs written in InC.)
|
|
|
|
|
|
|
|
In particular, the tangler of |inweb| replaces the |[[nonterminals]]| below with
|
|
|
|
invocations of the |REGISTER_NONTERMINAL| and |INTERNAL_NONTERMINAL| macros.
|
|
|
|
For example, it inserts the C line:
|
|
|
|
= (text as C)
|
|
|
|
INTERNAL_NONTERMINAL(L"<any-integer>", any_integer_NTM, 1, 1);
|
|
|
|
=
|
|
|
|
since this is an "internal" nonterminal; and the macro will then expand
|
|
|
|
to code which sets up |any_integer_NTM| -- see below.
|
|
|
|
|
|
|
|
=
|
|
|
|
void Nonterminals::register(void) {
|
|
|
|
/* The following is not valid C, but causes Inweb to insert lines which are */
|
|
|
|
[[nonterminals]];
|
|
|
|
/* Back to regular C now */
|
|
|
|
nonterminal *nt;
|
|
|
|
LOOP_OVER(nt, nonterminal)
|
|
|
|
if ((nt->marked_internal) && (nt->internal_definition == NULL))
|
|
|
|
internal_error("internal nonterminal has no definition function");
|
|
|
|
}
|
|
|
|
|
|
|
|
@ So, then, //inweb// tangles out code which uses the |REGISTER_NONTERMINAL|
|
|
|
|
macro for any standard nonterminal, and also tangles a compositor function for
|
|
|
|
it; the name of which is the nonterminal's name with a |C| suffix. For example,
|
|
|
|
suppose //inweb// sees the following in the web it is tangling:
|
|
|
|
= (text as Preform)
|
|
|
|
<competitor> ::=
|
2020-07-28 11:57:58 +03:00
|
|
|
the pacemaker | ==> { 1, - }
|
2020-07-28 12:43:16 +03:00
|
|
|
<ordinal-number> runner | ==> { pass 1 }
|
|
|
|
runner no <cardinal-number> ==> { pass 1 }
|
2020-05-17 02:20:21 +03:00
|
|
|
=
|
|
|
|
It then tangles this macro usage into //Nonterminals::register// above:
|
|
|
|
= (text as C)
|
|
|
|
REGISTER_NONTERMINAL(L"<competitor>", competitor_NTM);
|
|
|
|
=
|
|
|
|
And it also tangles matching declarations for:
|
|
|
|
(a) the global variable |competitor_NTM|, of type |nonterminal *|;
|
|
|
|
(b) the "compositor function" |competitor_NTMC|, which is a function to
|
|
|
|
deal with what happens when a successful match is made against the grammar --
|
|
|
|
this incorporates the material which //inweb// finds to the right of the |==>|
|
|
|
|
markers in the Preform definition.
|
|
|
|
|
|
|
|
But if we left things at that, we would find ourselves at run-time with
|
|
|
|
a null variable, a function not called from anywhere, and an instance
|
|
|
|
somewhere in memory of a nonterminal read in from Preform syntax and
|
|
|
|
called |"<competitor>"|, but which has no apparent connection to either
|
|
|
|
the function or the variable. We clearly need to join these together.
|
|
|
|
|
|
|
|
And so the |REGISTER_NONTERMINAL| macro expands to code which initialises the
|
|
|
|
variable to the nonterminal having its name, and then connects that to the
|
|
|
|
compositor function:
|
|
|
|
|
|
|
|
@d REGISTER_NONTERMINAL(quotedname, identifier)
|
|
|
|
identifier = Nonterminals::find(Vocabulary::entry_for_text(quotedname));
|
2020-05-17 13:51:27 +03:00
|
|
|
identifier->compositor_fn = identifier##C;
|
2020-05-17 02:20:21 +03:00
|
|
|
|
|
|
|
@ For example, this might expand to:
|
|
|
|
= (text as C)
|
|
|
|
competitor_NTM = Nonterminals::find(Vocabulary::entry_for_text(L"<competitor>"));
|
2020-05-17 13:51:27 +03:00
|
|
|
competitor_NTM->compositor_fn = competitor_NTMC;
|
2020-05-17 02:20:21 +03:00
|
|
|
=
|
|
|
|
Note that it is absolutely necessary that |Nonterminals::find| does
|
|
|
|
return a nonterminal. But we can be sure that it does, since the function creates
|
|
|
|
a nonterminal object of that name even if one does not already exist.
|
|
|
|
|
|
|
|
@ The position for internal nonterminals (i.e. those defined by a function
|
|
|
|
written by the programmer, not by Preform grammar lines) is similar:
|
|
|
|
(a) again there is a global variable, say |any_integer_NTM|, of type |nonterminal *|;
|
|
|
|
(b) but now there is no compositor, and instead there is a function |any_integer_NTMR|
|
|
|
|
which actually performs the parse directly.
|
|
|
|
|
|
|
|
The |INTERNAL_NONTERMINAL| macro similarly initialises and connects these
|
|
|
|
declarations. |min| and |max| are conveniences for speedy parsing, and supply
|
|
|
|
the minimum and maximum number of words that the nonterminal can match; these
|
|
|
|
are needed because the Preform optimiser can't see inside |any_integer_NTMR| to
|
|
|
|
calculate those bounds for itself. |max| can be infinity, in which case we
|
|
|
|
use the constant |INFINITE_WORD_COUNT| for it.
|
|
|
|
|
|
|
|
@d INTERNAL_NONTERMINAL(quotedname, identifier, min, max)
|
|
|
|
identifier = Nonterminals::find(Vocabulary::entry_for_text(quotedname));
|
2020-05-19 13:46:13 +03:00
|
|
|
identifier->opt.nt_extremes = LengthExtremes::new(min, max);
|
2020-05-17 02:20:21 +03:00
|
|
|
identifier->internal_definition = identifier##R;
|
|
|
|
identifier->marked_internal = TRUE;
|
|
|
|
|
|
|
|
@ So, then, the following rather lengthy class declaration shows what goes
|
|
|
|
into a nonterminal. Note that nonterminals are uniquely identifiable by their
|
|
|
|
names: there can be only one called, say, <any-integer>. This is why its
|
|
|
|
textual name is referred to as an "ID".
|
|
|
|
|
|
|
|
=
|
|
|
|
typedef struct nonterminal {
|
|
|
|
struct vocabulary_entry *nonterminal_id; /* e.g. |"<any-integer>"| */
|
|
|
|
|
|
|
|
/* For internal nonterminals */
|
|
|
|
int marked_internal; /* has, or will be given, an internal definition... */
|
|
|
|
int (*internal_definition)(wording W, int *result, void **result_p); /* ...this one */
|
|
|
|
int voracious; /* if true, scans whole rest of word range */
|
|
|
|
|
|
|
|
/* For regular nonterminals */
|
2020-05-22 11:38:17 +03:00
|
|
|
struct production_list *first_pl; /* if not internal, this defines it */
|
2020-05-17 13:51:27 +03:00
|
|
|
int (*compositor_fn)(int *r, void **rp, int *i_s, void **i_ps, wording *i_W, wording W);
|
2020-05-17 02:20:21 +03:00
|
|
|
int multiplicitous; /* if true, matches are alternative syntax tree readings */
|
2020-05-19 13:46:13 +03:00
|
|
|
int number_words_by_production; /* this parses names for numbers, like "huit" or "zwei" */
|
|
|
|
unsigned int flag_words_in_production; /* all words in the production should get these flags */
|
2020-05-17 02:20:21 +03:00
|
|
|
|
|
|
|
/* Storage for most recent correct match */
|
|
|
|
struct wording range_result[MAX_RANGES_PER_PRODUCTION]; /* storage for word ranges matched */
|
|
|
|
|
2020-05-17 15:37:39 +03:00
|
|
|
struct nonterminal_optimisation_data opt; /* see //The Optimiser// */
|
|
|
|
struct nonterminal_instrumentation_data ins; /* see //Instrumentation// */
|
|
|
|
|
2020-05-17 02:20:21 +03:00
|
|
|
CLASS_DEFINITION
|
|
|
|
} nonterminal;
|
|
|
|
|
|
|
|
@ A few notes on this are in order:
|
|
|
|
|
|
|
|
(a) As noted above, every nonterminal is either "internal" or "regular". If
|
|
|
|
internal, it is defined by a function; if regular, it is defined by lines
|
|
|
|
of grammar (called "productions") and a compositor function.
|
|
|
|
|
|
|
|
(b) A few internal nonterminals are "voracious". These are given the entire
|
|
|
|
word range for their productions to eat, and encouraged to eat as much as they
|
|
|
|
like, returning a word number to show how far they got. While this effect
|
|
|
|
could be duplicated with non-voracious nonterminals, that would be quite a bit
|
|
|
|
slower, since it would have to test every possible word range.
|
|
|
|
|
|
|
|
(c) A few regular nonterminals are "multiplicitous". These composite their
|
|
|
|
results in a way special to the Inform compiler's syntax tree, by stacking
|
|
|
|
them up as alternative possible readings of the same text. Ordinarily, the
|
|
|
|
result of parsing text against a nonterminal is that the first grammar line
|
|
|
|
matching that text determines the meaning, but for a multiplicitous nonterminal,
|
|
|
|
every line matching the text determines one of perhaps many possible meanings.
|
|
|
|
|
2020-05-19 13:46:13 +03:00
|
|
|
(d) For numbering and flagging on regular NTs, see //Nonterminals::make_numbering//
|
|
|
|
below.
|
|
|
|
|
|
|
|
(e) The optimisation data helps the parser to reject non-matching text quickly.
|
2020-05-17 02:20:21 +03:00
|
|
|
For example, if the optimiser can determine that <competitor> only ever matches
|
|
|
|
texts of between 3 and 7 words in length, it can quickly reject any run of
|
|
|
|
words outside that range. (However: note that a maximum of 0 means that the
|
|
|
|
maximum and minimum word counts are disregarded.) The other fields are harder
|
|
|
|
to explain -- see //The Optimiser//.
|
|
|
|
|
|
|
|
@ So, then, as noted above, nonterminals are identified by their name-words.
|
|
|
|
The following is not especially fast but doesn't need to be: it's used only
|
|
|
|
when Preform grammar is parsed, not when Inform text is parsed.
|
|
|
|
|
|
|
|
=
|
|
|
|
nonterminal *Nonterminals::detect(vocabulary_entry *name_word) {
|
|
|
|
nonterminal *nt;
|
|
|
|
LOOP_OVER(nt, nonterminal)
|
|
|
|
if (name_word == nt->nonterminal_id)
|
|
|
|
return nt;
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
@ And the following always returns one, creating it if necessary:
|
|
|
|
|
|
|
|
=
|
|
|
|
nonterminal *Nonterminals::find(vocabulary_entry *name_word) {
|
|
|
|
nonterminal *nt = Nonterminals::detect(name_word);
|
|
|
|
if (nt == NULL) {
|
|
|
|
nt = CREATE(nonterminal);
|
|
|
|
nt->nonterminal_id = name_word;
|
|
|
|
|
|
|
|
nt->marked_internal = FALSE; /* by default, nonterminals are regular */
|
|
|
|
nt->internal_definition = NULL;
|
|
|
|
nt->voracious = FALSE;
|
|
|
|
|
2020-05-22 11:38:17 +03:00
|
|
|
nt->first_pl = NULL;
|
2020-05-17 13:51:27 +03:00
|
|
|
nt->compositor_fn = NULL;
|
2020-05-17 02:20:21 +03:00
|
|
|
nt->multiplicitous = FALSE;
|
2020-05-19 13:46:13 +03:00
|
|
|
nt->number_words_by_production = FALSE; /* i.e., don't */
|
|
|
|
nt->flag_words_in_production = 0; /* i.e., apply no flags */
|
2020-05-17 15:37:39 +03:00
|
|
|
|
|
|
|
for (int i=0; i<MAX_RANGES_PER_PRODUCTION; i++)
|
|
|
|
nt->range_result[i] = EMPTY_WORDING;
|
|
|
|
|
|
|
|
Optimiser::initialise_nonterminal_data(&(nt->opt));
|
|
|
|
Instrumentation::initialise_nonterminal_data(&(nt->ins));
|
2020-05-17 02:20:21 +03:00
|
|
|
}
|
|
|
|
return nt;
|
|
|
|
}
|
|
|
|
|
|
|
|
@h Word ranges in a nonterminal.
|
|
|
|
We now need to define the macros |GET_RW| and |PUT_RW|, which get and set
|
|
|
|
the results of a successful match against a nonterminal (see //About Preform//
|
|
|
|
for more on this).
|
|
|
|
|
|
|
|
We do so by giving each nonterminal a small array of |wording|s, which are
|
|
|
|
lightweight structures incurring little time or space overhead. The fact that
|
|
|
|
they are attached to the NT itself, rather than, say, being placed on a
|
|
|
|
parsing stack of some kind, makes them faster to access, but is possible only
|
|
|
|
because the parser never backtracks. Similarly, results word ranges are
|
|
|
|
overwritten if a nonterminal calls itself directly or indirectly: that is, the
|
|
|
|
inner one's results are wiped out by the outer one. But this is no problem,
|
|
|
|
since we never extract word-ranges from grammar which is recursive.
|
|
|
|
|
|
|
|
Word range 0 is reserved in case we ever need it for the entire text matched
|
|
|
|
by the nonterminal, though at present we don't need that.
|
|
|
|
|
|
|
|
@d MAX_RANGES_PER_PRODUCTION 5 /* in fact, one less than this, since range 0 is reserved */
|
|
|
|
@d GET_RW(nt, N) (nt->range_result[N])
|
|
|
|
@d PUT_RW(nt, N, W) { nt->range_result[N] = W; }
|
|
|
|
@d INHERIT_RANGES(from, to) {
|
|
|
|
for (int i=1; i<MAX_RANGES_PER_PRODUCTION; i++) /* not copying range 0 */
|
|
|
|
to->range_result[i] = from->range_result[i];
|
|
|
|
}
|
|
|
|
@d CLEAR_RW(from) {
|
|
|
|
for (int i=0; i<MAX_RANGES_PER_PRODUCTION; i++) /* including range 0 */
|
|
|
|
from->range_result[i] = EMPTY_WORDING;
|
|
|
|
}
|
|
|
|
|
|
|
|
@h Other results.
|
|
|
|
The parser records the result of the most recently matched nonterminal in the
|
|
|
|
following global variables -- which, unlike word ranges, are not attached to
|
|
|
|
any single NT.
|
|
|
|
|
|
|
|
//inweb// translates the notation |<<r>>| and |<<rp>>| to these variable names:
|
|
|
|
|
|
|
|
=
|
|
|
|
int most_recent_result = 0; /* the variable which |inweb| writes |<<r>>| */
|
|
|
|
void *most_recent_result_p = NULL; /* the variable which |inweb| writes |<<rp>>| */
|
2020-05-19 13:46:13 +03:00
|
|
|
|
|
|
|
@h Flagging and numbering.
|
|
|
|
The following mechanism arranges for words used in the grammar for a NT to
|
|
|
|
be given properties just because of that -- either flags or numerical values.
|
|
|
|
For example, if we wanted the numbers from Stoppard's play "Dogg's Hamlet",
|
|
|
|
we might have:
|
|
|
|
= (text as Preform)
|
|
|
|
<dogg-numbers> ::=
|
|
|
|
sun | dock | trog | slack | pan
|
|
|
|
=
|
|
|
|
And if <dogg-numbers> were made a "numbering" NT, the effect would be that
|
|
|
|
these five words would pick up the numerical values 1, 2, 3, 4, 5, because
|
|
|
|
they occur in production number 1, 2, 3, 4, 5 for the NT.
|
|
|
|
|
|
|
|
=
|
|
|
|
void Nonterminals::make_numbering(nonterminal *nt) {
|
|
|
|
nt->number_words_by_production = TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
@ Similarly, we could flag this NT with |NUMBER_MC|, and then the five words
|
|
|
|
sun, dock, trog, slack, pan would all pick up the |NUMBER_MC| flag
|
|
|
|
automatically.
|
|
|
|
|
|
|
|
=
|
|
|
|
void Nonterminals::flag_words_with(nonterminal *nt, unsigned int flags) {
|
|
|
|
nt->flag_words_in_production = flags;
|
|
|
|
}
|
|
|
|
|
|
|
|
@ This is all done by the following function, which is called when a word |ve|
|
|
|
|
is read as part of a production with match number |pc| for the nonterminal |nt|:
|
|
|
|
|
|
|
|
=
|
|
|
|
void Nonterminals::note_word(vocabulary_entry *ve, nonterminal *nt, int pc) {
|
|
|
|
ve->flags |= (nt->flag_words_in_production);
|
|
|
|
if (nt->number_words_by_production) ve->literal_number_value = pc;
|
|
|
|
}
|