§13.1.1. Otherwise a request must be the name of a single debugging aspect.
diff --git a/docs/inflections-module/2-np.html b/docs/inflections-module/2-np.html
index 525277019..23e943d16 100644
--- a/docs/inflections-module/2-np.html
+++ b/docs/inflections-module/2-np.html
@@ -244,7 +244,7 @@ single avinue.
if (first == NULL) continue;ptoken *second = first->next_ptoken;if ((second) && (second->next_ptoken)) {
-LoadPreform::log_production(pr, FALSE);
+Instrumentation::log_production(pr, FALSE);Conjugation::trie_definition_error(nt, pr, "trie line with more than 2 words"); }Consider the one- or two-token production in this nonterminal7.1.1;
diff --git a/docs/inflections-module/3-ga.html b/docs/inflections-module/3-ga.html
index 6536325c4..1ad30b994 100644
--- a/docs/inflections-module/3-ga.html
+++ b/docs/inflections-module/3-ga.html
@@ -70,7 +70,7 @@ The following does the suffixing:
<comparative-construction>::=...than
-
§7. Plural inflections. The following takes a single word, assumes it to be a noun which meaningfully
has a plural, and modifies it to the plural form. ("Golf" is a noun which
doesn't sensibly have a plural; the algorithm here would return "golves".)
@@ -333,7 +333,7 @@ of a sibilant plus "o" suffix to include an "e", so that Conway produces
...<en-trie-plural-regular-inflections>|...<en-trie-plural-append-s>
-
§8. See Conway's table A.2. The following nouns, mostly names of kinds of animal,
have the same plural as singular form: for example, chamois, salmon, goldfish.
@@ -391,7 +391,7 @@ have the same plural as singular form: for example, chamois, salmon, goldfish.
elk0|pincers0
-
§11. Step 5. Now we reach a batch of irregular but fairly general inflected
endings; for example, protozoon to protozoa, or metamorphosis to metamorphoses.
Note that we differ from Conway in pluralizing blouse as blouses, not blice.
@@ -461,7 +461,7 @@ Note that we differ from Conway in pluralizing blouse as blouses, not blice.
*sis3ses|*xis3xes
-
§13. Step 11a. (We're not implementing Conway's steps in sequence: see below.)
These -o endings are mostly loan words from Romance languages whose original
inflections are assimilated.
@@ -534,7 +534,7 @@ inflections are assimilated.
stylostylos|tempotempos
-
§16. Verb inflections. "Le verbe est l'âme d'une langue" (attributed to Georges Duhamel). And the
care of the soul is, of course, complicated. For example, the source text can
say something like this:
@@ -787,7 +787,7 @@ Inform assertion sentences, but are needed for text substitutions.)
won't<informal-negated-modal-conjugation>|...<regular-verb-conjugation>
-
§21. We will start with two auxiliary verbs, that is, verbs used to construct
forms of other verbs. The first is "to have"; as we'll see, English uses
this to construct perfect tenses:
@@ -850,7 +850,7 @@ for "[have]".
<not-instance-of-verb-at-run-time>|<to-have-tabulation>
-
§23. And this is an example of splitting into cases for the six persons,
1PS, 2PS, 3PS, 1PP, 2PP, 3PP. I have, you have, he has, we have, you have,
they have. (This is more excitingly varied in other languages, of course.)
@@ -999,7 +999,7 @@ they have. (This is more excitingly varied in other languages, of course.)
<to-have-present>::=have|have|has|have|have|have
-
§24. Next we have "to do", which is like "to have" in being fairly regular,
as irregular verbs go. But we treat this as a special case because, again,
we're going to need as an auxiliary verb when forming negatives ("Peter
@@ -1065,7 +1065,7 @@ may have to revisit this for languages other than English.)
<to-do-present>::=do|do|does|do|do|do
-
§25. Regular English verbs, then, look like so. We will, for the first time,
make heavy use of our numbered verb forms: for example, for the verb
"to take", they would be "take" (1), "taking" (2), "taken" (3),
@@ -1097,7 +1097,7 @@ to "grabs onto", "grabbing onto" and so on.
6<en-trie-past>|<regular-verb-tabulation>
-
§26. Here we see our auxiliary verbs in use. For the negated present tense,
"Peter does not carry the ball"; for the negated past tense, "Peter did
not carry the ball" — in both cases, this is "to do" plus the infinitive
@@ -1121,7 +1121,7 @@ a bequest".)
a5-willnot1|p*3by
-
§27. This looks odd, but what it says is that the present tense of a regular
English verb is always the infinitive (I take, you take, we take, and so on)
except for third person singular (he takes), which is different. (It's usually
@@ -1133,7 +1133,7 @@ as we'll see.)
<regular-verb-present>::=1|1|5|1|1|1
-
§29. Except for tense formation (Peter "will" take the ball), the most common
modal verb which can be used in Inform source text is "can". For example:
@@ -1198,7 +1198,7 @@ to elide, so we always pronounce it that way and the spelling now follows.
a5+willbeableto++1|a5-willnotbeableto++1
-
§30. Inform has only a simple understanding of what "can" means, so it doesn't
allow the source text to use "can" in combination with arbitrary verbs.
Instead, each legal combination has to be declared explicitly:
@@ -1241,7 +1241,7 @@ Jane".
a(beableto)4|p(beableto)be3(4)by
-
§31. The following handles the other English modal verbs ("might", "should"
and so on) surprisingly easily. The notation ++1 means that the verb
being modified should appear in verb form 1, and so on: for example,
@@ -1266,7 +1266,7 @@ being modified should appear in verb form 1, and so on: for example,
a5+4++1|a5-4not++1
-
§32. That completes our basic kit of verbs nicely. What's left is used only
for generating text at run-time — for printing adaptive messages, that is;
none of these oddball exceptional cases is otherwise used as a verb in
@@ -1328,7 +1328,7 @@ dialects — and we aren't even going to try to cope with that.
<contracted-to-be-past-negated>::=wasn't|weren't|wasn't|weren't|weren't|weren't
-
§34. Now we come to "aren't", a negated form of "to be", but where the
contraction occurs between the verb and the "not" rather than between
the subject and the verb.
@@ -1417,7 +1417,7 @@ that option here.)
<arent-perfect>::=haven'tbeen|haven'tbeen|hasn'tbeen|haven'tbeen|haven'tbeen|haven'tbeen
-
§37. We have special tries just to list the forms of the cases we will
deal with. Tries can do fancy things (see below), but here they act just as
a look-up table: for example, "won't" has present "won't", past
@@ -1516,7 +1516,7 @@ signs can be used if we absolutely have to introduce spaces.
couldn'tcouldn't|shouldn'tshouldn't
-
§38. That's the end of the conjugations — the easy part, it turns out. We now
need to create the four tries to make verb forms out of the infinitive:
the present participle, the past participle, the third-person singular
@@ -1574,7 +1574,7 @@ version of Greenbaum's rules above.
...<en-trie-regular-b-present-participle>|...<en-trie-regular-c-present-participle>
-
§40. First of all there are some irregular cases — some for the usual suspects,
but others for oddball verbs where English breaks the normal phonetic rules
for the sake of clarity. For example, the participle of "singe" ought to
@@ -1770,7 +1770,7 @@ from the act of producing a song.
stye1ing|undersaye1ing
-
§44. Next the past participle. As noted above, for most verbs this is the same
as the past (e.g., he agreed and it was agreed); but there's a list of
exceptions for Anglo-Saxon survivals (e.g., he chose and it was chosen).
@@ -2024,7 +2024,7 @@ removed.
wringwrung|writewritten
-
§45. That's the mandatory participles sorted out; so now we move on to the two
additional verb forms used by English. First, the present form: a curiosity
of English is that this is almost always formed as if it were the plural of the
@@ -2042,7 +2042,7 @@ of exceptions to this.
havehas|dodoes
-
§46. Second, the past. This is harder. Once again we have a catalogue of
Anglo-Saxon past forms (e.g., he chose, not he chooses); and after those
are out of the way, the rules are the same as for the present participle,
@@ -2675,7 +2675,7 @@ when to double the consonant, which again depends on stress.
*y1ied| shied, tried*0ed
-
§49. Grading of adjectives is more interesting. These spelling rules are taken
from the Oxford English Grammar at 4.24, "Gradability and comparison".
Something we can't easily implement is that a final vowel plus consonant
@@ -3237,7 +3237,7 @@ rare in English adjectives.
*<aeiou><bcdfghkmlnprstvxyz>0+est|*0est
-
§50. To the best of my knowledge there's no technical term for "the noun which
is formed from an adjective to refer to the quality it measures", so the
Inform source code calls this the "quiddity". English permits several
@@ -3257,7 +3257,7 @@ sometimes less elegant, but never means the wrong thing.
*<bcdfghkmlnprstvwxyz>y1iness| e.g. "happy" to "happiness"*0ness
-
§51. English has almost no noun cases at all, with the only exceptions being
Anglo-Saxon pronouns (thus we distinguish "they" and "them" as nominative
and accusative, for example); and pronouns we handle separately in any
@@ -3271,7 +3271,7 @@ case. We won't bother to distinguish gender:
<noun-declension>::=*<en-noun-declension-group><en-noun-declension-tables>
-
§2. Possessives. Inform uses these not only for parsing but also to inflect text. For example,
if every person is given a nose, the player will see it as "my nose" not
"your nose". Inform handles such inflections by converting a pronoun in
@@ -108,7 +108,7 @@ person to second person).
its/his/her|==> 1 singulartheir==> 2 plural
-
§4. The articles need to be single words, and the following two productions
have an unusual convention: they are required to have production numbers
which encode both the implied grammatical number and gender.
@@ -142,7 +142,7 @@ number in any case, so we end up with something quite dull:
/a/a/an|/d/some
-
§7. These are registered as excerpt meanings with the ADJECTIVE_MC meaning
code, so parsing a word range to match an adjective is easy. By construction
there is only one adjectival_phrase for any given excerpt of text, so
diff --git a/docs/linguistics-module/3-cao.html b/docs/linguistics-module/3-cao.html
index e49bf110b..3a12768a5 100644
--- a/docs/linguistics-module/3-cao.html
+++ b/docs/linguistics-module/3-cao.html
@@ -92,7 +92,7 @@ numbers run from 0 to 12 since that's what clocks need.
eleven|twelve
-
§3. Those two nonterminals here simply supply text: for efficiency reasons we
don't actually parse them, although they would give the correct response if
we did. Instead they're scanned for words which are marked with the appropriate
@@ -144,11 +144,11 @@ numbers.
Optimiser::assign_bitmap_bit(<cardinal-number>, 0);Optimiser::assign_bitmap_bit(<ordinal-number>, 1);
-<cardinal-number-in-words>->number_words_by_production = TRUE;
-<cardinal-number-in-words>->flag_words_in_production = NUMBER_MC;
+<cardinal-number-in-words>->opt.number_words_by_production = TRUE;
+<cardinal-number-in-words>->opt.flag_words_in_production = NUMBER_MC;
-<ordinal-number-in-words>->number_words_by_production = TRUE;
-<ordinal-number-in-words>->flag_words_in_production = ORDINAL_MC;
+<ordinal-number-in-words>->opt.number_words_by_production = TRUE;
+<ordinal-number-in-words>->opt.flag_words_in_production = ORDINAL_MC;}
§4. Actual parsing is done here. We look at a single word to see if it's a
@@ -175,7 +175,7 @@ in decimal digits, perhaps with a minus sign.
returnFALSE;}
-
§4.1. These mustn't match any number too large to fit into the virtual machine
being compiled to, so "42000", for instance, is not a valid literal if Inform
is parsing text in a work intended for the 16-bit Z-machine.
@@ -213,7 +213,7 @@ project, with the user not realising the consequences.
returnFALSE;}
-
diff --git a/docs/linguistics-module/3-daq.html b/docs/linguistics-module/3-daq.html
index 221490615..8de2adc8d 100644
--- a/docs/linguistics-module/3-daq.html
+++ b/docs/linguistics-module/3-daq.html
@@ -509,7 +509,7 @@ no text appears in front of the number "three".
greaterthan|otherthan
-
§14. Parsing the determiner at the head of a noun phrase. We run through the possible determiners in reverse creation order, choosing the
first which matches. The following returns \(-1\) if nothing was found, or
else the first word number after the determiner words, and in that case
@@ -568,7 +568,7 @@ names like "three of clubs", meaning a single playing card.
of...|==> TRUE; return FAIL_NONTERMINAL...==> TRUE
-
§17. We attempt to see if the word range begins with (or consists of) text which
refers to the given determiner, returning the first word past this text and
also (where appropriate) setting the number specified. For instance, for
diff --git a/docs/linguistics-module/4-aoc.html b/docs/linguistics-module/4-aoc.html
index 743270d95..cc5b20e2f 100644
--- a/docs/linguistics-module/4-aoc.html
+++ b/docs/linguistics-module/4-aoc.html
@@ -122,7 +122,7 @@ and what has been the case in the past.
never|==> IMPOSSIBLE_CEinitially==> INITIALLY_CE
-
diff --git a/docs/linguistics-module/4-prp.html b/docs/linguistics-module/4-prp.html
index ead509644..d9729fd3f 100644
--- a/docs/linguistics-module/4-prp.html
+++ b/docs/linguistics-module/4-prp.html
@@ -186,7 +186,7 @@ preposition, but note that it does so by testing in creation order.
returnFALSE;}
§9. It's often useful to look for prepositions which can be combined with the
copular verb "to be". These are tested in order of the list of possible
verb forms for "to be', which is constructed with longer prepositions first.
@@ -210,7 +210,7 @@ So it will find the longest match.
returnFALSE;}
-
§16. A copular verb is one which implies the equality relation: in practice,
that means it's "to be". So the following matches "is", "were not",
and so on.
@@ -568,7 +568,7 @@ and so on.
returnFALSE;}
-
§17. A noncopular verb is anything that isn't copular, but here we also require
it to be in the present tense and the negative sense. So, for example, "does
not carry" qualifies; "is not" or "supports" don't qualify.
@@ -593,7 +593,7 @@ not carry" qualifies; "is not" or "supports" don't qualify.
returnFALSE;}
-
§22. These three nonterminals are used by Inform only to recognise constant
names for verbs. For example, the parsing of the Inform constants "the verb take"
or "the verb to be able to see" use these.
@@ -746,7 +746,7 @@ or "the verb to be able to see" use these.
returnFALSE;}
-
§5. Articled nounphrases (NP2). Although, again, any text is acceptable, now we now take note of the definite
or indefinite article, and also of whether it's used in the singular or the
plural.
@@ -186,7 +186,7 @@ leave the text empty.
<if-not-deliberately-capitalised><definite-article><nounphrase>|==> 0; *XP = RP[3]; Annotate node by definite article5.2;<nounphrase>==> 0; *XP = RP[1]
-
§11. Worldly nounphrases (NP4). That just leaves the big one. It comes in two versions, for the object and
subject NPs of a regular sentence, but they are almost exactly the same. They
differ slightly, as we'll see, in the handling of relative phrases; when
@@ -335,7 +335,7 @@ accusative.)
<if-not-deliberately-capitalised><np-relative-phrase-unlimited>|==> 0; *XP = RP[2]<np-inner-without-rp>==> 0; *XP = RP[1]
-
§12. So here we go with relative phrases. We've already seen that our two general
forms of NP differ only in the range of RPs allowed at the top level: here we
see, furthermore, that the only limitation is that in the subject of an
@@ -390,7 +390,7 @@ in the case of a participle like "holding".
<np-relative-phrase-implicit>|==> 0; *XP = RP[1]<np-relative-phrase-explicit>==> 0; *XP = RP[1]
-
§13. Finally, we define what we mean by implicit and explicit relative phrases.
As examples, the right-hand sides of:
@@ -420,7 +420,7 @@ directions, in particular, a little better.
<permitted-preposition>_,/and|==> 0; return FAIL_NONTERMINAL;<permitted-preposition><np-inner-without-rp>==> 0; Work out a meaning13.1;
-
§3. We will want to spot adverbs of certainty adjacent to the verb itself;
English allows these either side, so "A man is usually happy" and "Peter
certainly is happy" are both possible. Note that these adverbs can divide
@@ -160,7 +160,7 @@ the plain", where "mainly" divides "lies" from "in".
<post-verb-certainty>::=<certainty>...==> R[1]
-
§4. Relative clauses ("a woman who is on the stage") are detected by the presence
of a marker word before the verb (in this example, "who"). Of course, such
a word doesn't always mean we have a relative clause, so we will need to be a
@@ -174,7 +174,7 @@ little careful using this nonterminal.
<pre-verb-rc-marker>::=...<relative-clause-marker>
-
§5. For purely pragmatic reasons, we'll want to avoid reading prepositions (and
thus implicit relative clauses, such as "the cat in the hat") where they occur
after the word "called" in a sentence. For example, "a cat called Puss in
@@ -185,7 +185,7 @@ Boots" must not be thought to be in Boots.
<phrase-with-calling>::=...called...
-
§6. Main nonterminal. And so this nonterminal turns a sentence into a small parse tree. Between 2010
and early 2016, this was implemented in straight Preform rather than as an
internal, but that became simply too complicated to maintain once imperative
@@ -207,7 +207,7 @@ verbs were added to the grammar. Still, it was a pity.
returnrv;}
-
§7. The following routine is only very slightly recursive. It's used either
as above, to parse a whole sentence like "The coral snake is in the green
bucket", or else is called (once) from within itself to parse just the
diff --git a/docs/words-module/3-wrd.html b/docs/words-module/3-wrd.html
index 089237eb1..973d88992 100644
--- a/docs/words-module/3-wrd.html
+++ b/docs/words-module/3-wrd.html
@@ -145,7 +145,7 @@ by moving A pas
§1. What data we collect. This ought to be a privacy policy under GDPR, somehow. If so, our justification
+for logging usage data would be this:
+
+
+
(a) the Preform parser does something very complicated and has to be tuned just
+right to be efficient, so debugging logs are helpful;
+
(b) but it runs millions of times in each Inform compilation, in a wide variety
+of ways, and any kind of complete log would be both too large and too complex
+to take in. We want to be selective, and to be able to summarise.
+
+
So, in instrumentation mode, we gather the following data. For nonterminals,
+we record the number of hits and misses. If a nonterminal is "watched", we
+log its every parse.
+
The structure nonterminal_instrumentation_data is accessed in 4/prf and here.
+
§2. We count the number of hits and misses on each production, and also store
+some sample text matching it. (In fact, we store the longest text which ever
+matches it.)
+
§1. Reading Preform syntax from a file or text. The parser reads source text against a specific language only, if
primary_Preform_language is set; or, if it isn't, from any language.
@@ -293,7 +293,7 @@ independent production_list object.
CLASS_DEFINITION} production_list;
-
The structure production_list is accessed in 4/to, 4/prf and here.
+
The structure production_list is accessed in 4/to, 4/prf, 4/ins and here.
§9.
@@ -368,28 +368,19 @@ only confuses the picture here.
structptoken *first_ptoken; the linked list of ptokensintmatch_number; 0 for /a/, 1 for /b/ and so on: see About Preform
-intno_ranges; actually one more, since range 0 is reserved (see above)
+intno_ranges; actually one more, since range 0 is reserved
- Optimiser data
-intmin_pr_words, max_pr_words;
-structrange_requirementproduction_req;
-intno_struts; the actual number, this time
-structptoken *struts[MAX_STRUTS_PER_PRODUCTION]; first ptoken in strut
-intstrut_lengths[MAX_STRUTS_PER_PRODUCTION]; length of the strut in words
-
- For debugging only
-intproduction_tries; for statistics collected in instrumented mode
-intproduction_matches; ditto
-structwordingsample_text; ditto
+structproduction_optimisation_dataopt; see The Optimiser
+structproduction_instrumentation_datains; see Instrumentationstructproduction *next_production; within its production listCLASS_DEFINITION} production;
-
The structure production is accessed in 4/to, 4/prf and here.
-
§12. And at the bottom of the chain, the lowly ptoken. Even this can spawn another
-list, though: the token fried/green/tomatoes is a list of three ptokens joined
-by the alternative_ptoken links.
+
The structure production is accessed in 4/nnt, 4/to, 4/prf, 4/ins and here.
+
§12. And at the bottom of God's great chain, the lowly ptoken. Even this can spawn
+another list, though: the token fried/green/tomatoes is a list of three ptokens
+joined by the alternative_ptoken links.
There are really only three kinds of ptoken, wildcards, fixed words, and
@@ -421,17 +412,14 @@ the balanced_wildcard<
intrange_starts; 1, 2, 3, ... if word range 1, 2, 3, ... starts with thisintrange_ends; 1, 2, 3, ... if word range 1, 2, 3, ... ends with this
- Optimiser data
-intptoken_position; fixed position in range: 1, 2, ... for left, -1, -2, ... for right
-intstrut_number; if this is part of a strut, what number? or -1 if not
-intptoken_is_fast; can be checked in the fast pass of the parser
-structrange_requirementtoken_req;
+structptoken_optimisation_dataopt; see The Optimiser
+structptoken_instrumentation_datains; see Instrumentationstructptoken *next_ptoken; within its production listCLASS_DEFINITION} ptoken;
-
The structure ptoken is accessed in 4/to, 4/prf and here.
+
The structure ptoken is accessed in 4/nnt, 4/to, 4/prf, 4/ins and here.
§13. Each ptoken has a range_starts and range_ends number. This is either -1,
or marks that the ptoken occurs as the first or last in a range (or both). For
example, in the production
@@ -458,11 +446,8 @@ and so on.
pr->no_ranges = 1; so that they count from 1; range 0 is unused
-pr->no_struts = 0; they will be detected later
-pr->min_pr_words = 1; pr->max_pr_words = INFINITE_WORD_COUNT;
-
-pr->production_tries = 0; pr->production_matches = 0;
-pr->sample_text = EMPTY_WORDING;
+Optimiser::initialise_production_data(&(pr->opt));
+Instrumentation::initialise_production_data(&(pr->ins));ptoken *head = NULL, *tail = NULL;Parse the row of production tokens into a linked list of ptokens14.1;
@@ -730,9 +715,8 @@ becomes a fixed word; otherwise it could be any of the five categories.
pt->result_index = 1;pt->range_starts = -1; pt->range_ends = -1;
-pt->ptoken_position = 0;
-pt->strut_number = -1;
-pt->ptoken_is_fast = FALSE;
+Optimiser::initialise_ptoken_data(&(pt->opt));
+Instrumentation::initialise_ptoken_data(&(pt->ins));
§16.2. If the text refers to a nonterminal which doesn't yet exist, then this
@@ -765,98 +749,8 @@ never returns NULLif (pt->ptoken_category == FIXED_WORD_PTC) Optimiser::flag_words(ve, nt, pc);
§1. How nonterminals are stored. Each different nonterminal defined in the Syntax.preform code read in,
such as <any-integer>, is going to correspond to a global variable in the
@@ -194,7 +180,7 @@ use the constant INFINITE_WOR
@@ -221,22 +207,13 @@ textual name is referred to as an "ID".
Storage for most recent correct matchstructwordingrange_result[MAX_RANGES_PER_PRODUCTION]; storage for word ranges matched
- Optimiser data
-intoptimised_in_this_pass; have the following been worked out yet?
-intmin_nt_words, max_nt_words; for speed
-structrange_requirementnonterminal_req;
-intnt_req_bit; which hashing category the words belong to, or \(-1\) if none
-intnumber_words_by_production;
-unsignedintflag_words_in_production;
+structnonterminal_optimisation_dataopt; see The Optimiser
+structnonterminal_instrumentation_datains; see Instrumentation
- For debugging only
-intwatched; watch goings-on to the debugging log
-intnonterminal_tries; for statistics collected in instrumented mode
-intnonterminal_matches; dittoCLASS_DEFINITION} nonterminal;
-
The structure nonterminal is accessed in 4/lp, 4/to, 4/prf and here.
+
The structure nonterminal is accessed in 4/lp, 4/to, 4/prf, 4/ins and here.
§6. A few notes on this are in order:
@@ -289,20 +266,15 @@ when Preform grammar is parsed, not when Inform text is parsed.
nt->internal_definition = NULL;nt->voracious = FALSE;
-for (inti=0; i<MAX_RANGES_PER_PRODUCTION; i++)
-nt->range_result[i] = EMPTY_WORDING;
-
nt->first_production_list = NULL;nt->compositor_fn = NULL;nt->multiplicitous = FALSE;
-nt->optimised_in_this_pass = FALSE;
-nt->min_nt_words = 1; nt->max_nt_words = INFINITE_WORD_COUNT;
-nt->nt_req_bit = -1;
-nt->number_words_by_production = FALSE;
-nt->flag_words_in_production = 0;
-nt->watched = FALSE;
-nt->nonterminal_tries = 0; nt->nonterminal_matches = 0;
+for (inti=0; i<MAX_RANGES_PER_PRODUCTION; i++)
+nt->range_result[i] = EMPTY_WORDING;
+
+Optimiser::initialise_nonterminal_data(&(nt->opt));
+Instrumentation::initialise_nonterminal_data(&(nt->ins)); }returnnt;}
@@ -350,17 +322,8 @@ any single NT.
intmost_recent_result = 0; the variable which inweb writes <<r>>void *most_recent_result_p = NULL; the variable which inweb writes <<rp>>
-
§11. Watching. A "watched" nonterminal is one which the Preform parser logs its usage of;
-this is helpful when debugging.
-
diff --git a/docs/words-module/4-prf.html b/docs/words-module/4-prf.html
index 1275f30f3..3e3096cd4 100644
--- a/docs/words-module/4-prf.html
+++ b/docs/words-module/4-prf.html
@@ -97,14 +97,12 @@ only about 6\% of its time here.
void **result_p) {time_tstart_of_nt = time(0);if (nt == NULL) internal_error("can't parse a null nonterminal");
- #ifdefINSTRUMENTED_PREFORM
-nt->nonterminal_tries++;
- #endif
+nt->ins.nonterminal_tries++;intsuccess_rval = TRUE; what to return in the event of a successful matchfail_nonterminal_quantum = 0;intteppic = ptraci; Teppic saves Ptraci
-ptraci = nt->watched;
+ptraci = nt->ins.watched;if (ptraci) {if (preform_lookahead_mode) ptraci = FALSE;
@@ -112,10 +110,9 @@ only about 6\% of its time here.
}intinput_length = Wordings::length(W);
-if ((nt->max_nt_words == 0) ||
- ((input_length >= nt->min_nt_words) && (input_length <= nt->max_nt_words))) {
+if ((nt->opt.max_nt_words == 0) ||
+ ((input_length >= nt->opt.min_nt_words) && (input_length <= nt->opt.max_nt_words)))Try to match the input text to the nonterminal1.2;
- }The nonterminal has failed to parse1.1;}
@@ -127,6 +124,7 @@ only about 6\% of its time here.
+Instrumentation::note_nonterminal_fail(nt);if (ptraci) LOG("Failed %V (time %d)\n", nt->nonterminal_id, time(0)-start_of_nt);ptraci = teppic;returnFALSE;
@@ -140,11 +138,9 @@ and QP will hol
@@ -165,7 +161,7 @@ an internal NT, or try all possible productions for an external one.
unoptimised = TRUE;if (nt->internal_definition) {if (nt->voracious) unoptimised = TRUE;
-if ((unoptimised) || (Optimiser::nt_bitmap_violates(W, &(nt->nonterminal_req)) == FALSE)) {
+if ((unoptimised) || (Optimiser::nt_bitmap_violates(W, &(nt->opt.nonterminal_req)) == FALSE)) {intr, Q; void *QP = NULL;if (Wordings::first_wn(W) >= 0) r = (*(nt->internal_definition))(W, &Q, &QP);else { r = FALSE; Q = 0; }
@@ -177,12 +173,12 @@ an internal NT, or try all possible productions for an external one.
} else {if (ptraci) {LOG("%V: <%W> violates ", nt->nonterminal_id, W);
-Optimiser::log_range_requirement(&(nt->nonterminal_req));
+Optimiser::log_range_requirement(&(nt->opt.nonterminal_req));LOG("\n"); } } } else {
-if ((unoptimised) || (Optimiser::nt_bitmap_violates(W, &(nt->nonterminal_req)) == FALSE)) {
+if ((unoptimised) || (Optimiser::nt_bitmap_violates(W, &(nt->opt.nonterminal_req)) == FALSE)) {void *acc_result = NULL;production_list *pl;for (pl = nt->first_production_list; pl; pl = pl->next_production_list) {
@@ -193,8 +189,8 @@ an internal NT, or try all possible productions for an external one.
for (pr = pl->first_production; pr; pr = pr->next_production) {intviolates = FALSE;if (unoptimised == FALSE) {
-if (pr->production_req.ditto_flag) violates = last_v;
-elseviolates = Optimiser::nt_bitmap_violates(W, &(pr->production_req));
+if (pr->opt.production_req.ditto_flag) violates = last_v;
+elseviolates = Optimiser::nt_bitmap_violates(W, &(pr->opt.production_req));last_v = violates; }if (violates == FALSE) {
@@ -202,9 +198,9 @@ an internal NT, or try all possible productions for an external one.
} else {if (ptraci) {LOG("production in %V: ", nt->nonterminal_id);
-LoadPreform::log_production(pr, FALSE);
+Instrumentation::log_production(pr, FALSE);LOG(": <%W> violates ", W);
-Optimiser::log_range_requirement(&(pr->production_req));
+Optimiser::log_range_requirement(&(pr->opt.production_req));LOG("\n"); } }
@@ -218,7 +214,7 @@ an internal NT, or try all possible productions for an external one.
} else {if (ptraci) {LOG("%V: <%W> violates ", nt->nonterminal_id, W);
-Optimiser::log_range_requirement(&(nt->nonterminal_req));
+Optimiser::log_range_requirement(&(nt->opt.nonterminal_req));LOG("\n"); } }
@@ -236,25 +232,16 @@ text against a production.
if (ptraci) {LOG_INDENT;Log the production match number1.2.1.1;
-LoadPreform::log_production(pr, FALSE); LOG("\n");
+Instrumentation::log_production(pr, FALSE); LOG("\n"); }
- #ifdefINSTRUMENTED_PREFORM
-pr->production_tries++;
- #endif
-
intslow_scan_needed = FALSE; #ifdefCORE_MODULEparse_node *added_to_result = NULL; #endif
-if ((input_length >= pr->min_pr_words) && (input_length <= pr->max_pr_words)) {
+if ((input_length >= pr->opt.min_pr_words) && (input_length <= pr->opt.max_pr_words)) {intQ; void *QP = NULL;Actually parse the given production, going to Fail if we can't1.2.1.2;
-
- #ifdefINSTRUMENTED_PREFORM record the sentence containing the longest example
-pr->production_matches++;
-if (Wordings::length(pr->sample_text) < Wordings::length(W)) pr->sample_text = W;
- #endif
-
+Instrumentation::note_production_match(pr, W);if (ptraci) {Log the production match number1.2.1.1;LOG("succeeded (%s): ", (slow_scan_needed)?"slowly":"quickly");
@@ -264,6 +251,7 @@ text against a production.
}Fail:
+Instrumentation::note_production_fail(pr);if (ptraci) {Log the production match number1.2.1.1; #ifdefCORE_MODULE
@@ -374,8 +362,8 @@ are in those positions.
ptoken *pt;intwn = -1, tc;for (pt = pr->first_ptoken, tc = 0; pt; pt = pt->next_ptoken, tc++) {
-if (pt->ptoken_is_fast) {
-intp = pt->ptoken_position;
+if (pt->opt.ptoken_is_fast) {
+intp = pt->opt.ptoken_position;if (p > 0) wn = Wordings::first_wn(W)+p-1;elseif (p < 0) wn = Wordings::last_wn(W)+p+1;if (Preform::parse_fixed_word_ptoken(wn, pt) == FALSE) {
@@ -404,7 +392,7 @@ and that for each \(i\) the \(i\)-th strut matches the text beginning at \(s_i\)
intspos[MAX_STRUTS_PER_PRODUCTION]; word numbers for where we are trying the struts
-intNS = pr->no_struts;
+intNS = pr->opt.no_struts;Start from the lexicographically earliest strut position1.2.1.2.3.1;ptoken *backtrack_token = NULL;intbacktrack_index = -1, backtrack_to = -1, backtrack_tc = -1;
@@ -433,14 +421,14 @@ handling the popular case of one strut separately.
§1.2.1.2.3.3.3.1. How much text from the input should this ptoken match? We feed it as much
@@ -634,10 +622,10 @@ probably gives the wrong answer.)
ptoken *lookahead = nextpt;if (lookahead == NULL) wt = Wordings::last_wn(W);else {
-intp = lookahead->ptoken_position;
+intp = lookahead->opt.ptoken_position;if (p > 0) wt = Wordings::first_wn(W)+p-2;elseif (p < 0) wt = Wordings::last_wn(W)+p;
-elseif (lookahead->strut_number >= 0) wt = spos[lookahead->strut_number]-1;
+elseif (lookahead->opt.strut_number >= 0) wt = spos[lookahead->opt.strut_number]-1;elseif ((lookahead->nt_pt) && (pt->negated_ptoken == FALSE) && (Optimiser::ptoken_width(pt) == PTOKEN_ELASTIC)) {
@@ -686,10 +674,10 @@ last word in the input text.
elsebreak; } else {intq = Preform::parse_nt_against_word_range(pt->nt_pt,
-Wordings::new(pos, pos+pt->nt_pt->max_nt_words-1),
+Wordings::new(pos, pos+pt->nt_pt->opt.max_nt_words-1),NULL, NULL);if (pt->negated_ptoken) q = q?FALSE:TRUE;
-if (q) pos += pt->nt_pt->max_nt_words;
+if (q) pos += pt->nt_pt->opt.max_nt_words;elsebreak; }if (pos-from >= len) returnfrom;
@@ -715,7 +703,7 @@ last word in the input text.
}
diff --git a/docs/words-module/4-to.html b/docs/words-module/4-to.html
index 6c24021b6..01890283b 100644
--- a/docs/words-module/4-to.html
+++ b/docs/words-module/4-to.html
@@ -13,14 +13,6 @@
-
-
-
+
+
+
@@ -92,6 +92,53 @@ changed that.
intfirst_round_of_nt_optimisation_made = FALSE;
+typedefstructnonterminal_optimisation_data {
+intoptimised_in_this_pass; have the following been worked out yet?
+intmin_nt_words, max_nt_words; for speed
+structrange_requirementnonterminal_req;
+intnt_req_bit; which hashing category the words belong to, or \(-1\) if none
+intnumber_words_by_production;
+unsignedintflag_words_in_production;
+} nonterminal_optimisation_data;
+
+
+voidOptimiser::initialise_nonterminal_data(nonterminal_optimisation_data *opt) {
+opt->optimised_in_this_pass = FALSE;
+opt->min_nt_words = 1; opt->max_nt_words = INFINITE_WORD_COUNT;
+opt->nt_req_bit = -1;
+opt->number_words_by_production = FALSE;
+opt->flag_words_in_production = 0;
+Optimiser::clear_rreq(&(opt->nonterminal_req));
+}
+
+typedefstructptoken_optimisation_data {
+intptoken_position; fixed position in range: 1, 2, ... for left, -1, -2, ... for right
+intstrut_number; if this is part of a strut, what number? or -1 if not
+intptoken_is_fast; can be checked in the fast pass of the parser
+structrange_requirementtoken_req;
+} ptoken_optimisation_data;
+
+voidOptimiser::initialise_ptoken_data(ptoken_optimisation_data *opt) {
+opt->ptoken_position = 0;
+opt->strut_number = -1;
+opt->ptoken_is_fast = FALSE;
+Optimiser::clear_rreq(&(opt->token_req));
+}
+
+typedefstructproduction_optimisation_data {
+intmin_pr_words, max_pr_words;
+structrange_requirementproduction_req;
+intno_struts;
+structptoken *struts[MAX_STRUTS_PER_PRODUCTION]; first ptoken in strut
+intstrut_lengths[MAX_STRUTS_PER_PRODUCTION]; length of the strut in words
+} production_optimisation_data;
+
+voidOptimiser::initialise_production_data(production_optimisation_data *opt) {
+opt->no_struts = 0;
+opt->min_pr_words = 1; opt->max_pr_words = INFINITE_WORD_COUNT;
+Optimiser::clear_rreq(&(opt->production_req));
+}
+
typedefstructrange_requirement {intno_requirements;intditto_flag;
@@ -106,15 +153,15 @@ changed that.
intno_req_bits = 0;
-voidOptimiser::optimise_counts(void) {
+voidOptimiser::optimise_counts(void) {nonterminal *nt;LOOP_OVER(nt, nonterminal) {
-Optimiser::clear_rreq(&(nt->nonterminal_req));
+Optimiser::clear_rreq(&(nt->opt.nonterminal_req));if (nt->marked_internal) {
-nt->optimised_in_this_pass = TRUE;
+nt->opt.optimised_in_this_pass = TRUE; } else {
-nt->optimised_in_this_pass = FALSE;
-nt->min_nt_words = 1; nt->max_nt_words = INFINITE_WORD_COUNT;
+nt->opt.optimised_in_this_pass = FALSE;
+nt->opt.min_nt_words = 1; nt->opt.max_nt_words = INFINITE_WORD_COUNT; } }if (first_round_of_nt_optimisation_made == FALSE) {
@@ -130,9 +177,9 @@ changed that.
LOOP_OVER(nt, nonterminal) Optimiser::optimise_nt_reqs(nt);}
-voidOptimiser::optimise_nt(nonterminal *nt) {
-if (nt->optimised_in_this_pass) return;
-nt->optimised_in_this_pass = TRUE;
+voidOptimiser::optimise_nt(nonterminal *nt) {
+if (nt->opt.optimised_in_this_pass) return;
+nt->opt.optimised_in_this_pass = TRUE;Compute the minimum and maximum match lengths1.1;production_list *pl;
@@ -149,7 +196,7 @@ changed that.
Mark the vocabulary's incidence list with this nonterminal1.7;}
-
The structure range_requirement is accessed in 4/prf and here.
+
The structure nonterminal_optimisation_data is accessed in 4/nnt, 4/prf, 4/ins and here.
The structure ptoken_optimisation_data is accessed in 4/prf, 4/ins and here.
The structure production_optimisation_data is accessed in 4/prf, 4/ins and here.
The structure range_requirement is accessed in 4/prf and here.
§1.1. The minimum matched text length for a nonterminal is the smallest of the
minima for its possible productions; for a production, it's the sum of the
minimum match lengths of its tokens.
@@ -173,7 +220,7 @@ minimum match lengths of its tokens.
if (min_p > INFINITE_WORD_COUNT) min_p = INFINITE_WORD_COUNT;if (max_p > INFINITE_WORD_COUNT) max_p = INFINITE_WORD_COUNT; }
-pr->min_pr_words = min_p; pr->max_pr_words = max_p;
+pr->opt.min_pr_words = min_p; pr->opt.max_pr_words = max_p;if ((min == -1) && (max == -1)) { min = min_p; max = max_p; }else {if (min_p < min) min = min_p;
@@ -182,7 +229,7 @@ minimum match lengths of its tokens.
} }if (min >= 1) {
-nt->min_nt_words = min; nt->max_nt_words = max;
+nt->opt.min_nt_words = min; nt->opt.max_nt_words = max; }
@@ -211,10 +258,10 @@ starts and finishes; it's not enough just to know where it starts.
last = pt;intL = Optimiser::ptoken_width(pt);if ((posn != 0) && (L != PTOKEN_ELASTIC)) {
-pt->ptoken_position = posn;
+pt->opt.ptoken_position = posn;posn += L; } else {
-pt->ptoken_position = 0; thus clearing any expired positions from earlier
+pt->opt.ptoken_position = 0; thus clearing any expired positions from earlierposn = 0; } }
@@ -236,10 +283,10 @@ production, but this is never larger than about 10.
intposn = -1;ptoken *pt;for (pt = last; pt; ) {
-if (pt->ptoken_position != 0) break; don't use a back-end position if there's a front one
+if (pt->opt.ptoken_position != 0) break; don't use a back-end position if there's a front oneintL = Optimiser::ptoken_width(pt);if ((posn != 0) && (L != PTOKEN_ELASTIC)) {
-pt->ptoken_position = posn;
+pt->opt.ptoken_position = posn;posn -= L; } elsebreak;
@@ -289,20 +336,22 @@ position then all of them have, but we're in no hurry so we don't exploit that.)
§1.7. Weak requirement: one word in range must match one of these bits
@@ -327,7 +376,7 @@ Strong ": all bits in this range must be matched by one word
intfirst_production = TRUE;
-Optimiser::clear_rreq(&(nt->nonterminal_req));
+Optimiser::clear_rreq(&(nt->opt.nonterminal_req)); #ifdefPREFORM_CIRCULARITY_BREAKERPREFORM_CIRCULARITY_BREAKER(nt); #endif
@@ -354,27 +403,27 @@ Strong ": all bits in this range must be matched by one word
intall = TRUE, first = TRUE;ptoken *pt;for (pt = pr->first_ptoken; pt; pt = pt->next_ptoken) {
-Optimiser::clear_rreq(&(pt->token_req));
+Optimiser::clear_rreq(&(pt->opt.token_req));if ((pt->ptoken_category == FIXED_WORD_PTC) && (pt->negated_ptoken == FALSE)) {ptoken *alt;for (alt = pt; alt; alt = alt->alternative_ptoken)Optimiser::set_nt_incidence(alt->ve_pt, nt);
-Optimiser::atomic_rreq(&(pt->token_req), nt);
+Optimiser::atomic_rreq(&(pt->opt.token_req), nt); } elseall = FALSE;intself_referential = FALSE, empty = FALSE;if ((pt->ptoken_category == NONTERMINAL_PTC) &&
- (pt->nt_pt->min_nt_words == 0) && (pt->nt_pt->max_nt_words == 0))
+ (pt->nt_pt->opt.min_nt_words == 0) && (pt->nt_pt->opt.max_nt_words == 0))empty = TRUE; even if negated, noticeif ((pt->ptoken_category == NONTERMINAL_PTC) && (pt->negated_ptoken == FALSE)) { if (pt->nt_pt == nt) self_referential = TRUE;Optimiser::optimise_nt(pt->nt_pt);
-pt->token_req = pt->nt_pt->nonterminal_req;
+pt->opt.token_req = pt->nt_pt->opt.nonterminal_req; }if ((self_referential == FALSE) && (empty == FALSE)) {if (first) {
-prt = pt->token_req;
+prt = pt->opt.token_req; } else {
-Optimiser::concatenate_rreq(&prt, &(pt->token_req));
+Optimiser::concatenate_rreq(&prt, &(pt->opt.token_req)); }first = FALSE; }
@@ -385,10 +434,10 @@ Strong ": all bits in this range must be matched by one word
Optimiser::disjoin_rreq(&nnt, &prt); }first_production = FALSE;
-pr->production_req = prt;
+pr->opt.production_req = prt; } }
-nt->nonterminal_req = nnt;
+nt->opt.nonterminal_req = nnt; #ifdefPREFORM_CIRCULARITY_BREAKERPREFORM_CIRCULARITY_BREAKER(nt); #endif
@@ -402,17 +451,17 @@ combination of the meaning codes found in an adjective list.
-intOptimiser::nt_bitmap_bit(nonterminal *nt) {
-if (nt->nt_req_bit == -1) {
+intOptimiser::nt_bitmap_bit(nonterminal *nt) {
+if (nt->opt.nt_req_bit == -1) {intb = RESERVED_NT_BITS + ((no_req_bits++)%(32-RESERVED_NT_BITS));
-nt->nt_req_bit = (1 << b);
+nt->opt.nt_req_bit = (1 << b); }
-returnnt->nt_req_bit;
+returnnt->opt.nt_req_bit;}voidOptimiser::assign_bitmap_bit(nonterminal *nt, intb) {if (nt == NULL) internal_error("null NT");
-nt->nt_req_bit = (1 << b);
+nt->opt.nt_req_bit = (1 << b);}intOptimiser::test_word(intwn, nonterminal *nt) {
@@ -540,7 +589,7 @@ combination of the meaning codes found in an adjective list.
§5.
-intOptimiser::nt_bitmap_violates(wordingW, range_requirement *req) {
+intOptimiser::nt_bitmap_violates(wordingW, range_requirement *req) {if (req->no_requirements) returnFALSE;if (Wordings::length(W) == 1) {intbm = Vocabulary::get_ntb(Lexer::word(Wordings::first_wn(W)));
@@ -604,7 +653,7 @@ match some words against X, then some more against Y.
-voidOptimiser::concatenate_rreq(range_requirement *req, range_requirement *with) {
+voidOptimiser::concatenate_rreq(range_requirement *req, range_requirement *with) {req->DS_req = Optimiser::concatenate_ds(req->DS_req, with->DS_req);req->DW_req = Optimiser::concatenate_dw(req->DW_req, with->DW_req);req->CS_req = Optimiser::concatenate_cs(req->CS_req, with->CS_req);
@@ -620,7 +669,7 @@ the strongest requirement we can make. So:
@@ -642,7 +691,7 @@ the rarest for best effect, but that's too much work.
-intOptimiser::concatenate_dw(intm1, intm2) {
+intOptimiser::concatenate_dw(intm1, intm2) {if (m1 == 0) returnm2; the case where we have no information about Xif (m2 == 0) returnm1; and about Yreturnm1; the general case discussed above
@@ -654,7 +703,7 @@ the union, so:
-intOptimiser::concatenate_cw(intm1, intm2) {
+intOptimiser::concatenate_cw(intm1, intm2) {if (m1 == 0) return0; the case where we have no information about Xif (m2 == 0) return0; and about Yreturnm1 | m2; the general case discussed above
@@ -664,11 +713,11 @@ the union, so:
-intOptimiser::disjoin_dw(intm1, intm2) {
+intOptimiser::disjoin_dw(intm1, intm2) {if (m1 == 0) return0; the case where we have no information about Xif (m2 == 0) return0; and about Yreturnm1 | m2; the general case discussed above
@@ -724,34 +773,34 @@ must be found in X/Y, so:
-intOptimiser::disjoin_cw(intm1, intm2) {
+intOptimiser::disjoin_cw(intm1, intm2) {if (m1 == 0) return0; the case where we have no information about Xif (m2 == 0) return0; and about Yreturnm1 | m2; the general case discussed above}
-intOptimiser::disjoin_fw(intm1, intm2) {
+intOptimiser::disjoin_fw(intm1, intm2) {returnOptimiser::disjoin_cw(m1, m2);}
-intOptimiser::disjoin_fs(intm1, intm2) {
+intOptimiser::disjoin_fs(intm1, intm2) {returnOptimiser::disjoin_cs(m1, m2);}
-voidOptimiser::clear_rreq(range_requirement *req) {
+voidOptimiser::clear_rreq(range_requirement *req) {req->DS_req = 0; req->DW_req = 0;req->CS_req = 0; req->CW_req = 0;req->FS_req = 0; req->FW_req = 0;}
-voidOptimiser::atomic_rreq(range_requirement *req, nonterminal *nt) {
+voidOptimiser::atomic_rreq(range_requirement *req, nonterminal *nt) {intb = Optimiser::nt_bitmap_bit(nt);req->DS_req = b; req->DW_req = b;req->CS_req = b; req->CW_req = b;req->FS_req = 0; req->FW_req = 0;}
-voidOptimiser::log_range_requirement(range_requirement *req) {
+voidOptimiser::log_range_requirement(range_requirement *req) {if (req->DW_req) { LOG(" DW: %08x", req->DW_req); }if (req->DS_req) { LOG(" DS: %08x", req->DS_req); }if (req->CW_req) { LOG(" CW: %08x", req->CW_req); }
@@ -766,7 +815,7 @@ must be found in X/Y, so:
diff --git a/docs/words-module/P-wtmd.html b/docs/words-module/P-wtmd.html
index 321d186f3..934282e6d 100644
--- a/docs/words-module/P-wtmd.html
+++ b/docs/words-module/P-wtmd.html
@@ -243,17 +243,18 @@ wording and a unique ID number and makes something sensible: §10. Preform. Prefor is a meta-language for writing a simple grammar: it's in some sense
+
§10. Preform. Preform is a meta-language for writing a simple grammar: it's in some sense
pre-Inform, because it defines the Inform language itself. See About Preform.
Compilers are a little like the human body, in that most of their organs can
-be located in a single spot: there is just one lexical analyser, and it is
-entirely contained in the section Lexer. But just a few organs — the
-nervous system, or the blood vessels — are present almost everywhere in the
+be located in a single spot: the heart, for example, or the gall bladder.
+Or in the case of Inform, the Lexer. But a few organs of the body — like
+the nervous system, or blood vessels — are found almost everywhere in the
body, and the Inform syntax analyser is like that. While the basic code which
drives this is in Preform and in the syntax module, the actual
-syntax being read can be found all over Inform. This has a notation like so:
+syntax being read is in many, many different places. Such syntax has a notation
+like so:
@@ -261,7 +262,7 @@ syntax being read can be found all over Inform. This has a notation like so:
<ordinal-number>runner|==> TRUErunnerno<cardinal-number>==> FALSE
-
And such notation is mixed in with regular C code in many sections of the
+
This notation is mixed in with regular C code in many sections of the
core and other modules.
diff --git a/docs/words-module/index.html b/docs/words-module/index.html
index 3df3c6592..94551d030 100644
--- a/docs/words-module/index.html
+++ b/docs/words-module/index.html
@@ -172,6 +172,11 @@
Basic Nonterminals -
A handful of bare minimum Preform syntax.
+