2020-05-13 01:33:17 +03:00
|
|
|
What This Module Does.
|
|
|
|
|
|
|
|
An overview of the linguistics module's role and abilities.
|
|
|
|
|
|
|
|
@h Prerequisites.
|
|
|
|
The linguistics module is a part of the Inform compiler toolset. It is
|
|
|
|
presented as a literate program or "web". Before diving in:
|
|
|
|
(a) It helps to have some experience of reading webs: see //inweb// for more.
|
|
|
|
(b) The module is written in C, in fact ANSI C99, but this is disguised by the
|
|
|
|
fact that it uses some extension syntaxes provided by the //inweb// literate
|
|
|
|
programming tool, making it a dialect of C called InC. See //inweb// for
|
|
|
|
full details, but essentially: it's C without predeclarations or header files,
|
|
|
|
and where functions have names like |Tags::add_by_name| rather than |add_by_name|.
|
2022-04-04 20:31:44 +03:00
|
|
|
(c) This module uses other modules drawn from the compiler (see //structure//), and also
|
2020-05-13 01:33:17 +03:00
|
|
|
uses a module of utility functions called //foundation//.
|
|
|
|
For more, see //foundation: A Brief Guide to Foundation//.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
|
|
|
@h Goals and non-goals.
|
|
|
|
The linguistics module aims to provide Preform grammar capable to taking a
|
|
|
|
verb phrase of natural language and parsing it into a "diagram" somewhat like
|
|
|
|
the ones students draw in school grammar classes. For example,
|
|
|
|
|
|
|
|
>> most of the heavy bricks have been on top of the wall
|
|
|
|
|
|
|
|
becomes a "diagram" -- a fragment of the tree provided by the //syntax//
|
2020-07-20 13:30:03 +03:00
|
|
|
module -- showing the linguistic structure of the sentence.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
2020-07-20 13:30:03 +03:00
|
|
|
This work is done by the nonterminal <sentence> -- see //Verb Phrases//.
|
|
|
|
Many examples of sentence diagrams can be seen in //About Sentence Diagrams//;
|
|
|
|
read that first.
|
|
|
|
|
|
|
|
To be in a position to parse like this, the module needs a way to accumulate
|
2020-06-29 22:08:47 +03:00
|
|
|
knowledge about the possible nouns, verbs, adjectives, prepositions and so on
|
|
|
|
which might appear in such sentences. Where sensible, a design goal here is
|
|
|
|
for each of these grammatical categories to correspond closely to an object
|
|
|
|
class of the same name -- that is, //noun//, //verb//, //adjective// and so on.
|
|
|
|
For example, "brick" and "wall" correspond to two different instances of //noun//,
|
|
|
|
and "to be" to //verb//. The instances of these grammatical classes, taken
|
2020-06-30 01:41:10 +03:00
|
|
|
together, form what is called the "stock".
|
|
|
|
|
2020-06-29 22:08:47 +03:00
|
|
|
@ To be clear, though:
|
|
|
|
(a) The stock is a range of possibilities and not a tally of what actually
|
|
|
|
appears in any given source text being looked at. When Inform compiles a
|
|
|
|
source text, it may perhaps mention "brick" many times, but the stock would
|
|
|
|
still just contain one //noun// object for "brick"; equally, the text being
|
|
|
|
compiled might not contain the verb "to provide", even though the stock has it.
|
|
|
|
(b) For the most part, it is for the user of this module to create an
|
|
|
|
appropriate stock. This will never be so large as to meaningfully define the
|
|
|
|
entire English (or any other) language; it's likely to focus on some narrow
|
|
|
|
subject area. The stock does not have to be created all at once: it can fill
|
|
|
|
out over time.[1]
|
|
|
|
(c) For the most part, it is up to the user to give meanings to terms in the
|
|
|
|
stock, and the linguistics module tries to impose as few constraints as possible
|
|
|
|
about those meanings.[2]
|
|
|
|
(d) Only "for the most part", because certain "fixed" grammatical categories
|
|
|
|
are an exception -- for example, articles, or cardinal and ordinal
|
|
|
|
numbers, or determiners. In these categories, the linguistics module provides
|
|
|
|
the stock and the user cannot change or add to it; and the linguistics
|
|
|
|
module provides the meanings, too, which the user can read but cannot change.[3]
|
|
|
|
|
|
|
|
[1] When Inform reads "A brick is a kind of thing. Two bricks are here.",
|
|
|
|
for example, the first sentence causes it to create the //noun// for brick,
|
|
|
|
so that the second sentence is read with a stock which is one term larger.
|
|
|
|
|
|
|
|
[2] Verbs are a partial exception in that we need to make some minimal
|
|
|
|
assumptions about how verb meanings will work -- this is abstracted out in
|
|
|
|
the section //Verb Meanings//. Even there, though, the user will have to find
|
|
|
|
her own way to define what verbs actually mean.
|
|
|
|
|
|
|
|
[3] For example, the determiner "all of" has a fixed meaning defined here
|
|
|
|
as the //quantifier// object |for_all_quantifier|; and the cardinal "six"
|
|
|
|
always means 6.
|
|
|
|
|
|
|
|
@ This module is designed to work closely with the //inflections// module,
|
|
|
|
and the demarcation line between the two is that //inflections// knows nothing
|
|
|
|
about the meaning or usage of words, while //linguistics// is largely free
|
|
|
|
to ignore the complications brought in by inflected forms of words.
|
|
|
|
|
|
|
|
For example, a single //verb// object represents "to carry", even though it
|
|
|
|
has many different expressions in words -- "has not carried", "will carry",
|
|
|
|
and so on. A single //noun// representing "brick" would include its plural
|
|
|
|
"bricks" and indeed its declension into other cases, in a language which has
|
|
|
|
noun cases. Our goal here is that the stock should never contain two different
|
|
|
|
instances which merely represent inflected forms of the same thing.
|
|
|
|
|
2020-07-01 02:58:55 +03:00
|
|
|
@h Lcons.
|
|
|
|
An "lcon" is a reference to something in the linguistic stock, together with
|
|
|
|
annotations for the grammatical form used on a particular occasion. For
|
|
|
|
example, an lcon can mean "the indefinite article, in the third person
|
|
|
|
plural feminine form", where the stock item is just the indefinite article.
|
|
|
|
Lcons are packed into single integers for efficiency: no memory needs to be
|
|
|
|
allocated to create lcons, and they copy with value semantics.
|
|
|
|
|
|
|
|
Thus the stock is not just a figure of speech, it's actually a data structure:
|
|
|
|
see //Stock Control//.
|
|
|
|
|
|
|
|
@h Cardinals and ordinals.
|
|
|
|
This module does not aspire to parse the full range of literals understood
|
|
|
|
by, for example, Inform. But it does provide <cardinal-number> and
|
|
|
|
<ordinal-number> nonterminals, which can recognise low numbers in words
|
|
|
|
or arbitrary numbers in digits. See //Cardinals and Ordinals//.
|
|
|
|
|
|
|
|
Note that these are not part of the stock. It wouldn't really make sense
|
|
|
|
for them to be so: we would need a potentially infinite number of them to
|
|
|
|
accommodate the possible range of integers which appear.
|
|
|
|
|
2020-07-20 13:30:03 +03:00
|
|
|
@h Stock for noun phrases.
|
2020-06-29 22:08:47 +03:00
|
|
|
For convenience we will run through the stock in two sets: those grammatical
|
|
|
|
categories used in noun phrases, and provided by //Chapter 2//; and those
|
|
|
|
used in verb phrases, //Chapter 3//. In alphabetical order:
|
|
|
|
|
|
|
|
@ Each adjective in the stock is an instance of //adjective//. All senses of
|
|
|
|
the same adjectival word -- "empty", say -- are represented by the same
|
|
|
|
//adjective//: it's for the user to manage multiple meanings, and to deduce
|
|
|
|
which one applies from the context.[1]
|
|
|
|
|
|
|
|
The user creates the stock of adjectives by calling //Adjectives::declare//.
|
|
|
|
|
|
|
|
[1] In Inform, for example, empty means something different for rulebooks
|
|
|
|
than for containers. An adjective is a unary predicate applying to something,
|
|
|
|
and the kind of that thing can be used to decide which meaning applies. See
|
2020-08-27 17:50:24 +03:00
|
|
|
//assertions: Adjective Meanings// for how Inform does this.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
2020-07-01 02:58:55 +03:00
|
|
|
@ Each article in the stock is an instance of //adjective//. The stock is fixed
|
|
|
|
at two, both automatically created: the definite and the indefinite article.
|
|
|
|
See //Articles//.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
|
|
|
@ The determiners are a fixed stock of roughly 20, each being an instance of
|
|
|
|
the class //determiner//. Properly speaking some are families of determiners --
|
|
|
|
"all but three" and "all but six" are the same //determiner// but with a
|
|
|
|
different numerical parameter. But others, like "most of", have fixed wording.
|
|
|
|
The meaning of each determiner is a logical //quantifier//.[1]
|
|
|
|
See //Determiners and Quantifiers//.
|
|
|
|
|
|
|
|
[1] Inform makes heavy use of these quantifiers when representing the meaning
|
|
|
|
of sentences in predicate calculus, but that's beyond the scope of this module.
|
|
|
|
|
|
|
|
@ Each noun in the stock is an instance of //noun//. Unlike with //adjective//,
|
|
|
|
when the same noun word has multiple meanings, it gets multiple //noun// objects,
|
|
|
|
one for each meaning.[1] Nouns divide into two subclasses: common nouns, such
|
|
|
|
as "monument", and proper nouns, such as "Statue of Liberty".[2]
|
|
|
|
|
|
|
|
The user creates the stock of adjectives by calling //Nouns::new_proper_noun//
|
|
|
|
or //Nouns::new_common_noun//.
|
|
|
|
|
|
|
|
[1] Because nouns unlike adjectives are not predicates, and therefore are not
|
|
|
|
always dependent on some other word which will establish which meaning is
|
|
|
|
intended. Our best option is instead to parse in some context-sensitive way.
|
|
|
|
|
|
|
|
[2] Inform source text contains more proper nouns than a grammarian would guess.
|
|
|
|
The sentence "Mary is carrying a kite" will result in two proper nouns being
|
|
|
|
added to the stock, "Mary" and "kite", unless "kite" is already known as a
|
|
|
|
common noun. This is because "the kite", from then on, will be understood
|
|
|
|
as referring to this one specific object, just as "Mary" will.
|
|
|
|
|
2020-07-01 02:58:55 +03:00
|
|
|
@ Each pronoun in the stock is an instance of //pronoun//. The stock is fixed
|
|
|
|
at three, all automatically created: the subject (he, she), object (him, her)
|
|
|
|
and possessive (his, her) pronouns. See //Pronouns//.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
2020-07-20 13:30:03 +03:00
|
|
|
@h Stock for verb phrases.
|
2020-06-29 22:08:47 +03:00
|
|
|
|
|
|
|
Adverb phrases of occurrence refers to wording such as "for the sixth time".
|
|
|
|
The stock of these is fixed, though as with determiners, each is really a
|
|
|
|
parametrised family: the two adverb phrases here are the one measuring "times",
|
|
|
|
that is, repetitions, and the one measuring "turns", a purely
|
|
|
|
interactive-fictional construction useful to Inform. See //Adverb Phrases of Occurrence//.
|
|
|
|
|
2020-06-30 01:41:10 +03:00
|
|
|
@ Adverbs of certainty are words like "usually" or "always". There is a fixed
|
2020-06-29 22:08:47 +03:00
|
|
|
stock of these, at five certainty levels. See //Adverbs of Certainty//.
|
|
|
|
|
2020-06-30 01:41:10 +03:00
|
|
|
@ Prepositions are phrases like "over" or "on top of". Each is an instance of
|
|
|
|
the //preposition// class. See //Prepositions//.
|
|
|
|
|
2020-07-11 01:25:00 +03:00
|
|
|
The user creates the stock of prepositions by calling //Prepositions::make//,
|
2020-06-30 01:41:10 +03:00
|
|
|
but note that this function is also called when verbs are created, in order
|
|
|
|
to implement participles like "carrying" as prepositions -- which is not
|
|
|
|
linguistically ideal, but makes it possible to parse auxiliary uses of
|
|
|
|
"to be" efficiently, as in "X is carried by Y".
|
|
|
|
|
|
|
|
@ Each verb in the stock is an instance of //verb//. For example, "to carry"
|
|
|
|
or "to be" might be verbs. One verb in the stock is special and is the
|
|
|
|
copular verb -- in English, that's "to be". A copular verb is one which has two
|
|
|
|
interacting subject phrases rather than a subject and an object.[1]
|
|
|
|
|
|
|
|
The user creates the stock of adjectives by calling //Verbs::new_verb// or
|
|
|
|
//Verbs::new_operator_verb//. (Operator verbs are mathematical operators
|
|
|
|
such as |<=|, which can be added to the stock as if they were regular verbs,
|
|
|
|
but which do not conjugate or have tenses.)
|
|
|
|
|
|
|
|
Each verb has a list of //verb_form// objects, one for each "form" it takes.
|
|
|
|
A form in this sense is a combination of a verb with prepositions; meanings
|
|
|
|
correspond to verb forms, not verbs alone. For example, the verb in the
|
|
|
|
sentences "Peter is hungry" and "Jane will be in the Dining Room" is in
|
|
|
|
each case "to be", but the forms are different: one is "to be" alone, the
|
|
|
|
other "to be" plus the preposition "in", which changes the meaning.
|
|
|
|
|
|
|
|
The user, then, creates //verb// objects and gives them //verb_form//s,
|
|
|
|
attaching meanings to each form via the mechanisms in //Verb Meanings//
|
|
|
|
(which wrap those meanings in //verb_meaning// objects). The linguistics
|
|
|
|
module then has to provide an efficient way to parse text to find uses
|
|
|
|
of these verbs, and it does so by constructing intermediate objects.
|
|
|
|
See //Verb Usages//.
|
|
|
|
|
|
|
|
[1] Compare "Peter carries a ball", where Peter and the ball have very
|
|
|
|
asymmetric roles because the action is done by Peter but to the ball, and
|
|
|
|
"Peter is the mayor", where two nouns are equated in a symmetrical sort of way.
|
|
|
|
|
|
|
|
@h Performance in practice.
|
|
|
|
The following tabulates the linguistic stock accumulated by a typical Inform 7
|
|
|
|
compilation (the same one used to generate the data in //inform7: Performance Metrics//).
|
2022-05-01 16:25:26 +03:00
|
|
|
Within each category, items are listed in order of creation.
|
2020-06-30 01:41:10 +03:00
|
|
|
|
|
|
|
= (hyperlinked undisplayed text from Figures/stock-diagnostics.txt)
|