inform7/services/linguistics-module/Chapter 4/Noun Phrases.w

[NounPhrases::] Noun Phrases.

To construct noun-phrase subtrees for assertion sentences.

@h Hierarchy of noun phrases.
Noun phrase nodes are built at four levels of elaboration, which we take in
turn:

(-NP1) Raw: where the text is entirely untouched and unannotated.
(-NP2) Articled: where any initial article is converted to an annotation.
(-NP3) List-divided: where, in addition, a list is broken up into individual items.
(-NP4) Full: where, in addition, pronouns, relative phrases establishing
relationships and properties, and so on are parsed.

@h Raw nounphrases (NP1).
A raw noun phrase is always a single |UNPARSED_NOUN_NT|. The following always
matches any non-empty text:

=
<np-unparsed> ::=
	...                 ==> { 0, Diagrams::new_UNPARSED_NOUN(W) }

@ This "balanced" version, however, requires any brackets and braces to be
used in a balanced way: thus |frogs ( and toads )| would match, but
|frogs ( and| would not. It therefore does not always match.

=
<np-unparsed-bal> ::=
	^<balanced-text> |  ==> { fail }
	<np-unparsed>       ==> { pass 1 }

@ The noun phrase of an existential sentence is recognised thus:

=
<np-existential> ::=
	there               ==> { 0, Diagrams::new_DEFECTIVE(W) }

@h Articled nounphrases (NP2).
Now an initial article becomes an annotation and is removed from the text.
Note that
(a) Unexpectedly upper-case articles are left well alone, as in the sentence:

>> On the table is a thing called A Town Called Alice.

(b) Articles are not removed if that would leave the text empty.

(c) If we are in a language where the same word might either be definite or
indefinite, the latter has precedence.

=
<np-articled> ::=
	... |                                              ==> { lookahead }
	<if-not-cap> <indefinite-article> <np-unparsed> |  ==> { 0, NounPhrases::add_art(RP[3], RP[2]) }
	<if-not-cap> <definite-article> <np-unparsed> |    ==> { 0, NounPhrases::add_art(RP[3], RP[2]) }
	<np-unparsed>                                      ==> { pass 1 }

<np-articled-bal> ::=
	^<balanced-text> |                                 ==> { fail }
	<np-articled>                                      ==> { pass 1 }

@ =
parse_node *NounPhrases::add_art(parse_node *p, article_usage *au) {
	Node::set_article(p, au);
	return p;
}

@ The following function is only occasionally useful; it takes an existing raw
node and retrospectively applies <np-articled> to it.

=
parse_node *NounPhrases::annotate_by_articles(parse_node *RAW_NP) {
	<np-articled>(Node::get_text(RAW_NP));
	parse_node *MODEL = <<rp>>;
	Node::set_text(RAW_NP, Node::get_text(MODEL));
	Node::set_article(RAW_NP, Node::get_article(MODEL));
	return RAW_NP;
}

@h List-divided nounphrases (NP3).
An "articled list" matches text like "the lion, a witch, and some wardrobes"
as a list of articled noun phrases.

Note that the requirement that non-final terms in the list have to be balanced
means that an and or a comma inside brackets can never be a divider. Thus
"the horse (and its boy)" would be one item, not two.

=
<np-articled-list> ::=
	... |                                   ==> { lookahead }
	<np-articled-bal> <np-articled-tail> |  ==> { 0, Diagrams::new_AND(R[2], RP[1], RP[2]) }
	<np-articled>                           ==> { pass 1 }

<np-articled-tail> ::=
	, {_and} <np-articled-list> |           ==> { Wordings::first_wn(W), RP[1] }
	{_,/and} <np-articled-list>             ==> { Wordings::first_wn(W), RP[1] }

@ "Alternative lists" divide up at "or" rather than "and", thus matching text
such as "voluminous, middling big or poky", and the individual entries are not
articled.

=
<np-alternative-list> ::=
	... |                                      ==> { lookahead }
	<np-unparsed-bal> <np-alternative-tail> |  ==> { 0, Diagrams::new_AND(R[2], RP[1], RP[2]) }
	<np-unparsed>                              ==> { pass 1 }

<np-alternative-tail> ::=
	, {_or} <np-alternative-list> |            ==> { Wordings::first_wn(W), RP[1] }
	{_,/or} <np-alternative-list>              ==> { Wordings::first_wn(W), RP[1] }

@h Full nounphrases (NP4).
When fully parsing the structure of a nounphrase, we have five different
constructions in play, and need to work out their precedence over each other:
rather as |*| takes precedence over |+| in arithmetic expressions in C, so
here we have --
= (text)
	RELATIONSHIP_NT > CALLED_NT > WITH_NT > AND_NT > KIND_NT
=
That is, relative clauses take precedence over callings, and so on. The
above hierarchy is arrived at thus:
(a) We need |RELATIONSHIP_NT > WITH_NT| so that "X is in a container with
carrying capacity 10" will work.
(b) We need |WITH_NT > AND_NT| so that "X is a container with carrying
capacity 10 and diameter 12" will work.
(c) We need |CALLED_NT > WITH_NT| so that "X is a container called the flask
with flange" will work.
(d) We need |RELATIONSHIP_NT > CALLED_NT| so that "A man called Horse is in
the High Sierra" will work.
(e) We want |KIND_NT| to be of low precedence because it is always either
the word "kind" alone, or "kind of N" for some atomic noun N.

See //About Sentence Diagrams// for numerous examples.

@ Full nounphrase parsing varies slightly according to the position of the
phrase, i.e., whether it is in the subject or object position. Thus "X is Y"
or "X is in Y" would lead to X being parsed by <np-as-subject>, Y by <np-as-object>.
They are identical except that:

(a) In subject position, a full nounphrase can use "there" to indicate
an existential sentence such as "there is a hair in my soup"; and

(b) In subject position, a relative phrase cannot begin with a word which
looks like a participle.

=
<np-as-subject> ::=
	<np-existential> |                             ==> { pass 1 }
	<if-not-cap> <np-relative-phrase-limited> |    ==> { pass 2 }
	<np-nonrelative>                               ==> { pass 1 }

<np-as-object> ::=
	<if-not-cap> <np-relative-phrase-unlimited> |  ==> { pass 2 }
	<np-nonrelative>                               ==> { pass 1 }

@ To explain the limitation here: RPs only exist in the subject position due
to subject-verb inversion in English. Thus, "In the Garden is a tortoise" is a
legal inversion of "A tortoise is in the Garden". Following this logic we ought
to accept Yoda-like inversions such as "Holding the light sabre is the young Jedi",
but we don't want to do that, because then a sentence like "Holding Area is a room"
might have to be read as saying that a nameless room is holding something
called "Area".

=
<np-relative-phrase-limited> ::=
	<np-relative-phrase-implicit> |                ==> { pass 1 }
	<probable-participle> *** |                    ==> { fail }
	<np-relative-phrase-explicit>                  ==> { pass 1 }

<np-relative-phrase-unlimited> ::=
	<np-relative-phrase-implicit> |                ==> { pass 1 }
	<np-relative-phrase-explicit>                  ==> { pass 1 }

@ Inform guesses above that most English words ending in "-ing" are present
participles -- like guessing, bluffing, cheating, and so on. But there is
a conspicuous exception to this; so any word found in <non-participles>
is never treated as a participle.

=
<non-participles> ::=
	thing/something

<probable-participle> internal 1 {
	if (Vocabulary::test_flags(Wordings::first_wn(W), ING_MC)) {
		if (<non-participles>(W)) { ==> { fail nonterminal }; }
		return TRUE;
	}
	==> { fail nonterminal };
}

@ An implicit RP is a word like "carried", or "worn", on its own -- this
implies a relation to some unspecified noun. We represent that in the tree
using the "implied noun" pronoun. For now, these are fixed.

=
<np-relative-phrase-implicit> ::=
	worn |              ==> @<Act on the implicit RP worn@>
	carried |           ==> @<Act on the implicit RP carried@>
	initially carried   ==> @<Act on the implicit RP initially carried@>

@<Act on the implicit RP worn@> =
	#ifndef IF_MODULE
	==> { fail production }
	#endif
	#ifdef IF_MODULE
	==> { 0, Diagrams::new_implied_RELATIONSHIP(W, R_wearing) }
	#endif

@<Act on the implicit RP carried@> =
	#ifndef IF_MODULE
	==> { fail production }
	#endif
	#ifdef IF_MODULE
	==> { 0, Diagrams::new_implied_RELATIONSHIP(W, R_carrying) }
	#endif

@<Act on the implicit RP initially carried@> =
	#ifndef IF_MODULE
	==> { fail production }
	#endif
	#ifdef IF_MODULE
	==> { 0, Diagrams::new_implied_RELATIONSHIP(W, R_carrying) }
	#endif

@ An explicit RP is one which uses a preposition and then a noun phrase: for
example, "on the table" is explicit.

Note that we throw out a relative phrase if the noun phrase within it would
begin with "and" or a comma; this enables us to parse sentences concerning
directions, in particular, a little better. But it means we do not recognise
"of, by and for the people" as an RP.

=
<np-relative-phrase-explicit> ::=
	<permitted-preposition> _,/and ... |       ==> { fail }
	<permitted-preposition> _,/and |           ==> { fail }
	<permitted-preposition> <np-nonrelative>   ==> @<Work out a meaning@>

@<Work out a meaning@> =
	VERB_MEANING_LINGUISTICS_TYPE *R = VerbMeanings::get_regular_meaning_of_form(
		Verbs::find_form(permitted_verb, RP[1], NULL));
	if (R == NULL) return FALSE;
	==> { -, Diagrams::new_RELATIONSHIP(W, VerbMeanings::reverse_VMT(R), RP[2]) };

@ We have now disposed of |RELATIONSHIP_NT| and are left with the constructs:
= (text)
	CALLED_NT > WITH_NT > AND_NT > KIND_NT
=
These are all handled by <np-nonrelative>. Two points to note:
(a) The first production accepts arbitrary text quickly and without allocating
memory if we're in lookahead mode -- an important economy since otherwise
parsing a list of $n$ items would have running time and memory of order $2^n$.
(b) If we regard the above constructs as being like operators in arithmetic,
then the operands have to match <np-operand>, and this requires text which has
balanced brackets. That ensures that, for example, "frog (called toad)"
is not misread as saying that "frog (" is called "toad )". But note that
the final <np-articled> production catches any unbalanced text, so even
text like "smile X-)" will in fact match <np-nonrelative>.

=
<np-nonrelative> ::=
	... |                                      ==> { lookahead }
	<np-operand> {called} <np-articled-bal> |  ==> { 0, Diagrams::new_CALLED(WR[1], RP[1], RP[2]) }
	<np-operand> <np-with-or-having-tail> |    ==> { 0, Diagrams::new_WITH(R[2], RP[1], RP[2]) }
	<np-operand> <np-and-tail> |               ==> { 0, Diagrams::new_AND(R[2], RP[1], RP[2]) }
	<np-kind-phrase> |                         ==> { pass 1 }
	<agent-pronoun> |                          ==> { 0, Diagrams::new_PRONOUN(W, RP[1]) }
	<here-pronoun> |                           ==> { 0, Diagrams::new_PRONOUN(W, RP[1]) }
	<np-articled>                              ==> { pass 1 }

<np-operand> ::=
	<if-not-cap> <np-relative-phrase-unlimited> |  ==> { pass 2 }
	^<balanced-text> |                             ==> { fail }
	<np-nonrelative>                               ==> { pass 1 }

@ The tail of with-or-having parses for instance "with carrying capacity 5"
in the NP

>> a container with carrying capacity 5

This makes use of a nifty feature of Preform: when Preform scans to see how to
divide the text, it tries <np-with-or-having-tail> in each possible position.
The reply can be yes, no, or no and move on a little. So if we spot "it with
action", the answer is no, and move on three words: that jumps over a "with"
which we don't want to recognise. (Because if we did, then "the locking it
with action" would be parsed as a property list, "action", attaching to a
bogus object called "locking it".)

=
<np-with-or-having-tail> ::=
	it with action *** |                       ==> { advance Wordings::delta(WR[1], W) }
	{with/having} (/) *** |                    ==> { advance Wordings::delta(WR[1], W) }
	{with/having} ... ( <response-letter> ) |  ==> { advance Wordings::delta(WR[1], W) }
	{with/having} <np-new-property-list>       ==> { Wordings::first_wn(WR[1]), RP[1] }

<np-new-property-list> ::=
	... |                                      ==> { lookahead }
	<np-new-property> <np-new-property-tail> | ==> { 0, Diagrams::new_AND(R[2], RP[1], RP[2]) }
	<np-new-property>                          ==> { pass 1 };

<np-new-property-tail> ::=
	, {_and} <np-new-property-list> |          ==> { Wordings::first_wn(W), RP[1] }
	{_,/and} <np-new-property-list>            ==> { Wordings::first_wn(W), RP[1] }

<np-new-property> ::=
	...                                        ==> { 0, Diagrams::new_PROPERTY_LIST(W) }

@ The "and" tail is much easier:

=
<np-and-tail> ::=
	, {_and} <np-operand> |                    ==> { Wordings::first_wn(W), RP[1] }
	{_,/and} <np-operand>                      ==> { Wordings::first_wn(W), RP[1] }

@ Kind phrases are easier:

>> A sedan chair is a kind of vehicle. A weather pattern is a kind.

Note that indefinite articles are permitted before the word "kind(s)",
but definite articles are not.

=
<np-kind-phrase> ::=
	<indefinite-article> <np-kind-phrase-unarticled> |  ==> { pass 2 }
	<np-kind-phrase-unarticled>                         ==> { pass 1 }

<np-kind-phrase-unarticled> ::=
	kind/kinds |                                        ==> { 0, Diagrams::new_KIND(W, NULL) }
	kind/kinds of <np-operand>                          ==> { 0, Diagrams::new_KIND(W, RP[1]) }