inform7/inter/building-module/Preliminaries/What This Module Does.w

What This Module Does.

An overview of the building module's role and abilities.

@h Prerequisites.
The building module is a part of the Inform compiler toolset. It is
presented as a literate program or "web". Before diving in:
(a) It helps to have some experience of reading webs: see //inweb// for more.
(b) The module is written in C, in fact ANSI C99, but this is disguised by the
fact that it uses some extension syntaxes provided by the //inweb// literate
programming tool, making it a dialect of C called InC. See //inweb// for
full details, but essentially: it's C without predeclarations or header files,
and where functions have names like |Tags::add_by_name| rather than just |add_by_name|.
(c) This module uses other modules drawn from the compiler (see //structure//), and also
uses a module of utility functions called //foundation//.
For more, see //foundation: A Brief Guide to Foundation//.

@h Introduction.
This module is essentially middleware. It acts as a bridge to the low-level
functions in the //bytecode// module, allowing them to be used with much
greater ease and consistency.

This module needs plenty of working data, and stashes that data inside the
|inter_tree| structure it is working on: in a component of that structure called
a //building_site//. Whereas the main data in an |inter_tree| affects the meaning
of the tree, i.e., makes a difference as to what program the tree represents,
the contents of the //building_site// component are only used to make it, and
are ignored by the //final// code-generator.

@h Large-scale architecture.
An inter tree is fundamentally a set of resources stored in a nested set of
|inter_package| boxes.

(*) The following resources are stored at the root level (i.e., not inside of
any package) and nowhere else:
(-*) Package type declarations. See //LargeScale::package_type//.
(-*) Primitive declarations. See //Inter Primitives//. Again, Inter can in
principle support a variety of different "instruction sets", but this module
presents a single standardised instruction set.
(-*) Compiler pragmas. These are marginal tweaks on a platform-by-platform basis
and use of them is minimal, but see //LargeScale::emit_pragma//.

(*) Everything else is inside a single top-level package called |main|, which
has package type |_plain|.

(*) |main| contains only packages, and of only two types:
(-*) "Modules", which are packages of type |_module|. These occur nowhere else
in the tree.
(-*) "Linkages", which are packages of type |_linkage|. These occur nowhere else
in the tree.

(*) //inform7// compiles the material in each compilation unit to a module
named for that unit. That is:
(-*) The module |source_text| contains material from the main source text.
(-*) Each extension included produces a module, named, for example,
|locksmith_by_emily_short|.

(*) Each kit produces a module, named after it. Any Inter tree produced by
//inform7// will always contain the module |BasicInformKit|, for example.

(*) //inform7// generates an additional module called |generic|, holding
generic definitions -- material which is the same regardless of what is
being compiled.

(*) //inform7// generates an additional module called |completion|, holding
resources put together from across different compilation units.[1]

(*) //inter// generates an additional module called |synoptic|, made during
linking, which contains resources collated from or cross-referencing
everything else.

(*) Modules contain only further packages, called "submodules" and with the
package type |_submodule|. The Inform tools use a standard set of names for
such submodules: for example, in any module the resources defining its
global variables are in a submodule called |variables|. (If it defines no
variables, the submodule will not be present.)

(*) There are just two different linkages -- packages with special contents
and which the linking steps of //pipeline// treat differently from modules.
(-*) |architecture| has no subpackages, and contains only constant definitions,
drawn from a fixed and limited set. These definitions depend on, and indeed
express, the target architecture: for example, |WORDSIZE|, the number of
bytes per word, is defined here. Symbols here behave uniquely in linking:
when two trees are linked together, they will each have an |architecture|
package, and symbols in them will simply be identified with each other.
Thus the |WORDSIZE| defined in the main Inform 7 tree will be considered
the same symbol as the |WORDSIZE| defined in the tree for BasicInformKit.
(-*) |connectors| has no subpackages and no resources other than symbols.
It holds plugs and sockets enabling the Inter tree to be linked with other
Inter trees; during linking, these are removed when their purposes has been
served, so that after a successful link, |connectors| will always be empty.

See //Large-Scale Structure// for the code which builds all of the above
packages (though not their contents).

[1] Ideally |completion| would not exist, and everything in it would be made
as part of |synoptic| during linking, but at present this is too difficult.

@ Inter code is a nested tree of boxes, |inter_package|s, which contain Inter
code defining various resources, cross-referenced by |inter_symbol|s.

But this tree cannot be magically made all at once. For much of the run of
a tool like //inform7//, a partly-built tree will exist, and this introduces
many potential race conditions -- where, for example, a call to function F
cannot be made until F itself has been made, and so on.

We also want to avoid bugs where one part of the compiler thinks that F will
live in one place, and another part thinks it is somewhere else.

To that end, we use a flexible way to describe naming and positioning
conventions for Inter resources (such as our hypothetical F). In this system,
a //package_request// stands for a package which may or may not already exist;
and an //inter_name//, similarly, is a symbol which may or may not exist yet.
This enables tools like //inform7// to build up elaborate if shadowy worlds
of references to tree positions which will be filled in later.
= (text)
				DEFINITELY MADE		PERHAPS NOT YET MADE
	PACKAGE		inter_package		//package_request//
	SYMBOL		inter_symbol		//inter_name//
=
So, for example, a //package_request// can represent |/main/synoptic/kinds|
either before or after that package has been built. At some point the package
ceases to be virtual and comes into being: this is called "incarnation". But
code in //inform7// using package requests never needs to know when this takes
place, and will function equally well before or after -- so, no race conditions.

And similarly for //inter_name//, which it would perhaps be more consistent
to call a |symbol_request|. But "iname" is now a term used almost ubiquitously
across //inform7// and //inter//, and it doesn't seem worth renaming it now.

@h Medium-scale blueprints.
The above systems make nested packages and symbols within them, but not the
actual content of these boxes, or the definitions which the symbols refer to.
In short, the actual Inter code.

The straightforward way to compile some Inter code is to make calls to functions
in //Producing Inter//, which provide a straightforward if low-level API. For example:
= (text as InC)
	inter_name *iname = HierarchyLocations::iname(I, CCOUNT_PROPERTY_HL);
	Produce::numeric_constant(I, iname, K_value, x);
=
Note that we do not need to say where this code will go. //Producing Inter//
looks at the iname, works out what package request it should go into, incarnates
that into a real |inter_package| if necessary, then incarnates the iname into
a real |inter_symbol| if necessary; and finally emits a |CONSTANT_IST| in the
relevant package, an instruction which defines the symbol.

And similarly for emitting code inside a function body, though then it is
necessary first to say what function (which can be done by calling //Produce::function_body//
with the iname for that function). For example:
= (text as InC)
	Produce::inv_primitive(I, RETURN_BIP);
	Produce::down(I);
		Produce::val(I, K_value, InterValuePairs::number(1));
	Produce::up(I);
=

@ But that is a laborious sort of notation for what, in a C-like language, would
be written just as |return 1|. It would be very painful to have to implement
kits such as BasicInformKit that way. Instead, we write them in a notation which
is very close indeed[1] to Inform 6 syntax.[2]

This means we need to provide what amounts to a pocket Inform-6-to-Inter compiler,
and we do that in this module, using a data structure called an //inter_schema// --
in effect, an annotated syntax tree -- to represent the results of parsing Inform 6
notation. For example, this:
= (text as InC)
	inter_schema *sch = ParsingSchemas::from_text(I"return true;", where);
	EmitInterSchemas::emit(I, ..., sch, ...);
=
generates Inter code equivalent to the example above.[3] But the real power of
the system comes from:

(a) The ability to handle much larger passages of I6 notation - for example, a
function body 10K long -- in an acceptably speed-efficient way; and

(b) The ability to subsctitute values in for placeholders.

As an example of (b), an //inter_schema// is how //inform7// compiles so-called
inline phrase definitions such as:
= (text as Inform 7)
	To say (L - a list of values) in brace notation:
		(- LIST_OF_TY_Say({-by-reference:L}, 1); -).
=
Here, the text |LIST_OF_TY_Say({-by-reference:L}, 1);| is passed through to
//ParsingSchemas::from_text// to make a schema. When the phrase is invoked,
//EmitInterSchemas::emit// is used to generate Inter code from it; and a
reference to the list passed to the invocation as the token |L| is substituted
for the braced clause |{-by-reference:L}|.[4] Schemas are also used as convenient
shorthand in the compiler to express how to, for example, post-increment a
property value.

[1] Some antique syntaxes, such as |for| loops broken with semicolons not colons,
are missing; so are some hardly-used directives; and the superclass |::| operator;
and built-in compiler symbols relevant only to particular virtual machines, such
as |#g$self|, are not there. But really, you will never notice they are gone.

[2] Using Inform 6 notation was very convenient in the years 2004-17, when Inform
generated only I6 code: it became more problematic in 2018, when Inter instructions
were needed instead, and much of this module was written as a response.

[3] Skipping over some of the arguments to the emission function, which basically
tell us how to resolve identifier names into variables, arrays, and so on.

[4] These braced placeholders are, of course, not Inform 6 notation, and
represent an extension of the I6 syntax.

@h Small-scale masonry.
Finally, there are also times when we want to compile explicit code, one
Inter instruction at a time, and for this the Produce API is provided.

This API keeps track of the current write position inside each tree (using
the //code_insertion_point// system), and then provides functions which call
down into //bytecode// for us, making use of that write position. So, for
example, we can write:
= (text as InC)
	Produce::inv_primitive(I, RETURN_BIP);
	Produce::down(I);
		Produce::val(I, K_value, InterValuePairs::number(17));
	Produce::up(I);
=
to produce the Inter code:
= (text as Inter)
	inv !return
		val K_unchecked 17
=
Note the use of //Produce::down// and //Produce::up// to step up and down the
hierarchy: these functions are always called in matching ways.

@ The //pipeline// module makes heavy use of the Produce API. Surprising,
//inform7// calls it in only a few places -- but in fact that is because
it provides still another middleware layer on top. See //runtime: Emit//.
But it's really only a very thin layer, allowing the caller not to have to
pass the |I| argument to every call (because it will always be the Inter tree
being compiled by //inform7//). Despite appearances, then, Produce makes all
of the Inter instructions generated inside either //inter// or //inform7//.