inform7/inter/building-module/Preliminaries/What This Module Does.w

What This Module Does.

An overview of the building module's role and abilities.

@h Prerequisites.
The building module is a part of the Inform compiler toolset. It is
presented as a literate program or "web". Before diving in:
(a) It helps to have some experience of reading webs: see //inweb// for more.
(b) The module is written in C, in fact ANSI C99, but this is disguised by the
fact that it uses some extension syntaxes provided by the //inweb// literate
programming tool, making it a dialect of C called InC. See //inweb// for
full details, but essentially: it's C without predeclarations or header files,
and where functions have names like |Tags::add_by_name| rather than just |add_by_name|.
(c) This module uses other modules drawn from the //compiler//, and also
uses a module of utility functions called //foundation//.
For more, see //foundation: A Brief Guide to Foundation//.

@h Services for builders.
This module is essentially middleware. It acts as a bridge to the low-level
functions in the //bytecode// module, allowing them to be used with much
greater ease and consistency.

In particular, the functions here enforce a number of conventions about how an
Inter tree is laid out. Indiscriminate use of //bytecode// functions would allow
other layouts to be made, but we want to be systematic.

This module needs plenty of working data, and stashes that data inside the
|inter_tree| structure it is working on: in a compoment of that structure called
a //building_site//. Whereas the main data ih an |inter_tree| affects the meaning
of the tree, i.e., makes a difference as to what program the tree represents,
the contents of the //building_site// component are only used to make it, and
are ignored by the //final// code-generator.

@h Structural conventions.
An inter tree is fundamentally a set of resources stored in a nested set of
|inter_package| boxes.

(*) The following resources are stored at the root level (i.e., not inside of
any package) and nowhere else:
(-*) Package type declarations. See //LargeScale::package_type//.
(-*) Primitive declarations. See //Inter Primitives//. Again, Inter can in
principle support a variety of different "instruction sets", but this module
presents a single standardised instruction set.
(-*) Compiler pragmas. These are marginal tweaks on a platform-by-platform basis
and use of them is minimal, but see //LargeScale::emit_pragma//.

(*) Everything else is inside a single top-level package called |main|, which
has package type |_plain|.

(*) |main| contains only packages, and of only two types:
(-*) "Modules", which are packages of type |_module|. These occur nowhere else
in the tree.
(-*) "Linkages", which are packages of type |_linkage|. These occur nowhere else
in the tree.

(*) //inform7// compiles the material in each compilation unit to a module
named for that unit. That is:
(-*) The module |source_text| contains material from the main source text.
(-*) Each extension included produces a module, named, for example,
|locksmith_by_emily_short|.

(*) Each kit produces a module, named after it. Any Inter tree produced by
//inform7// will always contain the module |BasicInformKit|, for example.

(*) //inform7// generates an additional module called |generic|, holding
generic definitions -- material which is the same regardless of what is
being compiled.

(*) //inform7// generates an additional module called |completion|, holding
resources put together from across different compilation units.[1]

(*) //inter// generates an additional module called |synoptic|, made during
linking, which contains resources collated from or cross-referencing
everything else.

(*) Modules contain only further packages, called "submodules" and with the
package type |_submodule|. The Inform tools use a standard set of names for
such submodules: for example, in any module the resources defining its
global variables are in a submodule called |variables|. (If it defines no
variables, the submodule will not be present.)

(*) There are just two different linkages -- packages with special contents
and which the linking steps of //pipeline// treat differently from modules.
(-*) |architecture| has no subpackages, and contains only constant definitions,
drawn from a fixed and limited set. These definitions depend on, and indeed
express, the target architecture: for example, |WORDSIZE|, the number of
bytes per word, is defined here. Symbols here behave uniquely in linking:
when two trees are linked together, they will each have an |architecture|
package, and symbols in them will simply be identified with each other.
Thus the |WORDSIZE| defined in the main Inform 7 tree will be considered
the same symbol as the |WORDSIZE| defined in the tree for BasicInformKit.
(-*) |connectors| has no subpackages and no resources other than symbols.
It holds plugs and sockets enabling the Inter tree to be linked with other
Inter trees; during linking, these are removed when their purposes has been
served, so that after a successful link, |connectors| will always be empty.

See //Large-Scale Structure// for the code which builds all of the above
packages (though not their contents).

[1] Ideally |completion| would not exist, and everything in it would be made
as part of |synoptic| during linking, but at present this is too difficult.

@h Dealing with partly-built Inter.
Inter code is a nested tree of boxes, |inter_package|s, which contain Inter
code defining various resources, cross-referenced by |inter_symbol|s.

But this tree cannot be magically made all at once. For much of the run of
a tool like //inform7//, a partly-built tree will exist, and this introduces
many potential race conditions -- where, for example, a call to function F
cannot be made until F itself has been made, and so on.

We also want to avoid bugs where one part of the compiler thinks that F will
live in one place, and another part thinks it is somewhere else.

To that end, we use a flexible way to describe naming and positioning
conventions for Inter resources (such as our hypothetical F). In this system,
a //package_request// stands for a package which may or may not already exist;
and an //inter_name//, similarly, is a symbol which may or may not exist yet.
This enables tools like //inform7// to build up elaborate if shadowy worlds
of references to tree positions which will be filled in later.
= (text)
				DEFINITELY MADE		PERHAPS NOT YET MADE
	PACKAGE		inter_package		//package_request//
	SYMBOL		inter_symbol		//inter_name//
=
So, for example, a //package_request// can represent |/main/synoptic/kinds|
either before or after that package has been built. At some point the package
ceases to be virtual and comes into being: this is called "incarnation". But
code in //inform7// using package requests never needs to know when this takes
place, and will function equally well before or after -- so, no race conditions.

And similarly for //inter_name//, which it would perhaps be more consistent
to call a |symbol_request|. But "iname" is now a term used almost ubiquitously
across //inform7// and //inter//, and it doesn't seem worth renaming it now.

@h Code and schemas.
The above systems make nested packages and symbols within them, but not the
actual content of these boxes, or the definitions which the symbols refer to.
In short, the actual Inter code.

The straightforward way to compile some Inter code is to make calls to functions
in //Producing Inter//, which provide a straightforward if low-level API. For example:
= (text as InC)
	inter_name *iname = HierarchyLocations::iname(I, CCOUNT_PROPERTY_HL);
	Produce::numeric_constant(I, iname, K_value, x);
=
Note that we do not need to say where this code will go. //Producing Inter//
looks at the iname, works out what package request it should go into, incarnates
that into a real |inter_package| if necessary, then incarnates the iname into
a real |inter_symbol| if necessary; and finally emits a |CONSTANT_IST| in the
relevant package, an instruction which defines the symbol.

And similarly for emitting code inside a function body, though then it is
necessary first to say what function (which can be done by calling //Produce::block//
with the iname for that function). For example:
= (text as InC)
	Produce::inv_primitive(I, RETURN_BIP);
	Produce::down(I);
		Produce::val(I, K_value, LITERAL_IVAL, 1);
	Produce::up(I);
=

@ But that is a laborious sort of notation for what, in a C-like language, would
be written just as |return 1|. It would be very painful to have to implement
kits such as BasicInformKit that way. Instead, we write them in a notation which
is very close indeed[1] to Inform 6 syntax.

This means we need to provide what amounts to a pocket Inform-6-to-Inter compiler,
and we do that in this module, using a data structure called an //inter_schema// --
in effect, an annotated syntax tree -- to represent the results of parsing Inform 6
notation. For example, this:
= (text as InC)
	inter_schema *sch = ParsingSchemas::from_text(I"return true;");
	EmitInterSchemas::emit(I, ..., sch, ...);
=
generates Inter code equivalent to the example above.[2] But the real power of
the system comes from:

(a) The ability to handle much larger passages of I6 notation - for example, a
function body 10K long -- in an acceptably speed-efficient way; and

(b) The ability to subsctitute values in for placeholders.

As an example of (b), an //inter_schema// is how //inform7// compiles so-called
inline phrase definitions such as:
= (text as Inform 7)
	To say (L - a list of values) in brace notation:
		(- LIST_OF_TY_Say({-by-reference:L}, 1); -).
=
Here, the text |LIST_OF_TY_Say({-by-reference:L}, 1);| is passed through to
//ParsingSchemas::from_text// to make a schema. When the phrase is invoked,
//EmitInterSchemas::emit// is used to generate Inter code from it; and a
reference to the list passed to the invocation as the token |L| is substituted
for the braced clause |{-by-reference:L}|.[3] Schemas are also used as convenient
shorthand in the compiler to express how to, for example, post-increment a
property value.

[1] Some antique syntaxes, such as |for| loops broken with semicolons not colons,
are missing; so are some hardly-used directives; and the superclass |::| operator;
and built-in compiler symbols relevant only to particular virtual machines, such
as |#g$self|, are not there. But really, you will never notice they are gone.

[2] Skipping over some of the arguments to the emission function, which basically
tell us how to resolve identifier names into variables, arrays, and so on.

[3] These braced placeholders are, of course, not Inform 6 notation, and
represent an extension of the I6 syntax.