inform7/inter/Manual/Code Packages in Textual Inter.w

Code Packages in Textual Inter.

How executable functions are expressed in textual inter programs.

@h Code packages.
To recap from //Textual Inter//: an Inter program is a nested hierarchy of
packages. Some of those are special |_code| packages which define functions,
special in several ways:

(*) Their names can be used as values: that's how functions are called. See
|inv| below.

(*) Their names can optionally have types: see //Data Packages in Textual Inter//
for details.

(*) They cannot have subpackages. Conceptually, a code package is a single
function body. Packages are not used for "code blocks", and there are no
nested functions.

(*) They cannot contain |constant|, |variable|, and similar instructions found
in data packages. Instead they can only contain the set of instructions which
are the subject of this section (and which are allowed only in |_code| packages).

@ The basic structure of a function body like this is that it begins with some
local variable declarations, and then has its actual content inside a |code|
block, like so:
= (text as Inter)
	package double _code
	    local x
	    code
	    	inv !return
	    		inv !plus
	    			val x
	    			val x
=
As with its global analogue, |variable|, a |local| instruction can optionally
specify a type:
= (text as Inter)
	local (int32) x
=

There can be at most one |code| instruction at the top level. This is incorrect:
= (text as Inter)
	package fails _code
	    code
	    	inv !enableprinting
	    code
	    	inv !print
	    		val "I am dismal.\n"
=
and should instead be:
= (text as Inter)
	package succeeds _code
	    code
	    	inv !enableprinting
	    	inv !print
	    		val "I am glorious.\n"
=

@ Surprisingly, perhaps, it's legal not to have a |code| block at all. This
function works:
= (text as Inter)
	package succeeds _code
=
But of course it does nothing. If the return value of such a function is used,
it will be 0.

@h Contexts.
At any point inside a function body (except at the very top level), the
instruction used is expected to have a given "category", decided by the
"context" at that point. These categories have names:

(*) |code| context. This means an instruction is expected to do something,
but not produce a resulting value.

(*) |val| context. This means an instruction is expected to produce a value.

(*) |ref| context. This means an instruction is expected to provide a
"reference" to some storage in the program. For example, it could indicate
a global variable, or a particular property of some instance.

(*) |lab| context. This means an instruction is expected to indicate a label
marking a position in that same function.

In a |code| block, the context is initially |code|. For example:
= (text as Inter)
	package double _code
	    local x                             top level has no context
	    code                                top level has no context
	    	inv !jump                       context is code
	    		lab .SkipWarning            context is lab
	    	inv !print                      context is code
	    		val "It'll get bigger!\n"   context is val
	    	.SkipWarning                    context is code
	    	inv !store                      context is code
	    		ref x                       context is ref
	    		inv !plus                   context is val
	    			val x                   context is val
	    			val x                   context is val
	    	inv !return                     context is code
	    		val x                       context is val
=
In this function, the |code| block contains five instructions, each of which
is read in a |code| context. Each of those then has its own expectations which
set the context for its child instructions, and so on. For example, |inv !store|
expects to see two child instructions, the first in |ref| context and the
second in |val| context.

Those uses of |inv !something| are called "primitive invocations". They are
like function calls, but where the function is built in to Inter and is not
itself defined in Inter. Each such has a "signature". For example, the
internal declaration of |!store| is:
= (text as Inter)
	primitive !store ref val -> val
=
So its signature is |ref val -> val|. This expresses that its two children
should be read in |ref| and |val| context, and that its result is a |val|.
(As in most C-like languages, stores are values in Inter, though in
practice those values are often thrown away.)

The standard built-in stock of primitive invocations is described in the
next section, on //Inform Primitives//.

@ How is all this policed? Whereas typechecking of data is often weak in Inter,
signature checking is taken much more seriously. If the context is |code|, then
the only legal primitives to invoke are those where the return part of the
signature is either |void| (no value) or |val| (a value, but which is thrown
away and ignored, as in most C-like languages). Otherwise, |ref| context
requires a |ref| result, and similarly for |val| and |lab|.

For example, |!return| has the signature |val -> void|, which makes it legal
to use in a |code| context as in the above example. But these two attempts
to use it would both be incorrect:
= (text as Inter)
	inv !return
	inv !printnumber
		inv !return
			val 10
=
The first fails because it tries to use |!return| as if it were |void -> void|,
i.e., with no supplied value; the second fails because it tries to use it as if
it were |val -> val|.

@ Some primitives have |code| as one or more of their arguments. For example:
= (text as Inter)
	primitive !ifelse val code code -> void
=
This evaluates the first argument (a value), then executes the second argument
(a code block) if the value is non-zero, or alternatively the third if it is zero.
There is no result. For example:
= (text as Inter)
	inv !ifelse
		val x
		code
			inv !printnumber
				x
		code
			inv !print
				"I refuse to print zeroes on principle."
=

@ Rather like |code|, which executes a run of instructions as if they were a
single instruction, |evaluation| makes a run of evaluations. Thus:
= (text as Inter)
	inv !printnumber
		evaluation
			val 23
			val -1
			val 12
=
prints just "12". The point of this is that there may be side-effects in the
earlier evaluations, of course, though there weren't in this example.

Another converter, so to speak, is |reference|, but this is much more limited
in what it is allowed to do.
= (text as Inter)
	inv !store
		reference
			val x
		val 5
=
is exactly equivalent to:
= (text as Inter)
	inv !store
		ref x
		val 5
=
This is not a very useful example: but consider --
= (text as Inter)
	inv !store
		reference
			inv !propertyvalue
				val Odessa
				val area
		val 5000
=
which changes the property |area| for |Odessa| to 5000. The signature of
|!propertyvalue| is |val val -> val|, and ordinarily it evaluates the property.
But placed under a |reference|, it becomes a reference to where that property
is stored, and thus allows the value to be changed with |!store|. This:
= (text as Inter)
	inv !store
		inv !propertyvalue
			val Odessa
			val area
		val 5000
=
would by contrast be rejected with an error, as trying to use a |val| in a |ref|
context.

|reference| cannot be applied to anything other than storage (a local or global
variable, a memory location or a property value), so for example:
= (text as Inter)
	reference
		val 5
=
is meaningless and will be rejected. There is in general no way to make, say,
a pointer to a function or instance using |reference|. It is much more circumscribed
than the |&| operator in C.

@h Function calls.
This seems a good point to say how to make function calls, since it's almost
exactly the same. This:
= (text as Inter)
	inv !printnumber
		inv double
			val 10
=
prints "20". Note the lack of a |!| in front of the function name: this means
it is a regular function, not a primitive.

@ Function calls work in a rather assembly-language-like way, and Inter makes
much less effort to type-check these for any kind of safety: so beware. It
allows them to have any of the signatures |void -> val|, |val -> val|,
|val val -> val|, ... and so on: in other words, they can be called with
any number of arguments.

In particular, even if a function is declared with a type it is still legal to
call it with any number of arguments. Again: beware.

Those arguments become the initial values of the local variables. So for
example, if:
= (text as Inter)
	package example _code
	    local x
	    local y
=
then:

(*) a call with no arguments results in |x| and |y| equal to 0 and 0;
(*) a call with argument 7 results in |x| and |y| equal to 7 and 0;
(*) a call with arguments 7 and 81 results in |x| and |y| equal to 7 and 81;
(*) a call with three or more arguments has undefined results and may crash
the program altogether.

@h Val, ref, lab and cast.
We have seen many examples already, but:

(*) |val V| allows us to use any simple value |V| in any |val| context. For
what is meant by a "simple" value, see //Data Packages in Textual Inter//.

(*) |ref R| allows us to refer to any variable, local or global, in a |ref|
context.

(*) |lab L| allows us to refer to any label declared somewhere in the current
function body, in a |lab| context.

@ The |val| and |ref| instructions both allow optional type markers to be placed,
so for example:
= (text as Inter)
	val (int32) x
	ref (text) y
=
Where no type marker is given, the type is always considered |unchecked|.

Types of |val| or |ref| tend not to be checked or looked at anyway, so this
feature is currently little used. For many primitives, some of which are quite
polymorphic, it would be difficult to impose a typechecking regime anyway.
But the ability to mark |val| and |ref| with types is preserved as a hedge
against potential future developments, when Inter might conceivably be
tightened up to typecheck explicitly typed values.

Similarly unuseful for the moment is |cast|. This instruction allows us to
say "consider this value as if it had a different type". For example, if we
are using an enumerated type |city|, we could read the enumeration values as
numbers like so:
= (text as Inter)
	cast int32 <- city
		val (city) Odessa
=
Right now this is no different from:
= (text as Inter)
	val (int32) Odessa
=
but we keep |cast| around as a hedge against future developments, in case we
ever want to typecheck strictly enough that |val (int32) Odessa| is rejected
as a contradiction in terms.

@h Labels and assembly language.
Like labels in C, these are named reference points in the code; they are written
|.NAME|, where |.NAME| must begin with a full stop |.|. Labels are not values;
they cannot be stored, or computed with, or cast. They can only be used in
a |lab| instruction.

@ Two uses of |inv| have already been covered: to call an Inter function, and
to invoke a primitive operation. The third is to execute an "assembly-language
opcode". What we mean by that is the direct use of the instruction set on the
target virtual machine we are expecting our program to run on.

This has always been a feature of Inform 6 code. For example, some real-number
arithmetic functions in BasicInformKit are written to use heavy amounts of
Glulx assembly language, in order to access functionality not present in the
Inform language itself. Here is a sample:
= (text as Inform 6)
	@fdiv sp $40135D8E log10val; ! $40135D8E is log(10)
	@floor log10val fexpo;
	@ftonumn fexpo expo;
=
Those "opcodes" beginning |@| are part of the instruction set for the
Glulx virtual machine: real number arithmetic is impossible on the smaller
Z-machine, so we couldn't meaningfully compile this code to that platform,
and that is just is well because it has a completely different instruction
set of opcodes from Glulx anyway. Still, there's no denying that Inter code
using assembly immediately becomes less portable. This is why it is always
better to use Inter primitives if possible.

Still, BasicInformKit must be compiled to Inter code somehow. We clearly need to
deal with those opcodes somehow. The standard Inform-provided kits use two
different sets of opcodes, as noted: the Z-machine and Glulx instruction sets.
One conceivable way to deal with this would have been to provide primitives
equivalent to every opcode in either set (or at least every opcode used in
the standard Inform kits). But that would hugely increase the set of primitives,
and also incur a certain amount of awkward repetition.

Instead, the Inter specification goes to the opposite extreme. It makes no
assumptions about what assembly opcodes do, or do not, exist. Inter allows
absolutely anything, and would be quite happy to accept, say, |inv @flytothemoon|,
even though this opcode does not exist in any known system of assembly language.[1]

And so the above is in fact compiled to:
= (text as Inform 6)
	inv @fdiv
		assembly stack
		val 0x40135D8E
		val log10val
	inv @floor
		val log10val
		val fexpo
	inv @ftonumn
		val fexpo
		val expo
=
And when the //building// module performed that compilation, it knew nothing
about |@fdiv| and the rest: it just took on trust that this is meaningful.

[1] This can actually be useful, since it means people experimenting with new
hybrid forms of Inform can devise extra opcodes of their own.

@ So, how apparently generous: the Inter specification allows us to invoke
opcodes with arbitrary names. But that does not, of course, mean that those
opcodes can be compiled to code which does anything useful. The //final//
code-generation module probably won't know what to do with our hypothetical
|@flytothemoon| opcode.

In practice, therefore, //final// knows how to deal with the Z-machine
instruction set when compiling for Z via Inform 6, and how to deal with the
Glulx instruction set when compiling either for Glulx via Inform 6 or a
native executable via a C compiler like |clang|. Any further code-generators
are also likely to follow Glulx conventions. So: if you really must use
assembly language in your Inter code, good advice would be --

(1) Use the Glulx instruction set, for better chances of portability.

(2) Only use those opcodes which are also used in the standard Inform kits
somewhere, since those will probably be implemented.

@ If we look at this example in more detail:
= (text as Inter)
	inv @fdiv
		assembly stack
		val 0x40135D8E
		val log10val
=
we see some general features of assembly language. Inter allows any number
of child instructions to be supplied -- here, there are three. Since Inter knows
nothing about the meaning of |@fdiv|, it has no way to know how many are
expected. They should all be usages of |val|, |lab|, or |assembly|.

|val| and |lab| we have seen already. |assembly| is a sort of punctuation
instruction which allows various oddball syntaxes of Z-machine or Glulx
assembly to be imitated in Inter. There are only seven possible |assembly|
instructions. Two are very common:

(*) |assembly stack| is probably the most common, either reading or writing
to the top of the virtual machine's stack.

(*) |assembly store_to| indicates that a storage location follows (either
|assembly stack| or a local or global variable). This is only used in Z-machine
assembly language; Glulx assembly doesn't have this marker.

The other five apply only to "branch instructions", which perform some test
and then either return from the current function or make a jump to a label
(a "branch"), depending on the outcome of the test. By default the instruction
branches on a successful test. But alternatively it can:

(*) |assembly branch_if_false|.

(*) |assembly return_true_if_true|.

(*) |assembly return_false_if_true|

(*) |assembly return_true_if_false|

(*) |assembly return_false_if_false|

So for example the Z-machine instruction |@random sp -> i;| compiles to Inter as:
= (text as Inter)
	inv @fdiv
		assembly stack
		assembly store_to
		val i
=
And note the use of |val i|, not |ref i|, even though the variable is being
written to here. Even Inter's normal rules of category checking do not apply
to assembly language, the lowest of the low.