§3. Translation into Unicode. The following handles sentences like:
+
§3. Translation into Unicode. The following sentence form is now deprecated:
leftwards harpoon with barb upwards translates into Unicode as 8636.
-
The subject "leftwards harpoon with barb upwards" is parsed against the
-Unicode character names known already to make sure that this new translation
-doesn't disagree with an existing one (that is, doesn't translate to a
-different code number).
+
Until Inform 10.1, this equated a Unicode name to its code point value; see
+IE-0005 and Unicode Literals (in values) for what now happens instead.
The sentence "X translates into Y as Z" has this sense provided Y matches:
@@ -180,6 +178,7 @@ different code number).
§4.
+intPM_UnicodeDeprecated_thrown = FALSE;intTranslations::translates_into_unicode_as_SMF(inttask, parse_node *V, wording *NPs) {wordingSW = (NPs)?(NPs[0]):EMPTY_WORDING;wordingOW = (NPs)?(NPs[1]):EMPTY_WORDING;
@@ -195,67 +194,23 @@ different code number).
}break;casePASS_2_SMFT:
-Create the Unicode character name4.1;
+if (PM_UnicodeDeprecated_thrown == FALSE) {
+PM_UnicodeDeprecated_thrown = TRUE;
+StandardProblems::sentence_problem(Task::syntax_tree(),
+_p_(PM_UnicodeDeprecated),
+"the sentence 'X translates into Unicode as Y' has been removed "
+"from the Inform language",
+"because it is now redundant. Inform already knows all the names "
+"in the Unicode standard. If you're getting this problem message "
+"because you included the extension 'Unicode Full Character Names' "
+"or 'Unicode Character Names', all you need do is to not include it.");
+ }break; }returnFALSE;}
-
§5. And this parses the noun phrases of such sentences. Note that the numeric
-values has to be given in decimal — I was tempted to allow hexadecimal here,
-but life's too short. Unicode translation sentences are really only
-technicalities needed by the built-in extensions, and those are mechanically
-generated anyway; Inform authors never type them.
-
-StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_UnicodeNonLiteral),
-"a Unicode character name must be translated into a literal decimal "
-"number written out in digits",
-"which this seems not to be.");
-returnFALSE;
-
§6. Translation into Inter. There are three sentences here, but the first is now deprecated: it has split
+
§5. Translation into Inter. There are three sentences here, but the first is now deprecated: it has split
off into two different meanings, each with its own wording for clarity.
@@ -263,7 +218,7 @@ off into two different meanings, each with its own wording for clarity.
defineTRANSLATION_DEFINED_BY_FORM2defineTRANSLATION_ACCESSIBLE_TO_FORM3
-
§7. The sentence "X translates into Y as Z" has this sense provided Y matches the
+
§6. The sentence "X translates into Y as Z" has this sense provided Y matches the
following. Before the coming of Inter code, the only conceivable compilation
target was Inform 6, but these now set Inter identifiers, so really the first
wording is to be preferred.
@@ -276,13 +231,13 @@ wording is to be preferred.
inform6
§8.1. The object noun phrase is usually just an I6 identifier in quotation marks,
but it's also possible to list literal texts (for the benefit of rules).
Following the optional "with" is an articled list, each entry of which
will be required to pass <extra-response>.
@@ -441,7 +396,7 @@ will be required to pass <
<quoted-text>==> { R[1], NULL }
§8.2.1. Ensure that we are translating to a quoted I6 identifier8.2.1 =
intvalid = TRUE;if (<translates-into-i6-sentence-object>(Node::get_text(p2)) == FALSE) valid = FALSE;elseresponses_list = <<rp>>;
-if (valid) Dequote it and see if it's valid9.2.1.1;
+if (valid) Dequote it and see if it's valid8.2.1.1;if (valid == FALSE) {StandardProblems::sentence_problem(Task::syntax_tree(),_p_(PM_TranslatedToNonIdentifier),
@@ -483,11 +438,11 @@ will be required to pass <
returnFALSE; }
diff --git a/docs/assertions-module/6-act.html b/docs/assertions-module/6-act.html
index d25a0847c..9633f2540 100644
--- a/docs/assertions-module/6-act.html
+++ b/docs/assertions-module/6-act.html
@@ -280,7 +280,7 @@ rulebook should have an identifier given to it which is accessible to Inter:
-voidActivities::translates(wordingW, parse_node *p2) {
+voidActivities::translates(wordingW, parse_node *p2) {if (<activity-name>(W)) {activity *av = (activity *) <<rp>>;RTActivities::translate(av, Node::get_text(p2));
diff --git a/docs/assertions-module/6-rlb.html b/docs/assertions-module/6-rlb.html
index ca529655e..41cb8f952 100644
--- a/docs/assertions-module/6-rlb.html
+++ b/docs/assertions-module/6-rlb.html
@@ -364,7 +364,7 @@ rulebook should have an identifier given to it which is accessible to Inter:
-voidRulebooks::translates(wordingW, parse_node *p2) {
+voidRulebooks::translates(wordingW, parse_node *p2) {if (<rulebook-name>(W)) {rulebook *B = (rulebook *) <<rp>>;RTRulebooks::translate(B, Node::get_text(p2));
diff --git a/docs/assertions-module/6-rls.html b/docs/assertions-module/6-rls.html
index a74d013da..268a59063 100644
--- a/docs/assertions-module/6-rls.html
+++ b/docs/assertions-module/6-rls.html
@@ -356,7 +356,7 @@ is wording which should contain just the double-quoted function name.
-voidRules::declare_Inter_rule(wordingW, wordingFW) {
+voidRules::declare_Inter_rule(wordingW, wordingFW) {rule *R = Rules::obtain(W, TRUE);R->defn_as_Inter_function = Str::new();WRITE_TO(R->defn_as_Inter_function, "%W", FW);
diff --git a/docs/core-module/1-cm.html b/docs/core-module/1-cm.html
index 0ccb5bdb6..e61438d3b 100644
--- a/docs/core-module/1-cm.html
+++ b/docs/core-module/1-cm.html
@@ -116,7 +116,7 @@ want to produce predictable output for easier testing.
}
diff --git a/docs/core-module/1-cp.html b/docs/core-module/1-cp.html
index a49a37f6b..8c6e91014 100644
--- a/docs/core-module/1-cp.html
+++ b/docs/core-module/1-cp.html
@@ -381,7 +381,7 @@ We begin with core itself.
DECLARE_CLASS(files_data)
diff --git a/docs/core-module/1-cp2.html b/docs/core-module/1-cp2.html
index 4b7a1a876..2bc20cf29 100644
--- a/docs/core-module/1-cp2.html
+++ b/docs/core-module/1-cp2.html
@@ -165,7 +165,7 @@ bit or the <k-kind> bit set, which as we see above is }
diff --git a/docs/core-module/1-cs.html b/docs/core-module/1-cs.html
index fb8d72a2c..16a7412f4 100644
--- a/docs/core-module/1-cs.html
+++ b/docs/core-module/1-cs.html
@@ -231,7 +231,7 @@ the above settings can be changed.)
}
diff --git a/docs/core-module/1-gtg.html b/docs/core-module/1-gtg.html
new file mode 100644
index 000000000..e7406ba6a
--- /dev/null
+++ b/docs/core-module/1-gtg.html
@@ -0,0 +1,186 @@
+
+
+
+ Gitignoring
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Automatically creating or updating gitignore files within Inform projects, so that can be put under version control with git more easily.
+
+
§1. Git is, so help us, the world's standard in version control, but is not
+the easiest system to configure, especially for beginners. One thing we can
+help with is the automatic setting up of .gitignore files, which tell git
+which files are ephemeral and need not be under source control.
+
+
+
This very simple feature was added to Inform as IE-0002 in October 2022.
+
§2. In .gitignore file syntax, pathnames are relative to that of the file.
+P/** means "ignore P and all its contents, to any depth". Lines beginning
+with a # are comments.
+
The function Gitignoring::for_materials is used in §1.
+
§3. What we do, for each of the directories relevant to a project (i.e. the project
+itself and its materials), is to see if a .gitignore file already exists. If it
+does, we look for a "stanza" between appropriate comments which will represent
+our contribution. If that stanza already contains the right contents, then we
+do not write the file. (There is no need, and we don't want to touch the timestamp
+on the file.) Otherwise, we write the file back but with out preferred contents
+of the stanza replacing whatever was there before.
+
+
+
As a special case, if there is no .gitignore file, we create one consisting
+only of our stanza.
+
+
+
+voidGitignoring::fix(filename *F, text_stream *stanza_wanted) {
+gitignore_harvestH;
+Harvest the existing gitignore file content, if any3.2;
+
+if (H.ignore) return;
+if (Str::eq(stanza_wanted, H.G)) return;
+
+text_streamF_struct; text_stream *OUT = &F_struct;
+if (STREAM_OPEN_TO_FILE(OUT, F, ISO_ENC) == FALSE)
+Errors::fatal_with_file("unable to open .gitignore file for output: %f", F);
+WRITE("%S", H.B);
+WRITE("# This stanza written automatically by inform7\n");
+WRITE("%S", stanza_wanted);
+WRITE("# End of stanza written automatically by inform7\n");
+WRITE("%S", H.A);
+STREAM_CLOSE(OUT);
+}
+
+
§3.1. The process of extracting the content of any existing .gitignore file
+is called "harvesting", and results in one of these:
+
+
+
+typedefstructgitignore_harvest {
+intposition; 1: before stanza, 2: inside it, 3: after it
+intignore; have we seen a request not to do this?
+structtext_stream *B; content of file before stanza
+structtext_stream *G; content of stanza (not including comments)
+structtext_stream *A; content of file after stanza
+} gitignore_harvest;
+
+
The structure gitignore_harvest is private to this section.
+
§3.2. Harvest the existing gitignore file content, if any3.2 =
+
diff --git a/docs/core-module/1-pp.html b/docs/core-module/1-pp.html
index 90c278ae1..cfac17270 100644
--- a/docs/core-module/1-pp.html
+++ b/docs/core-module/1-pp.html
@@ -131,7 +131,7 @@ a final status indicator.
}
diff --git a/docs/core-module/1-wtc.html b/docs/core-module/1-wtc.html
index 896cd9b12..9185bad59 100644
--- a/docs/core-module/1-wtc.html
+++ b/docs/core-module/1-wtc.html
@@ -149,6 +149,7 @@ thing which is being compiled when it is.
inform7_task->next_resource_number = 3;DefaultLanguage::set(Projects::get_language_of_syntax(project));
+Gitignoring::automatic(project);intrv = Sequence::carry_out(TargetVMs::debug_enabled(inform7_task->task->for_vm));returnrv;
@@ -555,7 +556,7 @@ flag stays FALSE}
diff --git a/docs/core-module/2-up.html b/docs/core-module/2-up.html
index a462a9061..b108f8f53 100644
--- a/docs/core-module/2-up.html
+++ b/docs/core-module/2-up.html
@@ -386,7 +386,7 @@ a message which diagnoses the problem rather better.
}
diff --git a/docs/core-module/index.html b/docs/core-module/index.html
index 1d60d90a4..f6d99a124 100644
--- a/docs/core-module/index.html
+++ b/docs/core-module/index.html
@@ -110,6 +110,11 @@
Internal Test Cases -
Handling requests to compile internal tests.
+
+
+ Gitignoring -
+ Automatically creating or updating gitignore files within Inform projects, so that can be put under version control with git more easily.
+
diff --git a/docs/html-module/2-if.html b/docs/html-module/2-if.html
index 1f5f401cc..aed94a8cf 100644
--- a/docs/html-module/2-if.html
+++ b/docs/html-module/2-if.html
@@ -81,6 +81,7 @@ but they're just plain old files, and are not managed by Inbuild as "copies".
enumEXTENSION_DOCUMENTATION_MODEL_IRESenumRESOURCE_JSON_REQS_IRESenumREGISTRY_JSON_REQS_IRES
+enumUNICODE_DATA_IRES
filename *InstalledFiles::filename(intires) {
@@ -102,6 +103,8 @@ but they're just plain old files, and are not managed by Inbuild as "copies".
returnFilenames::in(misc, I"resource.jsonr");caseREGISTRY_JSON_REQS_IRES:returnFilenames::in(misc, I"registry.jsonr");
+caseUNICODE_DATA_IRES:
+returnFilenames::in(misc, I"UnicodeData.txt");caseCBLORB_REPORT_MODEL_IRES:returnInstalledFiles::varied_by_platform(models, I"CblorbModel.html");
diff --git a/docs/lexicon-module/P-wtmd.html b/docs/lexicon-module/P-wtmd.html
index cadef2627..48e745a7e 100644
--- a/docs/lexicon-module/P-wtmd.html
+++ b/docs/lexicon-module/P-wtmd.html
@@ -182,7 +182,7 @@ number of successes.
Size of lexicon: 3118 excerpt meanings
- Stored among 844 words out of total vocabulary of 10734
+ Stored among 844 words out of total vocabulary of 10727 714 words have a start list: longest belongs to report (with 293 meanings) 15 words have an end list: longest belongs to case (with 6 meanings) 29 words have a middle list: longest belongs to to (with 4 meanings)
diff --git a/docs/supervisor-module/3-is2.html b/docs/supervisor-module/3-is2.html
index b4f5f4c0e..081f9672b 100644
--- a/docs/supervisor-module/3-is2.html
+++ b/docs/supervisor-module/3-is2.html
@@ -117,6 +117,7 @@ folders anyway; maybe we should leave well be.)
inform_project *project = ProjectBundleManager::from_copy(S->associated_copy);if (project == NULL) project = ProjectFileManager::from_copy(S->associated_copy);if (project == NULL) internal_error("no project");
+
if (S->associated_copy->edition->work->genre == project_bundle_genre)Pathnames::create_in_file_system(Projects::materials_path(project)); #ifdefCORE_MODULE
diff --git a/docs/values-module/1-vm.html b/docs/values-module/1-vm.html
index 4b52e042c..624349be6 100644
--- a/docs/values-module/1-vm.html
+++ b/docs/values-module/1-vm.html
@@ -67,6 +67,7 @@ which use this module:
enumTEXT_SUBSTITUTIONS_DAenumVARIABLE_CREATIONS_DAenumTABLES_DA
+enumUNICODE_DATA_MREASON
COMPILE_WRITER(instance *, Instances::log)
@@ -82,6 +83,7 @@ which use this module:
Log::declare_aspect(TEXT_SUBSTITUTIONS_DA, L"text substitutions", FALSE, FALSE);Log::declare_aspect(VARIABLE_CREATIONS_DA, L"variable creations", FALSE, FALSE);Log::declare_aspect(TABLES_DA, L"table construction", FALSE, FALSE);
+Memory::reason_name(UNICODE_DATA_MREASON, "Unicode data");REGISTER_WRITER('O', Instances::log);REGISTER_WRITER('q', Equations::log);REGISTER_WRITER('Z', NonlocalVariables::log);
diff --git a/docs/values-module/2-lvl.html b/docs/values-module/2-lvl.html
index c31b55b00..11ee13bff 100644
--- a/docs/values-module/2-lvl.html
+++ b/docs/values-module/2-lvl.html
@@ -134,7 +134,7 @@ all in the VALUEparse_node *Lvalues::new_actual_NONLOCAL_VARIABLE(nonlocal_variable *nlv) {parse_node *spec = Node::new(NONLOCAL_VARIABLE_NT);Node::set_constant_nonlocal_variable(spec, nlv);
-Node::set_text(spec, nlv->name);
+Node::set_text(spec, nlv->name);returnspec;}
§1. Parsing. The following is called only on excerpts from the source where it is a
fairly safe bet that a Unicode character is referred to. For example, when
the player types either of these:
intUnicodeLiterals::max(intcc) {
-if ((cc < 0) || (cc >= 0x10000)) {
-StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_UnicodeOutOfRange),
-"Inform can only handle Unicode characters in the 16-bit range",
-"from 0 to 65535.");
+if ((cc < 0) || (cc >= MAX_UNICODE_CODE_POINT)) {
+Issue PM_UnicodeOutOfRange1.2;return65; }returncc;}
+
§1.2. Issue PM_UnicodeOutOfRange1.2 =
+
+
+
+StandardProblems::sentence_problem(Task::syntax_tree(), _p_(PM_UnicodeOutOfRange),
+"this character value is beyond the range which the current story "
+"could handle",
+"which is from 0 to (hexadecimal) FFFF for stories compiled to the "
+"Z-machine, and otherwise 0 to 1FFFF.");
+
§2. Code points. Each distinct code point in the Unicode specification will correspond to one
+of these:
+
+
+
defineMAX_UNICODE_CODE_POINT0x20000
+enumCc_UNICODE_CATfrom1 Other, Control
+enumCf_UNICODE_CAT Other, Format
+enumCn_UNICODE_CAT Other, Not Assigned: no character actually has this
+enumCo_UNICODE_CAT Other, Private Use
+enumCs_UNICODE_CAT Other, Surrogate
+enumLl_UNICODE_CAT Letter, Lowercase
+enumLm_UNICODE_CAT Letter, Modifier
+enumLo_UNICODE_CAT Letter, Other
+enumLt_UNICODE_CAT Letter, Titlecase
+enumLu_UNICODE_CAT Letter, Uppercase
+enumMc_UNICODE_CAT Mark, Spacing Combining
+enumMe_UNICODE_CAT Mark, Enclosing
+enumMn_UNICODE_CAT Mark, Non-Spacing
+enumNd_UNICODE_CAT Number, Decimal Digit
+enumNl_UNICODE_CAT Number, Letter
+enumNo_UNICODE_CAT Number, Other
+enumPc_UNICODE_CAT Punctuation, Connector
+enumPd_UNICODE_CAT Punctuation, Dash
+enumPe_UNICODE_CAT Punctuation, Close
+enumPf_UNICODE_CAT Punctuation, Final quote
+enumPi_UNICODE_CAT Punctuation, Initial quote
+enumPo_UNICODE_CAT Punctuation, Other
+enumPs_UNICODE_CAT Punctuation, Open
+enumSc_UNICODE_CAT Symbol, Currency
+enumSk_UNICODE_CAT Symbol, Modifier
+enumSm_UNICODE_CAT Symbol, Math
+enumSo_UNICODE_CAT Symbol, Other
+enumZl_UNICODE_CAT Separator, Line
+enumZp_UNICODE_CAT Separator, Paragraph
+enumZs_UNICODE_CAT Separator, Space
+
+
+typedefstructunicode_point {
+intcode_point; in the range 0 to MAX_UNICODE_CODE_POINT - 1
+structtext_stream *name; e.g. "RIGHT-FACING ARMENIAN ETERNITY SIGN"
+intcategory; one of the *_UNICODE_CAT values above
+inttolower; -1 if no mapping to lower case is available, or a code point
+inttoupper; -1 if no mapping to upper case is available, or a code point
+inttotitle; -1 if no mapping to title case is available, or a code point
+} unicode_point;
+
+unicode_pointUnicodeLiterals::new_code_point(intC) {
+unicode_pointup;
+up.code_point = C;
+up.name = NULL;
+up.category = Cn_UNICODE_CAT;
+up.tolower = -1;
+up.toupper = -1;
+up.totitle = -1;
+returnup;
+}
+
+
The structure unicode_point is accessed in 2/spc, 2/rvl, 2/lvl, 5/dsh and here.
+
§3. Storage for these is managed on demand, in a flexibly-sized array:
+
+
+
+unicode_point *unicode_points = NULL; array indexed by code point
+intunicode_points_extent = 0; current number of entries in that array
+intmax_known_unicode_point = 0;
+
+unicode_point *UnicodeLiterals::code_point(intU) {
+if ((U < 0) || (U >= MAX_UNICODE_CODE_POINT)) internal_error("Unicode point out of range");
+UnicodeLiterals::ensure_data();
+if (U >= unicode_points_extent) {
+intnew_extent = unicode_points_extent;
+if (new_extent == 0) new_extent = 1;
+while (new_extent <= U) new_extent = 2*new_extent;
+unicode_point *new_unicode_points = (unicode_point *)
+ (Memory::calloc(new_extent, sizeof(unicode_point), UNICODE_DATA_MREASON));
+for (inti=0; i<unicode_points_extent; i++)
+new_unicode_points[i] = unicode_points[i];
+for (inti=unicode_points_extent; i<new_extent; i++)
+new_unicode_points[i] = UnicodeLiterals::new_code_point(i);
+if (unicode_points_extent > 0)
+Memory::I7_array_free(unicode_points,
+UNICODE_DATA_MREASON, unicode_points_extent, sizeof(unicode_point));
+unicode_points = new_unicode_points;
+unicode_points_extent = new_extent;
+ }
+if (U > max_known_unicode_point) max_known_unicode_point = U;
+return &(unicode_points[U]);
+}
+
+
§4. The standard Inform distribution includes the current Unicode specification's
+main data file. Although parsing that file is relatively fast, we do it only
+on demand, because it's not small (about 2 MB of text) and is often not needed.
+
+
+
+dictionary *UnicodeData_lookup = NULL;
+voidUnicodeLiterals::ensure_data(void) {
+if (UnicodeData_lookup == NULL) {
+UnicodeData_lookup = Dictionaries::new(65536, FALSE);
+filename *F = InstalledFiles::filename(UNICODE_DATA_IRES);
+TextFiles::read(F, FALSE, "can't open UnicodeData file", TRUE,
+ &UnicodeLiterals::read_line, NULL, NULL);
+LOG("Read Unicode data to code point 0x%06x in %f\n", max_known_unicode_point, F);
+ }
+}
+
+
§5. The format of this file is admirably stable. Lines look like so:
+
+
+
+ 0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
+ 1C85;CYRILLIC SMALL LETTER THREE-LEGGED TE;Ll;0;L;;;;;N;;;0422;;0422
+ 1FAA1;SEWING NEEDLE;So;0;ON;;;;;N;;;;;
+
+
Each line corresponds to a code point. They're presented in the file in ascending
+order of these values, but we make no use of that fact. Each line contains fields
+divided by semicolons, and semicolon characters are illegal in any field.
+
§5.4. Control codes in Unicode, a residue of ASCII, are given no names by the
+standard. For example:
+
+
+
+ 0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;;
+
+
Indeed, at present every code with category Cc has the pseudo-name <control>.
+So we will mostly not allow these to be referred to by name in Inform. (In theory we
+could read the ISO-10646 comment as if it were a name: here, that would be
+"END OF TRANSMISSION", which isn't too bad. But "FORM FEED (FF)" and
+"CHARACTER TABULATION" are less persuasive, and anyway, we don't actually want
+users to insert control characters into Inform text literals.)
+
§6. Using the Unicode data. The first lookup here is slow, since it requires us to parse the Unicode
+specification data file. But after that everything runs quite swiftly.
+
diff --git a/docs/values-module/5-dsh.html b/docs/values-module/5-dsh.html
index fdeba2dcc..367a9b5ad 100644
--- a/docs/values-module/5-dsh.html
+++ b/docs/values-module/5-dsh.html
@@ -2524,7 +2524,7 @@ a property when recovering from other problems.
THIS_IS_A_GROSSER_THAN_GROSS_PROBLEM;Problems::quote_source(1, current_sentence);
-Problems::quote_wording(2, prn->name);
+Problems::quote_wording(2, prn->name);Problems::quote_subject(3, owning_subject);StandardProblems::handmade_problem(Task::syntax_tree(), _p_(PM_LookedUpForbiddenProperty));Problems::issue_problem_segment(
@@ -3518,11 +3518,8 @@ common misunderstanding.
"%PMaybe you intended this to produce a Unicode character? ""Unicode characters can be written either using their decimal ""numbers - for instance, 'Unicode 2041' - or with their standard "
-"names - 'Unicode Latin small ligature oe'. For efficiency reasons "
-"these names are only available if you ask for them; to make them "
-"available, you need to 'Include Unicode Character Names by Graham "
-"Nelson' or, if you really need more, 'Include Unicode Full "
-"Character Names by Graham Nelson'.");
+"names - 'Unicode Latin small ligature oe'. For the full list of "
+"those names, see the Unicode standard version 15.0.0.");Problems::issue_problem_end();