mirror of
https://github.com/ganelson/inform.git
synced 2024-07-09 02:24:21 +03:00
1867 lines
362 KiB
HTML
1867 lines
362 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>RegExp Template</title>
|
|
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
|
|
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
<script>
|
|
MathJax = {
|
|
tex: {
|
|
inlineMath: '$', '$'], ['\\(', '\\)'
|
|
},
|
|
svg: {
|
|
fontCache: 'global'
|
|
}
|
|
};
|
|
</script>
|
|
<script type="text/javascript" id="MathJax-script" async
|
|
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
|
|
</script>
|
|
|
|
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
|
|
</head>
|
|
<body class="commentary-font">
|
|
<nav role="navigation">
|
|
<h1><a href="../index.html">
|
|
<img src="../docs-assets/Inform.png" height=72">
|
|
</a></h1>
|
|
<ul><li><a href="../compiler.html">compiler tools</a></li>
|
|
<li><a href="../other.html">other tools</a></li>
|
|
<li><a href="../extensions.html">extensions and kits</a></li>
|
|
<li><a href="../units.html">unit test tools</a></li>
|
|
</ul><h2>Extensions</h2><ul>
|
|
<li><a href="../basic_inform/index.html">basic_inform</a></li>
|
|
<li><a href="../standard_rules/index.html">standard_rules</a></li>
|
|
</ul><h2>Kits</h2><ul>
|
|
<li><a href="index.html"><span class="selectedlink">BasicInformKit</span></a></li>
|
|
<li><a href="../BasicInformExtrasKit/index.html">BasicInformExtrasKit</a></li>
|
|
<li><a href="../CommandParserKit/index.html">CommandParserKit</a></li>
|
|
<li><a href="../EnglishLanguageKit/index.html">EnglishLanguageKit</a></li>
|
|
<li><a href="../WorldModelKit/index.html">WorldModelKit</a></li>
|
|
|
|
</ul>
|
|
</nav>
|
|
<main role="main">
|
|
<!--Weave of 'RegExp Template' generated by Inweb-->
|
|
<div class="breadcrumbs">
|
|
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="../extensions.html">Kits</a></li><li><a href="index.html">BasicInformKit</a></li><li><b>RegExp Template</b></li></ul></div>
|
|
<p class="purpose">Code to match and replace on regular expressions against indexed text strings.</p>
|
|
|
|
<ul class="toc"><li><a href="S-rgx.html#SP1">§1. Debugging</a></li><li><a href="S-rgx.html#SP2">§2. Algorithm</a></li><li><a href="S-rgx.html#SP3">§3. Class Codes</a></li><li><a href="S-rgx.html#SP4">§4. Packets</a></li><li><a href="S-rgx.html#SP5">§5. Nodes</a></li><li><a href="S-rgx.html#SP6">§6. Match Variables</a></li><li><a href="S-rgx.html#SP7">§7. Markers</a></li><li><a href="S-rgx.html#SP8">§8. Debugging</a></li><li><a href="S-rgx.html#SP9">§9. Compiling Tree For Substring Search</a></li><li><a href="S-rgx.html#SP10">§10. Compiling Tree For Regexp Search</a></li><li><a href="S-rgx.html#SP11">§11. Parser</a></li><li><a href="S-rgx.html#SP12">§12. Parse At Position</a></li><li><a href="S-rgx.html#SP13">§13. Backtracking</a></li><li><a href="S-rgx.html#SP14">§14. Fail Subexpressions</a></li><li><a href="S-rgx.html#SP15">§15. Erasing Constraints</a></li><li><a href="S-rgx.html#SP16">§16. Matching Literal Text</a></li><li><a href="S-rgx.html#SP17">§17. Matching Character Range</a></li><li><a href="S-rgx.html#SP18">§18. Search And Replace</a></li><li><a href="S-rgx.html#SP19">§19. Concatenation</a></li></ul><hr class="tocbar">
|
|
|
|
<p class="commentary firstcommentary"><a id="SP1" class="paragraph-anchor"></a><b>§1. Debugging. </b>Set this to true at your peril.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="identifier-syntax">Global</span><span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax"> = </span><span class="reserved-syntax">false</span><span class="plain-syntax">; </span><span class="comment-syntax">Change to true for (a lot of) debugging data in use</span>
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_SetTrace</span><span class="plain-syntax"> </span><span class="identifier-syntax">F</span><span class="plain-syntax">; </span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax"> = </span><span class="identifier-syntax">F</span><span class="plain-syntax">; ];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP2" class="paragraph-anchor"></a><b>§2. Algorithm. </b>Once Inform 7 supported (indexed) text, regular-expression matching became an
|
|
obvious goal: regexp-based features offer more or less the gold standard in
|
|
text search and replace facilities, and I7 is so concerned with text that
|
|
we shouldn't make do with less. But the best and most portable
|
|
implementation of regular expression matching, PCRE by Philip Hazel, is
|
|
about a hundred times larger than the code in this section, and also had
|
|
unacceptable memory needs: there was no practicable way to make it small
|
|
enough to do useful work on the Z-machine. Nor could an I6 regexp-matcher
|
|
compile just-in-time code, or translate the expression into a suitable
|
|
deterministic finite state machine. One day, though, I read one of the
|
|
papers which Brian Kernighan writes every few years to the effect that
|
|
regular-expression matching is much easier than you think. Kernighan is
|
|
right: writing a regexp matcher is indeed easier than you think (one day's
|
|
worth of cheerful hacking), but debugging one until it passes the trickiest
|
|
hundred of Perl's 645 test cases is another matter (and it took a whole
|
|
week more). Still, the result seems to be robust. The main compromise made
|
|
is that backtracking is not always comprehensive with regexps like
|
|
<span class="extract"><span class="extract-syntax">^(a\1?){4}$</span></span>, because we do not allocate individual storage to backtrack
|
|
individually through all possibilities of each of the four uses of the
|
|
bracketed subexpression — which means we miss some cases, since the
|
|
subexpression contains a reference to itself, so that its content can vary
|
|
in the four uses. PCRE's approach here is to expand the expression as if it
|
|
were a sequence of four bracketed expressions, thus removing the awkward
|
|
quantifier <span class="extract"><span class="extract-syntax">{4}</span></span>, but that costs memory: indeed this is why PCRE cannot
|
|
solve all of Perl's test cases without its default memory allocation being
|
|
raised. In other respects, the algorithm below appears to be accurate if
|
|
not very fast.
|
|
</p>
|
|
|
|
<p class="commentary firstcommentary"><a id="SP3" class="paragraph-anchor"></a><b>§3. Class Codes. </b>While in principle we could keep the match expression in textual form, in
|
|
practice the syntax of regular expressions is complex enough that this
|
|
would be tricky and rather slow: we would be parsing the same notations over
|
|
and over again. So we begin by compiling it to a simple tree structure. The
|
|
tree is made up of nodes, and each node has a "class code": these are
|
|
identified by the <span class="extract"><span class="extract-syntax">*_RE_CC</span></span> constants below. Note that the class codes
|
|
below are all negative: this is so that they are distinct from all valid
|
|
ZSCII or Unicode characters. (ZSCII is used only on the Z-machine, which
|
|
has a 16-bit word but an 8-bit character set, so that all character values
|
|
are positive; similarly, Unicode is (for our purposes) a 16-bit character
|
|
set on a 32-bit virtual machine.)
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="comment-syntax">Character classes</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NEWLINE_RE_CC</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">TAB_RE_CC</span><span class="plain-syntax"> = -2;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">DIGIT_RE_CC</span><span class="plain-syntax"> = -3;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONDIGIT_RE_CC</span><span class="plain-syntax"> = -4;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">WHITESPACE_RE_CC</span><span class="plain-syntax"> = -5;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONWHITESPACE_RE_CC</span><span class="plain-syntax"> = -6;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">PUNCTUATION_RE_CC</span><span class="plain-syntax"> = -7;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONPUNCTUATION_RE_CC</span><span class="plain-syntax"> = -8;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">WORD_RE_CC</span><span class="plain-syntax"> = -9;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONWORD_RE_CC</span><span class="plain-syntax"> = -10;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">ANYTHING_RE_CC</span><span class="plain-syntax"> = -11;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NOTHING_RE_CC</span><span class="plain-syntax"> = -12;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RANGE_RE_CC</span><span class="plain-syntax"> = -13;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">LCASE_RE_CC</span><span class="plain-syntax"> = -14;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONLCASE_RE_CC</span><span class="plain-syntax"> = -15;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">UCASE_RE_CC</span><span class="plain-syntax"> = -16;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONUCASE_RE_CC</span><span class="plain-syntax"> = -17;</span>
|
|
|
|
<span class="comment-syntax">Control structures</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax"> = -20;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">DISJUNCTION_RE_CC</span><span class="plain-syntax"> = -21;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">CHOICE_RE_CC</span><span class="plain-syntax"> = -22;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax"> = -23;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">IF_RE_CC</span><span class="plain-syntax"> = -24;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">CONDITION_RE_CC</span><span class="plain-syntax"> = -25;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">THEN_RE_CC</span><span class="plain-syntax"> = -26;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">ELSE_RE_CC</span><span class="plain-syntax"> = -27;</span>
|
|
|
|
<span class="comment-syntax">Substring matchers</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">VARIABLE_RE_CC</span><span class="plain-syntax"> = -30;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">LITERAL_RE_CC</span><span class="plain-syntax"> = -31;</span>
|
|
|
|
<span class="comment-syntax">Positional matchers</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">START_RE_CC</span><span class="plain-syntax"> = -40;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">END_RE_CC</span><span class="plain-syntax"> = -41;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">BOUNDARY_RE_CC</span><span class="plain-syntax"> = -42;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NONBOUNDARY_RE_CC</span><span class="plain-syntax"> = -43;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">ALWAYS_RE_CC</span><span class="plain-syntax"> = -44;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">NEVER_RE_CC</span><span class="plain-syntax"> = -45;</span>
|
|
|
|
<span class="comment-syntax">Mode switches</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">SENSITIVITY_RE_CC</span><span class="plain-syntax"> = -50;</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP4" class="paragraph-anchor"></a><b>§4. Packets. </b>The nodes of the compiled expression tree are stored in "packets", which
|
|
are segments of a fixed array. A regexp complicated enough that it cannot
|
|
be stored in <span class="extract"><span class="extract-syntax">RE_MAX_PACKETS</span></span> packets will be rejected with an error: it
|
|
looks like a rather low limit, but in fact suffices to handle all of Perl's
|
|
test cases, some of which are works of diabolism.
|
|
</p>
|
|
|
|
<p class="commentary">A packet is then a record containing 14 fields, with offsets defined by the
|
|
constants defined below. These fields combine the compilation of the
|
|
corresponding fragment of the regexp with both the tree structure holding
|
|
these packets together and also the current state of the temporary variables
|
|
recording how far we have progressed in trying all of the possible ways to
|
|
match the packet.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax"> = </span><span class="constant-syntax">32</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_SIZE</span><span class="plain-syntax"> = </span><span class="constant-syntax">14</span><span class="plain-syntax">; </span><span class="comment-syntax">Words of memory used per packet</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_SIZE_IN_BYTES</span><span class="plain-syntax"> = </span><span class="identifier-syntax">WORDSIZE</span><span class="plain-syntax">*</span><span class="identifier-syntax">RE_PACKET_SIZE</span><span class="plain-syntax">; </span><span class="comment-syntax">Bytes used per packet</span>
|
|
|
|
<span class="reserved-syntax">Array</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax"> --> </span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax">*</span><span class="identifier-syntax">RE_PACKET_SIZE</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax">One of the class codes defined above</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">; </span><span class="comment-syntax">Three parameters whose meaning depends on class code</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax"> = </span><span class="constant-syntax">3</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax"> = </span><span class="constant-syntax">4</span><span class="plain-syntax">; </span><span class="comment-syntax">Younger sibling in the compiled tree</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> = </span><span class="constant-syntax">5</span><span class="plain-syntax">; </span><span class="comment-syntax">Elder sibling</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> = </span><span class="constant-syntax">6</span><span class="plain-syntax">; </span><span class="comment-syntax">Child</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="constant-syntax">7</span><span class="plain-syntax">; </span><span class="comment-syntax">Parent</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="constant-syntax">8</span><span class="plain-syntax">; </span><span class="comment-syntax">Backtracking data</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="constant-syntax">9</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = </span><span class="constant-syntax">10</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_CACHE1</span><span class="plain-syntax"> = </span><span class="constant-syntax">11</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_CACHE2</span><span class="plain-syntax"> = </span><span class="constant-syntax">12</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_MODES</span><span class="plain-syntax"> = </span><span class="constant-syntax">13</span><span class="plain-syntax">;</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP5" class="paragraph-anchor"></a><b>§5. Nodes. </b>The routine to create a node, something which happens only during the
|
|
compilation phase, and also the routine which returns the address of a given
|
|
node. Nodes are numbered from 0 up to \(M-1\), where \(M\) is the constant
|
|
<span class="extract"><span class="extract-syntax">RE_MAX_PACKETS</span></span>.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax"> </span><span class="identifier-syntax">cc</span><span class="plain-syntax"> </span><span class="identifier-syntax">par1</span><span class="plain-syntax"> </span><span class="identifier-syntax">par2</span><span class="plain-syntax"> </span><span class="identifier-syntax">par3</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">n</span><span class="plain-syntax"><0) || (</span><span class="identifier-syntax">n</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax">)) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax"> + </span><span class="identifier-syntax">n</span><span class="plain-syntax">*</span><span class="identifier-syntax">RE_PACKET_SIZE_IN_BYTES</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> = </span><span class="identifier-syntax">cc</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">par2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax"> = </span><span class="identifier-syntax">par3</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = -1; </span><span class="comment-syntax">Match start</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1; </span><span class="comment-syntax">Match end</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = -1; </span><span class="comment-syntax">Rewind edge</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_NodeAddress</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">n</span><span class="plain-syntax"><0) || (</span><span class="identifier-syntax">n</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax"> + </span><span class="identifier-syntax">n</span><span class="plain-syntax">*</span><span class="identifier-syntax">RE_PACKET_SIZE_IN_BYTES</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP6" class="paragraph-anchor"></a><b>§6. Match Variables. </b>A bracketed subexpression can be used as a variable: we support <span class="extract"><span class="extract-syntax">\1</span></span>, ..., <span class="extract"><span class="extract-syntax">\9</span></span>
|
|
to mean "the value of subexpression 1 to 9", and <span class="extract"><span class="extract-syntax">\0</span></span> to mean "the whole
|
|
text matched", as if the entire regexp were bracketed. (PCRE and Perl also
|
|
allow <span class="extract"><span class="extract-syntax">\10</span></span>, <span class="extract"><span class="extract-syntax">\11</span></span>, ..., but we don't, because it complicates parsing and
|
|
memory is too short.)
|
|
</p>
|
|
|
|
<p class="commentary"><span class="extract"><span class="extract-syntax">RE_Subexpressions-->10</span></span> stores the number of subexpressions in use, not
|
|
counting <span class="extract"><span class="extract-syntax">\0</span></span>. During the compiling stage, <span class="extract"><span class="extract-syntax">RE_Subexpressions-->N</span></span> is set
|
|
to point to the node representating <span class="extract"><span class="extract-syntax">\N</span></span>, where <span class="extract"><span class="extract-syntax">N</span></span> varies from 1 to 9.
|
|
When matching is complete, and assuming we care about the contents of
|
|
these variables — which we might not, and if not we certainly don't want
|
|
to waste time and memory — we call <span class="extract"><span class="extract-syntax">TEXT_TY_RE_CreateMatchVars</span></span> to allocate
|
|
text variables and fill them in as appropriate, memory permitting.
|
|
</p>
|
|
|
|
<p class="commentary"><span class="extract"><span class="extract-syntax">TEXT_TY_RE_EmptyMatchVars</span></span> empties any such variables which may survive from
|
|
a previous match, setting them to the empty text.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="reserved-syntax">Array</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax"> --> </span><span class="constant-syntax">11</span><span class="plain-syntax">; </span><span class="comment-syntax">Address of node for this subexpression</span>
|
|
<span class="reserved-syntax">Array</span><span class="plain-syntax"> </span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax"> --> </span><span class="constant-syntax">10</span><span class="plain-syntax">; </span><span class="comment-syntax">Indexed text to hold values of the variables</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_DebugMatchVars</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10, </span><span class="string-syntax">" collecting subexps^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">n</span><span class="plain-syntax">=0:(</span><span class="identifier-syntax">n</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10) && (</span><span class="identifier-syntax">n</span><span class="plain-syntax"><10): </span><span class="identifier-syntax">n</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Subexp "</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="string-syntax">" = ["</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">, </span><span class="string-syntax">","</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">, </span><span class="string-syntax">"] = "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> (</span><span class="identifier-syntax">char</span><span class="plain-syntax">) </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_CreateMatchVars</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="identifier-syntax">ctxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">cl</span><span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">n</span><span class="plain-syntax">=0:(</span><span class="identifier-syntax">n</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10) && (</span><span class="identifier-syntax">n</span><span class="plain-syntax"><10): </span><span class="identifier-syntax">n</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">) </span><span class="identifier-syntax">BlkValueFree</span><span class="plain-syntax">(</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueCreate</span><span class="plain-syntax">(</span><span class="identifier-syntax">TEXT_TY</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ctxt</span><span class="plain-syntax"> = </span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cl</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">cl</span><span class="plain-syntax">+1 >= </span><span class="identifier-syntax">csize</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueSetLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="constant-syntax">2</span><span class="plain-syntax">*</span><span class="identifier-syntax">cl</span><span class="plain-syntax">) == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">++, </span><span class="identifier-syntax">ch</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_EmptyMatchVars</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">n</span><span class="plain-syntax">=0:((</span><span class="identifier-syntax">n</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10) && (</span><span class="identifier-syntax">n</span><span class="plain-syntax"><10)): </span><span class="identifier-syntax">n</span><span class="plain-syntax">++)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">0</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_GetMatchVar</span><span class="plain-syntax"> </span><span class="identifier-syntax">vn</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">vn</span><span class="plain-syntax"><0) || (</span><span class="identifier-syntax">vn</span><span class="plain-syntax">>=10) || (</span><span class="identifier-syntax">vn</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">EMPTY_TEXT_VALUE</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">vn</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">EMPTY_TEXT_VALUE</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> < </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">EMPTY_TEXT_VALUE</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">Allocated_Match_Vars</span><span class="plain-syntax">--></span><span class="identifier-syntax">vn</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** "</span><span class="plain-syntax">, </span><span class="identifier-syntax">vn</span><span class="plain-syntax">, </span><span class="string-syntax">" unallocated</span><span class="comment-syntax">***^";</span>
|
|
<span class="string-syntax"> return EMPTY_TEXT_VALUE;</span>
|
|
<span class="string-syntax"> }</span>
|
|
<span class="string-syntax"> return Allocated_Match_Vars-->vn;</span>
|
|
<span class="string-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP7" class="paragraph-anchor"></a><b>§7. Markers. </b>At each node, the <span class="extract"><span class="extract-syntax">-->RE_DATA1</span></span> and <span class="extract"><span class="extract-syntax">-->RE_DATA2</span></span> fields represent the
|
|
character positions of the start and end of the text matched by the node
|
|
and its subtree (if any). These are called markers.
|
|
</p>
|
|
|
|
<p class="commentary">Thus <span class="extract"><span class="extract-syntax">TEXT_TY_MV_End(N, 0)</span></span> returns the start of <span class="extract"><span class="extract-syntax">\N</span></span> and <span class="extract"><span class="extract-syntax">TEXT_TY_MV_End(N, 1)</span></span>
|
|
the end of <span class="extract"><span class="extract-syntax">\N</span></span>, according to the current match of subexpression <span class="extract"><span class="extract-syntax">N</span></span>.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_MV_End</span><span class="plain-syntax"> </span><span class="identifier-syntax">n</span><span class="plain-syntax"> </span><span class="identifier-syntax">end</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">n</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">end</span><span class="plain-syntax">==0) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_Clear_Markers</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_Clear_Markers</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP8" class="paragraph-anchor"></a><b>§8. Debugging. </b>Code in this paragraph simply prints a convenient screen representation of
|
|
the compiled regexp, together with the current values of its markers. It is
|
|
invaluable for debugging purposes and, touch wood, may not be needed again,
|
|
but it is relatively compact and we keep it just in case.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_DebugTree</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">detail</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Pattern: "</span><span class="plain-syntax">, (</span><span class="identifier-syntax">TEXT_TY_Say</span><span class="plain-syntax">) </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="string-syntax">"^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugSubtree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="constant-syntax">1</span><span class="plain-syntax">, </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">, </span><span class="identifier-syntax">detail</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_DebugSubtree</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">depth</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> </span><span class="identifier-syntax">detail</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cup</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cup</span><span class="plain-syntax"> = </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** broken initial previous ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">cup</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** broken up matching ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">spaces</span><span class="plain-syntax">(</span><span class="identifier-syntax">depth</span><span class="plain-syntax">*2);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">offset</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">detail</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">offset</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** broken down/up ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugSubtree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">depth</span><span class="plain-syntax">+1, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">detail</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">offset</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** broken next/previous ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> = </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax"> </span><span class="identifier-syntax">offset</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">detail</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">par1</span><span class="plain-syntax"> </span><span class="identifier-syntax">par2</span><span class="plain-syntax"> </span><span class="identifier-syntax">par3</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="string-syntax">"[NULL]"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"["</span><span class="plain-syntax">, (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">-</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">)/(</span><span class="identifier-syntax">RE_PACKET_SIZE_IN_BYTES</span><span class="plain-syntax">), </span><span class="string-syntax">"] "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">for (i=0:i<RE_PACKET_SIZE:i++) print offset-->i, " ";</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">par1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">par2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">par3</span><span class="plain-syntax"> = </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DIGIT_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"DIGIT"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONDIGIT_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONDIGIT"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">UCASE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"UCASE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONUCASE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONUCASE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">LCASE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"LCASE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONLCASE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONLCASE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">WHITESPACE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"WHITESPACE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONWHITESPACE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONWHITESPACE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">PUNCTUATION_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"PUNCTUATION"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONPUNCTUATION_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONPUNCTUATION"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">WORD_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"WORD"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONWORD_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONWORD"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ALWAYS_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"ALWAYS"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NEVER_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NEVER"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">START_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"START"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">END_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"END"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BOUNDARY_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"BOUNDARY"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONBOUNDARY_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NONBOUNDARY"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ANYTHING_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"ANYTHING"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NOTHING_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NOTHING"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RANGE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"RANGE"</span><span class="plain-syntax">; </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax"> == </span><span class="reserved-syntax">true</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (negated)"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">par1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">par2</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> (</span><span class="identifier-syntax">char</span><span class="plain-syntax">) </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">VARIABLE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"VARIABLE "</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par1</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"EXP"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"SUBEXP "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par1</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"= V"</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par2</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (?=...) lookahead"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (?<=...) lookbehind of width "</span><span class="plain-syntax">, </span><span class="identifier-syntax">par3</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par2</span><span class="plain-syntax"> == </span><span class="constant-syntax">2</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (?</span><span class="comment-syntax">..) negated lookahead";</span>
|
|
<span class="string-syntax"> else print "</span><span class="plain-syntax"> (?<</span><span class="comment-syntax">..) negated lookbehind of width ", par3;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par2</span><span class="plain-syntax"> == </span><span class="constant-syntax">3</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" uncollecting"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par2</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">3</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" forcing case sensitivity"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax"> == </span><span class="constant-syntax">2</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" forcing case insensitivity"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par2</span><span class="plain-syntax"> == </span><span class="constant-syntax">4</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (?>...) possessive"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NEWLINE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"NEWLINE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TAB_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"TAB"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"QUANTIFIER min="</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">, </span><span class="string-syntax">" max="</span><span class="plain-syntax">, </span><span class="identifier-syntax">par2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par3</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (lazy)"</span><span class="plain-syntax">; </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" (greedy)"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">LITERAL_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"LITERAL"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">par1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">par2</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> (</span><span class="identifier-syntax">char</span><span class="plain-syntax">) </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DISJUNCTION_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"DISJUNCTION of "</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">, </span><span class="string-syntax">" choices"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">CHOICE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"CHOICE no "</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">SENSITIVITY_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"SENSITIVITY"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par1</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" off"</span><span class="plain-syntax">; </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" on"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">IF_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"IF"</span><span class="plain-syntax">; </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par1</span><span class="plain-syntax"> >= </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" = V"</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">CONDITION_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"CONDITION"</span><span class="plain-syntax">; </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">par1</span><span class="plain-syntax"> >= </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" = V"</span><span class="plain-syntax">, </span><span class="identifier-syntax">par1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">THEN_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"THEN"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ELSE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"ELSE"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">detail</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">": "</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">, </span><span class="string-syntax">", "</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">, </span><span class="string-syntax">", "</span><span class="plain-syntax">, </span><span class="identifier-syntax">offset</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP9" class="paragraph-anchor"></a><b>§9. Compiling Tree For Substring Search. </b>When we search for a literal substring, say looking for "per" in
|
|
"Supernumerary", we will in fact use the same apparatus as when searching
|
|
for a regular expression: we compile a very simple node tree in which <span class="extract"><span class="extract-syntax">\0</span></span>
|
|
as the root contains just one child node, a <span class="extract"><span class="extract-syntax">LITERAL_RE_CC</span></span> matching exactly
|
|
the text "per". We return 2 since that's the number of nodes in the tree.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_CHR_CompileTree</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax"> </span><span class="identifier-syntax">literal</span><span class="plain-syntax"> </span><span class="identifier-syntax">fto</span><span class="plain-syntax"> </span><span class="identifier-syntax">no_packets</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">fto</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">);</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(0, </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">literal</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(1, </span><span class="identifier-syntax">LITERAL_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="identifier-syntax">fto</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> = </span><span class="identifier-syntax">literal</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">literal</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">root</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">exactly</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">no_packets</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">+3 > </span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="string-syntax">"regexp too complex"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">++, </span><span class="identifier-syntax">START_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">; </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">++, </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">, -1, </span><span class="constant-syntax">3</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax"> = </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">; </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">++, </span><span class="identifier-syntax">END_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax"> = </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> = </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">exactly</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_UP</span><span class="plain-syntax"> = </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">; </span><span class="identifier-syntax">exactly</span><span class="plain-syntax"> = </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">no_packets</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ExpandChoices</span><span class="plain-syntax">(</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">, </span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP10" class="paragraph-anchor"></a><b>§10. Compiling Tree For Regexp Search. </b>But in general we need to compile a regular expression string into a tree
|
|
of the kind described above, and here is the routine which does that,
|
|
returning the number of nodes used to build the tree. The syntax it accepts
|
|
is very fully documented in {\it Writing with Inform}, so no details are
|
|
given here.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="reserved-syntax">Array</span><span class="plain-syntax"> </span><span class="identifier-syntax">Subexp_Posns</span><span class="plain-syntax"> --> </span><span class="constant-syntax">20</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_CompileTree</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">no_packets</span><span class="plain-syntax"> </span><span class="identifier-syntax">ffrom</span><span class="plain-syntax"> </span><span class="identifier-syntax">fto</span><span class="plain-syntax"> </span><span class="identifier-syntax">cc</span><span class="plain-syntax"> </span><span class="identifier-syntax">par1</span><span class="plain-syntax"> </span><span class="identifier-syntax">par2</span><span class="plain-syntax"> </span><span class="identifier-syntax">par3</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">quantifiable</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax"> </span><span class="identifier-syntax">no_subs</span><span class="plain-syntax"> </span><span class="identifier-syntax">blevel</span><span class="plain-syntax"> </span><span class="identifier-syntax">bits</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">fto</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">fto</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">++, </span><span class="identifier-syntax">NEVER_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">); </span><span class="comment-syntax">Empty regexp never matches</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_Node</span><span class="plain-syntax">(</span><span class="identifier-syntax">no_packets</span><span class="plain-syntax">++, </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->0 = </span><span class="identifier-syntax">attach_to</span><span class="plain-syntax">; </span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10 = </span><span class="constant-syntax">1</span><span class="plain-syntax">; </span><span class="identifier-syntax">no_subs</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">quantifiable</span><span class="plain-syntax"> = </span><span class="reserved-syntax">false</span><span class="plain-syntax">; </span><span class="identifier-syntax">blevel</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ffrom</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">: </span><span class="identifier-syntax">ffrom</span><span class="plain-syntax"> < </span><span class="identifier-syntax">fto</span><span class="plain-syntax">: ) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cc</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ffrom</span><span class="plain-syntax">++); </span><span class="identifier-syntax">par1</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="identifier-syntax">par2</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="identifier-syntax">par3</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">cc</span><span class="plain-syntax"> == </span><span class="character-syntax">'\') {</span>
|
|
<span class="character-syntax"> if (ffrom == fto) return "Search pattern not terminated";</span>
|
|
<span class="character-syntax"> cc = BlkValueRead(ftxt, ffrom++);</span>
|
|
<span class="character-syntax"> switch (cc) {</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">b</span><span class="character-syntax">': cc = BOUNDARY_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">B</span><span class="character-syntax">': cc = NONBOUNDARY_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">d</span><span class="character-syntax">': cc = DIGIT_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">D</span><span class="character-syntax">': cc = NONDIGIT_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">l</span><span class="character-syntax">': cc = LCASE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">L</span><span class="character-syntax">': cc = NONLCASE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">n</span><span class="character-syntax">': cc = NEWLINE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">p</span><span class="character-syntax">': cc = PUNCTUATION_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">P</span><span class="character-syntax">': cc = NONPUNCTUATION_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">s</span><span class="character-syntax">': cc = WHITESPACE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">S</span><span class="character-syntax">': cc = NONWHITESPACE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">t</span><span class="character-syntax">': cc = TAB_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">u</span><span class="character-syntax">': cc = UCASE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">U</span><span class="character-syntax">': cc = NONUCASE_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">w</span><span class="character-syntax">': cc = WORD_RE_CC;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">W</span><span class="character-syntax">': cc = NONWORD_RE_CC;</span>
|
|
<span class="character-syntax"> default:</span>
|
|
<span class="character-syntax"> if ((cc >= '</span><span class="constant-syntax">1</span><span class="character-syntax">') && (cc <= '</span><span class="constant-syntax">9</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> par1 = cc-'</span><span class="constant-syntax">0</span><span class="character-syntax">';</span>
|
|
<span class="character-syntax"> cc = VARIABLE_RE_CC;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if (((cc >= '</span><span class="identifier-syntax">a</span><span class="character-syntax">') && (cc <= '</span><span class="identifier-syntax">z</span><span class="character-syntax">')) ||</span>
|
|
<span class="character-syntax"> ((cc >= '</span><span class="identifier-syntax">A</span><span class="character-syntax">') && (cc <= '</span><span class="identifier-syntax">Z</span><span class="character-syntax">'))) return "unknown escape";</span>
|
|
<span class="character-syntax"> cc = LITERAL_RE_CC;</span>
|
|
<span class="character-syntax"> par1 = ffrom-1; par2 = ffrom;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> quantifiable = true;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> switch (cc) {</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">(</span><span class="character-syntax">': par2 = 0;</span>
|
|
<span class="character-syntax"> </span><span class="comment-syntax">f (BlkValueRead(ftxt, ffrom) == ')') return "empty subexpression";</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">?</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> ffrom++;</span>
|
|
<span class="character-syntax"> bits = true;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">-</span><span class="character-syntax">') { ffrom++; bits = false; }</span>
|
|
<span class="character-syntax"> else if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax"><</span><span class="character-syntax">') { ffrom++; bits = false; }</span>
|
|
<span class="character-syntax"> switch (cc = BlkValueRead(ftxt, ffrom++)) {</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">#</span><span class="character-syntax">': while (BlkValueRead(ftxt, ffrom++) ~= 0 or '</span><span class="plain-syntax">)</span><span class="character-syntax">') ;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom-1) == 0)</span>
|
|
<span class="character-syntax"> return "comment never ends";</span>
|
|
<span class="character-syntax"> continue;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">(</span><span class="character-syntax">': cc = BlkValueRead(ftxt, ffrom);</span>
|
|
<span class="character-syntax"> if ((cc == '</span><span class="constant-syntax">1</span><span class="character-syntax">' or '</span><span class="constant-syntax">2</span><span class="character-syntax">' or '</span><span class="constant-syntax">3</span><span class="character-syntax">' or '</span><span class="constant-syntax">4</span><span class="character-syntax">' or</span>
|
|
<span class="character-syntax"> '</span><span class="constant-syntax">5</span><span class="character-syntax">' or '</span><span class="constant-syntax">6</span><span class="character-syntax">' or '</span><span class="constant-syntax">7</span><span class="character-syntax">' or '</span><span class="constant-syntax">8</span><span class="character-syntax">' or '</span><span class="constant-syntax">9</span><span class="character-syntax">') &&</span>
|
|
<span class="character-syntax"> (BlkValueRead(ftxt, ffrom+1) =='</span><span class="plain-syntax">)</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> ffrom = ffrom + 2;</span>
|
|
<span class="character-syntax"> par1 = cc - '</span><span class="constant-syntax">0</span><span class="character-syntax">';</span>
|
|
<span class="character-syntax"> } else ffrom--;</span>
|
|
<span class="character-syntax"> cc = IF_RE_CC; </span><span class="comment-syntax">(?(...)...) conditional</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> if (blevel == 20) return "subexpressions too deep";</span>
|
|
<span class="character-syntax"> Subexp_Posns-->(blevel++) = TEXT_TY_RE_NodeAddress(no_packets);</span>
|
|
<span class="character-syntax"> jump CClassKnown;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">=</span><span class="character-syntax">': par2 = 1; </span><span class="comment-syntax">(?=...) lookahead/behind</span>
|
|
<span class="character-syntax"> par3 = 0; if (bits == false) par3 = -1;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">!</span><span class="character-syntax">': par2 = 2; </span><span class="comment-syntax">(?!...) negated lookahead/behind</span>
|
|
<span class="character-syntax"> par3 = 0; if (bits == false) par3 = -1;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">:</span><span class="character-syntax">': par2 = 3; </span><span class="comment-syntax">(?:...) uncollecting subexpression</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">></span><span class="character-syntax">': par2 = 4; </span><span class="comment-syntax">(?>...) possessive</span>
|
|
<span class="character-syntax"> default:</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">)</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> if (cc == '</span><span class="identifier-syntax">i</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> cc = SENSITIVITY_RE_CC; par1 = bits; ffrom++;</span>
|
|
<span class="character-syntax"> jump CClassKnown;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">:</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> if (cc == '</span><span class="identifier-syntax">i</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> par1 = bits; par2 = 3; par3 = bits+1; ffrom++;</span>
|
|
<span class="character-syntax"> jump AllowForm;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> return "unknown (?...) form";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> .AllowForm;</span>
|
|
<span class="character-syntax"> if (par2 == 0) par1 = no_subs++; else par1 = -1;</span>
|
|
<span class="character-syntax"> cc = SUBEXP_RE_CC;</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> if (blevel == 20) return "subexpressions too deep";</span>
|
|
<span class="character-syntax"> Subexp_Posns-->(blevel++) = TEXT_TY_RE_NodeAddress(no_packets);</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">)</span><span class="character-syntax">': if (blevel == 0) return "subexpression bracket mismatch";</span>
|
|
<span class="character-syntax"> blevel--;</span>
|
|
<span class="character-syntax"> attach_to = Subexp_Posns-->blevel;</span>
|
|
<span class="character-syntax"> if (attach_to-->RE_DOWN == NULL) {</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> attach_to-->RE_DOWN =</span>
|
|
<span class="character-syntax"> TEXT_TY_RE_Node(no_packets++, ALWAYS_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> (attach_to-->RE_DOWN)-->RE_UP = attach_to;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> quantifiable = true;</span>
|
|
<span class="character-syntax"> continue;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">.</span><span class="character-syntax">': cc = ANYTHING_RE_CC; quantifiable = true;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">|</span><span class="character-syntax">': cc = CHOICE_RE_CC; quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">^</span><span class="character-syntax">': cc = START_RE_CC; quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="constant-syntax">$</span><span class="character-syntax">': cc = END_RE_CC; quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">{</span><span class="character-syntax">': if (quantifiable == false) return "quantifier misplaced";</span>
|
|
<span class="character-syntax"> par1 = 0; par2 = -1; bits = 1;</span>
|
|
<span class="character-syntax"> while ((cc=BlkValueRead(ftxt, ffrom++)) ~= 0 or '</span><span class="plain-syntax">}</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> if (cc == '</span><span class="plain-syntax">,</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> bits++;</span>
|
|
<span class="character-syntax"> if (bits >= 3) return "too many colons in ?{...}";</span>
|
|
<span class="character-syntax"> continue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if ((cc >= '</span><span class="constant-syntax">0</span><span class="character-syntax">') || (cc <= '</span><span class="constant-syntax">9</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> if (bits == 1) {</span>
|
|
<span class="character-syntax"> if (par1 < 0) par1 = 0;</span>
|
|
<span class="character-syntax"> par1 = par1*10 + (cc-'</span><span class="constant-syntax">0</span><span class="character-syntax">');</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if (par2 < 0) par2 = 0;</span>
|
|
<span class="character-syntax"> par2 = par2*10 + (cc-'</span><span class="constant-syntax">0</span><span class="character-syntax">');</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> } else return "non-digit in ?{...}";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (cc ~= '</span><span class="plain-syntax">}</span><span class="character-syntax">') return "{x,y} quantifier never ends";</span>
|
|
<span class="character-syntax"> cc = QUANTIFIER_RE_CC;</span>
|
|
<span class="character-syntax"> if (par2 == -1) {</span>
|
|
<span class="character-syntax"> if (bits == 2) par2 = 30000;</span>
|
|
<span class="character-syntax"> else par2 = par1;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (par1 > par2) return "{x,y} with x greater than y";</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">?</span><span class="character-syntax">') { ffrom++; par3 = true; }</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax"><</span><span class="character-syntax">', '</span><span class="plain-syntax">[</span><span class="character-syntax">': par3 = false; if (cc == '</span><span class="plain-syntax"><</span><span class="character-syntax">') bits = '</span><span class="plain-syntax">></span><span class="character-syntax">'; else bits = '</span><span class="plain-syntax">]</span><span class="character-syntax">';</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">^</span><span class="character-syntax">') { ffrom++; par3 = true; }</span>
|
|
<span class="character-syntax"> par1 = ffrom;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == bits) { ffrom++; }</span>
|
|
<span class="character-syntax"> while (cc ~= bits or 0) {</span>
|
|
<span class="character-syntax"> cc = BlkValueRead(ftxt, ffrom++);</span>
|
|
<span class="character-syntax"> if (cc == '</span><span class="plain-syntax">\</span><span class="character-syntax">') {</span>
|
|
<span class="character-syntax"> cc = BlkValueRead(ftxt, ffrom++);</span>
|
|
<span class="character-syntax"> if (cc ~= 0) cc = BlkValueRead(ftxt, ffrom++);</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (cc == 0) return "Character range never ends";</span>
|
|
<span class="character-syntax"> par2 = ffrom-1;</span>
|
|
<span class="character-syntax"> if ((par2 > par1 + 1) &&</span>
|
|
<span class="character-syntax"> (BlkValueRead(ftxt, par1) == '</span><span class="plain-syntax">:</span><span class="character-syntax">') &&</span>
|
|
<span class="character-syntax"> (BlkValueRead(ftxt, par2-1) == '</span><span class="plain-syntax">:</span><span class="character-syntax">') &&</span>
|
|
<span class="character-syntax"> (BlkValueRead(ftxt, par2-2) ~= '</span><span class="plain-syntax">\</span><span class="character-syntax">'))</span>
|
|
<span class="character-syntax"> return "POSIX named character classes unsupported";</span>
|
|
<span class="character-syntax"> bits = TEXT_TY_RE_RangeSyntaxCorrect(ftxt, par1, par2);</span>
|
|
<span class="character-syntax"> if (bits) return bits;</span>
|
|
<span class="character-syntax"> if (par1 < par2) cc = RANGE_RE_CC;</span>
|
|
<span class="character-syntax"> else cc = NOTHING_RE_CC;</span>
|
|
<span class="character-syntax"> quantifiable = true;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">*</span><span class="character-syntax">': if (quantifiable == false) return "quantifier misplaced";</span>
|
|
<span class="character-syntax"> cc = QUANTIFIER_RE_CC;</span>
|
|
<span class="character-syntax"> par1 = 0; par2 = 30000;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">?</span><span class="character-syntax">') { ffrom++; par3 = true; }</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">+</span><span class="character-syntax">': if (quantifiable == false) return "quantifier misplaced";</span>
|
|
<span class="character-syntax"> cc = QUANTIFIER_RE_CC;</span>
|
|
<span class="character-syntax"> par1 = 1; par2 = 30000;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">?</span><span class="character-syntax">') { ffrom++; par3 = true; }</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> '</span><span class="plain-syntax">?</span><span class="character-syntax">': if (quantifiable == false) return "quantifier misplaced";</span>
|
|
<span class="character-syntax"> cc = QUANTIFIER_RE_CC;</span>
|
|
<span class="character-syntax"> par1 = 0; par2 = 1;</span>
|
|
<span class="character-syntax"> if (BlkValueRead(ftxt, ffrom) == '</span><span class="plain-syntax">?</span><span class="character-syntax">') { ffrom++; par3 = true; }</span>
|
|
<span class="character-syntax"> quantifiable = false;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> .CClassKnown;</span>
|
|
|
|
<span class="character-syntax"> if (cc >= 0) {</span>
|
|
<span class="character-syntax"> quantifiable = true;</span>
|
|
<span class="character-syntax"> if ((attach_to-->RE_CCLASS == LITERAL_RE_CC) &&</span>
|
|
<span class="character-syntax"> (BlkValueRead(ftxt, ffrom) ~= '</span><span class="plain-syntax">*</span><span class="character-syntax">' or '</span><span class="plain-syntax">+</span><span class="character-syntax">' or '</span><span class="plain-syntax">?</span><span class="character-syntax">' or '</span><span class="plain-syntax">{</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> (attach_to-->RE_PAR2)++;</span>
|
|
<span class="character-syntax"> if (TEXT_TY_RE_Trace == 2) {</span>
|
|
<span class="character-syntax"> print "Extending literal by ", cc, "=", (char) cc, "^";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> continue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> cc = LITERAL_RE_CC; par1 = ffrom-1; par2 = ffrom;</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
|
|
<span class="character-syntax"> if (TEXT_TY_RE_Trace == 2) {</span>
|
|
<span class="character-syntax"> print "Attaching packet ", no_packets+1, " to ";</span>
|
|
<span class="character-syntax"> TEXT_TY_RE_DebugNode(attach_to, ftxt);</span>
|
|
<span class="character-syntax"> TEXT_TY_RE_DebugTree(ftxt);</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> token = TEXT_TY_RE_Node(no_packets++, cc, par1, par2, par3);</span>
|
|
|
|
<span class="character-syntax"> if ((token-->RE_CCLASS == SUBEXP_RE_CC) && (token-->RE_PAR2 == 0)) {</span>
|
|
<span class="character-syntax"> RE_Subexpressions-->(token-->RE_PAR1) = token;</span>
|
|
<span class="character-syntax"> (RE_Subexpressions-->10)++;</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> if ((attach_to-->RE_CCLASS == SUBEXP_RE_CC or CHOICE_RE_CC or IF_RE_CC) &&</span>
|
|
<span class="character-syntax"> (attach_to-->RE_DOWN == NULL)) {</span>
|
|
<span class="character-syntax"> attach_to-->RE_DOWN = token; token-->RE_UP = attach_to;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if ((token-->RE_CCLASS == CHOICE_RE_CC) &&</span>
|
|
<span class="character-syntax"> ((attach_to-->RE_UP)-->RE_CCLASS == CHOICE_RE_CC)) {</span>
|
|
<span class="character-syntax"> no_packets--; token = attach_to-->RE_UP;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if (token-->RE_CCLASS == CHOICE_RE_CC) {</span>
|
|
<span class="character-syntax"> while (attach_to-->RE_PREVIOUS ~= NULL)</span>
|
|
<span class="character-syntax"> attach_to = attach_to-->RE_PREVIOUS;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (token-->RE_CCLASS == QUANTIFIER_RE_CC or CHOICE_RE_CC) {</span>
|
|
<span class="character-syntax"> token-->RE_PREVIOUS = attach_to-->RE_PREVIOUS;</span>
|
|
<span class="character-syntax"> token-->RE_UP = attach_to-->RE_UP;</span>
|
|
<span class="character-syntax"> if ((attach_to-->RE_UP ~= NULL) && (attach_to-->RE_PREVIOUS == NULL))</span>
|
|
<span class="character-syntax"> (attach_to-->RE_UP)-->RE_DOWN = token;</span>
|
|
<span class="character-syntax"> token-->RE_DOWN = attach_to;</span>
|
|
<span class="character-syntax"> bits = attach_to;</span>
|
|
<span class="character-syntax"> while (bits ~= NULL) {</span>
|
|
<span class="character-syntax"> bits-->RE_UP = token;</span>
|
|
<span class="character-syntax"> bits = bits-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> attach_to-->RE_PREVIOUS = NULL;</span>
|
|
<span class="character-syntax"> if (token-->RE_PREVIOUS ~= NULL)</span>
|
|
<span class="character-syntax"> (token-->RE_PREVIOUS)-->RE_NEXT = token;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> attach_to-->RE_NEXT = token; token-->RE_PREVIOUS = attach_to;</span>
|
|
<span class="character-syntax"> token-->RE_UP = attach_to-->RE_UP;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> if (token-->RE_CCLASS == CHOICE_RE_CC) {</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> token-->RE_NEXT = TEXT_TY_RE_Node(no_packets++, CHOICE_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> (token-->RE_NEXT)-->RE_PREVIOUS = token;</span>
|
|
<span class="character-syntax"> (token-->RE_NEXT)-->RE_UP = token-->RE_UP;</span>
|
|
<span class="character-syntax"> token = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> attach_to = token;</span>
|
|
|
|
<span class="character-syntax"> if (TEXT_TY_RE_Trace == 2) {</span>
|
|
<span class="character-syntax"> print "Result:^";</span>
|
|
<span class="character-syntax"> TEXT_TY_RE_DebugTree(ftxt);</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> if (blevel ~= 0) return "subexpression bracket mismatch";</span>
|
|
|
|
<span class="character-syntax"> if (exactly) {</span>
|
|
<span class="character-syntax"> if (no_packets+3 > RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> exactly = RE_PACKET_space-->RE_DOWN;</span>
|
|
<span class="character-syntax"> token = TEXT_TY_RE_Node(no_packets++, START_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> RE_PACKET_space-->RE_DOWN = token; token-->RE_UP = RE_PACKET_space;</span>
|
|
<span class="character-syntax"> attach_to = TEXT_TY_RE_Node(no_packets++, SUBEXP_RE_CC, -1, 3, 0);</span>
|
|
<span class="character-syntax"> token-->RE_NEXT = attach_to; attach_to-->RE_PREVIOUS = token;</span>
|
|
<span class="character-syntax"> attach_to-->RE_UP = RE_PACKET_space;</span>
|
|
<span class="character-syntax"> attach_to-->RE_NEXT = TEXT_TY_RE_Node(no_packets++, END_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> (attach_to-->RE_NEXT)-->RE_PREVIOUS = attach_to;</span>
|
|
<span class="character-syntax"> (attach_to-->RE_NEXT)-->RE_UP = RE_PACKET_space;</span>
|
|
<span class="character-syntax"> attach_to-->RE_DOWN = exactly;</span>
|
|
<span class="character-syntax"> while (exactly ~= NULL) {</span>
|
|
<span class="character-syntax"> exactly-->RE_UP = attach_to; exactly = exactly-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> no_packets = TEXT_TY_RE_ExpandChoices(RE_PACKET_space, no_packets);</span>
|
|
|
|
<span class="character-syntax"> if (TEXT_TY_RE_Trace) {</span>
|
|
<span class="character-syntax"> print "Compiled pattern:^";</span>
|
|
<span class="character-syntax"> TEXT_TY_RE_DebugTree(ftxt);</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> bits = TEXT_TY_RE_CheckTree(RE_PACKET_space, no_subs); if (bits) return bits;</span>
|
|
|
|
<span class="character-syntax"> return no_packets;</span>
|
|
<span class="character-syntax">];</span>
|
|
|
|
<span class="character-syntax">[ TEXT_TY_RE_RangeSyntaxCorrect ftxt rf rt</span>
|
|
<span class="character-syntax"> i chm;</span>
|
|
<span class="character-syntax"> for (i=rf: i<rt: i++) {</span>
|
|
<span class="character-syntax"> chm = BlkValueRead(ftxt, i);</span>
|
|
<span class="character-syntax"> if ((chm == '</span><span class="plain-syntax">\</span><span class="character-syntax">') && (i+1<rt)) {</span>
|
|
<span class="character-syntax"> chm = BlkValueRead(ftxt, ++i);</span>
|
|
<span class="character-syntax"> if (((chm >= '</span><span class="identifier-syntax">a</span><span class="character-syntax">') && (chm <= '</span><span class="identifier-syntax">z</span><span class="character-syntax">')) ||</span>
|
|
<span class="character-syntax"> ((chm >= '</span><span class="identifier-syntax">A</span><span class="character-syntax">') && (chm <= '</span><span class="identifier-syntax">Z</span><span class="character-syntax">'))) {</span>
|
|
<span class="character-syntax"> if (chm ~= '</span><span class="identifier-syntax">s</span><span class="character-syntax">' or '</span><span class="identifier-syntax">S</span><span class="character-syntax">' or '</span><span class="identifier-syntax">p</span><span class="character-syntax">' or '</span><span class="identifier-syntax">P</span><span class="character-syntax">' or '</span><span class="identifier-syntax">w</span><span class="character-syntax">' or '</span><span class="identifier-syntax">W</span><span class="character-syntax">' or '</span><span class="identifier-syntax">d</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="identifier-syntax">D</span><span class="character-syntax">' or '</span><span class="identifier-syntax">n</span><span class="character-syntax">' or '</span><span class="identifier-syntax">t</span><span class="character-syntax">' or '</span><span class="identifier-syntax">l</span><span class="character-syntax">' or '</span><span class="identifier-syntax">L</span><span class="character-syntax">' or '</span><span class="identifier-syntax">u</span><span class="character-syntax">' or '</span><span class="identifier-syntax">U</span><span class="character-syntax">')</span>
|
|
<span class="character-syntax"> return "Invalid escape in {} range";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if ((i+2<rt) && (BlkValueRead(ftxt, i+1) == '</span><span class="plain-syntax">-</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> if (chm > BlkValueRead(ftxt, i+2)) return "Invalid {} range";</span>
|
|
<span class="character-syntax"> i=i+2;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> rfalse;</span>
|
|
<span class="character-syntax">];</span>
|
|
|
|
<span class="character-syntax">[ TEXT_TY_RE_ExpandChoices token no_packets</span>
|
|
<span class="character-syntax"> rv prev nex holder new ct n cond_node then_node else_node;</span>
|
|
<span class="character-syntax"> while (token ~= NULL) {</span>
|
|
<span class="character-syntax"> if (token-->RE_CCLASS == IF_RE_CC) {</span>
|
|
<span class="character-syntax"> if ((token-->RE_DOWN)-->RE_CCLASS == CHOICE_RE_CC) {</span>
|
|
<span class="character-syntax"> for (nex=token-->RE_DOWN, n=0: nex~=NULL: nex=nex-->RE_NEXT) n++;</span>
|
|
<span class="character-syntax"> if (n~=2) return "conditional has too many clauses";</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> cond_node = TEXT_TY_RE_Node(no_packets++, CONDITION_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR1 >= 1) {</span>
|
|
<span class="character-syntax"> cond_node-->RE_PAR1 = token-->RE_PAR1;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> then_node = token-->RE_DOWN;</span>
|
|
<span class="character-syntax"> then_node-->RE_CCLASS = THEN_RE_CC;</span>
|
|
<span class="character-syntax"> else_node = then_node-->RE_NEXT;</span>
|
|
<span class="character-syntax"> else_node-->RE_CCLASS = ELSE_RE_CC;</span>
|
|
<span class="character-syntax"> if (cond_node-->RE_PAR1 < 1) {</span>
|
|
<span class="character-syntax"> cond_node-->RE_DOWN = then_node-->RE_DOWN;</span>
|
|
<span class="character-syntax"> then_node-->RE_DOWN = (then_node-->RE_DOWN)-->RE_NEXT;</span>
|
|
<span class="character-syntax"> if (then_node-->RE_DOWN ~= NULL)</span>
|
|
<span class="character-syntax"> (then_node-->RE_DOWN)-->RE_PREVIOUS = NULL;</span>
|
|
<span class="character-syntax"> (cond_node-->RE_DOWN)-->RE_NEXT = NULL;</span>
|
|
<span class="character-syntax"> (cond_node-->RE_DOWN)-->RE_UP = cond_node;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> token-->RE_DOWN = cond_node; cond_node-->RE_UP = token;</span>
|
|
<span class="character-syntax"> cond_node-->RE_NEXT = then_node; then_node-->RE_PREVIOUS = cond_node;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> cond_node = TEXT_TY_RE_Node(no_packets++, CONDITION_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> then_node = TEXT_TY_RE_Node(no_packets++, THEN_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR1 >= 1) {</span>
|
|
<span class="character-syntax"> cond_node-->RE_PAR1 = token-->RE_PAR1;</span>
|
|
<span class="character-syntax"> then_node-->RE_DOWN = token-->RE_DOWN;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> cond_node-->RE_DOWN = token-->RE_DOWN;</span>
|
|
<span class="character-syntax"> then_node-->RE_DOWN = (token-->RE_DOWN)-->RE_NEXT;</span>
|
|
<span class="character-syntax"> (cond_node-->RE_DOWN)-->RE_NEXT = NULL;</span>
|
|
<span class="character-syntax"> (cond_node-->RE_DOWN)-->RE_UP = cond_node;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> token-->RE_DOWN = cond_node;</span>
|
|
<span class="character-syntax"> cond_node-->RE_UP = token; cond_node-->RE_NEXT = then_node;</span>
|
|
<span class="character-syntax"> then_node-->RE_PREVIOUS = cond_node; then_node-->RE_UP = token;</span>
|
|
<span class="character-syntax"> then_node-->RE_NEXT = NULL;</span>
|
|
<span class="character-syntax"> if (then_node-->RE_DOWN ~= NULL)</span>
|
|
<span class="character-syntax"> (then_node-->RE_DOWN)-->RE_PREVIOUS = NULL;</span>
|
|
<span class="character-syntax"> for (nex = then_node-->RE_DOWN: nex ~= NULL: nex = nex-->RE_NEXT) {</span>
|
|
<span class="character-syntax"> nex-->RE_UP = then_node;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> if (cond_node-->RE_DOWN ~= NULL) {</span>
|
|
<span class="character-syntax"> nex = cond_node-->RE_DOWN;</span>
|
|
<span class="character-syntax"> if ((nex-->RE_CCLASS ~= SUBEXP_RE_CC) ||</span>
|
|
<span class="character-syntax"> (nex-->RE_NEXT ~= NULL) ||</span>
|
|
<span class="character-syntax"> (nex-->RE_PAR2 ~= 1 or 2)) {</span>
|
|
<span class="character-syntax"> </span><span class="comment-syntax">EXT_TY_RE_DebugSubtree(0, 0, nex, true);</span>
|
|
<span class="character-syntax"> return "condition not lookahead/behind";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if ((token-->RE_CCLASS == CHOICE_RE_CC) && (token-->RE_PAR1 < 1)) {</span>
|
|
<span class="character-syntax"> prev = token-->RE_PREVIOUS;</span>
|
|
<span class="character-syntax"> nex = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> while ((nex ~= NULL) && (nex-->RE_CCLASS == CHOICE_RE_CC))</span>
|
|
<span class="character-syntax"> nex = nex-->RE_NEXT;</span>
|
|
<span class="character-syntax"> holder = token-->RE_UP; if (holder == NULL) return "bang";</span>
|
|
<span class="character-syntax"> if (no_packets >= RE_MAX_PACKETS) return "regexp too complex";</span>
|
|
<span class="character-syntax"> new = TEXT_TY_RE_Node(no_packets++, DISJUNCTION_RE_CC, 0, 0, 0);</span>
|
|
<span class="character-syntax"> holder-->RE_DOWN = new; new-->RE_UP = holder;</span>
|
|
<span class="character-syntax"> if (prev ~= NULL) {</span>
|
|
<span class="character-syntax"> prev-->RE_NEXT = new; new-->RE_PREVIOUS = prev;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (nex ~= NULL) {</span>
|
|
<span class="character-syntax"> nex-->RE_PREVIOUS = new; new-->RE_NEXT = nex;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> new-->RE_DOWN = token;</span>
|
|
<span class="character-syntax"> token-->RE_PREVIOUS = NULL;</span>
|
|
<span class="character-syntax"> ct = 1;</span>
|
|
<span class="character-syntax"> while (token ~= NULL) {</span>
|
|
<span class="character-syntax"> token-->RE_PAR1 = ct++;</span>
|
|
<span class="character-syntax"> token-->RE_UP = new;</span>
|
|
<span class="character-syntax"> if ((token-->RE_NEXT ~= NULL) &&</span>
|
|
<span class="character-syntax"> ((token-->RE_NEXT)-->RE_CCLASS ~= CHOICE_RE_CC))</span>
|
|
<span class="character-syntax"> token-->RE_NEXT = NULL;</span>
|
|
<span class="character-syntax"> token = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> new-->RE_PAR1 = ct-1;</span>
|
|
<span class="character-syntax"> if (token ~= NULL) token-->RE_NEXT = NULL;</span>
|
|
<span class="character-syntax"> token = new; continue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (token-->RE_DOWN ~= NULL) {</span>
|
|
<span class="character-syntax"> no_packets = TEXT_TY_RE_ExpandChoices(token-->RE_DOWN, no_packets);</span>
|
|
<span class="character-syntax"> if ((no_packets<0) || (no_packets >= RE_MAX_PACKETS)) break;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> token = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> return no_packets;</span>
|
|
<span class="character-syntax">];</span>
|
|
|
|
<span class="character-syntax">[ TEXT_TY_RE_CheckTree token no_subs</span>
|
|
<span class="character-syntax"> rv;</span>
|
|
<span class="character-syntax"> while (token ~= NULL) {</span>
|
|
<span class="character-syntax"> if (token-->RE_CCLASS == VARIABLE_RE_CC) {</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR1 >= no_subs) return "reference to nonexistent group";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if ((token-->RE_CCLASS == SUBEXP_RE_CC) &&</span>
|
|
<span class="character-syntax"> (token-->RE_PAR2 == 1 or 2) &&</span>
|
|
<span class="character-syntax"> (token-->RE_PAR3 == -1)) {</span>
|
|
<span class="character-syntax"> token-->RE_PAR3 = TEXT_TY_RE_Width(token-->RE_DOWN);</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR3 == -1) return "variable length lookbehind not implemented";</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (token-->RE_DOWN ~= NULL) {</span>
|
|
<span class="character-syntax"> rv = TEXT_TY_RE_CheckTree(token-->RE_DOWN, no_subs);</span>
|
|
<span class="character-syntax"> if (rv) return rv;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> token = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> rfalse;</span>
|
|
<span class="character-syntax">];</span>
|
|
|
|
<span class="character-syntax">[ TEXT_TY_RE_Width token downwards</span>
|
|
<span class="character-syntax"> w rv aw choice;</span>
|
|
<span class="character-syntax"> while (token ~= NULL) {</span>
|
|
<span class="character-syntax"> switch (token-->RE_CCLASS) {</span>
|
|
<span class="character-syntax"> DIGIT_RE_CC, NONDIGIT_RE_CC, WHITESPACE_RE_CC, NONWHITESPACE_RE_CC,</span>
|
|
<span class="character-syntax"> PUNCTUATION_RE_CC, NONPUNCTUATION_RE_CC, WORD_RE_CC, NONWORD_RE_CC,</span>
|
|
<span class="character-syntax"> ANYTHING_RE_CC, NOTHING_RE_CC, RANGE_RE_CC, NEWLINE_RE_CC, TAB_RE_CC,</span>
|
|
<span class="character-syntax"> UCASE_RE_CC, NONUCASE_RE_CC, LCASE_RE_CC, NONLCASE_RE_CC:</span>
|
|
<span class="character-syntax"> w++;</span>
|
|
<span class="character-syntax"> START_RE_CC, END_RE_CC, BOUNDARY_RE_CC, NONBOUNDARY_RE_CC, ALWAYS_RE_CC:</span>
|
|
<span class="character-syntax"> ;</span>
|
|
<span class="character-syntax"> LITERAL_RE_CC:</span>
|
|
<span class="character-syntax"> w = w + token-->RE_PAR2 - token-->RE_PAR1;</span>
|
|
<span class="character-syntax"> VARIABLE_RE_CC:</span>
|
|
<span class="character-syntax"> return -1;</span>
|
|
<span class="character-syntax"> IF_RE_CC:</span>
|
|
<span class="character-syntax"> rv = TEXT_TY_RE_Width((token-->RE_DOWN)-->RE_NEXT);</span>
|
|
<span class="character-syntax"> if (rv == -1) return -1;</span>
|
|
<span class="character-syntax"> if (rv ~= TEXT_TY_RE_Width(((token-->RE_DOWN)-->RE_NEXT)-->RE_NEXT))</span>
|
|
<span class="character-syntax"> return -1;</span>
|
|
<span class="character-syntax"> w = w + rv;</span>
|
|
<span class="character-syntax"> SUBEXP_RE_CC:</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR2 == 1 or 2) rv = 0;</span>
|
|
<span class="character-syntax"> else {</span>
|
|
<span class="character-syntax"> rv = TEXT_TY_RE_Width(token-->RE_DOWN);</span>
|
|
<span class="character-syntax"> if (rv == -1) return -1;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> w = w + rv;</span>
|
|
<span class="character-syntax"> QUANTIFIER_RE_CC:</span>
|
|
<span class="character-syntax"> if (token-->RE_PAR1 ~= token-->RE_PAR2) return -1;</span>
|
|
<span class="character-syntax"> rv = TEXT_TY_RE_Width(token-->RE_DOWN);</span>
|
|
<span class="character-syntax"> if (rv == -1) return -1;</span>
|
|
<span class="character-syntax"> w = w + rv*(token-->RE_PAR1);</span>
|
|
<span class="character-syntax"> DISJUNCTION_RE_CC:</span>
|
|
<span class="character-syntax"> aw = -1;</span>
|
|
<span class="character-syntax"> for (choice = token-->RE_DOWN: choice ~= NULL: choice = choice-->RE_NEXT) {</span>
|
|
<span class="character-syntax"> rv = TEXT_TY_RE_Width(choice-->RE_DOWN);</span>
|
|
<span class="character-syntax"> </span><span class="comment-syntax">rint "Option found ", rv, "^";</span>
|
|
<span class="character-syntax"> if (rv == -1) return -1;</span>
|
|
<span class="character-syntax"> if ((aw >= 0) && (aw ~= rv)) return -1;</span>
|
|
<span class="character-syntax"> aw = rv;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> w = w + aw;</span>
|
|
<span class="character-syntax"> SENSITIVITY_RE_CC:</span>
|
|
<span class="character-syntax"> ;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (downwards) return w;</span>
|
|
<span class="character-syntax"> if (token ~= NULL) token = token-->RE_NEXT;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> return w;</span>
|
|
<span class="character-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP11" class="paragraph-anchor"></a><b>§11. Parser. </b>The virtue of all of that tree compilation is that the code which actually
|
|
does the work — which parses the source text to see if the regular expression
|
|
matches it — is much shorter and quicker: indeed, it takes up fewer lines
|
|
than the compiler part, which goes to show that decoding regular expression
|
|
syntax is a more complex task than acting upon it. This would have surprised
|
|
the pioneers of regexp, but the syntax has become much more complicated over
|
|
the decades because of a steady increase in the number of extended notations.
|
|
The process shows no sign of stopping, with Python and PCRE continuing to
|
|
push boundaries beyond Perl, which was once thought the superest, duperest
|
|
regexp syntax there could be. However: to work.
|
|
</p>
|
|
|
|
<p class="commentary">The main matcher simply starts a recursive subroutine to perform the match.
|
|
However, the subroutine tests for a match at a particular position in the
|
|
source text; so the main routine tries the subroutine everywhere convenient
|
|
in the source text, from left to right, until a match is made — unless the
|
|
regexp is constrained by a <span class="extract"><span class="extract-syntax">^</span></span> glyph to begin matching at the start of the
|
|
source text, which will cause a <span class="extract"><span class="extract-syntax">START_RE_CC</span></span> node to be the eldest child
|
|
of the <span class="extract"><span class="extract-syntax">\0</span></span> root.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="identifier-syntax">Global</span><span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_RewindCount</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_PrintNoRewinds</span><span class="plain-syntax">; </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_RewindCount</span><span class="plain-syntax">; ];</span>
|
|
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
|
|
<span class="reserved-syntax">Constant</span><span class="plain-syntax"> </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_Parse</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> </span><span class="identifier-syntax">insens</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ilen</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">initial_mode</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ilen</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"><0) || (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax">></span><span class="identifier-syntax">ilen</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">initial_mode</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">insens</span><span class="plain-syntax">) </span><span class="identifier-syntax">initial_mode</span><span class="plain-syntax"> = </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_Clear_Markers</span><span class="plain-syntax">(</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">);</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (:</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"><=</span><span class="identifier-syntax">ilen</span><span class="plain-syntax">:</span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> ((</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">START_RE_CC</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax">>0)) { </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = -1; </span><span class="reserved-syntax">break</span><span class="plain-syntax">; }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> > </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_EraseConstraints</span><span class="plain-syntax">(</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">, </span><span class="identifier-syntax">initial_mode</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_RewindCount</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ilen</span><span class="plain-syntax">, </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">, </span><span class="identifier-syntax">initial_mode</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> == -1) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">root</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">+</span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP12" class="paragraph-anchor"></a><b>§12. Parse At Position. </b><span class="extract"><span class="extract-syntax">TEXT_TY_RE_ParseAtPosition(ftxt, txt, ifrom, ito)</span></span> attempts to match text
|
|
beginning at position <span class="extract"><span class="extract-syntax">ifrom</span></span> in the text <span class="extract"><span class="extract-syntax">txt</span></span> and extending for
|
|
any length up to position <span class="extract"><span class="extract-syntax">ito</span></span>: it returns the number of characters which
|
|
were matched (which can legitimately be 0), or \(-1\) if no match could be
|
|
made. <span class="extract"><span class="extract-syntax">ftxt</span></span> is the original text of the regular expression in its
|
|
precompiled form, which we need partly to print good debugging information,
|
|
but mostly in order to match against a <span class="extract"><span class="extract-syntax">LITERAL_RE_CC</span></span> node.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax"> </span><span class="identifier-syntax">ifrom</span><span class="plain-syntax"> </span><span class="identifier-syntax">ito</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="identifier-syntax">edge</span><span class="plain-syntax"> </span><span class="identifier-syntax">rewind_this</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ifrom</span><span class="plain-syntax"> > </span><span class="identifier-syntax">ito</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ifrom</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> .</span><span class="identifier-syntax">Rewind</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">false</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Matching at "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"><</span><span class="identifier-syntax">ito</span><span class="plain-syntax">) </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">); </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_MODES</span><span class="plain-syntax"> = </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">; </span><span class="comment-syntax">Save in case of backtrack</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax">) {</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Should never happen</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">CHOICE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="string-syntax">"internal error"</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Mode switches</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">SENSITIVITY_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">) </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> = </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> | </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> = </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & (~</span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Zero-length positional markers</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ALWAYS_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NEVER_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> ;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">START_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">END_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">) == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BOUNDARY_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">) == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax">++;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">-1);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax">++;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONBOUNDARY_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">) == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax">++;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">-1);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax">++;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Control constructs</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">IF_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">; </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="reserved-syntax">false</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying conditional from "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> >= </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">-->10) &&</span>
|
|
<span class="plain-syntax"> ((</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">i</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">)) </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Condition found to be "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="string-syntax">"^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">rint "Then clause returned ", rv, "^";</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">) == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="comment-syntax">The empty else clause matches</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> (((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">rint "Else clause returned ", rv, "^";</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DISJUNCTION_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying disjunction from "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">: </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">: </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ch</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax"> <= </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax">) </span><span class="reserved-syntax">continue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying choice at "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">; </span><span class="comment-syntax">Where match was made</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ch</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">; </span><span class="comment-syntax">Option taken</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Choice worked with width "</span><span class="plain-syntax">, </span><span class="identifier-syntax">rv</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Failed disjunction from "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">; </span><span class="comment-syntax">Where match was tried</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1; </span><span class="comment-syntax">No option was taken</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">2</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> - </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">npos</span><span class="plain-syntax"><0) </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = -1; </span><span class="comment-syntax">Lookbehind fails: nothing behind</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">: </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">: </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & (~</span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">));</span>
|
|
<span class="plain-syntax"> </span><span class="constant-syntax">2</span><span class="plain-syntax">: </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> | </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">: </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="constant-syntax">2</span><span class="plain-syntax">: </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = -1; </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">npos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> == </span><span class="constant-syntax">2</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE1</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying quantifier from "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) { </span><span class="comment-syntax">Greedy quantifier</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">dge = ito; if (token-->RE_CONSTRAINT >= 0) edge = token-->RE_CONSTRAINT;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">edge</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">edge</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">=</span><span class="identifier-syntax">ipos</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">edge</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying quant rep "</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">+1, </span><span class="string-syntax">" at "</span><span class="plain-syntax">, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> | </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> < </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE1</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE2</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> == </span><span class="constant-syntax">30000</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">i</span><span class="plain-syntax">>=1)) { </span><span class="identifier-syntax">i</span><span class="plain-syntax">++; </span><span class="reserved-syntax">break</span><span class="plain-syntax">; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> <= </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> { </span><span class="comment-syntax">Lazy quantifier</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">edge</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> > </span><span class="identifier-syntax">edge</span><span class="plain-syntax">) </span><span class="identifier-syntax">edge</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">=</span><span class="identifier-syntax">ipos</span><span class="plain-syntax">: (</span><span class="identifier-syntax">npos</span><span class="plain-syntax"><</span><span class="identifier-syntax">ito</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> < </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">): </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">edge</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Trying quant rep "</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">+1, </span><span class="string-syntax">" at "</span><span class="plain-syntax">, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_ParseAtPosition</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">npos</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> | </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> < </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE1</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE2</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> == </span><span class="constant-syntax">30000</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">i</span><span class="plain-syntax">>=1)) { </span><span class="identifier-syntax">i</span><span class="plain-syntax">++; </span><span class="reserved-syntax">break</span><span class="plain-syntax">; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">npos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> >= </span><span class="identifier-syntax">edge</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> <= </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">outcome</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) { </span><span class="comment-syntax">Greedy quantifier</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> > </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">) { </span><span class="comment-syntax">I.e., if we have been greedy</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">i</span><span class="plain-syntax">-1; </span><span class="comment-syntax">And its edge limitation</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> { </span><span class="comment-syntax">Lazy quantifier</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax"> < </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">) { </span><span class="comment-syntax">I.e., if we have been lazy</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">i</span><span class="plain-syntax">+1; </span><span class="comment-syntax">And its edge limitation</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">npos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_CACHE2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Successful quant reps "</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">oken-->RE_DATA2 = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Failed quant reps "</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="string-syntax">": "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Character classes</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NOTHING_RE_CC</span><span class="plain-syntax">: ;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ANYTHING_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">WHITESPACE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONWHITESPACE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">PUNCTUATION_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONPUNCTUATION_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> ~= </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">WORD_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONWORD_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">10</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">13</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">32</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">9</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'.'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">','</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'!'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'/'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">':'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">';'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'('</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">')'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">']'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'{'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'}'</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DIGIT_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="character-syntax">'0'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'1'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'2'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'3'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'4'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'5'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'6'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'7'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'8'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'9'</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONDIGIT_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> ~= </span><span class="character-syntax">'0'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'1'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'2'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'3'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'4'</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'5'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'6'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'7'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'8'</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="character-syntax">'9'</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">LCASE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">CharIsOfCase</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONLCASE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">CharIsOfCase</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">) == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">UCASE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">CharIsOfCase</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="constant-syntax">1</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NONUCASE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">CharIsOfCase</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="constant-syntax">1</span><span class="plain-syntax">) == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">NEWLINE_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">10</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TAB_RE_CC</span><span class="plain-syntax">: </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">9</span><span class="plain-syntax">) { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RANGE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Range</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR3</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> { </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++; }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">Substring matches</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">LITERAL_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_MatchSubstring</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">, </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) { </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">; </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">VARIABLE_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">i</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">rv</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_MatchSubstring</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, (</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">i</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">RE_Subexpressions</span><span class="plain-syntax">--></span><span class="identifier-syntax">i</span><span class="plain-syntax">)--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">, </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">CIS_MFLAG</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rv</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) { </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> + </span><span class="identifier-syntax">rv</span><span class="plain-syntax">; </span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">; }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> .</span><span class="identifier-syntax">NeverMatchIncompleteVar</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">outcome</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_RewindCount</span><span class="plain-syntax">++ >= </span><span class="constant-syntax">10000</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_RewindCount</span><span class="plain-syntax"> == </span><span class="constant-syntax">10001</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">style</span><span class="plain-syntax"> </span><span class="identifier-syntax">bold</span><span class="plain-syntax">; </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"OVERFLOW^"</span><span class="plain-syntax">; </span><span class="reserved-syntax">style</span><span class="plain-syntax"> </span><span class="identifier-syntax">roman</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Rewind sought from failure at pos "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">" with: "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_SeekBacktrack</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">false</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="reserved-syntax">false</span><span class="plain-syntax">)))</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">jump</span><span class="plain-syntax"> </span><span class="identifier-syntax">RewindFound</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_SeekBacktrack</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) {</span>
|
|
<span class="plain-syntax"> .</span><span class="identifier-syntax">RewindFound</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_MODES</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mode_flags</span><span class="plain-syntax"> & </span><span class="identifier-syntax">ACCUM_MFLAG</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> == -1)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugTree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"^["</span><span class="plain-syntax">, </span><span class="identifier-syntax">ifrom</span><span class="plain-syntax">, </span><span class="string-syntax">","</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="string-syntax">"] rewinding to "</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">, </span><span class="string-syntax">" at "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">jump</span><span class="plain-syntax"> </span><span class="identifier-syntax">Rewind</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PREVIOUS</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"^Rewind impossible^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> - </span><span class="identifier-syntax">ifrom</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP13" class="paragraph-anchor"></a><b>§13. Backtracking. </b>It would be very straightforward to match regular expressions with the
|
|
above recursive code if, for every node, there were a fixed number of
|
|
characters (depending on the node) such that there would either be a
|
|
match eating that many characters, or else no match at all. If that
|
|
were true, we could simply march through the text matching until we
|
|
could match no more, and although some nodes might have ambiguous readings,
|
|
we could always match the first possibility which worked. There would never
|
|
be any need to retreat.
|
|
</p>
|
|
|
|
<p class="commentary">Well, in fact that happy state does apply to a surprising number of nodes,
|
|
and some quite complicated regular expressions can be made which use only
|
|
them: <span class="extract"><span class="extract-syntax"><abc>{2}\d\d\1</span></span>, for instance, matches a sequence of exactly 6
|
|
characters or else fails to match altogether, and there is never any need
|
|
to backtrack. One reason why backtracking is a fairly good algorithm in
|
|
practice is that these "good" cases occur fairly often, in subexpressions
|
|
if not in the entire expression, and the simple method above disposes of
|
|
them efficiently.
|
|
</p>
|
|
|
|
<p class="commentary">But in an expression like <span class="extract"><span class="extract-syntax">ab+bb</span></span>, there is no alternative to backtracking
|
|
if we are going to try to match the nodes from left to right: we match the
|
|
"a", then we match as many "b"s as we can — but then we find that we
|
|
have to match "bb", and this is necessarily impossible because we have
|
|
just eaten all of the "b"s available. We therefore backtrack one node
|
|
to the <span class="extract"><span class="extract-syntax">b+</span></span> and try again. We obviously can't literally try again because
|
|
that would give the same result: instead we impose a constraint. Suppose
|
|
it previously matched a row of 23 letter "b"s, so that the quantifier <span class="extract"><span class="extract-syntax">+</span></span>
|
|
resulted in a multiplicity of 23. We then constrain the node and in effect
|
|
consider it to be <span class="extract"><span class="extract-syntax">b{1,22}</span></span>, that is, to match at least 1 and at most 22
|
|
letter "b"s. That still won't work, as it happens, so we backtrack again
|
|
with a constraint tightened to make it <span class="extract"><span class="extract-syntax">b{1,21}</span></span>, and now the match occurs
|
|
as we would hope. When the expression becomes more complex, backtracking
|
|
becomes a longer-distance, recursive procedure — we have to exhaust all
|
|
possibilities of a more recent node before tracking back to one from longer
|
|
ago. (This is why the worst test cases are those which entice us into a long,
|
|
long series of matches only to find that a guess made right back at the
|
|
start was ill-fated.)
|
|
</p>
|
|
|
|
<p class="commentary">Rather than describing <span class="extract"><span class="extract-syntax">TEXT_TY_RE_SeekBacktrack</span></span> in detail here, it is probably
|
|
more useful to suggest that the reader observe it in action by setting
|
|
<span class="extract"><span class="extract-syntax">TEXT_TY_RE_Trace</span></span> and trying a few regular expressions.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_SeekBacktrack</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">downwards</span><span class="plain-syntax"> </span><span class="identifier-syntax">ito</span><span class="plain-syntax"> </span><span class="identifier-syntax">report_only</span><span class="plain-syntax"> </span><span class="identifier-syntax">untried</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">report_only</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Scan for rewind: "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR2</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">2</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">4</span><span class="plain-syntax">)) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">downwards</span><span class="plain-syntax">) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">continue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">report_only</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Descend^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_SeekBacktrack</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">false</span><span class="plain-syntax">, </span><span class="identifier-syntax">ito</span><span class="plain-syntax">, </span><span class="identifier-syntax">report_only</span><span class="plain-syntax">)) </span><span class="reserved-syntax">rtrue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">untried</span><span class="plain-syntax"> = </span><span class="reserved-syntax">false</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DISJUNCTION_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> >= </span><span class="constant-syntax">1</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> < </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">) &&</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> < </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_PAR1</span><span class="plain-syntax">)) { </span><span class="comment-syntax">Matched but earlier than last</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">report_only</span><span class="plain-syntax">) </span><span class="reserved-syntax">rtrue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> == -1)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span>
|
|
<span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax">)++;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">untried</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax">:</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> ~= -2) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) && (</span><span class="identifier-syntax">report_only</span><span class="plain-syntax"> == </span><span class="reserved-syntax">false</span><span class="plain-syntax">)) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Quant with cons not -2: "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">report_only</span><span class="plain-syntax">) </span><span class="reserved-syntax">rtrue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">untried</span><span class="plain-syntax"> = </span><span class="reserved-syntax">true</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">untried</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"Grounds for rewind at: "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugNode</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="reserved-syntax">true</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_EraseConstraints</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_EraseConstraints</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">rtrue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">downwards</span><span class="plain-syntax">) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP14" class="paragraph-anchor"></a><b>§14. Fail Subexpressions. </b>Here, an attempt to make a complicated match against the node in <span class="extract"><span class="extract-syntax">token</span></span> has
|
|
failed: that means that any subexpressions which were matched in the course of
|
|
the attempt must also in retrospect be considered unmatched. So we work down
|
|
through the subtree at <span class="extract"><span class="extract-syntax">token</span></span> and empty any markers for subexpressions,
|
|
which in effect clears their backslash variables — this is important as,
|
|
otherwise, the contents left over could cause the alternative reading of the
|
|
<span class="extract"><span class="extract-syntax">token</span></span> to be misparsed if it refers to the backslash variables in question.
|
|
(If you think nobody would ever be crazy enough to write a regular expression
|
|
like that, you haven't see Perl's test suite.)
|
|
</p>
|
|
|
|
<p class="commentary">If the <span class="extract"><span class="extract-syntax">downwards</span></span> flag is clear, it not only invalidates subexpression
|
|
matches below the node but also to the right of the node — this is useful
|
|
for a backtrack which runs back quite some distance.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> </span><span class="identifier-syntax">downwards</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">: </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_FailSubexpressions</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax"> == </span><span class="identifier-syntax">SUBEXP_RE_CC</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">downwards</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP15" class="paragraph-anchor"></a><b>§15. Erasing Constraints. </b>As explained above, temporary constraints are placed on some nodes when we
|
|
are backtracking to test possible cases. When we do backtrack, though, it's
|
|
important to lift any constraints left over from the previous attempt to
|
|
parse material which is part of or subsequent to the token whose match
|
|
attempt has been abandoned.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_EraseConstraints</span><span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax"> ~= </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CCLASS</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">DISJUNCTION_RE_CC</span><span class="plain-syntax">: </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">QUANTIFIER_RE_CC</span><span class="plain-syntax">: </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_CONSTRAINT</span><span class="plain-syntax"> = -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_EraseConstraints</span><span class="plain-syntax">(</span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DOWN</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">token</span><span class="plain-syntax"> = </span><span class="identifier-syntax">token</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_NEXT</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP16" class="paragraph-anchor"></a><b>§16. Matching Literal Text. </b>Here we attempt to make a match of the substring of the text <span class="extract"><span class="extract-syntax">mtxt</span></span> which runs
|
|
from character <span class="extract"><span class="extract-syntax">mfrom</span></span> to character <span class="extract"><span class="extract-syntax">mto</span></span>, looking for it at the given
|
|
position <span class="extract"><span class="extract-syntax">ipos</span></span> in the source text <span class="extract"><span class="extract-syntax">txt</span></span>.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_MatchSubstring</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax"> </span><span class="identifier-syntax">ipos</span><span class="plain-syntax"> </span><span class="identifier-syntax">mtxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">mfrom</span><span class="plain-syntax"> </span><span class="identifier-syntax">mto</span><span class="plain-syntax"> </span><span class="identifier-syntax">insens</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">mfrom</span><span class="plain-syntax"> < </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">insens</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">mfrom</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">mto</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">mtxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++) ~= </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RevCase</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">mfrom</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">mto</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++)</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ipos</span><span class="plain-syntax">++) ~= </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">mtxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">))</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> -1;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">mto</span><span class="plain-syntax">-</span><span class="identifier-syntax">mfrom</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP17" class="paragraph-anchor"></a><b>§17. Matching Character Range. </b>Suppose that a character range is stored in <span class="extract"><span class="extract-syntax">ftxt</span></span> between the character
|
|
positions <span class="extract"><span class="extract-syntax">rf</span></span> and <span class="extract"><span class="extract-syntax">rt</span></span>. Then <span class="extract"><span class="extract-syntax">TEXT_TY_RE_Range(ch, ftxt, rf, rt, negate, insens)</span></span>
|
|
tests whether a given character <span class="extract"><span class="extract-syntax">ch</span></span> lies within that character range,
|
|
negating the outcome if <span class="extract"><span class="extract-syntax">negate</span></span> is set, and performing comparisons
|
|
case insensitively if <span class="extract"><span class="extract-syntax">insens</span></span> is set.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_Range</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">rf</span><span class="plain-syntax"> </span><span class="identifier-syntax">rt</span><span class="plain-syntax"> </span><span class="identifier-syntax">negate</span><span class="plain-syntax"> </span><span class="identifier-syntax">insens</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">chm</span><span class="plain-syntax"> </span><span class="identifier-syntax">upper</span><span class="plain-syntax"> </span><span class="identifier-syntax">crev</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">negate</span><span class="plain-syntax"> == </span><span class="reserved-syntax">true</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Range</span><span class="plain-syntax">(</span><span class="identifier-syntax">ch</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">rf</span><span class="plain-syntax">, </span><span class="identifier-syntax">rt</span><span class="plain-syntax">, </span><span class="reserved-syntax">false</span><span class="plain-syntax">, </span><span class="identifier-syntax">insens</span><span class="plain-syntax">)) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">rtrue</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">rf</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">rt</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">chm</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">chm</span><span class="plain-syntax"> == </span><span class="character-syntax">'\') && (i+1<rt)) {</span>
|
|
<span class="character-syntax"> chm = BlkValueRead(ftxt, ++i);</span>
|
|
<span class="character-syntax"> switch (chm) {</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">s</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if (ch == 10 or 13 or 32 or 9) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">S</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if ((ch) && (ch ~= 10 or 13 or 32 or 9)) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">p</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="plain-syntax">.</span><span class="character-syntax">' or '</span><span class="plain-syntax">,</span><span class="character-syntax">' or '</span><span class="plain-syntax">!</span><span class="character-syntax">' or '</span><span class="plain-syntax">?</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="plain-syntax">-</span><span class="character-syntax">' or '</span><span class="plain-syntax">/</span><span class="character-syntax">' or '</span><span class="string-syntax">"' or ':' or ';'</span>
|
|
<span class="string-syntax"> or '(' or ')' or '[' or ']' or '{' or '}') rtrue;</span>
|
|
<span class="string-syntax"> 'P':</span>
|
|
<span class="string-syntax"> if ((ch) && (ch ~= '.' or ',' or '!' or '?'</span>
|
|
<span class="string-syntax"> or '-' or '/' or '"</span><span class="character-syntax">' or '</span><span class="plain-syntax">:</span><span class="character-syntax">' or '</span><span class="plain-syntax">;</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="plain-syntax">(</span><span class="character-syntax">' or '</span><span class="plain-syntax">)</span><span class="character-syntax">' or '</span><span class="plain-syntax">[</span><span class="character-syntax">' or '</span><span class="plain-syntax">]</span><span class="character-syntax">' or '</span><span class="plain-syntax">{</span><span class="character-syntax">' or '</span><span class="plain-syntax">}</span><span class="character-syntax">')) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">w</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if ((ch) && (ch ~= 10 or 13 or 32 or 9</span>
|
|
<span class="character-syntax"> or '</span><span class="plain-syntax">.</span><span class="character-syntax">' or '</span><span class="plain-syntax">,</span><span class="character-syntax">' or '</span><span class="plain-syntax">!</span><span class="character-syntax">' or '</span><span class="plain-syntax">?</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="plain-syntax">-</span><span class="character-syntax">' or '</span><span class="plain-syntax">/</span><span class="character-syntax">' or '</span><span class="string-syntax">"' or ':' or ';'</span>
|
|
<span class="string-syntax"> or '(' or ')' or '[' or ']' or '{' or '}')) rtrue;</span>
|
|
<span class="string-syntax"> 'W':</span>
|
|
<span class="string-syntax"> if (ch == 10 or 13 or 32 or 9</span>
|
|
<span class="string-syntax"> or '.' or ',' or '!' or '?'</span>
|
|
<span class="string-syntax"> or '-' or '/' or '"</span><span class="character-syntax">' or '</span><span class="plain-syntax">:</span><span class="character-syntax">' or '</span><span class="plain-syntax">;</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="plain-syntax">(</span><span class="character-syntax">' or '</span><span class="plain-syntax">)</span><span class="character-syntax">' or '</span><span class="plain-syntax">[</span><span class="character-syntax">' or '</span><span class="plain-syntax">]</span><span class="character-syntax">' or '</span><span class="plain-syntax">{</span><span class="character-syntax">' or '</span><span class="plain-syntax">}</span><span class="character-syntax">') rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">d</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="constant-syntax">0</span><span class="character-syntax">' or '</span><span class="constant-syntax">1</span><span class="character-syntax">' or '</span><span class="constant-syntax">2</span><span class="character-syntax">' or '</span><span class="constant-syntax">3</span><span class="character-syntax">' or '</span><span class="constant-syntax">4</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="constant-syntax">5</span><span class="character-syntax">' or '</span><span class="constant-syntax">6</span><span class="character-syntax">' or '</span><span class="constant-syntax">7</span><span class="character-syntax">' or '</span><span class="constant-syntax">8</span><span class="character-syntax">' or '</span><span class="constant-syntax">9</span><span class="character-syntax">') rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">D</span><span class="character-syntax">':</span>
|
|
<span class="character-syntax"> if ((ch) && (ch ~= '</span><span class="constant-syntax">0</span><span class="character-syntax">' or '</span><span class="constant-syntax">1</span><span class="character-syntax">' or '</span><span class="constant-syntax">2</span><span class="character-syntax">' or '</span><span class="constant-syntax">3</span><span class="character-syntax">' or '</span><span class="constant-syntax">4</span><span class="character-syntax">'</span>
|
|
<span class="character-syntax"> or '</span><span class="constant-syntax">5</span><span class="character-syntax">' or '</span><span class="constant-syntax">6</span><span class="character-syntax">' or '</span><span class="constant-syntax">7</span><span class="character-syntax">' or '</span><span class="constant-syntax">8</span><span class="character-syntax">' or '</span><span class="constant-syntax">9</span><span class="character-syntax">')) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">l</span><span class="character-syntax">': if (CharIsOfCase(ch, 0)) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">L</span><span class="character-syntax">': if (CharIsOfCase(ch, 0) == false) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">u</span><span class="character-syntax">': if (CharIsOfCase(ch, 1)) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">U</span><span class="character-syntax">': if (CharIsOfCase(ch, 1) == false) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">n</span><span class="character-syntax">': if (ch == 10) rtrue;</span>
|
|
<span class="character-syntax"> '</span><span class="identifier-syntax">t</span><span class="character-syntax">': if (ch == 9) rtrue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if ((i+2<rt) && (BlkValueRead(ftxt, i+1) == '</span><span class="plain-syntax">-</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> upper = BlkValueRead(ftxt, i+2);</span>
|
|
<span class="character-syntax"> if ((ch >= chm) && (ch <= upper)) rtrue;</span>
|
|
<span class="character-syntax"> if (insens) {</span>
|
|
<span class="character-syntax"> crev = TEXT_TY_RevCase(ch);</span>
|
|
<span class="character-syntax"> if ((crev >= chm) && (crev <= upper)) rtrue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> i=i+2;</span>
|
|
<span class="character-syntax"> } else {</span>
|
|
<span class="character-syntax"> if (chm == ch) rtrue;</span>
|
|
<span class="character-syntax"> if ((insens) && (chm == TEXT_TY_RevCase(ch))) rtrue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> rfalse;</span>
|
|
<span class="character-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP18" class="paragraph-anchor"></a><b>§18. Search And Replace. </b>And finally, last but not least: the routine which searches an indexed
|
|
text <span class="extract"><span class="extract-syntax">txt</span></span> trying to match it against <span class="extract"><span class="extract-syntax">ftxt</span></span>. If <span class="extract"><span class="extract-syntax">ftxtype</span></span> is set to
|
|
<span class="extract"><span class="extract-syntax">REGEXP_BLOB</span></span> then <span class="extract"><span class="extract-syntax">ftxt</span></span> is expected to be a regular expression such
|
|
as <span class="extract"><span class="extract-syntax">ab+(c*de)?</span></span>, whereas if <span class="extract"><span class="extract-syntax">ftxtype</span></span> is <span class="extract"><span class="extract-syntax">CHR_BLOB</span></span> then it is expected
|
|
only to be a simple string of characters taken literally, such as <span class="extract"><span class="extract-syntax">frog</span></span>.
|
|
</p>
|
|
|
|
<p class="commentary">Each match found is replaced with the contents of <span class="extract"><span class="extract-syntax">rtxt</span></span>, except
|
|
that if the blob type is <span class="extract"><span class="extract-syntax">REGEXP_BLOB</span></span> then we recognise a few syntaxes
|
|
as special: for instance, <span class="extract"><span class="extract-syntax">\2</span></span> expands to the value of subexpression 2
|
|
as it was matched — see {\it Writing with Inform} for details.
|
|
</p>
|
|
|
|
<p class="commentary">The optional argument <span class="extract"><span class="extract-syntax">insens</span></span> is a flag which, if set, causes the matching
|
|
to be done case insensitively; the optional argument <span class="extract"><span class="extract-syntax">exactly</span></span>, if set,
|
|
causes the matching to work only if the entire <span class="extract"><span class="extract-syntax">txt</span></span> is matched. (This
|
|
is not especially useful with regular expressions, because the effect can
|
|
equally be achieved by turning <span class="extract"><span class="extract-syntax">ab+c</span></span>, say, into <span class="extract"><span class="extract-syntax">^ab+c$</span></span>, but it is
|
|
indeed useful where the blob type is <span class="extract"><span class="extract-syntax">CHR_BLOB</span></span>.)
|
|
</p>
|
|
|
|
<p class="commentary">For an explanation of the use of the word "blob", see "Text.i6t".
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_Replace_RE</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">insens</span><span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">r</span><span class="plain-syntax"> </span><span class="identifier-syntax">p</span><span class="plain-syntax"> </span><span class="identifier-syntax">p1</span><span class="plain-syntax"> </span><span class="identifier-syntax">p2</span><span class="plain-syntax"> </span><span class="identifier-syntax">cp</span><span class="plain-syntax"> </span><span class="identifier-syntax">cp1</span><span class="plain-syntax"> </span><span class="identifier-syntax">cp2</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">rint "Find: "; BlkValueDebug(ftxt); print "^";</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">rint "Rep: "; BlkValueDebug(rtxt); print "^";</span>
|
|
<span class="plain-syntax"> </span><span class="comment-syntax">rint "In: "; BlkValueDebug(txt); print "^";</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">) { </span><span class="identifier-syntax">cp</span><span class="plain-syntax"> = </span><span class="identifier-syntax">txt</span><span class="plain-syntax">-->0; </span><span class="identifier-syntax">p</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_Temporarily_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">); }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cp1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">-->0; </span><span class="identifier-syntax">p1</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_Temporarily_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cp2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">rtxt</span><span class="plain-syntax">-->0; </span><span class="identifier-syntax">p2</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_Temporarily_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">r</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_Replace_REI</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">rtxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">insens</span><span class="plain-syntax">, </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Untransmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">p1</span><span class="plain-syntax">, </span><span class="identifier-syntax">cp1</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Untransmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">p2</span><span class="plain-syntax">, </span><span class="identifier-syntax">cp2</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_Untransmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">p</span><span class="plain-syntax">, </span><span class="identifier-syntax">cp</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">r</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_Replace_REI</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt</span><span class="plain-syntax"> </span><span class="identifier-syntax">ftxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">insens</span><span class="plain-syntax"> </span><span class="identifier-syntax">exactly</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ctxt</span><span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> </span><span class="identifier-syntax">ilen</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">cl</span><span class="plain-syntax"> </span><span class="identifier-syntax">mpos</span><span class="plain-syntax"> </span><span class="identifier-syntax">cpos</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="identifier-syntax">chm</span><span class="plain-syntax">;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ilen</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_Err</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">switch</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">REGEXP_BLOB</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_RE_CompileTree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">CHR_BLOB</span><span class="plain-syntax">: </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CHR_CompileTree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">exactly</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">default</span><span class="plain-syntax">: </span><span class="string-syntax">"*** bad ftxtype ***"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"><0) || (</span><span class="identifier-syntax">i</span><span class="plain-syntax">></span><span class="identifier-syntax">RE_MAX_PACKETS</span><span class="plain-syntax">)) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_Err</span><span class="plain-syntax"> = </span><span class="identifier-syntax">i</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"*** Regular expression error: "</span><span class="plain-syntax">, (</span><span class="reserved-syntax">string</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_Err</span><span class="plain-syntax">, </span><span class="string-syntax">" ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RunTimeProblem</span><span class="plain-syntax">(</span><span class="identifier-syntax">RTP_REGEXPSYNTAXERROR</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugTree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"(compiled to "</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="string-syntax">" packets)^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax"> == </span><span class="identifier-syntax">REGEXP_BLOB</span><span class="plain-syntax">) </span><span class="identifier-syntax">TEXT_TY_RE_EmptyMatchVars</span><span class="plain-syntax">();</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mpos</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="identifier-syntax">chm</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">; </span><span class="identifier-syntax">cpos</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Parse</span><span class="plain-syntax">(</span><span class="identifier-syntax">ftxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">mpos</span><span class="plain-syntax">, </span><span class="identifier-syntax">insens</span><span class="plain-syntax">) >= </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">chm</span><span class="plain-syntax">++;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"^*** Match "</span><span class="plain-syntax">, </span><span class="identifier-syntax">chm</span><span class="plain-syntax">, </span><span class="string-syntax">" found ("</span><span class="plain-syntax">, </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">, </span><span class="string-syntax">","</span><span class="plain-syntax">,</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">, </span><span class="string-syntax">"): "</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> == </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"<empty>"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> (</span><span class="identifier-syntax">char</span><span class="plain-syntax">) </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">" ***^"</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">; </span><span class="comment-syntax">Accept only one match, replace nothing</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">chm</span><span class="plain-syntax"> == </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ctxt</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueCreate</span><span class="plain-syntax">(</span><span class="identifier-syntax">TEXT_TY</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Transmute</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">cpos</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">cl</span><span class="plain-syntax">+1 >= </span><span class="identifier-syntax">csize</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueSetLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="constant-syntax">2</span><span class="plain-syntax">*</span><span class="identifier-syntax">cl</span><span class="plain-syntax">) == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">++, </span><span class="identifier-syntax">ch</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_Concatenate</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">rtxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax">, </span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">cl</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mpos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">; </span><span class="identifier-syntax">cpos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">mpos</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA1</span><span class="plain-syntax"> == </span><span class="identifier-syntax">RE_PACKET_space</span><span class="plain-syntax">--></span><span class="identifier-syntax">RE_DATA2</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">mpos</span><span class="plain-syntax">++;</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">chm</span><span class="plain-syntax"> == </span><span class="constant-syntax">100</span><span class="plain-syntax">) { </span><span class="comment-syntax">Purely to keep the output from being excessive</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">print</span><span class="plain-syntax"> </span><span class="string-syntax">"(Stopping after 100 matches.)^"</span><span class="plain-syntax">; </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">chm</span><span class="plain-syntax"> > </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=</span><span class="identifier-syntax">cpos</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">ilen</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">cl</span><span class="plain-syntax">+1 >= </span><span class="identifier-syntax">csize</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">BlkValueSetLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="constant-syntax">2</span><span class="plain-syntax">*</span><span class="identifier-syntax">cl</span><span class="plain-syntax">) == </span><span class="reserved-syntax">false</span><span class="plain-syntax">) </span><span class="reserved-syntax">break</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">csize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">++, </span><span class="identifier-syntax">ch</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ftxtype</span><span class="plain-syntax"> == </span><span class="identifier-syntax">REGEXP_BLOB</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_CreateMatchVars</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">TEXT_TY_RE_Trace</span><span class="plain-syntax">)</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">TEXT_TY_RE_DebugMatchVars</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">rtxt</span><span class="plain-syntax"> ~= </span><span class="constant-syntax">0</span><span class="plain-syntax"> </span><span class="reserved-syntax">or</span><span class="plain-syntax"> </span><span class="constant-syntax">1</span><span class="plain-syntax">) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueWrite</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">, </span><span class="identifier-syntax">cl</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueCopy</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt</span><span class="plain-syntax">, </span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">BlkValueFree</span><span class="plain-syntax">(</span><span class="identifier-syntax">ctxt</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> }</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">chm</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax">];</span>
|
|
</pre>
|
|
<p class="commentary firstcommentary"><a id="SP19" class="paragraph-anchor"></a><b>§19. Concatenation. </b>See the corresponding routine in "Text.i6t": this is a variation
|
|
which handles the special syntaxes used in search-and-replace.
|
|
</p>
|
|
|
|
<pre class="displayed-code all-displayed-code code-font">
|
|
<span class="plain-syntax">[ </span><span class="identifier-syntax">TEXT_TY_RE_Concatenate</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt_to</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt_from</span><span class="plain-syntax"> </span><span class="identifier-syntax">blobtype</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt_ref</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">pos</span><span class="plain-syntax"> </span><span class="identifier-syntax">len</span><span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> </span><span class="identifier-syntax">tosize</span><span class="plain-syntax"> </span><span class="identifier-syntax">x</span><span class="plain-syntax"> </span><span class="identifier-syntax">y</span><span class="plain-syntax"> </span><span class="identifier-syntax">case</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">txt_to</span><span class="plain-syntax">==0) || (</span><span class="identifier-syntax">BlkValueWeakKind</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_to</span><span class="plain-syntax">) ~= </span><span class="identifier-syntax">TEXT_TY</span><span class="plain-syntax">)) </span><span class="reserved-syntax">rfalse</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">txt_from</span><span class="plain-syntax">==0) || (</span><span class="identifier-syntax">BlkValueWeakKind</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_from</span><span class="plain-syntax">) ~= </span><span class="identifier-syntax">TEXT_TY</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">txt_to</span><span class="plain-syntax">;</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">pos</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_to</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">tosize</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueLBCapacity</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_to</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">len</span><span class="plain-syntax"> = </span><span class="identifier-syntax">TEXT_TY_CharacterLength</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_from</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">for</span><span class="plain-syntax"> (</span><span class="identifier-syntax">i</span><span class="plain-syntax">=0:</span><span class="identifier-syntax">i</span><span class="plain-syntax"><</span><span class="identifier-syntax">len</span><span class="plain-syntax">:</span><span class="identifier-syntax">i</span><span class="plain-syntax">++) {</span>
|
|
<span class="plain-syntax"> </span><span class="identifier-syntax">ch</span><span class="plain-syntax"> = </span><span class="identifier-syntax">BlkValueRead</span><span class="plain-syntax">(</span><span class="identifier-syntax">txt_from</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">);</span>
|
|
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ch</span><span class="plain-syntax"> == </span><span class="character-syntax">'\') && (i < len-1)) {</span>
|
|
<span class="character-syntax"> ch = BlkValueRead(txt_from, ++i);</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="identifier-syntax">n</span><span class="character-syntax">') ch = 10;</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="identifier-syntax">t</span><span class="character-syntax">') ch = 9;</span>
|
|
<span class="character-syntax"> case = -1;</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="identifier-syntax">l</span><span class="character-syntax">') case = 0;</span>
|
|
<span class="character-syntax"> if (ch == '</span><span class="identifier-syntax">u</span><span class="character-syntax">') case = 1;</span>
|
|
<span class="character-syntax"> if (case >= 0) ch = BlkValueRead(txt_from, ++i);</span>
|
|
<span class="character-syntax"> if ((ch >= '</span><span class="constant-syntax">0</span><span class="character-syntax">') && (ch <= '</span><span class="constant-syntax">9</span><span class="character-syntax">')) {</span>
|
|
<span class="character-syntax"> ch = ch - '</span><span class="constant-syntax">0</span><span class="character-syntax">';</span>
|
|
<span class="character-syntax"> if (ch < RE_Subexpressions-->10) {</span>
|
|
<span class="character-syntax"> x = (RE_Subexpressions-->ch)-->RE_DATA1;</span>
|
|
<span class="character-syntax"> y = (RE_Subexpressions-->ch)-->RE_DATA2;</span>
|
|
<span class="character-syntax"> if (x >= 0) {</span>
|
|
<span class="character-syntax"> for (:x<y:x++) {</span>
|
|
<span class="character-syntax"> ch = BlkValueRead(txt_ref, x);</span>
|
|
<span class="character-syntax"> if (pos+1 >= tosize) {</span>
|
|
<span class="character-syntax"> if (BlkValueSetLBCapacity(txt_to, 2*tosize) == false) break;</span>
|
|
<span class="character-syntax"> tosize = BlkValueLBCapacity(txt_to);</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (case >= 0)</span>
|
|
<span class="character-syntax"> BlkValueWrite(txt_to, pos++, CharToCase(ch, case));</span>
|
|
<span class="character-syntax"> else</span>
|
|
<span class="character-syntax"> BlkValueWrite(txt_to, pos++, ch);</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> continue;</span>
|
|
<span class="character-syntax"> }</span>
|
|
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> if (pos+1 >= tosize) {</span>
|
|
<span class="character-syntax"> if (BlkValueSetLBCapacity(txt_to, 2*tosize) == false) break;</span>
|
|
<span class="character-syntax"> tosize = BlkValueLBCapacity(txt_to);</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> BlkValueWrite(txt_to, pos++, ch);</span>
|
|
<span class="character-syntax"> }</span>
|
|
<span class="character-syntax"> BlkValueWrite(txt_to, pos, 0);</span>
|
|
<span class="character-syntax"> return txt_to;</span>
|
|
<span class="character-syntax">];</span>
|
|
</pre>
|
|
<nav role="progress"><div class="progresscontainer">
|
|
<ul class="progressbar"><li class="progressprev"><a href="S-chr.html">❮</a></li><li class="progresssection"><a href="S-dfn.html">dfn</a></li><li class="progresssection"><a href="S-utl.html">utl</a></li><li class="progresssection"><a href="S-gll.html">gll</a></li><li class="progresssection"><a href="S-zmc.html">zmc</a></li><li class="progresssection"><a href="S-prg.html">prg</a></li><li class="progresssection"><a href="S-mth.html">mth</a></li><li class="progresssection"><a href="S-fl.html">fl</a></li><li class="progresssection"><a href="S-srt.html">srt</a></li><li class="progresssection"><a href="S-tbl.html">tbl</a></li><li class="progresssection"><a href="S-mst.html">mst</a></li><li class="progresssection"><a href="S-rlb.html">rlb</a></li><li class="progresssection"><a href="S-flx.html">flx</a></li><li class="progresssection"><a href="S-blc.html">blc</a></li><li class="progresssection"><a href="S-txt.html">txt</a></li><li class="progresssection"><a href="S-unc.html">unc</a></li><li class="progresssection"><a href="S-chr.html">chr</a></li><li class="progresscurrent">rgx</li><li class="progresssection"><a href="S-lst.html">lst</a></li><li class="progresssection"><a href="S-cmb.html">cmb</a></li><li class="progresssection"><a href="S-rlt.html">rlt</a></li><li class="progresssection"><a href="S-rlt2.html">rlt2</a></li><li class="progresssection"><a href="S-rtp.html">rtp</a></li><li class="progressnext"><a href="S-lst.html">❯</a></li></ul></div>
|
|
</nav><!--End of weave-->
|
|
|
|
</main>
|
|
</body>
|
|
</html>
|
|
|