---- Input syntax description

The input syntax is loosely patterned after the VMS LIB$TPARSE
facility.  The bulk of the program consists of state and transition
declarations.  There are some other lines which are frequently used;
they are described after the state and transition declarations.

A state is declared with a `$state' line.  The syntax of this is

	$state	[name]

The optional name is to allow transitions to refer to this state from
far away.

A transition is declared with a `$tran' line.  The syntax of this is

	$tran	trigger [-> statename] [action]

The trigger specifies when this transition will be taken.  If the arrow
and statename are present, they specify what state the transition is
to; otherwise, the transition is to the next state listed (the one
corresponding to the next $state line in the input).  The statename can
also be $exit or $fail, causing the parse to succeed or fail
respectively.  An optional action routine may be specified.  This looks
like a normal C routine call (without the trailing semicolon that would
make it a statement), with one exception: if there are no explicit
arguments, the parentheses must be omitted.  One extra argument will be
added, after any arguments explicitly listed; what this argument is
depends on what sort of trigger is given.

The taking of a transition consumes zero or more characters of the
input string; how much of the string is consumed depends on what sort
of trigger is given.

The possible transitions out of a state are tried in order; the first
one to succeed is the one that is taken.  This means that it is legal
for more than one transition to match; the one taken is the one that is
listed first.  If none of the transitions out of a state match, the
parse fails as if a transition to $fail had been made.

Normally, all leading whitespace (as determined by isspace()) is
silently skipped upon entry to each state (ie, immediately before an
attempt is made to match the first transition, but after any action
routine applying to the transition that entered the state).  There is a
flag that can be set to inhibit this behavior; see the C interface
section for details.

Normally, action routines are assumed to return nothing interesting.
It is possible to make a transition's success conditional on something
returned by the action routine.  To do this, prefix the action routine
with a question mark.  When this is done, the action routine is assumed
to return an integer value.  If this value is zero, the transition
fails; otherwise it succeeds as normal.

The trigger can be

	- a string "...".  The transition will be taken if that string
	  is present at the current point in the input string; the
	  entire string will be consumed by the transition.  The extra
	  argument passed to the action routine is a pointer to a
	  string literal containing the trigger string (not a pointer
	  into the string being parsed).

	- a character 'c'.  The transition will be taken if that
	  character is the next character in the input string.  The
	  transition consumes the character from the input.  The extra
	  argument to the action routine is the character.

	- !statename (where statename must be the name of a state).
	  This corresponds to "calling a subroutine" in the state
	  table.  The current point in the input is remembered and the
	  state table is entered at the specified state.  The machine
	  is run until a transition to $exit or $fail (including
	  implicit $fail transitions due to failure to match any
	  transitions) is taken.  If the subparse succeeds (the
	  transition was to $exit), then the ! transition succeeds and
	  the input pointer is advanced to where it was when the $exit
	  transition was taken.  If the subparse failed, the input
	  pointer is backed up to the remembered value and the !
	  transition fails.  Side effects due to action routines called
	  during the (partial) subparse cannot, of course, be undone.
	  The extra argument to the action routine is the integer
	  constant zero.

	- a keyword specifying a class of characters or a particular
	  sort of character string:

	    $any
		Matches any character (except NUL).  Consumes exactly
		one character on match.  Action routine argument:
		character matched.

	    $decimal
		Any nonempty string of digits 0-9.  Consumes entire
		string.  Action routine argument: a `long int'
		containing the value of the number.

	    $digit
		Any single digit 0-9.  Consumes the digit.  Action
		routine argument: the digit matched.  (The character,
		not the corresponding integer 0-9.)

	    $eos
		Matches at end-of-string only.  Consumes nothing.
		Action routine argument: integer constant zero.

	    $hex
		Like $decimal except number is in hex.  (No leading 0x
		is included.)

	    $octal
		Like $decimal and $hex except number is in octal.

	    $binary
		Ditto for binary.

	    $symbol
		Matches a string of characters consisting of digits,
		letters of either case, dollar signs, and underscores,
		provided it does not begin with a digit.  Currently,
		this string must be no longer than 256 characters.
		Consumes the entire string.  Action routine argument: a
		copy of the string matched (not a pointer into the
		string being parsed).  This must be copied if it is to
		be saved; the string passed to the action routine will
		be destroyed at some more or less unpredictable time
		after the action routine returns.

	- the keyword $lambda, specifying a lambda transition.  This
	  type of transition always succeeds and consumes no
	  characters.  Action routine argument: integer constant zero.

	- the keyword $anyof, followed by a name that has previously
	  appeared on a line beginning with $anyof (see below).  This
	  matches exactly one character from the specified set.  It
	  consumes just that character; the action routine argument is
	  the character matched.

What other sorts of line are there?  They are distinguished by the
keyword at the beginning of the line.  $state and $tran have already
been described.  There are also $prefix, $trace, $initial, $action, and
$anyof.  Blank lines are ignored, and lines beginning with $$ have the
$$ stripped and are prepended (in order of appearance, otherwise
unchanged) to the output.

The $prefix line looks like `$prefix symbol' where symbol is any legal
C symbol.  This specifies a prefix which is put at the beginning of all
variables and functions generated for the machine.  This allows you to
use more than one fsm in a given program without name clashes.  If you
don't specify a $prefix line, the default is
$prefix FSM

The $trace, $action, and $initial lines each take a C function name, as
in
$trace trace_fxn
$action action_fxn
$initial initial_fxn

The function specified with $trace gets called each time a state is
entered.  It is called with two arguments.  The first argument is a
small integer giving the number of the state, which is usually close to
its number of appearance in the input (but this is not guaranteed); the
second one is the name given on the $state line, if any, otherwise a
concocted name of the form "State %d".

The function specified with $action gets called just before any action
routine is called.  It is called with one argument, that being a
character string showing the call about to be made, as it appears in
the generated C source (not as it appears in the input file).

The function specified with $initial gets called once each time the
parser is called; it is called just before the first state is entered.
In particular, it is called after the parser internals are set up but
before the input string is looked at.  This could be used, for example,
to turn on the FSM_FLAG_PARSE_BLANKS flag for a parser that is not
supposed to consider whitespace special.  (Beginning the parser with a
$lambda transition with an action would not be enough if the whitespace
occurs at the beginning of the string.)

If there is no $trace line, no trace routine is called for each state
entry; if there is no $action line, the corresponding function calls
are omitted before the action routines are called.

The $anyof line declares a character class for use with the $anyof
transition trigger.  The syntax is

$anyof name ...characters...

The name is any legal C symbol; the characters may be any characters.
Nonprintable characters are represented as in C, with backslash
escapes.  Spaces and tabs are ignored following the name up until the
first non-whitespace character.  (Spaces may be specified as \040 and
tabs as \t, if this causes any problem.)  The order of the characters
in the list is irrelevant and duplicates are ignored.

---- C interface to the resulting code

The generated C code contains several functions, some of which are
internal to the parser.  The ones whose interface is advertised are as
follows.  (PFX represents the name specified on a $prefix line.  The
FSMarg type is defined in fsm.h.)

PFX(s) char *s;

	This is the main entry point for the parser.  It takes a
	pointer to the string to parse and returns 0 if the parse
	failed and 1 if it is succeeded.  The state table is entered at
	the first state given in the input file.

FSMarg *PFXgetarg()

	This returns a pointer to a structure used during the parse.
	This is provided so that action routine can do things like set
	and clear flags here.  See the fsm.h file for a description of
	this structure.  Currently, the only interesting thing to do is
	change the FSM_FLAG_PARSE_BLANKS bit in the flags field.  This
	bit is clear by default; when set, the parser's normal action
	of stripping all whitespace on entry to each state is
	inhibited.

char *PFXrest()

	Once the main parsing function has returned, this can be called
	to determine the unparsed portion of the string.  It returns a
	pointer to the unparsed portion.  If the parse succeeded, this
	is where the parser had reached when the transition to $exit
	was made.  If the parse failed, this is an attempt at pointing
	to the character that caused it to fail.  This is not
	guaranteed, because recursion (! triggers) can confuse things.

All other names beginning PFX or _PFX should be considered reserved.

Please mail compliments, flames, bug reports/fixes, comments, etc to

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B