This is my multi-machine disassembler.  This is a compilation document;
for a usage document, see README.

Ideally, it should work to just type "make".  But the ideal is not
achieved as often as it should be.  In roughly decreasing likelihood
(in my estimation) of their causing trouble, here are some of the
issues you may run into:

- I've used a private compiler extension (tagged control structure),
  which will produce syntax errors in standard C.  See below for more
  on the philosophy of this and how to deal with it.  [At Blaz's
  insistence - apparently doing a "patch < fixup" on each import is too
  onerous - I replaced this with a goto and a label, so unless I've
  slipped up and used it again in another change, it shouldn't bother
  you.]

- The Makefile uses the "variable != command" assignment syntax.  If
  this doesn't work for you, either get a real make (like a recent
  Berkeley make) or just replace those lines with suitable variable
  settings: CC=gcc (or CC=cc), and VFLAG= (ie, an empty value).

- The Makefile runs a command that pipes nm output thorugh sed and an
  awk script to rebuild machines.c.  If you don't have nm, sed, and
  awk, or if your nm produces output in a form that can't easily be
  massaged this way, you may have to do away with this rule and put in
  a hand-written machines.c.

- The Makefile uses "ld -X -r" to combine multiple .o files into one.
  If you're not on a conventional Unix machine, this may cause trouble.
  You can probably work around it by removing that target and listing
  the multiple files instead of the single file in the MO variable.

- The code assumes a Unix-style I/O subsystem.  In particular (and,
  possibly, among other things), it does not make any distinction
  between binary and text files; this may cause trouble if you are
  saddled with an OS that does make such a distinction.

- The Makefile compiles a program and runs it to produce another source
  file.  This normally will not be a problem, but it may cause trouble
  if your CC is a cross-compiler.

- The code uses gcc-style nested functions.  If your compiler can't
  handle these, the simplest thing to do from my point of view is to
  get and use gcc instead.  It is probably possible to un-nest them,
  but it may not be simple to do so; I haven't looked into it in any
  detail.  The philosophical remarks below about compiler extensions
  apply here too.

- If you run into any other problems that you think belong on this
  list, please send me mail!

As for my use of compiler extensions...

The code uses two notable extensions to C, one being gcc-style nested
functions, the other being tagged control structure.  Whether it's
appropriate to use nonstandard extensions is arguable; I see arguments
in both directions.

On the one hand, using extensions makes the resulting code nonportable
to compilers that don't also implement them.

On the other hand, there's no point having extensions if they never get
used.

Using gccisms like nested functions doesn't bother me significantly.
gcc is widely enough available that there's comparatively little risk
it will be unavailable for a target platform (though I realize this may
be small consolation for someone wanting to build the code for a
platform where it's not), and the gccisms improve the expressive power
of the language greatly, making possible a number of fairly clean ways
of doing things that border on impossible in standard C.

Using tagged control structure bothers me a bit more.  Like nested
functions, it does not, technically, make possible anything that's
impossible in standard C, but it makes a number of things clearer.
Since my patches to gcc to implement tagged control structure are
publicly available, anyone who has gcc to start with (see the previous
point) can get tagged control structure support.  And I'd like to see
such support spread; I think the extension is useful enough that it
deserves wider support.  Thus, even though it bothers me more, I still
come down in favor of using such tags.

At this writing, the relevant gcc patches can be had by anonymous ftp
to ftp.rodents.montreal.qc.ca, in /mouse/source-tree/patches/working/
in various files.  (Fetch the "doc" file for a list of patches; search
for "semantics" to find the section that's relevant to tagged control
structure.)

I've also written a program designed to be run over preprocessor
output, to convert tagged control structure to something stock gcc can
understand.  This program is up for ftp from ftp.rodents.montreal.qc.ca
in /mouse/hacks/ under the name lcs-cvt.c.  It's designed to be used

	gcc -E foo.c > tmp.i
	lcs-cvt tmp.i
	gcc -c -o foo.o tmp.i

For those who would prefer to remove the control structure labels by
editing the code, or who want to put them into a different compiler,
here's a brief sketch of how they work.

Some control structure constructs - while, do, for, and switch - can be
labeled with strings.  These strings appear in < > immediately
following the introductory keyword, as in

	for <"foo"> (i=0; i<100; i++) { ... }
	switch <"my top"> (cmdchar()) { ... }
	do <"err"> { ... } whlie (0);

Certain other constructs - break and continue statements, case labels,
and default: labels - can have a similar tag:

	break <"err">;
	continue <"foo">;
	case <"my top"> 99:
	default <"main">:

In each case, the tag serves to indicate which control structure
statement the tagged construct is to apply to.  For example, in

	for <"foo"> (i=0;i<100;i++)
	 { ...
	   switch (array[i])
	    { ...
	      break <"foo">;
	      ...
	    }
	   ...
	 }

the tag causes the break to be associated with the for loop rather than
the switch, thus obviating the need to either use a goto or add an
otherwise unnecessary boolean variable.

When searching for a matching tag, inappropriate control structure
types are ignored.  For example, a tagged continue ignores switch()
tags, even if they match textually; a tagged case label ignores
everything but switches.  (I may change this someday, to error if the
nearest enclosing matching tagged structure is of the wrong type.)  The
scope of a tag is the body of the tagged construct.  Two constructs
with the same name will never produce an error; if they are not nested,
there is no ambiguity, and if they are nested, a reference to the tag
from within both will refer to the innermost.

It is always possible to eliminate tags by sufficient use of gotos.
Tagged breaks and continues simply turn into a goto to a label placed
appropriately with respect to the matching construct; tagged case and
default labels can be moved to the outermost level of their containing
switch and then followed by a goto to the place they were moved from.
I don't like using gotos instead, though:

- The scope of a goto label is the entire containing function (it's
  possible to restrict this in gcc, but it's a gccism).  This means
  that when reading the code, you can't know where control might goto
  that label from without reading the whole function.  It also means
  that you can't use the same label twice in any given function
  (problematic for use in macros).

- gotos kill optimization more effectively than the corresponding
  "structured" constructs.

- Even if a goto label is carefully and appropriately named, as in
  "goto break_array_search_loop", the risk exists that someone (perhaps
  not the person who initially wrote it) used it for some other
  purpose.  A tagged control structure tag _can't_ be used outside of
  its scope.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B