This is my multi-machine disassembler.  This is a compilation document;
for a usage document, see README.

Ideally, it should work to just type "make".  But the ideal is not
achieved as often as it might be.  In roughly decreasing likelihood (in
my estimation) of their causing trouble, here are some of the issues
you may run into:

- I've used a semi-private compiler extension (labeled control
  structure), which will produce syntax errors in standard C.  See
  below for more on the philosophy of this and how to deal with it.

- The Makefile uses the "variable != command" assignment syntax.  If
  this doesn't work for you, either get a real make (like a recent
  Berkeley make) (heh, are my biases showing?) or just replace such
  lines with suitable variable settings, such as CC=gcc.

- The Makefile runs a command that pipes nm output through sed and an
  awk script to rebuild machines.c.  If you don't have nm, sed, and
  awk, or if your nm produces output in a form that can't easily be
  massaged this way, you may have to do away with this rule and put in
  a hand-written machines.c.

- The Makefile uses "ld -X -r" to combine multiple .o files into one.
  If you're not on a conventional Unix machine, this may cause trouble.
  You can probably work around it by removing that target and listing
  the multiple files instead of the single file in the MO variable.

- The code assumes a Unix-style I/O subsystem.  In particular (and,
  possibly, among other things), it does not make any distinction
  between binary and text files; this may cause trouble if you are
  saddled with an OS that does make such a distinction.

- The Makefile compiles a program and runs it to produce another source
  file.  This normally will not be a problem, but it may cause trouble
  if your CC is a cross-compiler.

- The code uses gcc-style nested functions.  If your compiler can't
  handle these, the simplest thing to do from my point of view is to
  get and use gcc instead.  It is probably possible to un-nest them,
  but it may not be simple to do so; I haven't looked into it in any
  detail.  The philosophical remarks below about compiler extensions
  apply here too.

- If you run into any other problems that you think belong on this
  list, please send me mail!

As for my use of compiler extensions...

The code uses two notable extensions to C, one being gcc-style nested
functions, the other being labeled control structure.  Whether it's
appropriate to use nonstandard extensions is arguable; I see arguments
in both directions.

On the one hand, using extensions makes the resulting code nonportable
to compilers that don't also implement them.

On the other hand, there's no point having extensions if they never get
used.

Using gccisms like nested functions doesn't bother me significantly.
gcc is widely enough available that there's comparatively little risk
it will be unavailable for a target platform (though I realize this may
be small consolation for someone wanting to build the code for a
platform where it's not), and the gccisms improve the expressive power
of the language greatly, making possible a number of fairly clean ways
of doing things that are relatively difficult in standard C.

Using labeled control structure bothers me a bit more.  Like nested
functions, it does not, technically, make possible anything that's
impossible in standard C, but it makes a number of things clearer.
Since my patches to gcc to implement labeled control structure are
publicly available, anyone who has gcc to start with (see the previous
point) can get labeled control structure support - though it may
require porting, or installing another gcc version, if the gcc version
at hand is too different from the ones I've added the support to.  And
I'd like to see such support spread; I think the extension is useful
enough that it deserves wider support.  Thus, even though it bothers me
more, I still come down in favor of using such tags.

For those who want the relevant gcc patches, they can be had in various
ways.  They can be found in my gitification of NetBSD's source tree,
which can be got via git clone on
git://git.rodents-montreal.org/Mouse/netbsd-fork/5.2/src,
.../4.0.1/src, or .../1.4T/src, depending on which version you want;
the relevant gcc versions are, respectively, 4.1.3, 4.1.2, and
egcs-1.1.2.  You can find the relevant commit by looking for a commit
whose message is "Add labeled control structure to gcc.".

For those who'd rather not grub about in their gcc, I've also written a
program designed to be run over preprocessor output, to convert most
uses of labeled control structure to something stock gcc can
understand.  This program is also available as a git repo,
git://git.rodents-montreal.org/lcs-cvt in this case.  It's designed to
be used as in

	gcc -E foo.c -o tmp.i
	lcs-cvt tmp.i
	gcc -c -o foo.o tmp.i

For those who would prefer to remove the control structure labels by
editing the code, or who want to put them into a different compiler,
here's a brief sketch of how they work.

Some control structure constructs - while, do, for, and switch - can be
labeled with strings.  These strings appear in < > immediately
following the introductory keyword, as in

	for <"foo"> (i=0; i<100; i++) { ... }
	switch <"my top"> (cmdchar()) { ... }
	do <"err"> { ... } whlie (0);

Certain other constructs - break and continue statements, case labels,
and default: labels - can have a similar tag:

	break <"err">;
	continue <"foo">;
	case <"my top"> 99:
	default <"main">:

In each case, the tag serves to indicate which control structure
statement the tagged construct is to apply to.  For example, in

	for <"foo"> (i=0;i<100;i++)
	 { ...
	   switch (array[i])
	    { ...
	      break <"foo">;
	      ...
	    }
	   ...
	 }

the tag causes the break to be associated with the for loop rather than
the switch, thus obviating the need to either use a goto or add an
otherwise unnecessary boolean variable.

When searching for a matching tag, inappropriate control structure
types are ignored.  For example, a tagged continue ignores switch()
tags, even if they match textually; a tagged case label ignores
everything but switches.  (I may change this someday, to error if the
nearest enclosing matching tagged structure is of the wrong type.)  The
scope of a tag is the body of the tagged construct.  Two constructs
with the same name will never produce an error; if they are not nested,
there is no ambiguity, and if they are nested, a reference to the tag
from within both will refer to the innermost.

It is always possible to eliminate tags by sufficient use of gotos.
Tagged breaks and continues simply turn into a goto to a label placed
appropriately with respect to the matching construct; tagged case and
default labels can be moved to the outermost level of their containing
switch and then followed by a goto to the place they were moved from.
I don't like writing gotos in the original source instead, though:

- The scope of a goto label is the entire containing function
  (admittedly, it's possible to restrict this in gcc).  This means that
  when reading the code, you can't know where control might goto that
  label from without reading the whole function.  It also means that
  you can't use the same label twice in any given function (problematic
  for use in macros).

- gotos kill optimization more effectively than the corresponding
  "structured" constructs.

- Even if a goto label is carefully and appropriately named, as in
  "goto break_array_search_loop", the risk exists that someone (perhaps
  not the person who initially wrote it) used it for some other
  purpose.  A tagged control structure tag _can't_ be used outside of
  its scope.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B