857 lines
40 KiB
Plaintext
857 lines
40 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
TECO-C
|
|||
|
|
|||
|
Programmer's Guide
|
|||
|
(last updated February 18, 1991 to reflect TECO-C version 140)
|
|||
|
|
|||
|
|
|||
|
|
|||
|
1 INTRODUCTION
|
|||
|
|
|||
|
These notes apply to TECOC version 135, which runs under VAX/VMS,
|
|||
|
MS-DOS, and Unix (SunOS, which is BSD). See file AAREADME.TXT for the
|
|||
|
specifics of operating system and compilers which have been used.
|
|||
|
|
|||
|
TECO-C is meant to be a complete implementation of TECO as defined by
|
|||
|
the Standard TECO User's Guide and Language Reference Manual, (in file
|
|||
|
TECO.DOC). It was written so that the author could move to many machines
|
|||
|
without knowing many editors.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
2 COMPILING AND LINKING
|
|||
|
|
|||
|
Conditional compilation directives are used to build TECO-C correctly
|
|||
|
for different environments. Identifiers automatically defined by
|
|||
|
the different compilers are used. Some identifiers defined in file ZPORT.H
|
|||
|
control whether video support or extra debugging code is included. See
|
|||
|
"VIDEO" and "DEBUGGING". Files are provided which "build" TECO-C in various
|
|||
|
environments. See file AAREADME.TXT for details.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
3 RUNNING TECO-C
|
|||
|
|
|||
|
When you run TECO, the command line used to invoke TECO is parsed for
|
|||
|
an input/output file name and several optional switches. TECO-11 parses the
|
|||
|
command line using a TECO macro imbedded in the program. TECO-C does the
|
|||
|
same thing. Actually, the imbedded macro used to parse the command line
|
|||
|
was stolen from TECO-11. I commented it and then modified it to repair
|
|||
|
minor inconsistencies. Use of TECO-11's macro makes TECO-C invocation
|
|||
|
identical to TECO-11's, even responding to "make love" with "Not war?".
|
|||
|
|
|||
|
The macro is in file CLPARS.TES. The compressed version (no comments
|
|||
|
or whitespace) is in file CLPARS.TEC. The GENCLP program converts
|
|||
|
CLPARS.TEC into CLPARS.H, an include file suitable for compiling into
|
|||
|
TECO-C.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
4 CODE CONVENTIONS
|
|||
|
|
|||
|
The code is not modular. Communication between almost all functions
|
|||
|
is through global variables, not argument lists. There is a reason: the
|
|||
|
nature of the basic parsing algorithm is to use the characters in the
|
|||
|
command string as indices into a table of functions. This makes for very
|
|||
|
fast command parsing, but it means that all the functions have to modify
|
|||
|
|
|||
|
Page 2
|
|||
|
|
|||
|
|
|||
|
global values, because no arguments are passed in. In other words, there
|
|||
|
were going to be 130 or so un-modular functions anyway, so I gave up on
|
|||
|
modularity. This explanation does not explain some of the complications in
|
|||
|
the search code, like the global variable SrcTyp. Oh, well.
|
|||
|
|
|||
|
Here's a brief list of some of the conventions followed by the code:
|
|||
|
|
|||
|
1. TECO-C is portable, so some convention was needed to separate
|
|||
|
portable code from system-dependent code. There is one file
|
|||
|
containing the system-dependent code for each platform TECO-C
|
|||
|
supports. These files have names that start with a "Z": ZVMS.C,
|
|||
|
ZMSDOS.C and ZUNIX.C.
|
|||
|
|
|||
|
All the system-dependent functions in those files start with a
|
|||
|
"Z". For example, the function that allocates memory is called
|
|||
|
ZAlloc. A VMS version of ZAlloc can be found in ZVMS.C, and an
|
|||
|
MS-DOS version can be found in ZMSDOS.C.
|
|||
|
|
|||
|
An extra file called ZUNKN.C exists to help efforts to port TECO-C
|
|||
|
to a new environment. This file contains stubs for all the
|
|||
|
system-dependent functions.
|
|||
|
|
|||
|
2. All system-independent global variables are declared
|
|||
|
alphabetically in file TECOC.C. They are defined in DEFEXT.H,
|
|||
|
which is included by all modules.
|
|||
|
|
|||
|
3. File TECOC.H contains the "global" definitions, including those
|
|||
|
for most structures.
|
|||
|
|
|||
|
4. Variables and functions are defined using the portability
|
|||
|
identifiers defined in the ZPORT.H file. Functions which do not
|
|||
|
return a value are defined as VOID. TECO-C should compile and
|
|||
|
link on different machines by changing only the environment
|
|||
|
definitions in the ZPORT.H file.
|
|||
|
|
|||
|
5. At one time, every function was in a file with the same name
|
|||
|
as the function. This made it easy to find the code for a
|
|||
|
function. The problem was that some groups of functions use data
|
|||
|
not needed by the other functions. This was especially true of
|
|||
|
the system-dependent functions. Also, some functions were called
|
|||
|
only by one other function, so it made sense for them to be in the
|
|||
|
same module as the caller and be made "static". So now, most
|
|||
|
functions are in a file named the same as the function, with the
|
|||
|
following exceptions:
|
|||
|
|
|||
|
1. All the "Z" functions are in are in the "Z" file for the given
|
|||
|
system.
|
|||
|
|
|||
|
2. The conditionally-compiled functions (ZCpyBl in ZINIT.C, the
|
|||
|
"Dbg" functions at the bottom of TECOC.C, the "v" functions in
|
|||
|
EXEW.C) aren't in their own files. If they were, then the
|
|||
|
command procedures/makefiles that compile the files would need
|
|||
|
to contain logic to conditionally compile the files.
|
|||
|
|
|||
|
Page 3
|
|||
|
|
|||
|
|
|||
|
3. The functions for the "E" and "F" commands are in EXEE.C and
|
|||
|
EXEF.C, respectively. So if you want to find function ExeEX,
|
|||
|
don't look for a file named EXEEX.C.
|
|||
|
|
|||
|
|
|||
|
6. Symbols are 6 characters long or less. The way I remember it,
|
|||
|
this was caused by the first system I wrote TECOC for: CP/M-68k,
|
|||
|
which had a limit of 8 characters for file names. The last two
|
|||
|
characters had to be ".C", so 6 characters were left for the file
|
|||
|
name. Since the file name was the same as the function it
|
|||
|
contained, functions were limited to 6 characters in length. When
|
|||
|
I saw how nicely the function declarations looked (they fit in one
|
|||
|
tab slot), I used 6 characters for other symbols too.
|
|||
|
|
|||
|
I've since been told that CP/M-68k has 8-character file names
|
|||
|
followed by 3-character file types, so CP/M-68k can't be blamed.
|
|||
|
So shoot me.
|
|||
|
|
|||
|
This standard has prevented problems from occurring with compilers
|
|||
|
that don't support very many characters of uniqueness.
|
|||
|
|
|||
|
In order to make up for the resultant cryptic names, upper and
|
|||
|
lower case are mixed. An uppercase letter indicates a new word.
|
|||
|
For example, "EBfEnd" stands for "Edit Buffer End". If you need
|
|||
|
to know what a variable name means, look at the definition of the
|
|||
|
variable in DEFEXT.H. The expanded version of the abbreviated
|
|||
|
name appears in the comment on the same line as the variable
|
|||
|
definition. A detailed description can be found with the
|
|||
|
declaration of the variable in TECOC.C.
|
|||
|
|
|||
|
The limit of 6 letters in variable names is relaxed in
|
|||
|
system-dependent code.
|
|||
|
|
|||
|
7. Variable and function names follow patterns where possible. For
|
|||
|
instance, "EBfBeg" and "EBfEnd" are the beginning and end of the
|
|||
|
edit buffer. If you see a variable named "BBfBeg", you can assume
|
|||
|
that it is the beginning of some other buffer, and that a "BBfEnd"
|
|||
|
exists which is the end of that buffer.
|
|||
|
|
|||
|
8. Character strings are usually represented in C by a pointer to a
|
|||
|
sequences of bytes terminated with a null character. I didn't do
|
|||
|
that in TECO-C because I thought it was too inefficient. To get
|
|||
|
the length of a string, you have to count characters. Most
|
|||
|
strings in TECO-C are therefore represented by two pointers, on to
|
|||
|
the first character and one to the character following the last
|
|||
|
character. With this representation, it's easy to add characters
|
|||
|
to a string and trivial to get the length.
|
|||
|
|
|||
|
9. Each file has a consistent format, which is:
|
|||
|
|
|||
|
1. a comment describing the function
|
|||
|
2. include directives
|
|||
|
3. the function declaration
|
|||
|
|
|||
|
Page 4
|
|||
|
|
|||
|
|
|||
|
4. local variable definitions, in alphabetical order
|
|||
|
5. code
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
5 TOP LEVEL EXECUTION AND COMMAND PARSING
|
|||
|
|
|||
|
The top level code for TECO-C is contained in file TECOC.C. It is
|
|||
|
very simple: after initializing, a loop is entered which reads a command
|
|||
|
string from the user, executes it, and loops back to read another command
|
|||
|
string. If the user executes a command which causes TECO-C to exit, the
|
|||
|
program is exited directly via a call to the TAbort function. TECO-C never
|
|||
|
exits by "falling out the bottom" of the main function.
|
|||
|
|
|||
|
After a command string is read, the ExeCSt function is called to
|
|||
|
execute the command string. ExeCSt contains the top-level parsing code.
|
|||
|
The parse is trivial: each command character is used as an index into a
|
|||
|
table of functions. The table contains one entry for each of the 128
|
|||
|
possible characters. Each function is responsible for "consuming" its
|
|||
|
command so that when it returns, the command string pointer points to the
|
|||
|
next command.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
5.1 Error Handling
|
|||
|
|
|||
|
When an error is detected, an error message is displayed at the point
|
|||
|
that the error is detected, and the function in which the error was
|
|||
|
detected returns a FAILURE status to its caller. Almost always, the caller
|
|||
|
returns a FAILURE status to it's caller, which returns a FAILURE status to
|
|||
|
it's caller, etc. When a FAILURE status is returned to the main command
|
|||
|
string parser, parsing of the command string stops and the user is prompted
|
|||
|
for a new command string.
|
|||
|
|
|||
|
This style tends to cause all function calls to follow the same form,
|
|||
|
which is
|
|||
|
|
|||
|
if (function() == FAILURE)
|
|||
|
return(FAILURE);
|
|||
|
|
|||
|
Things get more complicated in the system-dependent code (in the files
|
|||
|
with names that start with a "Z"). I extended TECO's error reporting
|
|||
|
slightly to allow the user to see the operating system's reason for an
|
|||
|
error, as this is often useful. For example, under VAX/VMS there are many
|
|||
|
reasons why an attempt to create an output file might fail. They include:
|
|||
|
errors in file name syntax, destination directory non-existence, file
|
|||
|
protection violations or disk quota violation. In order to supply enough
|
|||
|
information to the user, TECO-C outputs multiple-line error messages when a
|
|||
|
system error occurs.
|
|||
|
|
|||
|
Multiple-line error messages contain one line that describes the
|
|||
|
operating system's perception of the error and one line that describe's
|
|||
|
TECO's perception of the error. For instance, if a user of VAX/VMS does a
|
|||
|
"EW[abc]test.txt$$" command when the directory [abc] does not exist, the
|
|||
|
|
|||
|
Page 5
|
|||
|
|
|||
|
|
|||
|
error message generated by TECO-C is:
|
|||
|
|
|||
|
?SYS %RMS-F-DNF, directory not found
|
|||
|
?UFO unable to open file "[abc]test.txt" for output
|
|||
|
|
|||
|
System errors are therefore reported in a system-dependent fashion,
|
|||
|
using whatever messages the operating system can supply. Under VAX/VMS,
|
|||
|
the system service $GETMSG provides human-readable messages that TECO-C can
|
|||
|
use in the "SYS" part of the error message. Under UNIX, syserrlist[error]
|
|||
|
is a pointer to these messages.
|
|||
|
|
|||
|
There is another way in which error reporting in the system-dependent
|
|||
|
code is tricky. Under VAX/VMS, some system calls may return a code that is
|
|||
|
"successful" but contains extra information. For instance, when a user has
|
|||
|
set his directories so that only a limited number of versions of a file can
|
|||
|
exist, RMS will automatically purge the oldest version of the file when the
|
|||
|
user creates a file. This only happens if the newly created file would
|
|||
|
cause too many versions of the file to exist. When this happens, the VMS
|
|||
|
service returns a FILEPURGED status, which is successful. TECO-C informs
|
|||
|
the user about these things by displaying the message in brackets.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
5.2 Command Modifiers (CmdMod)
|
|||
|
|
|||
|
Command parsing is complicated by command modifiers and numeric
|
|||
|
arguments, which may precede some commands. These are implemented in a way
|
|||
|
that maintains the basic "jump table" idea. For instance, when an at-sign
|
|||
|
(@) modifier is encountered in a command string, the at-sign command
|
|||
|
function (ExeAtS) is called. The only thing ExeAtS does is set a flag
|
|||
|
indicating that an at-sign has been encountered. Commands which are
|
|||
|
affected by an at-sign modifier check this flag and behave accordingly.
|
|||
|
|
|||
|
The flags which indicate command modifiers are contained in global
|
|||
|
variable CmdMod. A bit in CmdMod is reserved for each command modifier.
|
|||
|
The modifiers are "@", ":" and "::". Of course, once the flag has been
|
|||
|
set, it must be cleared. With this parsing algorithm, the only way to do
|
|||
|
that is to make every command function explicitly reset CmdMod before a
|
|||
|
successful return. This is not too bad: clearing all the flags in CmdMod
|
|||
|
is done with one statement: "CmdMod = '\0';".
|
|||
|
|
|||
|
For numeric arguments to commands, an expression stack is used (see
|
|||
|
Stacks). The EstTop variable is the pointer to the top of the expression
|
|||
|
stack. Commands which handle numeric arguments check EStTop to see if the
|
|||
|
expression stack contains a value.
|
|||
|
|
|||
|
A special case of numeric arguments is "m,n". The "m" part is
|
|||
|
encountered and causes the value to be pushed onto the expression stack.
|
|||
|
The comma causes the ExeCom function to move the value into a special
|
|||
|
"m-argument" global variable (MArgmt), clear the expression stack and set
|
|||
|
another flag in CmdMod indicating that the "m" part of an "m,n" pair is
|
|||
|
defined. Then the "n" is encountered and pushed onto the stack. Commands
|
|||
|
which can take "m,n" pairs check the flag in CmdMod.
|
|||
|
|
|||
|
Page 6
|
|||
|
|
|||
|
|
|||
|
To summarize, CmdMod and EStTop are variables which describe the
|
|||
|
context of a command. Each command function tests these variables to see
|
|||
|
if it was preceded by modifiers or numbers. For this to work, it is
|
|||
|
important that the expression stack and the flags in CmdMod are cleared at
|
|||
|
the right times. It is the responsibility of each command function to
|
|||
|
leave CmdMod and EStTop with the proper values before successfully
|
|||
|
returning. The rules are:
|
|||
|
|
|||
|
1. If the command function is returning FAILURE, don't worry about
|
|||
|
clearing CmdMod or EStTop. They will be cleared before the next
|
|||
|
command string is executed.
|
|||
|
|
|||
|
2. If the command function leaves a value on the expression stack, do
|
|||
|
not clear EStTop before returning SUCCESS. If the command calls
|
|||
|
GetNmA, do not clear EStTop, as GetNmA does it for you.
|
|||
|
Otherwise, clear EStTop before returning SUCCESS.
|
|||
|
|
|||
|
3. Clear CmdMod unless the command function sets flags or needs to
|
|||
|
leave them alone. ExeDgt, for example, handles digit strings and
|
|||
|
doesn't clear CmdMod because the MARGIS bit may be set.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
6 SEARCHING
|
|||
|
|
|||
|
The search algorithm in TECO-C is complex. The war between the desire
|
|||
|
for a fast search and the need to handle all the features of TECO'ssearch
|
|||
|
commands has produced code which can be a real pain to follow. This
|
|||
|
section attempts to explain how things got the way they are. The code is
|
|||
|
explained in a bottom-up fashion, to follow the way it evolved in the
|
|||
|
author's twisted mind.
|
|||
|
|
|||
|
The basic search idea is to scan a contiguous edit buffer for a search
|
|||
|
string. The steps are:
|
|||
|
|
|||
|
1. Search the edit buffer for the first character in the search
|
|||
|
string. If you reach the end of the edit buffer without matching,
|
|||
|
the search fails.
|
|||
|
|
|||
|
2. When the first character of the search string matches a character
|
|||
|
in the edit buffer, try to match successive characters in the
|
|||
|
search string with the characters which follow the found character
|
|||
|
in the edit buffer. If they all match, the search succeeds. If
|
|||
|
one doesn't, go back to step 1.
|
|||
|
|
|||
|
|
|||
|
This is basically what TECO-C does. The features of TECO's search
|
|||
|
commands has buried these steps deep within some confusing code.
|
|||
|
|
|||
|
The first complication is introduced by pattern matching characters.
|
|||
|
TECO has 17 "match constructs", whiceh are indicated in the search string
|
|||
|
by the special characters ^X, ^S, ^N and ^Ex where "x" can be several other
|
|||
|
characters. For instance, a ^X in the search string means that any
|
|||
|
character is to be accepted as a match in place of the ^X. Characters
|
|||
|
|
|||
|
Page 7
|
|||
|
|
|||
|
|
|||
|
other than the match constructs represent themselves. An example: the
|
|||
|
search string "a^Xb" contains 3 match constructs: a, ^X and b.
|
|||
|
|
|||
|
TECO also supports forward or backward searching. When searching
|
|||
|
backwards, only the search for the first match construct in the search
|
|||
|
string is done in a backwards direction. When the character is found, the
|
|||
|
characters following it are compared in a forward direction to the edit
|
|||
|
buffer characters. This means that once the first match construct has been
|
|||
|
found, a single piece of code can be used to compare successive characters
|
|||
|
in the search string with successive characters in the edit buffer,
|
|||
|
regardless of whether the search is forwards or backwards.
|
|||
|
|
|||
|
Adding these new features, the new description of searching is:
|
|||
|
|
|||
|
1. Search the edit buffer forwards or backwards for a character which
|
|||
|
matches the first match construct in the search string. If you
|
|||
|
reach the end of the edit buffer without matching, the search
|
|||
|
fails.
|
|||
|
|
|||
|
2. When the first match construct of the search string matches a
|
|||
|
character in the edit buffer, try to match successive match
|
|||
|
constructs in the search string with the characters which follow
|
|||
|
the found character in the edit buffer. If they all match, the
|
|||
|
search succeeds. If one doesn't, go back to step 1.
|
|||
|
|
|||
|
|
|||
|
To begin a description of which routines implement the above steps,
|
|||
|
and in order to have a reference for later discussion, the following
|
|||
|
hierarchy chart of "who calls who" is presented.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Page 8
|
|||
|
|
|||
|
|
|||
|
ExeEUn ExeFB ExeFC ExeFD ExeFK ExeFN ExeFS ExeFUn ExeN ExeS ExeUnd
|
|||
|
| | | | | | | | | | |
|
|||
|
| | | | | | | | | | |
|
|||
|
------------------------------------------------------------
|
|||
|
|
|
|||
|
V
|
|||
|
Search
|
|||
|
|
|
|||
|
V
|
|||
|
SrcLop
|
|||
|
|
|
|||
|
V
|
|||
|
SSerch
|
|||
|
| | |
|
|||
|
+------+ | +------+
|
|||
|
+---+ | | | +---+
|
|||
|
| V V | V V |
|
|||
|
| ZFrSrc | BakSrc |
|
|||
|
| | | | | | |
|
|||
|
+---+ | | | +---+
|
|||
|
+------+ | +------+
|
|||
|
V V V
|
|||
|
CMatch <--+
|
|||
|
| |
|
|||
|
+--------+
|
|||
|
|
|||
|
|
|||
|
At the top are the functions that implement search commands (E_, FB,
|
|||
|
FC, FD, FK, FN, FS, F_, N, S and _). All of these functions call the main
|
|||
|
search function: Search.
|
|||
|
|
|||
|
At the lower level are the functions which implement steps 1 and 2
|
|||
|
described above. ZFrSrc searches forwards in the edit buffer for
|
|||
|
characters which match the first character in the search string. BakSrc
|
|||
|
does the same thing, but searches backwards. SSerch calls one of these two
|
|||
|
functions and then executes a loop which calls CMatch to compare successive
|
|||
|
match constructs in the search string to characters following the found
|
|||
|
character in the edit buffer. The reason that ZFrSrc, BakSrc and CMatch
|
|||
|
call themselves is to handle some of the more esoteric match constructs.
|
|||
|
|
|||
|
Case dependence in TECO is controlled by the search mode flag (see the
|
|||
|
^X command). The variable SMFlag holds the value of the search mode flag,
|
|||
|
and is used by ZFrSrc, BakSrc and CMatch.
|
|||
|
|
|||
|
One final point to help confuse things: ZFrSrc is system-dependent.
|
|||
|
It contains a VAX/VMS-specific version which uses the LIB$SCANC run-time
|
|||
|
library routine to access the SCANC instruction. The SCANC instruction
|
|||
|
looks like it was designed to handle TECO's match constructs. I couldn't
|
|||
|
resist using it, but it was a mistake, as it needlessly complicates an
|
|||
|
already messy algorithm. I have decided to remove the VMS-specific code
|
|||
|
some time in the future.
|
|||
|
|
|||
|
Further complications of the search algorithm arise because of the
|
|||
|
following capabilities of TECO searches:
|
|||
|
|
|||
|
Page 9
|
|||
|
|
|||
|
|
|||
|
1. If there is no text argument, use the previous search argument.
|
|||
|
|
|||
|
2. If colon modified, return success/failure and no error message
|
|||
|
|
|||
|
3. If the search fails and we're in a loop and a semicolon follows
|
|||
|
the search command, exit the loop without displaying an error
|
|||
|
message.
|
|||
|
|
|||
|
4. Handle optional repeat counts
|
|||
|
|
|||
|
5. If the ES flag is non-zero, verify the search based on the value
|
|||
|
of the flag.
|
|||
|
|
|||
|
6. If bit 64 of the ED flag is set, move dot by one on multiple
|
|||
|
searches.
|
|||
|
|
|||
|
7. If bit 16 of the ED flag is set, don't move after a failing
|
|||
|
search.
|
|||
|
|
|||
|
8. Be fast.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
7 MEMORY MANAGEMENT
|
|||
|
|
|||
|
7.1 The Edit Buffer And Input Buffer
|
|||
|
|
|||
|
TECO-C is based on TECO-11, but it uses a different form of edit
|
|||
|
buffer memory management. Here's why.
|
|||
|
|
|||
|
The edit buffer in TECO-11 is implemented as a continuous block of
|
|||
|
memory. This allows rapid movement through the edit buffer (by just
|
|||
|
maintaining a pointer to the current spot) and makes searches very
|
|||
|
straightforward. Insertion and deletion of text is expensive, because each
|
|||
|
insertion or deletion requires moving the text following the spot where the
|
|||
|
insertion or deletion occurs in order to maintain a continuous block of
|
|||
|
memory. This gets to be a real pain when a video editing capability is
|
|||
|
added to TECO, because in video mode text is added/deleted one character at
|
|||
|
a time very rapidly.
|
|||
|
|
|||
|
TECO-C uses a edit buffer gap scheme. The edit buffer occupies a
|
|||
|
continuous piece of memory, but there is a gap at the "current spot" in the
|
|||
|
edit buffer. When the user moves around the edit buffer, the gap is moved
|
|||
|
by shuffling text from one side of the gap to the other. This means that
|
|||
|
moving around the text buffer is slower than for TECO-11's scheme, but text
|
|||
|
insertion and deletion is very fast. Searches are still fast because most
|
|||
|
searches start at the current spot and go forwards or backwards, so a
|
|||
|
continuous piece of memory is searched. In the future, when some kind of
|
|||
|
video mode is added, insertion and deletion one-character-at-a-time will be
|
|||
|
fast using the gap scheme.
|
|||
|
|
|||
|
The variables that maintain pointers to the edit buffer and the gap
|
|||
|
within the buffer can be confusing, so here's some examples. Suppose that
|
|||
|
10000 bytes are allocated for the edit buffer when TECO-C is initialized.
|
|||
|
|
|||
|
Page 10
|
|||
|
|
|||
|
|
|||
|
Suppose the allocated memory starts at address 3000.
|
|||
|
|
|||
|
Empty edit buffer (the gap spans the whole edit buffer):
|
|||
|
|
|||
|
EBfBeg = 3000 (edit buffer beginning)
|
|||
|
GapBeg = 3000 (gap beginning)
|
|||
|
GapEnd = 13000 (gap end)
|
|||
|
EBfEnd = 13000 (edit buffer end)
|
|||
|
|
|||
|
Buffer contains "test", character pointer is before the first 't':
|
|||
|
|
|||
|
EBfBeg = 3000 (edit buffer beginning)
|
|||
|
GapBeg = 3000 (gap beginning)
|
|||
|
GapEnd = 12996 (gap end)
|
|||
|
12997 't'
|
|||
|
12998 'e'
|
|||
|
12999 's'
|
|||
|
EBfEnd = 13000 't' (edit buffer end)
|
|||
|
|
|||
|
|
|||
|
Buffer contains "test", character pointer is after the last 't':
|
|||
|
|
|||
|
EBfBeg = 3000 't' (edit buffer beginning)
|
|||
|
3001 'e'
|
|||
|
3002 's'
|
|||
|
3003 't'
|
|||
|
GapBeg = 3004 (gap beginning)
|
|||
|
GapEnd = 13000 (gap end)
|
|||
|
EBfEnd = 13000 (edit buffer end)
|
|||
|
|
|||
|
|
|||
|
Buffer contains "test", character pointer is after the 'e':
|
|||
|
|
|||
|
EBfBeg = 3000 't' (edit buffer beginning)
|
|||
|
3001 'e'
|
|||
|
GapBeg = 3002 (gap beginning)
|
|||
|
GapEnd = 12998 (gap end)
|
|||
|
12999 's'
|
|||
|
EBfEnd = 13000 't' (edit buffer end)
|
|||
|
|
|||
|
When an insertion command is executed, the text is inserted starting
|
|||
|
at GapBeg. When a deletion command is executed, GapEnd is incremented for
|
|||
|
a forward delete or GapBeg is decremented for a backwards delete. When the
|
|||
|
character pointer is moved forwards, the gap is moved forwards by copying
|
|||
|
text from the end of the gap to the beginning. When the character pointer
|
|||
|
is moved backwards, the gap is moved backwards by copying text from the the
|
|||
|
area just before the gap to the area at the end of the gap.
|
|||
|
|
|||
|
There are a few messy cases, such as when a bounded search is executed
|
|||
|
and the bounded text area includes the edit buffer gap. In this case, the
|
|||
|
gap is temporarily moved so that the search can proceed over a continuous
|
|||
|
memory area.
|
|||
|
|
|||
|
Page 11
|
|||
|
|
|||
|
|
|||
|
In order to confuse things a little, TECO-C has one addition to the
|
|||
|
basic edit buffer gap management. Following the end of the edit buffer
|
|||
|
(EBfEnd) is the current input stream buffer. Since file input commands
|
|||
|
always cause text to be appended to the end of the edit buffer, this is
|
|||
|
natural. Thus, no input buffer is needed: text is input directly into the
|
|||
|
edit buffer. This makes the code a little confusing, but it avoids the
|
|||
|
problem of having an input buffer. When you have an input buffer, you have
|
|||
|
to deal with the question of how large the buffer should be and what to do
|
|||
|
with it when it's too small. this scheme is fast and and saves some
|
|||
|
memory. (see File Input)
|
|||
|
|
|||
|
|
|||
|
|
|||
|
7.2 Q-registers
|
|||
|
|
|||
|
Q-registers have two parts: a numeric part and a text part. Each
|
|||
|
q-register is represented by a structure containing three fields: one to
|
|||
|
hold the numeric part and two to point to the beginning and end of the
|
|||
|
memory holding the text part. If the text part of the q-register is empty,
|
|||
|
then the pointer to the beginning of the text is NULL.
|
|||
|
|
|||
|
There are 36 global q-registers, one for each letter of the alphabet
|
|||
|
and 1 for each digit from 0 to 9. These q-registers are accessible from
|
|||
|
any macro level. There are 36 local q-registers for each macro level. The
|
|||
|
names for local q-registers are preceded by a period. Thus the command
|
|||
|
"1xa" inserts a line into global q-register "a", while the command "1x.a"
|
|||
|
inserts a line into local q-register ".a". Storage for the data structure
|
|||
|
defining local q-registers is not allocated until a local q-register is
|
|||
|
first used. This saves space and time, because local q-registers are
|
|||
|
rarely used, and doing things this way avoids allocating and freeing memory
|
|||
|
every time a macro is executed.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
8 STACKS
|
|||
|
|
|||
|
8.1 Expression Stack
|
|||
|
|
|||
|
An expression stack is used to parse TECO's expressions. Consider the
|
|||
|
command string QA+50=$$. When the command string is executed, the value of
|
|||
|
QA is pushed on the expression stack, then the operator "+" is pushed on
|
|||
|
the expression stack, and then the value "50" is pushed on the expression
|
|||
|
stack. Whenever a full expression that can be reduced is on the expression
|
|||
|
stack, it is reduced. For the above example, the stack is reduced when the
|
|||
|
value "50" is pushed.
|
|||
|
|
|||
|
The expression stack is implemented in the following variables:
|
|||
|
|
|||
|
EStack the stack itself, containing saved operators and operands
|
|||
|
EStTop index of the top element in EStack
|
|||
|
EStBot index of the current "bottom" of the stack in EStack
|
|||
|
|
|||
|
The "bottom" of the expression stack can change because an expression
|
|||
|
can include a macro invocation. For example, the command QA+M3=$$ causes
|
|||
|
the value of "QA" to be pushed on the expression stack, then the "+" is
|
|||
|
|
|||
|
Page 12
|
|||
|
|
|||
|
|
|||
|
pushed, and then the macro contained in q-register 3 is executed. The
|
|||
|
macro in q-register 3 returns a value to be used in the expression. When
|
|||
|
the macro is entered, a new expression stack "bottom" is established. This
|
|||
|
allows the macro to have a "local" expression stack bottom while
|
|||
|
maintaining the stack outside the macro.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
8.2 Loop Stack
|
|||
|
|
|||
|
The loop stack contains the loop count and the address of the first
|
|||
|
command in the loop. For example, in the command 5<FSMP$mt$>$$, the loop
|
|||
|
stack contains the loop count (5) and the address of the first command in
|
|||
|
the loop (F). Whenever the end-of-loop character (>) is encountered, the
|
|||
|
loop count is decremented. If the loop count is still greater than zero
|
|||
|
after it has been decremented, then the command string pointer is reset to
|
|||
|
point to the first character in the loop (F).
|
|||
|
|
|||
|
The loop stack is implemented in the following variables:
|
|||
|
|
|||
|
LStack the stack itself, containing saved counts and addresses
|
|||
|
LStTop index of the top element in LStack
|
|||
|
LStBot index of the current "bottom" of the stack in LStack
|
|||
|
|
|||
|
The loop stack needs a "floating" bottom for the same reason that the
|
|||
|
expression stack needs one: macros. Consider the command string
|
|||
|
4<Smp$M7$>$$. When the "<" in is encountered, the loop count (4) and the
|
|||
|
address of the first character in the loop (S) are placed on the loop
|
|||
|
stack. Command execution continues, and the "M7" command is encountered.
|
|||
|
Suppose that q-register 7 contains the erroneous command string 10>DL>$$.
|
|||
|
When the ">" command is encountered in the macro, TECO expects the loop
|
|||
|
stack to contain a loop count and an address for the first character in the
|
|||
|
loop. In this example, there is no matching "<" command in the macro which
|
|||
|
would have set up the loop stack. It would be very bad if TECO were to
|
|||
|
think that the loop count was 4 and the first command in the loop was "S".
|
|||
|
In this situation, what TECO should do is generate the error message "BNI >
|
|||
|
not in iteration". In order to implement this, the variable LStBot is
|
|||
|
adjusted each time a macro is entered or exited. LStBot represents the
|
|||
|
bottom of the loop stack for the current macro level.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
8.3 Macro Stack
|
|||
|
|
|||
|
The macro stack is used to preserve context each time a macro is
|
|||
|
entered. All important values are pushed onto the stack before a macro is
|
|||
|
entered and popped off the stack when the macro is exited. The macro stack
|
|||
|
is also used by the EI command, which means it's used when executing
|
|||
|
initialization files and mung files.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
9 HELP
|
|||
|
|
|||
|
This section discusses on-line HELP, which is available only under
|
|||
|
|
|||
|
Page 13
|
|||
|
|
|||
|
|
|||
|
VAX/VMS.
|
|||
|
|
|||
|
The HELP command is not documented in the TECO manual distributed by
|
|||
|
DEC., even though it is supported in TECO-11 and TECO-32. To get help,
|
|||
|
simply type "HELP" followed by a carriage return. HELP is the only TECO
|
|||
|
command that is not terminated by double escapes.
|
|||
|
|
|||
|
Help in TECOC is different than help in TECO-11. In TECO-C,
|
|||
|
interactive help mode is entered, so that a user can browse through a help
|
|||
|
tree, as he can from DCL. In TECO-C, access is provided to only two
|
|||
|
libraries: the library specific to TECO-C (pointed to by logical name
|
|||
|
TEC$INIT) and the system help library. To get help on TECO-C, just say
|
|||
|
"HELP", with or without arguments. To get help from the system library,
|
|||
|
say "HELP/S". I find this easier to use than TECO-11's syntax.
|
|||
|
|
|||
|
The help library for TECO-C is contained in file TECOC.HLB, which is
|
|||
|
generated from TECOC.HLP, which is generated from TECOC.RNH. See file
|
|||
|
TECOC.RNH for a description of how to do it. This help library is far
|
|||
|
broader than the library for TECO-11, but much of it has yet to be filled
|
|||
|
in.
|
|||
|
|
|||
|
The help library is also the repository for verbose error messages,
|
|||
|
which are displayed when the help flag (EH) is set to 3. For systems other
|
|||
|
than VMS, the ZHelp function displays verbose text contained in static
|
|||
|
memory (see file ZHELP.C).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
10 FILE INPUT
|
|||
|
|
|||
|
TECO has an elegant design that allows high speed input. There are no
|
|||
|
linked list data structures to keep track of, and most file input goes
|
|||
|
directly to the end of the edit buffer.
|
|||
|
|
|||
|
TECO-C takes advantage of this by reading normal file input directly
|
|||
|
to the end of the edit buffer. After each input call, nothing needs to be
|
|||
|
moved; the pointer to the end of the edit buffer is simply adjusted to
|
|||
|
point to the end of the new record. The pointer to the end of the edit
|
|||
|
buffer (EBfEnd) serves two purposes: it points to the end of the edit
|
|||
|
buffer and to the beginning of the input buffer.
|
|||
|
|
|||
|
A side effect of this scheme is the sharing of memory between the edit
|
|||
|
buffer and the input buffer. When the edit buffer is empty, it can be made
|
|||
|
smaller by shrinking the edit buffer gap in order to make the input buffer
|
|||
|
larger. Obviously, if the edit buffer needs to be expanded, the input
|
|||
|
buffer can suffer before more memory is actually requested from the
|
|||
|
operating system. This is easily achieved by moving the pointer to the
|
|||
|
"end-of-the-edit-buffer"/ "beginning-of-the-input-buffer".
|
|||
|
|
|||
|
This scheme works, but provides no support for the other forms of file
|
|||
|
input. The EP and ER$ commands provide a complete secondary input stream
|
|||
|
which can be open at the same time as the primary stream (two input files
|
|||
|
at once). The EI command reads and executes files containing TECO
|
|||
|
commands, and is used to execute the initialization file, if one exists.
|
|||
|
The EQq command, if implemented, reads the entire contents of a file
|
|||
|
|
|||
|
Page 14
|
|||
|
|
|||
|
|
|||
|
directly into a Q-register.
|
|||
|
|
|||
|
A second problem arises: on each of the open files, the quantum unit
|
|||
|
of input is not standard. For A, Y and P commands, a form feed or
|
|||
|
end-of-file "terminate" the read. For n:A commands, form feed, end-of-line
|
|||
|
or end-of-file "terminate" each read. For EI commands, two escapes or
|
|||
|
end-of-file "terminate" the read. The input code must "save" the portion
|
|||
|
of an input record following a special character and yield the saved text
|
|||
|
when the next command for the file is executed.
|
|||
|
|
|||
|
The scheme used in TECO-C is to read text from the current input
|
|||
|
stream directly to the end of the edit buffer. When the input stream is
|
|||
|
switched via a EP or ER$ command, the obvious switching of file descriptors
|
|||
|
happens, and any text that's "leftover" from the last read is explicitly
|
|||
|
saved elsewhere. Note that this happens VERY rarely, so a malloc/free is
|
|||
|
acceptable.
|
|||
|
|
|||
|
For EI and EQq commands, the input memory following the edit buffer is
|
|||
|
used as a temporary input buffer. After the file is read, the text is
|
|||
|
copied to a Q-register in the case of EQq and to a separate buffer in the
|
|||
|
case of EI.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
11 VIDEO
|
|||
|
|
|||
|
As of 18-Feb-1991, TECO-C supports video only under Unix. The code
|
|||
|
was written by Mark Henderson, using the CURSES package. See file
|
|||
|
VIDEO.TXT for a discussion of how it works.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
12 PORTABILITY
|
|||
|
|
|||
|
TECO-C was written with portability in mind. The first development
|
|||
|
machine was "minimal": a SAGE IV (68000) running CP/M-68k. In that
|
|||
|
environment, there was no "make" utility.
|
|||
|
|
|||
|
Initially, the system-independent code (files that don't start with a
|
|||
|
"Z") had absolutely no calls to standard C runtime functions. This was
|
|||
|
because I had several problems with the "standard" functions not being
|
|||
|
"standard" on different machines. With the onset of ANSI C I've grown less
|
|||
|
timid, but the main code still references almost no standard functions.
|
|||
|
This is less of a limitation than you might think: TECO-C doesn't use
|
|||
|
null-terminated strings. It also doesn't use unions, floating point or bit
|
|||
|
fields.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
13 PORTING TO A NEW ENVIRONMENT
|
|||
|
|
|||
|
|
|||
|
1. Move the source code to the target machine.
|
|||
|
|
|||
|
Page 15
|
|||
|
|
|||
|
|
|||
|
2. Inspect file ZPORT.H. You need to select the compiler you want
|
|||
|
the code compiled for. For instance, if you are porting to a Unix
|
|||
|
system, then fix ZPORT.H so that the unix identifier is defined
|
|||
|
(it is usually defined by default by the compiler). If your
|
|||
|
compiler is nothing like anything supported by ZPORT.H, then set
|
|||
|
the UNKNOWN identifier.
|
|||
|
|
|||
|
3. Compile and link. See file AAREADME.TXT for descriptions of how
|
|||
|
TECO-C is built in supported environments, and steal like mad.
|
|||
|
The problem here is that you need a "Z" file for your environment,
|
|||
|
containing all the "Z" functions needed by TECO-C. The easiest
|
|||
|
thing to do is copy ZUNKN.C to your own "Z" file and link against
|
|||
|
that. For instance, if I ever port TECO-C to a Macintosh, I'll
|
|||
|
copy ZUNKN.C to ZMAC.C.
|
|||
|
|
|||
|
4. Fix things so the compile/link is successful. If you have
|
|||
|
compiled with UNKNOWN set, you should get an executable file that
|
|||
|
displays a message and dies when the first system-dependent
|
|||
|
function is called. The strategy is to fix that function (often
|
|||
|
by stealing from the code for other operating systems), relink and
|
|||
|
deal with the next message until you have something that works.
|
|||
|
Functions should be implemented in roughly the following order:
|
|||
|
ZInit, ZTrmnl, ZExit, ZDspCh, ZAlloc, ZRaloc, ZFree, ZChin. This
|
|||
|
will give you a TECO with everything but file I/O. You can run
|
|||
|
it, add text to the edit buffer, delete text, search, use
|
|||
|
expressions and the = sign command (a calculator). Then do file
|
|||
|
input: ZOpInp, ZRdLin, ZIClos. Then do file output: ZOpout,
|
|||
|
ZWrBfr, ZOClos, ZOClDe. Use the test macros (*tst*.tec) to test
|
|||
|
how everything works (see Testing).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
14 TESTING
|
|||
|
|
|||
|
Testing of TECO-C is performed by executing macros. The macros are
|
|||
|
contained in files named TSTxxx.TEC, where XXX is some kind of indication
|
|||
|
as to what is tested. For instance, TSTQR.TEC tests q-registers. The test
|
|||
|
macros do not test all the functions provided by TECO. They were
|
|||
|
originally used to verify that TECO-C performs exactly the same as TECO-11
|
|||
|
under the VMS operating system. When I needed to test a chunk of code, I
|
|||
|
sometimes did it the right way and wrote a macro.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
15 DEBUGGING
|
|||
|
|
|||
|
A debugging system (very ugly, very useful) is imbedded within the
|
|||
|
code. It is conditionally complied into the code by turning on or off an
|
|||
|
identifier (DEBUGGING) defined in the TECOC.H file. When debugging code is
|
|||
|
compiled in, you can access it using the ^P command, which is not used by
|
|||
|
regular TECO. The ^P command with no argument will display help about how
|
|||
|
to use ^P.
|
|||
|
|
|||
|
Page 16
|
|||
|
|
|||
|
|
|||
|
If you are working under VMS, it sometimes helps to compare the
|
|||
|
execution of TECO-C with TECO-11. Put a test command string into a file.
|
|||
|
Use DEFINE/USER_MODE to redirect the output of TECO-C to a file and execute
|
|||
|
the macro with TECO-C. Then do the same thing with TECO-11. Use the
|
|||
|
DIFFERENCES command to compare the two output files. They should be 100
|
|||
|
percent identical.
|