11556Srgrimes#	@(#)TOUR	8.1 (Berkeley) 5/31/93
250471Speter# $FreeBSD$
31556Srgrimes
41556SrgrimesNOTE -- This is the original TOUR paper distributed with ash and
51556Srgrimesdoes not represent the current state of the shell.  It is provided anyway
61556Srgrimessince it provides helpful information for how the shell is structured,
71556Srgrimesbut be warned that things have changed -- the current shell is
81556Srgrimesstill under development.
91556Srgrimes
101556Srgrimes================================================================
111556Srgrimes
121556Srgrimes                       A Tour through Ash
131556Srgrimes
141556Srgrimes               Copyright 1989 by Kenneth Almquist.
151556Srgrimes
161556Srgrimes
171556SrgrimesDIRECTORIES:  The subdirectory bltin contains commands which can
181556Srgrimesbe compiled stand-alone.  The rest of the source is in the main
191556Srgrimesash directory.
201556Srgrimes
211556SrgrimesSOURCE CODE GENERATORS:  Files whose names begin with "mk" are
221556Srgrimesprograms that generate source code.  A complete list of these
231556Srgrimesprograms is:
241556Srgrimes
25157789Sschweikh        program         input files         generates
26157789Sschweikh        -------         -----------         ---------
271556Srgrimes        mkbuiltins      builtins            builtins.h builtins.c
281556Srgrimes        mknodes         nodetypes           nodes.h nodes.c
291556Srgrimes        mksyntax            -               syntax.h syntax.c
3017987Speter        mktokens            -               token.h
311556Srgrimes
32253650SjillesThere are undoubtedly too many of these.
331556Srgrimes
341556SrgrimesEXCEPTIONS:  Code for dealing with exceptions appears in
351556Srgrimesexceptions.c.  The C language doesn't include exception handling,
361556Srgrimesso I implement it using setjmp and longjmp.  The global variable
371556Srgrimesexception contains the type of exception.  EXERROR is raised by
38218306Sjillescalling error.  EXINT is an interrupt.
391556Srgrimes
401556SrgrimesINTERRUPTS:  In an interactive shell, an interrupt will cause an
411556SrgrimesEXINT exception to return to the main command loop.  (Exception:
421556SrgrimesEXINT is not raised if the user traps interrupts using the trap
431556Srgrimescommand.)  The INTOFF and INTON macros (defined in exception.h)
44157789Sschweikhprovide uninterruptible critical sections.  Between the execution
451556Srgrimesof INTOFF and the execution of INTON, interrupt signals will be
461556Srgrimesheld for later delivery.  INTOFF and INTON can be nested.
471556Srgrimes
481556SrgrimesMEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
491556Srgrimeswhich call error when there is no memory left.  It also defines a
501556Srgrimesstack oriented memory allocation scheme.  Allocating off a stack
511556Srgrimesis probably more efficient than allocation using malloc, but the
521556Srgrimesbig advantage is that when an exception occurs all we have to do
531556Srgrimesto free up the memory in use at the time of the exception is to
541556Srgrimesrestore the stack pointer.  The stack is implemented using a
551556Srgrimeslinked list of blocks.
561556Srgrimes
571556SrgrimesSTPUTC:  If the stack were contiguous, it would be easy to store
581556Srgrimesstrings on the stack without knowing in advance how long the
591556Srgrimesstring was going to be:
601556Srgrimes        p = stackptr;
611556Srgrimes        *p++ = c;       /* repeated as many times as needed */
621556Srgrimes        stackptr = p;
63157789SschweikhThe following three macros (defined in memalloc.h) perform these
641556Srgrimesoperations, but grow the stack if you run off the end:
651556Srgrimes        STARTSTACKSTR(p);
661556Srgrimes        STPUTC(c, p);   /* repeated as many times as needed */
671556Srgrimes        grabstackstr(p);
681556Srgrimes
691556SrgrimesWe now start a top-down look at the code:
701556Srgrimes
711556SrgrimesMAIN.C:  The main routine performs some initialization, executes
72157789Sschweikhthe user's profile if necessary, and calls cmdloop.  Cmdloop
731556Srgrimesrepeatedly parses and executes commands.
741556Srgrimes
751556SrgrimesOPTIONS.C:  This file contains the option processing code.  It is
761556Srgrimescalled from main to parse the shell arguments when the shell is
77222362Sjillesinvoked, and it also contains the set builtin.  The -i and -m op-
781556Srgrimestions (the latter turns on job control) require changes in signal
791556Srgrimeshandling.  The routines setjobctl (in jobs.c) and setinteractive
801556Srgrimes(in trap.c) are called to handle changes to these options.
811556Srgrimes
821556SrgrimesPARSING:  The parser code is all in parser.c.  A recursive des-
831556Srgrimescent parser is used.  Syntax tables (generated by mksyntax) are
841556Srgrimesused to classify characters during lexical analysis.  There are
85222362Sjillesfour tables:  one for normal use, one for use when inside single
86222362Sjillesquotes and dollar single quotes, one for use when inside double
87222362Sjillesquotes and one for use in arithmetic.  The tables are machine
88222362Sjillesdependent because they are indexed by character variables and
89222362Sjillesthe range of a char varies from machine to machine.
901556Srgrimes
911556SrgrimesPARSE OUTPUT:  The output of the parser consists of a tree of
921556Srgrimesnodes.  The various types of nodes are defined in the file node-
931556Srgrimestypes.
941556Srgrimes
951556SrgrimesNodes of type NARG are used to represent both words and the con-
961556Srgrimestents of here documents.  An early version of ash kept the con-
971556Srgrimestents of here documents in temporary files, but keeping here do-
981556Srgrimescuments in memory typically results in significantly better per-
991556Srgrimesformance.  It would have been nice to make it an option to use
1001556Srgrimestemporary files for here documents, for the benefit of small
1011556Srgrimesmachines, but the code to keep track of when to delete the tem-
1021556Srgrimesporary files was complex and I never fixed all the bugs in it.
1031556Srgrimes(AT&T has been maintaining the Bourne shell for more than ten
1041556Srgrimesyears, and to the best of my knowledge they still haven't gotten
1051556Srgrimesit to handle temporary files correctly in obscure cases.)
1061556Srgrimes
1071556SrgrimesThe text field of a NARG structure points to the text of the
1081556Srgrimesword.  The text consists of ordinary characters and a number of
1091556Srgrimesspecial codes defined in parser.h.  The special codes are:
1101556Srgrimes
1111556Srgrimes        CTLVAR              Variable substitution
1121556Srgrimes        CTLENDVAR           End of variable substitution
1131556Srgrimes        CTLBACKQ            Command substitution
1141556Srgrimes        CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
1151556Srgrimes        CTLESC              Escape next character
1161556Srgrimes
1171556SrgrimesA variable substitution contains the following elements:
1181556Srgrimes
1191556Srgrimes        CTLVAR type name '=' [ alternative-text CTLENDVAR ]
1201556Srgrimes
1211556SrgrimesThe type field is a single character specifying the type of sub-
1221556Srgrimesstitution.  The possible types are:
1231556Srgrimes
1241556Srgrimes        VSNORMAL            $var
1251556Srgrimes        VSMINUS             ${var-text}
1261556Srgrimes        VSMINUS|VSNUL       ${var:-text}
1271556Srgrimes        VSPLUS              ${var+text}
1281556Srgrimes        VSPLUS|VSNUL        ${var:+text}
1291556Srgrimes        VSQUESTION          ${var?text}
1301556Srgrimes        VSQUESTION|VSNUL    ${var:?text}
1311556Srgrimes        VSASSIGN            ${var=text}
132157789Sschweikh        VSASSIGN|VSNUL      ${var:=text}
1331556Srgrimes
1341556SrgrimesIn addition, the type field will have the VSQUOTE flag set if the
1351556Srgrimesvariable is enclosed in double quotes.  The name of the variable
1361556Srgrimescomes next, terminated by an equals sign.  If the type is not
1371556SrgrimesVSNORMAL, then the text field in the substitution follows, ter-
1381556Srgrimesminated by a CTLENDVAR byte.
1391556Srgrimes
1401556SrgrimesCommands in back quotes are parsed and stored in a linked list.
1411556SrgrimesThe locations of these commands in the string are indicated by
1421556SrgrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
1431556Srgrimesthe back quotes were enclosed in double quotes.
1441556Srgrimes
1451556SrgrimesThe character CTLESC escapes the next character, so that in case
1461556Srgrimesany of the CTL characters mentioned above appear in the input,
1471556Srgrimesthey can be passed through transparently.  CTLESC is also used to
1481556Srgrimesescape '*', '?', '[', and '!' characters which were quoted by the
1491556Srgrimesuser and thus should not be used for file name generation.
1501556Srgrimes
1511556SrgrimesCTLESC characters have proved to be particularly tricky to get
1521556Srgrimesright.  In the case of here documents which are not subject to
1531556Srgrimesvariable and command substitution, the parser doesn't insert any
1541556SrgrimesCTLESC characters to begin with (so the contents of the text
1551556Srgrimesfield can be written without any processing).  Other here docu-
1561556Srgrimesments, and words which are not subject to splitting and file name
1571556Srgrimesgeneration, have the CTLESC characters removed during the vari-
158157789Sschweikhable and command substitution phase.  Words which are subject to
1591556Srgrimessplitting and file name generation have the CTLESC characters re-
1601556Srgrimesmoved as part of the file name phase.
1611556Srgrimes
1621556SrgrimesEXECUTION:  Command execution is handled by the following files:
1631556Srgrimes        eval.c     The top level routines.
1641556Srgrimes        redir.c    Code to handle redirection of input and output.
1651556Srgrimes        jobs.c     Code to handle forking, waiting, and job control.
166157789Sschweikh        exec.c     Code to do path searches and the actual exec sys call.
1671556Srgrimes        expand.c   Code to evaluate arguments.
1681556Srgrimes        var.c      Maintains the variable symbol table.  Called from expand.c.
1691556Srgrimes
1701556SrgrimesEVAL.C:  Evaltree recursively executes a parse tree.  The exit
1711556Srgrimesstatus is returned in the global variable exitstatus.  The alter-
1721556Srgrimesnative entry evalbackcmd is called to evaluate commands in back
1731556Srgrimesquotes.  It saves the result in memory if the command is a buil-
1741556Srgrimestin; otherwise it forks off a child to execute the command and
1751556Srgrimesconnects the standard output of the child to a pipe.
1761556Srgrimes
1771556SrgrimesJOBS.C:  To create a process, you call makejob to return a job
1781556Srgrimesstructure, and then call forkshell (passing the job structure as
1791556Srgrimesan argument) to create the process.  Waitforjob waits for a job
1801556Srgrimesto complete.  These routines take care of process groups if job
1811556Srgrimescontrol is defined.
1821556Srgrimes
1831556SrgrimesREDIR.C:  Ash allows file descriptors to be redirected and then
1841556Srgrimesrestored without forking off a child process.  This is accom-
1851556Srgrimesplished by duplicating the original file descriptors.  The redir-
186157789Sschweikhtab structure records where the file descriptors have been dupli-
1871556Srgrimescated to.
1881556Srgrimes
1891556SrgrimesEXEC.C:  The routine find_command locates a command, and enters
1901556Srgrimesthe command in the hash table if it is not already there.  The
1911556Srgrimesthird argument specifies whether it is to print an error message
1921556Srgrimesif the command is not found.  (When a pipeline is set up,
1931556Srgrimesfind_command is called for all the commands in the pipeline be-
1941556Srgrimesfore any forking is done, so to get the commands into the hash
1951556Srgrimestable of the parent process.  But to make command hashing as
1961556Srgrimestransparent as possible, we silently ignore errors at that point
1971556Srgrimesand only print error messages if the command cannot be found
1981556Srgrimeslater.)
1991556Srgrimes
2001556SrgrimesThe routine shellexec is the interface to the exec system call.
2011556Srgrimes
2021556SrgrimesEXPAND.C:  Arguments are processed in three passes.  The first
2031556Srgrimes(performed by the routine argstr) performs variable and command
2041556Srgrimessubstitution.  The second (ifsbreakup) performs word splitting
205222362Sjillesand the third (expandmeta) performs file name generation.
2061556Srgrimes
2071556SrgrimesVAR.C:  Variables are stored in a hash table.  Probably we should
2081556Srgrimesswitch to extensible hashing.  The variable name is stored in the
2091556Srgrimessame string as the value (using the format "name=value") so that
2101556Srgrimesno string copying is needed to create the environment of a com-
2111556Srgrimesmand.  Variables which the shell references internally are preal-
2121556Srgrimeslocated so that the shell can reference the values of these vari-
2131556Srgrimesables without doing a lookup.
2141556Srgrimes
2151556SrgrimesWhen a program is run, the code in eval.c sticks any environment
2161556Srgrimesvariables which precede the command (as in "PATH=xxx command") in
2171556Srgrimesthe variable table as the simplest way to strip duplicates, and
2181556Srgrimesthen calls "environment" to get the value of the environment.
2191556Srgrimes
2201556SrgrimesBUILTIN COMMANDS:  The procedures for handling these are scat-
2211556Srgrimestered throughout the code, depending on which location appears
2221556Srgrimesmost appropriate.  They can be recognized because their names al-
2231556Srgrimesways end in "cmd".  The mapping from names to procedures is
224157789Sschweikhspecified in the file builtins, which is processed by the mkbuilt-
225157789Sschweikhins command.
2261556Srgrimes
2271556SrgrimesA builtin command is invoked with argc and argv set up like a
2281556Srgrimesnormal program.  A builtin command is allowed to overwrite its
2291556Srgrimesarguments.  Builtin routines can call nextopt to do option pars-
2301556Srgrimesing.  This is kind of like getopt, but you don't pass argc and
2311556Srgrimesargv to it.  Builtin routines can also call error.  This routine
2321556Srgrimesnormally terminates the shell (or returns to the main command
2331556Srgrimesloop if the shell is interactive), but when called from a builtin
2341556Srgrimescommand it causes the builtin command to terminate with an exit
2351556Srgrimesstatus of 2.
2361556Srgrimes
2371556SrgrimesThe directory bltins contains commands which can be compiled in-
2381556Srgrimesdependently but can also be built into the shell for efficiency
2391556Srgrimesreasons.  The makefile in this directory compiles these programs
2401556Srgrimesin the normal fashion (so that they can be run regardless of
2411556Srgrimeswhether the invoker is ash), but also creates a library named
2421556Srgrimesbltinlib.a which can be linked with ash.  The header file bltin.h
2431556Srgrimestakes care of most of the differences between the ash and the
2441556Srgrimesstand-alone environment.  The user should call the main routine
2451556Srgrimes"main", and #define main to be the name of the routine to use
2461556Srgrimeswhen the program is linked into ash.  This #define should appear
2471556Srgrimesbefore bltin.h is included; bltin.h will #undef main if the pro-
2481556Srgrimesgram is to be compiled stand-alone.
2491556Srgrimes
250222362SjillesCD.C:  This file defines the cd and pwd builtins.
2511556Srgrimes
2521556SrgrimesSIGNALS:  Trap.c implements the trap command.  The routine set-
2531556Srgrimessignal figures out what action should be taken when a signal is
2541556Srgrimesreceived and invokes the signal system call to set the signal ac-
2551556Srgrimestion appropriately.  When a signal that a user has set a trap for
2561556Srgrimesis caught, the routine "onsig" sets a flag.  The routine dotrap
2571556Srgrimesis called at appropriate points to actually handle the signal.
2581556SrgrimesWhen an interrupt is caught and no trap has been set for that
2591556Srgrimessignal, the routine "onint" in error.c is called.
2601556Srgrimes
2611556SrgrimesOUTPUT:  Ash uses it's own output routines.  There are three out-
2621556Srgrimesput structures allocated.  "Output" represents the standard out-
2631556Srgrimesput, "errout" the standard error, and "memout" contains output
2641556Srgrimeswhich is to be stored in memory.  This last is used when a buil-
2651556Srgrimestin command appears in backquotes, to allow its output to be col-
2661556Srgrimeslected without doing any I/O through the UNIX operating system.
2671556SrgrimesThe variables out1 and out2 normally point to output and errout,
2681556Srgrimesrespectively, but they are set to point to memout when appropri-
2691556Srgrimesate inside backquotes.
2701556Srgrimes
2711556SrgrimesINPUT:  The basic input routine is pgetc, which reads from the
2721556Srgrimescurrent input file.  There is a stack of input files; the current
2731556Srgrimesinput file is the top file on this stack.  The code allows the
2741556Srgrimesinput to come from a string rather than a file.  (This is for the
2751556Srgrimes-c option and the "." and eval builtin commands.)  The global
2761556Srgrimesvariable plinno is saved and restored when files are pushed and
2771556Srgrimespopped from the stack.  The parser routines store the number of
2781556Srgrimesthe current line in this variable.
2791556Srgrimes
2801556SrgrimesDEBUGGING:  If DEBUG is defined in shell.h, then the shell will
2811556Srgrimeswrite debugging information to the file $HOME/trace.  Most of
2821556Srgrimesthis is done using the TRACE macro, which takes a set of printf
2831556Srgrimesarguments inside two sets of parenthesis.  Example:
2841556Srgrimes"TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
2851556Srgrimescause the preprocessor can't handle functions with a variable
2861556Srgrimesnumber of arguments.  Defining DEBUG also causes the shell to
2871556Srgrimesgenerate a core dump if it is sent a quit signal.  The tracing
2881556Srgrimescode is in show.c.
289