11556Srgrimes# @(#)TOUR 8.1 (Berkeley) 5/31/93 250471Speter# $FreeBSD$ 31556Srgrimes 41556SrgrimesNOTE -- This is the original TOUR paper distributed with ash and 51556Srgrimesdoes not represent the current state of the shell. It is provided anyway 61556Srgrimessince it provides helpful information for how the shell is structured, 71556Srgrimesbut be warned that things have changed -- the current shell is 81556Srgrimesstill under development. 91556Srgrimes 101556Srgrimes================================================================ 111556Srgrimes 121556Srgrimes A Tour through Ash 131556Srgrimes 141556Srgrimes Copyright 1989 by Kenneth Almquist. 151556Srgrimes 161556Srgrimes 171556SrgrimesDIRECTORIES: The subdirectory bltin contains commands which can 181556Srgrimesbe compiled stand-alone. The rest of the source is in the main 191556Srgrimesash directory. 201556Srgrimes 211556SrgrimesSOURCE CODE GENERATORS: Files whose names begin with "mk" are 221556Srgrimesprograms that generate source code. A complete list of these 231556Srgrimesprograms is: 241556Srgrimes 25157789Sschweikh program input files generates 26157789Sschweikh ------- ----------- --------- 271556Srgrimes mkbuiltins builtins builtins.h builtins.c 281556Srgrimes mknodes nodetypes nodes.h nodes.c 291556Srgrimes mksyntax - syntax.h syntax.c 3017987Speter mktokens - token.h 311556Srgrimes 32253650SjillesThere are undoubtedly too many of these. 331556Srgrimes 341556SrgrimesEXCEPTIONS: Code for dealing with exceptions appears in 351556Srgrimesexceptions.c. The C language doesn't include exception handling, 361556Srgrimesso I implement it using setjmp and longjmp. The global variable 371556Srgrimesexception contains the type of exception. EXERROR is raised by 38218306Sjillescalling error. EXINT is an interrupt. 391556Srgrimes 401556SrgrimesINTERRUPTS: In an interactive shell, an interrupt will cause an 411556SrgrimesEXINT exception to return to the main command loop. (Exception: 421556SrgrimesEXINT is not raised if the user traps interrupts using the trap 431556Srgrimescommand.) The INTOFF and INTON macros (defined in exception.h) 44157789Sschweikhprovide uninterruptible critical sections. Between the execution 451556Srgrimesof INTOFF and the execution of INTON, interrupt signals will be 461556Srgrimesheld for later delivery. INTOFF and INTON can be nested. 471556Srgrimes 481556SrgrimesMEMALLOC.C: Memalloc.c defines versions of malloc and realloc 491556Srgrimeswhich call error when there is no memory left. It also defines a 501556Srgrimesstack oriented memory allocation scheme. Allocating off a stack 511556Srgrimesis probably more efficient than allocation using malloc, but the 521556Srgrimesbig advantage is that when an exception occurs all we have to do 531556Srgrimesto free up the memory in use at the time of the exception is to 541556Srgrimesrestore the stack pointer. The stack is implemented using a 551556Srgrimeslinked list of blocks. 561556Srgrimes 571556SrgrimesSTPUTC: If the stack were contiguous, it would be easy to store 581556Srgrimesstrings on the stack without knowing in advance how long the 591556Srgrimesstring was going to be: 601556Srgrimes p = stackptr; 611556Srgrimes *p++ = c; /* repeated as many times as needed */ 621556Srgrimes stackptr = p; 63157789SschweikhThe following three macros (defined in memalloc.h) perform these 641556Srgrimesoperations, but grow the stack if you run off the end: 651556Srgrimes STARTSTACKSTR(p); 661556Srgrimes STPUTC(c, p); /* repeated as many times as needed */ 671556Srgrimes grabstackstr(p); 681556Srgrimes 691556SrgrimesWe now start a top-down look at the code: 701556Srgrimes 711556SrgrimesMAIN.C: The main routine performs some initialization, executes 72157789Sschweikhthe user's profile if necessary, and calls cmdloop. Cmdloop 731556Srgrimesrepeatedly parses and executes commands. 741556Srgrimes 751556SrgrimesOPTIONS.C: This file contains the option processing code. It is 761556Srgrimescalled from main to parse the shell arguments when the shell is 77222362Sjillesinvoked, and it also contains the set builtin. The -i and -m op- 781556Srgrimestions (the latter turns on job control) require changes in signal 791556Srgrimeshandling. The routines setjobctl (in jobs.c) and setinteractive 801556Srgrimes(in trap.c) are called to handle changes to these options. 811556Srgrimes 821556SrgrimesPARSING: The parser code is all in parser.c. A recursive des- 831556Srgrimescent parser is used. Syntax tables (generated by mksyntax) are 841556Srgrimesused to classify characters during lexical analysis. There are 85222362Sjillesfour tables: one for normal use, one for use when inside single 86222362Sjillesquotes and dollar single quotes, one for use when inside double 87222362Sjillesquotes and one for use in arithmetic. The tables are machine 88222362Sjillesdependent because they are indexed by character variables and 89222362Sjillesthe range of a char varies from machine to machine. 901556Srgrimes 911556SrgrimesPARSE OUTPUT: The output of the parser consists of a tree of 921556Srgrimesnodes. The various types of nodes are defined in the file node- 931556Srgrimestypes. 941556Srgrimes 951556SrgrimesNodes of type NARG are used to represent both words and the con- 961556Srgrimestents of here documents. An early version of ash kept the con- 971556Srgrimestents of here documents in temporary files, but keeping here do- 981556Srgrimescuments in memory typically results in significantly better per- 991556Srgrimesformance. It would have been nice to make it an option to use 1001556Srgrimestemporary files for here documents, for the benefit of small 1011556Srgrimesmachines, but the code to keep track of when to delete the tem- 1021556Srgrimesporary files was complex and I never fixed all the bugs in it. 1031556Srgrimes(AT&T has been maintaining the Bourne shell for more than ten 1041556Srgrimesyears, and to the best of my knowledge they still haven't gotten 1051556Srgrimesit to handle temporary files correctly in obscure cases.) 1061556Srgrimes 1071556SrgrimesThe text field of a NARG structure points to the text of the 1081556Srgrimesword. The text consists of ordinary characters and a number of 1091556Srgrimesspecial codes defined in parser.h. The special codes are: 1101556Srgrimes 1111556Srgrimes CTLVAR Variable substitution 1121556Srgrimes CTLENDVAR End of variable substitution 1131556Srgrimes CTLBACKQ Command substitution 1141556Srgrimes CTLBACKQ|CTLQUOTE Command substitution inside double quotes 1151556Srgrimes CTLESC Escape next character 1161556Srgrimes 1171556SrgrimesA variable substitution contains the following elements: 1181556Srgrimes 1191556Srgrimes CTLVAR type name '=' [ alternative-text CTLENDVAR ] 1201556Srgrimes 1211556SrgrimesThe type field is a single character specifying the type of sub- 1221556Srgrimesstitution. The possible types are: 1231556Srgrimes 1241556Srgrimes VSNORMAL $var 1251556Srgrimes VSMINUS ${var-text} 1261556Srgrimes VSMINUS|VSNUL ${var:-text} 1271556Srgrimes VSPLUS ${var+text} 1281556Srgrimes VSPLUS|VSNUL ${var:+text} 1291556Srgrimes VSQUESTION ${var?text} 1301556Srgrimes VSQUESTION|VSNUL ${var:?text} 1311556Srgrimes VSASSIGN ${var=text} 132157789Sschweikh VSASSIGN|VSNUL ${var:=text} 1331556Srgrimes 1341556SrgrimesIn addition, the type field will have the VSQUOTE flag set if the 1351556Srgrimesvariable is enclosed in double quotes. The name of the variable 1361556Srgrimescomes next, terminated by an equals sign. If the type is not 1371556SrgrimesVSNORMAL, then the text field in the substitution follows, ter- 1381556Srgrimesminated by a CTLENDVAR byte. 1391556Srgrimes 1401556SrgrimesCommands in back quotes are parsed and stored in a linked list. 1411556SrgrimesThe locations of these commands in the string are indicated by 1421556SrgrimesCTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether 1431556Srgrimesthe back quotes were enclosed in double quotes. 1441556Srgrimes 1451556SrgrimesThe character CTLESC escapes the next character, so that in case 1461556Srgrimesany of the CTL characters mentioned above appear in the input, 1471556Srgrimesthey can be passed through transparently. CTLESC is also used to 1481556Srgrimesescape '*', '?', '[', and '!' characters which were quoted by the 1491556Srgrimesuser and thus should not be used for file name generation. 1501556Srgrimes 1511556SrgrimesCTLESC characters have proved to be particularly tricky to get 1521556Srgrimesright. In the case of here documents which are not subject to 1531556Srgrimesvariable and command substitution, the parser doesn't insert any 1541556SrgrimesCTLESC characters to begin with (so the contents of the text 1551556Srgrimesfield can be written without any processing). Other here docu- 1561556Srgrimesments, and words which are not subject to splitting and file name 1571556Srgrimesgeneration, have the CTLESC characters removed during the vari- 158157789Sschweikhable and command substitution phase. Words which are subject to 1591556Srgrimessplitting and file name generation have the CTLESC characters re- 1601556Srgrimesmoved as part of the file name phase. 1611556Srgrimes 1621556SrgrimesEXECUTION: Command execution is handled by the following files: 1631556Srgrimes eval.c The top level routines. 1641556Srgrimes redir.c Code to handle redirection of input and output. 1651556Srgrimes jobs.c Code to handle forking, waiting, and job control. 166157789Sschweikh exec.c Code to do path searches and the actual exec sys call. 1671556Srgrimes expand.c Code to evaluate arguments. 1681556Srgrimes var.c Maintains the variable symbol table. Called from expand.c. 1691556Srgrimes 1701556SrgrimesEVAL.C: Evaltree recursively executes a parse tree. The exit 1711556Srgrimesstatus is returned in the global variable exitstatus. The alter- 1721556Srgrimesnative entry evalbackcmd is called to evaluate commands in back 1731556Srgrimesquotes. It saves the result in memory if the command is a buil- 1741556Srgrimestin; otherwise it forks off a child to execute the command and 1751556Srgrimesconnects the standard output of the child to a pipe. 1761556Srgrimes 1771556SrgrimesJOBS.C: To create a process, you call makejob to return a job 1781556Srgrimesstructure, and then call forkshell (passing the job structure as 1791556Srgrimesan argument) to create the process. Waitforjob waits for a job 1801556Srgrimesto complete. These routines take care of process groups if job 1811556Srgrimescontrol is defined. 1821556Srgrimes 1831556SrgrimesREDIR.C: Ash allows file descriptors to be redirected and then 1841556Srgrimesrestored without forking off a child process. This is accom- 1851556Srgrimesplished by duplicating the original file descriptors. The redir- 186157789Sschweikhtab structure records where the file descriptors have been dupli- 1871556Srgrimescated to. 1881556Srgrimes 1891556SrgrimesEXEC.C: The routine find_command locates a command, and enters 1901556Srgrimesthe command in the hash table if it is not already there. The 1911556Srgrimesthird argument specifies whether it is to print an error message 1921556Srgrimesif the command is not found. (When a pipeline is set up, 1931556Srgrimesfind_command is called for all the commands in the pipeline be- 1941556Srgrimesfore any forking is done, so to get the commands into the hash 1951556Srgrimestable of the parent process. But to make command hashing as 1961556Srgrimestransparent as possible, we silently ignore errors at that point 1971556Srgrimesand only print error messages if the command cannot be found 1981556Srgrimeslater.) 1991556Srgrimes 2001556SrgrimesThe routine shellexec is the interface to the exec system call. 2011556Srgrimes 2021556SrgrimesEXPAND.C: Arguments are processed in three passes. The first 2031556Srgrimes(performed by the routine argstr) performs variable and command 2041556Srgrimessubstitution. The second (ifsbreakup) performs word splitting 205222362Sjillesand the third (expandmeta) performs file name generation. 2061556Srgrimes 2071556SrgrimesVAR.C: Variables are stored in a hash table. Probably we should 2081556Srgrimesswitch to extensible hashing. The variable name is stored in the 2091556Srgrimessame string as the value (using the format "name=value") so that 2101556Srgrimesno string copying is needed to create the environment of a com- 2111556Srgrimesmand. Variables which the shell references internally are preal- 2121556Srgrimeslocated so that the shell can reference the values of these vari- 2131556Srgrimesables without doing a lookup. 2141556Srgrimes 2151556SrgrimesWhen a program is run, the code in eval.c sticks any environment 2161556Srgrimesvariables which precede the command (as in "PATH=xxx command") in 2171556Srgrimesthe variable table as the simplest way to strip duplicates, and 2181556Srgrimesthen calls "environment" to get the value of the environment. 2191556Srgrimes 2201556SrgrimesBUILTIN COMMANDS: The procedures for handling these are scat- 2211556Srgrimestered throughout the code, depending on which location appears 2221556Srgrimesmost appropriate. They can be recognized because their names al- 2231556Srgrimesways end in "cmd". The mapping from names to procedures is 224157789Sschweikhspecified in the file builtins, which is processed by the mkbuilt- 225157789Sschweikhins command. 2261556Srgrimes 2271556SrgrimesA builtin command is invoked with argc and argv set up like a 2281556Srgrimesnormal program. A builtin command is allowed to overwrite its 2291556Srgrimesarguments. Builtin routines can call nextopt to do option pars- 2301556Srgrimesing. This is kind of like getopt, but you don't pass argc and 2311556Srgrimesargv to it. Builtin routines can also call error. This routine 2321556Srgrimesnormally terminates the shell (or returns to the main command 2331556Srgrimesloop if the shell is interactive), but when called from a builtin 2341556Srgrimescommand it causes the builtin command to terminate with an exit 2351556Srgrimesstatus of 2. 2361556Srgrimes 2371556SrgrimesThe directory bltins contains commands which can be compiled in- 2381556Srgrimesdependently but can also be built into the shell for efficiency 2391556Srgrimesreasons. The makefile in this directory compiles these programs 2401556Srgrimesin the normal fashion (so that they can be run regardless of 2411556Srgrimeswhether the invoker is ash), but also creates a library named 2421556Srgrimesbltinlib.a which can be linked with ash. The header file bltin.h 2431556Srgrimestakes care of most of the differences between the ash and the 2441556Srgrimesstand-alone environment. The user should call the main routine 2451556Srgrimes"main", and #define main to be the name of the routine to use 2461556Srgrimeswhen the program is linked into ash. This #define should appear 2471556Srgrimesbefore bltin.h is included; bltin.h will #undef main if the pro- 2481556Srgrimesgram is to be compiled stand-alone. 2491556Srgrimes 250222362SjillesCD.C: This file defines the cd and pwd builtins. 2511556Srgrimes 2521556SrgrimesSIGNALS: Trap.c implements the trap command. The routine set- 2531556Srgrimessignal figures out what action should be taken when a signal is 2541556Srgrimesreceived and invokes the signal system call to set the signal ac- 2551556Srgrimestion appropriately. When a signal that a user has set a trap for 2561556Srgrimesis caught, the routine "onsig" sets a flag. The routine dotrap 2571556Srgrimesis called at appropriate points to actually handle the signal. 2581556SrgrimesWhen an interrupt is caught and no trap has been set for that 2591556Srgrimessignal, the routine "onint" in error.c is called. 2601556Srgrimes 2611556SrgrimesOUTPUT: Ash uses it's own output routines. There are three out- 2621556Srgrimesput structures allocated. "Output" represents the standard out- 2631556Srgrimesput, "errout" the standard error, and "memout" contains output 2641556Srgrimeswhich is to be stored in memory. This last is used when a buil- 2651556Srgrimestin command appears in backquotes, to allow its output to be col- 2661556Srgrimeslected without doing any I/O through the UNIX operating system. 2671556SrgrimesThe variables out1 and out2 normally point to output and errout, 2681556Srgrimesrespectively, but they are set to point to memout when appropri- 2691556Srgrimesate inside backquotes. 2701556Srgrimes 2711556SrgrimesINPUT: The basic input routine is pgetc, which reads from the 2721556Srgrimescurrent input file. There is a stack of input files; the current 2731556Srgrimesinput file is the top file on this stack. The code allows the 2741556Srgrimesinput to come from a string rather than a file. (This is for the 2751556Srgrimes-c option and the "." and eval builtin commands.) The global 2761556Srgrimesvariable plinno is saved and restored when files are pushed and 2771556Srgrimespopped from the stack. The parser routines store the number of 2781556Srgrimesthe current line in this variable. 2791556Srgrimes 2801556SrgrimesDEBUGGING: If DEBUG is defined in shell.h, then the shell will 2811556Srgrimeswrite debugging information to the file $HOME/trace. Most of 2821556Srgrimesthis is done using the TRACE macro, which takes a set of printf 2831556Srgrimesarguments inside two sets of parenthesis. Example: 2841556Srgrimes"TRACE(("n=%d0, n))". The double parenthesis are necessary be- 2851556Srgrimescause the preprocessor can't handle functions with a variable 2861556Srgrimesnumber of arguments. Defining DEBUG also causes the shell to 2871556Srgrimesgenerate a core dump if it is sent a quit signal. The tracing 2881556Srgrimescode is in show.c. 289