1184610Salfred 2184610Salfred @(#)README 8.1 (Berkeley) 6/9/93 3184610Salfred $FreeBSD$ 4184610Salfred 5184610SalfredCompress version 4.0 improvements over 3.0: 6184610Salfred o compress() speedup (10-50%) by changing division hash to xor 7184610Salfred o decompress() speedup (5-10%) 8184610Salfred o Memory requirements reduced (3-30%) 9184610Salfred o Stack requirements reduced to less than 4kb 10184610Salfred o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 11184610Salfred o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 12184610Salfred o Default to 'quiet' mode 13184610Salfred o Unification of 'force' flags 14184610Salfred o Manual page overhaul 15184610Salfred o Portability enhancement for M_XENIX 16184610Salfred o Removed text on #else and #endif 17184610Salfred o Added "-V" switch to print version and options 18184610Salfred o Added #defines for SIGNED_COMPARE_SLOW 19184610Salfred o Added Makefile and "usermem" program 20184610Salfred o Removed all floating point computations 21184610Salfred o New programs: [deleted] 22184610Salfred 23184610SalfredThe "usermem" script attempts to determine the maximum process size. Some 24184610Salfredediting of the script may be necessary (see the comments). [It should work 25184610Salfredfine on 4.3 BSD.] If you can't get it to work at all, just create file 26184610Salfred"USERMEM" containing the maximum process size in decimal. 27184610Salfred 28184610SalfredThe following preprocessor symbols control the compilation of "compress.c": 29184610Salfred 30184610Salfred o USERMEM Maximum process memory on the system 31184610Salfred o SACREDMEM Amount to reserve for other processes 32184610Salfred o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 33184610Salfred o NO_UCHAR Don't use "unsigned char" types 34184610Salfred o BITS Overrules default set by USERMEM-SACREDMEM 35184610Salfred o vax Generate inline assembler 36184610Salfred o interdata Defines SIGNED_COMPARE_SLOW 37254404Skevlo o M_XENIX Makes arrays < 65536 bytes each 38184610Salfred o pdp11 BITS=12, NO_UCHAR 39184610Salfred o z8000 BITS=12 40184610Salfred o pcxt BITS=12 41184610Salfred o BSD4_2 Allow long filenames ( > 14 characters) & 42184610Salfred Call setlinebuf(stderr) 43184610Salfred 44184610SalfredThe difference "usermem-sacredmem" determines the maximum BITS that can be 45184610Salfredspecified with the "-b" flag. 46184610Salfred 47184610Salfredmemory: at least BITS 48184610Salfred------ -- ----- ---- 49194677Sthompsa 433,484 16 50194677Sthompsa 229,600 15 51194677Sthompsa 127,536 14 52194677Sthompsa 73,464 13 53194677Sthompsa 0 12 54194677Sthompsa 55194677SthompsaThe default is BITS=16. 56194677Sthompsa 57194677SthompsaThe maximum bits can be overruled by specifying "-DBITS=bits" at 58194677Sthompsacompilation time. 59194677Sthompsa 60194677SthompsaWARNING: files compressed on a large machine with more bits than allowed by 61194677Sthompsaa version of compress on a smaller machine cannot be decompressed! Use the 62194677Sthompsa"-b12" flag to generate a file on a large machine that can be uncompressed 63194677Sthompsaon a 16-bit machine. 64194677Sthompsa 65194677SthompsaThe output of compress 4.0 is fully compatible with that of compress 3.0. 66194677SthompsaIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 67194677Sthompsathe output of compress 3.0 may be fed into uncompress 4.0. 68194677Sthompsa 69194677SthompsaThe output of compress 4.0 not compatible with that of 70194677Sthompsacompress 2.0. However, compress 4.0 still accepts the output of 71188746Sthompsacompress 2.0. To generate output that is compatible with compress 72184610Salfred2.0, use the undocumented "-C" flag. 73184610Salfred 74194677Sthompsa -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 75188942Sthompsa-------------------------------- 76184610Salfred 77188942SthompsaEnclosed is compress version 3.0 with the following changes: 78188942Sthompsa 79184610Salfred1. "Block" compression is performed. After the BITS run out, the 80184610Salfred compression ratio is checked every so often. If it is decreasing, 81184610Salfred the table is cleared and a new set of substrings are generated. 82184610Salfred 83184610Salfred This makes the output of compress 3.0 not compatible with that of 84184610Salfred compress 2.0. However, compress 3.0 still accepts the output of 85184610Salfred compress 2.0. To generate output that is compatible with compress 86193045Sthompsa 2.0, use the undocumented "-C" flag. 87193045Sthompsa 88193045Sthompsa2. A quiet "-q" flag has been added for use by the news system. 89184610Salfred 90193045Sthompsa3. The character chaining has been deleted and the program now uses 91193045Sthompsa hashing. This improves the speed of the program, especially 92193045Sthompsa during decompression. Other speed improvements have been made, 93193045Sthompsa such as using putc() instead of fwrite(). 94193045Sthompsa 95193045Sthompsa4. A large table is used on large machines when a relatively small 96193045Sthompsa number of bits is specified. This saves much time when compressing 97184610Salfred for a 16-bit machine on a 32-bit virtual machine. Note that the 98188412Sthompsa speed improvement only occurs when the input file is > 30000 99188412Sthompsa characters, and the -b BITS is less than or equal to the cutoff 100188412Sthompsa described below. 101188412Sthompsa 102188412SthompsaMost of these changes were made by James A. Woods (ames!jaw). Thank you 103188412SthompsaJames! 104188412Sthompsa 105184610SalfredTo compile compress: 106188412Sthompsa 107188412Sthompsa cc -O -DUSERMEM=usermem -o compress compress.c 108188412Sthompsa 109184610SalfredWhere "usermem" is the amount of physical user memory available (in bytes). 110192984SthompsaIf any physical memory is to be reserved for other processes, put in 111184610Salfred"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 112187259Sthompsa 113184610SalfredThe difference "usermem-sacredmem" determines the maximum BITS that can be 114184610Salfredspecified, and the cutoff bits where the large+fast table is used. 115184610Salfred 116190734Sthompsamemory: at least BITS cutoff 117190734Sthompsa------ -- ----- ---- ------ 118190734Sthompsa 4,718,592 16 13 119190734Sthompsa 2,621,440 16 12 120184610Salfred 1,572,864 16 11 121184610Salfred 1,048,576 16 10 122187259Sthompsa 631,808 16 -- 123184610Salfred 329,728 15 -- 124184610Salfred 178,176 14 -- 125184610Salfred 99,328 13 -- 126190734Sthompsa 0 12 -- 127190734Sthompsa 128190734SthompsaThe default memory size is 750,000 which gives a maximum BITS=16 and no 129190734Sthompsalarge+fast table. 130184610Salfred 131184610SalfredThe maximum bits can be overruled by specifying "-DBITS=bits" at 132187259Sthompsacompilation time. 133184610Salfred 134184610SalfredIf your machine doesn't support unsigned characters, define "NO_UCHAR" 135184610Salfredwhen compiling. 136190734Sthompsa 137190734SthompsaIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 138190734Sthompsa 139184610SalfredAfter compilation, move "compress" to a standard executable location, such 140184610Salfredas /usr/local. Then: 141184610Salfred cd /usr/local 142184610Salfred ln compress uncompress 143184610Salfred ln compress zcat 144184610Salfred 145184610SalfredOn machines that have a fixed stack size (such as Perkin-Elmer), set the 146184610Salfredstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 147184610Salfred 148184610SalfredNext, install the manual (compress.l). 149188412Sthompsa cp compress.l /usr/man/manl 150188412Sthompsa cd /usr/man/manl 151188412Sthompsa ln compress.l uncompress.l 152184610Salfred ln compress.l zcat.l 153227843Smarius 154184610Salfred - or - 155184610Salfred 156184610Salfred cp compress.l /usr/man/man1/compress.1 157184610Salfred cd /usr/man/man1 158184610Salfred ln compress.1 uncompress.1 159184610Salfred ln compress.1 zcat.1 160184610Salfred 161184610Salfred regards, 162184610Salfred petsd!joe 163184610Salfred 164189275SthompsaHere is a note from the net: 165184610Salfred 166188942Sthompsa>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 167188942SthompsaPath: ames!hplabs!pesnta!amd!turtlevax!ken 168184610SalfredFrom: ken@turtlevax.UUCP (Ken Turkowski) 169184610SalfredNewsgroups: net.sources 170212122SthompsaSubject: Re: Compress release 3.0 : sample Makefile 171184610SalfredOrganization: CADLINC, Inc. @ Menlo Park, CA 172242777Shselasky 173188412SthompsaIn the compress 3.0 source recently posted to mod.sources, there is a 174188412Sthompsa#define variable which can be set for optimum performance on a machine 175188412Sthompsawith a large amount of memory. A program (usermem) to calculate the 176188412Sthompsausable amount of physical user memory is enclosed, as well as a sample 177188412Sthompsa4.2BSD Vax Makefile for compress. 178188412Sthompsa 179188412SthompsaHere is the README file from the previous version of compress (2.0): 180188412Sthompsa 181188412Sthompsa>Enclosed is compress.c version 2.0 with the following bugs fixed: 182188412Sthompsa> 183188412Sthompsa>1. The packed files produced by compress are different on different 184242777Shselasky> machines and dependent on the vax sysgen option. 185242777Shselasky> The bug was in the different byte/bit ordering on the 186242777Shselasky> various machines. This has been fixed. 187242777Shselasky> 188242777Shselasky> This version is NOT compatible with the original vax posting 189242777Shselasky> unless the '-DCOMPATIBLE' option is specified to the C 190242777Shselasky> compiler. The original posting has a bug which I fixed, 191242777Shselasky> causing incompatible files. I recommend you NOT to use this 192242777Shselasky> option unless you already have a lot of packed files from 193207077Sthompsa> the original posting by Thomas. 194184610Salfred>2. The exit status is not well defined (on some machines) causing the 195184610Salfred> scripts to fail. 196227309Sed> The exit status is now 0,1 or 2 and is documented in 197192502Sthompsa> compress.l. 198184610Salfred>3. The function getopt() is not available in all C libraries. 199184610Salfred> The function getopt() is no longer referenced by the 200184610Salfred> program. 201188412Sthompsa>4. Error status is not being checked on the fwrite() and fflush() calls. 202188412Sthompsa> Fixed. 203184610Salfred> 204188412Sthompsa>The following enhancements have been made: 205188412Sthompsa> 206184610Salfred>1. Added facilities of "compact" into the compress program. "Pack", 207223486Shselasky> "Unpack", and "Pcat" are no longer required (no longer supplied). 208184610Salfred>2. Installed work around for C compiler bug with "-O". 209184610Salfred>3. Added a magic number header (\037\235). Put the bits specified 210184610Salfred> in the file. 211184610Salfred>4. Added "-f" flag to force overwrite of output file. 212184610Salfred>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 213184610Salfred> compile. 214218864Shselasky>6. The 'uncompress' script has been deleted; simply 215218864Shselasky> 'ln compress uncompress' after you compile and it will work. 216218864Shselasky>7. Removed extra bit masking for machines that support unsigned 217218864Shselasky> characters. If your machine doesn't support unsigned characters, 218238466Srpaulo> define "NO_UCHAR" when compiling. 219238466Srpaulo> 220184610Salfred>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 221184610Salfred>standard executable location, such as /usr/local. Then: 222188412Sthompsa> cd /usr/local 223192984Sthompsa> ln compress uncompress 224188412Sthompsa> ln compress zcat 225194228Sthompsa> 226188412Sthompsa>On machines that have a fixed stack size (such as Perkin-Elmer), set the 227188412Sthompsa>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 228188412Sthompsa> 229188412Sthompsa>Next, install the manual (compress.l). 230188412Sthompsa> cp compress.l /usr/man/manl - or - 231188412Sthompsa> cp compress.l /usr/man/man1/compress.1 232188412Sthompsa> 233188412Sthompsa>Here is the README that I sent with my first posting: 234184610Salfred> 235184610Salfred>>Enclosed is a modified version of compress.c, along with scripts to make it 236184610Salfred>>run identically to pack(1), unpack(1), and pcat(1). Here is what I 237192984Sthompsa>>(petsd!joe) and a colleague (petsd!peora!srd) did: 238184610Salfred>> 239192499Sthompsa>>1. Removed VAX dependencies. 240184610Salfred>>2. Changed the struct to separate arrays; saves mucho memory. 241188412Sthompsa>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 242184610Salfred>>4. Sorted the character next chain and changed the search to stop 243188412Sthompsa>>prematurely. This saves a lot on the execution time when compressing. 244184610Salfred>> 245188412Sthompsa>>This version is totally compatible with the original version. Even though 246194228Sthompsa>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 247184610Salfred>>machine, due to the size of the arrays. 248184610Salfred>> 249184610Salfred>>Here is the README file from the original author: 250184610Salfred>> 251184610Salfred>>>Well, with all this discussion about file compression (for news batching 252192984Sthompsa>>>in particular) going around, I decided to implement the text compression 253184610Salfred>>>algorithm described in the June Computer magazine. The author claimed 254192984Sthompsa>>>blinding speed and good compression ratios. It's certainly faster than 255184610Salfred>>>compact (but, then, what wouldn't be), but it's also the same speed as 256188412Sthompsa>>>pack, and gets better compression than both of them. On 350K bytes of 257184610Salfred>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80 258184610Salfred>>>seconds, and compress (herein) also took 80 seconds. But, compact and 259184610Salfred>>>pack got about 30% compression, whereas compress got over 50%. So, I 260194228Sthompsa>>>decided I had something, and that others might be interested, too. 261184610Salfred>>> 262188412Sthompsa>>>As is probably true of compact and pack (although I haven't checked), 263184610Salfred>>>the byte order within a word is probably relevant here, but as long as 264184610Salfred>>>you stay on a single machine type, you should be ok. (Can anybody 265194228Sthompsa>>>elucidate on this?) There are a couple of asm's in the code (extv and 266187259Sthompsa>>>insv instructions), so anyone porting it to another machine will have to 267184610Salfred>>>deal with this anyway (and could probably make it compatible with Vax 268199816Sthompsa>>>byte order at the same time). Anyway, I've linted the code (both with 269184610Salfred>>>and without -p), so it should run elsewhere. Note the longs in the 270184610Salfred>>>code, you can take these out if you reduce BITS to <= 15. 271188412Sthompsa>>> 272238466Srpaulo>>>Have fun, and as always, if you make good enhancements, or bug fixes, 273238466Srpaulo>>>I'd like to see them. 274238466Srpaulo>>> 275238466Srpaulo>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 276242777Shselasky>> 277238466Srpaulo>> regards, 278242777Shselasky>> joe 279242777Shselasky>> 280238466Srpaulo>>-- 281238466Srpaulo>>Full-Name: Joseph M. Orost 282188412Sthompsa>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 283188412Sthompsa>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 284188412Sthompsa>>Phone: (201) 870-5844 285188412Sthompsa