1184610Salfred
2184610Salfred	@(#)README	8.1 (Berkeley) 6/9/93
3184610Salfred  $FreeBSD$
4184610Salfred
5184610SalfredCompress version 4.0 improvements over 3.0:
6184610Salfred	o compress() speedup (10-50%) by changing division hash to xor
7184610Salfred	o decompress() speedup (5-10%)
8184610Salfred	o Memory requirements reduced (3-30%)
9184610Salfred	o Stack requirements reduced to less than 4kb
10184610Salfred	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
11184610Salfred    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
12184610Salfred	o Default to 'quiet' mode
13184610Salfred	o Unification of 'force' flags
14184610Salfred	o Manual page overhaul
15184610Salfred	o Portability enhancement for M_XENIX
16184610Salfred	o Removed text on #else and #endif
17184610Salfred	o Added "-V" switch to print version and options
18184610Salfred	o Added #defines for SIGNED_COMPARE_SLOW
19184610Salfred	o Added Makefile and "usermem" program
20184610Salfred	o Removed all floating point computations
21184610Salfred	o New programs: [deleted]
22184610Salfred
23184610SalfredThe "usermem" script attempts to determine the maximum process size.  Some
24184610Salfredediting of the script may be necessary (see the comments).  [It should work
25184610Salfredfine on 4.3 BSD.] If you can't get it to work at all, just create file
26184610Salfred"USERMEM" containing the maximum process size in decimal.
27184610Salfred
28184610SalfredThe following preprocessor symbols control the compilation of "compress.c":
29184610Salfred
30184610Salfred	o USERMEM		Maximum process memory on the system
31184610Salfred	o SACREDMEM		Amount to reserve for other processes
32184610Salfred	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
33184610Salfred	o NO_UCHAR		Don't use "unsigned char" types
34184610Salfred	o BITS			Overrules default set by USERMEM-SACREDMEM
35184610Salfred	o vax			Generate inline assembler
36184610Salfred	o interdata		Defines SIGNED_COMPARE_SLOW
37254404Skevlo	o M_XENIX		Makes arrays < 65536 bytes each
38184610Salfred	o pdp11			BITS=12, NO_UCHAR
39184610Salfred	o z8000			BITS=12
40184610Salfred	o pcxt			BITS=12
41184610Salfred	o BSD4_2		Allow long filenames ( > 14 characters) &
42184610Salfred				Call setlinebuf(stderr)
43184610Salfred
44184610SalfredThe difference "usermem-sacredmem" determines the maximum BITS that can be
45184610Salfredspecified with the "-b" flag.
46184610Salfred
47184610Salfredmemory: at least		BITS
48184610Salfred------  -- -----                ----
49194677Sthompsa     433,484			 16
50194677Sthompsa     229,600			 15
51194677Sthompsa     127,536			 14
52194677Sthompsa      73,464			 13
53194677Sthompsa           0			 12
54194677Sthompsa
55194677SthompsaThe default is BITS=16.
56194677Sthompsa
57194677SthompsaThe maximum bits can be overruled by specifying "-DBITS=bits" at
58194677Sthompsacompilation time.
59194677Sthompsa
60194677SthompsaWARNING: files compressed on a large machine with more bits than allowed by 
61194677Sthompsaa version of compress on a smaller machine cannot be decompressed!  Use the
62194677Sthompsa"-b12" flag to generate a file on a large machine that can be uncompressed 
63194677Sthompsaon a 16-bit machine.
64194677Sthompsa
65194677SthompsaThe output of compress 4.0 is fully compatible with that of compress 3.0.
66194677SthompsaIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
67194677Sthompsathe output of compress 3.0 may be fed into uncompress 4.0.
68194677Sthompsa
69194677SthompsaThe output of compress 4.0 not compatible with that of
70194677Sthompsacompress 2.0.  However, compress 4.0 still accepts the output of
71188746Sthompsacompress 2.0.  To generate output that is compatible with compress
72184610Salfred2.0, use the undocumented "-C" flag.
73184610Salfred
74194677Sthompsa	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
75188942Sthompsa--------------------------------
76184610Salfred
77188942SthompsaEnclosed is compress version 3.0 with the following changes:
78188942Sthompsa
79184610Salfred1.	"Block" compression is performed.  After the BITS run out, the
80184610Salfred	compression ratio is checked every so often.  If it is decreasing,
81184610Salfred	the table is cleared and a new set of substrings are generated.
82184610Salfred
83184610Salfred	This makes the output of compress 3.0 not compatible with that of
84184610Salfred	compress 2.0.  However, compress 3.0 still accepts the output of
85184610Salfred	compress 2.0.  To generate output that is compatible with compress
86193045Sthompsa	2.0, use the undocumented "-C" flag.
87193045Sthompsa
88193045Sthompsa2.	A quiet "-q" flag has been added for use by the news system.
89184610Salfred
90193045Sthompsa3.	The character chaining has been deleted and the program now uses
91193045Sthompsa	hashing.  This improves the speed of the program, especially
92193045Sthompsa	during decompression.  Other speed improvements have been made,
93193045Sthompsa	such as using putc() instead of fwrite().
94193045Sthompsa
95193045Sthompsa4.	A large table is used on large machines when a relatively small
96193045Sthompsa	number of bits is specified.  This saves much time when compressing
97184610Salfred	for a 16-bit machine on a 32-bit virtual machine.  Note that the
98188412Sthompsa	speed improvement only occurs when the input file is > 30000
99188412Sthompsa	characters, and the -b BITS is less than or equal to the cutoff
100188412Sthompsa	described below.
101188412Sthompsa
102188412SthompsaMost of these changes were made by James A. Woods (ames!jaw).  Thank you
103188412SthompsaJames!
104188412Sthompsa
105184610SalfredTo compile compress:
106188412Sthompsa
107188412Sthompsa	cc -O -DUSERMEM=usermem -o compress compress.c
108188412Sthompsa
109184610SalfredWhere "usermem" is the amount of physical user memory available (in bytes).  
110192984SthompsaIf any physical memory is to be reserved for other processes, put in 
111184610Salfred"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
112187259Sthompsa
113184610SalfredThe difference "usermem-sacredmem" determines the maximum BITS that can be
114184610Salfredspecified, and the cutoff bits where the large+fast table is used.
115184610Salfred
116190734Sthompsamemory: at least		BITS		cutoff
117190734Sthompsa------  -- -----                ----            ------
118190734Sthompsa   4,718,592 			 16		  13
119190734Sthompsa   2,621,440 			 16		  12
120184610Salfred   1,572,864			 16		  11
121184610Salfred   1,048,576			 16		  10
122187259Sthompsa     631,808			 16               --
123184610Salfred     329,728			 15               --
124184610Salfred     178,176			 14		  --
125184610Salfred      99,328			 13		  --
126190734Sthompsa           0			 12		  --
127190734Sthompsa
128190734SthompsaThe default memory size is 750,000 which gives a maximum BITS=16 and no
129190734Sthompsalarge+fast table.
130184610Salfred
131184610SalfredThe maximum bits can be overruled by specifying "-DBITS=bits" at
132187259Sthompsacompilation time.
133184610Salfred
134184610SalfredIf your machine doesn't support unsigned characters, define "NO_UCHAR" 
135184610Salfredwhen compiling.
136190734Sthompsa
137190734SthompsaIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
138190734Sthompsa
139184610SalfredAfter compilation, move "compress" to a standard executable location, such 
140184610Salfredas /usr/local.  Then:
141184610Salfred	cd /usr/local
142184610Salfred	ln compress uncompress
143184610Salfred	ln compress zcat
144184610Salfred
145184610SalfredOn machines that have a fixed stack size (such as Perkin-Elmer), set the
146184610Salfredstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
147184610Salfred
148184610SalfredNext, install the manual (compress.l).
149188412Sthompsa	cp compress.l /usr/man/manl
150188412Sthompsa	cd /usr/man/manl
151188412Sthompsa	ln compress.l uncompress.l
152184610Salfred	ln compress.l zcat.l
153227843Smarius
154184610Salfred		- or -
155184610Salfred
156184610Salfred	cp compress.l /usr/man/man1/compress.1
157184610Salfred	cd /usr/man/man1
158184610Salfred	ln compress.1 uncompress.1
159184610Salfred	ln compress.1 zcat.1
160184610Salfred
161184610Salfred					regards,
162184610Salfred					petsd!joe
163184610Salfred
164189275SthompsaHere is a note from the net:
165184610Salfred
166188942Sthompsa>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
167188942SthompsaPath: ames!hplabs!pesnta!amd!turtlevax!ken
168184610SalfredFrom: ken@turtlevax.UUCP (Ken Turkowski)
169184610SalfredNewsgroups: net.sources
170212122SthompsaSubject: Re: Compress release 3.0 : sample Makefile
171184610SalfredOrganization: CADLINC, Inc. @ Menlo Park, CA
172242777Shselasky
173188412SthompsaIn the compress 3.0 source recently posted to mod.sources, there is a
174188412Sthompsa#define variable which can be set for optimum performance on a machine
175188412Sthompsawith a large amount of memory.  A program (usermem) to calculate the
176188412Sthompsausable amount of physical user memory is enclosed, as well as a sample
177188412Sthompsa4.2BSD Vax Makefile for compress.
178188412Sthompsa
179188412SthompsaHere is the README file from the previous version of compress (2.0):
180188412Sthompsa
181188412Sthompsa>Enclosed is compress.c version 2.0 with the following bugs fixed:
182188412Sthompsa>
183188412Sthompsa>1.	The packed files produced by compress are different on different
184242777Shselasky>	machines and dependent on the vax sysgen option.
185242777Shselasky>		The bug was in the different byte/bit ordering on the
186242777Shselasky>		various machines.  This has been fixed.
187242777Shselasky>
188242777Shselasky>		This version is NOT compatible with the original vax posting
189242777Shselasky>		unless the '-DCOMPATIBLE' option is specified to the C
190242777Shselasky>		compiler.  The original posting has a bug which I fixed, 
191242777Shselasky>		causing incompatible files.  I recommend you NOT to use this
192242777Shselasky>		option unless you already have a lot of packed files from
193207077Sthompsa>		the original posting by Thomas.
194184610Salfred>2.	The exit status is not well defined (on some machines) causing the
195184610Salfred>	scripts to fail.
196227309Sed>		The exit status is now 0,1 or 2 and is documented in
197192502Sthompsa>		compress.l.
198184610Salfred>3.	The function getopt() is not available in all C libraries.
199184610Salfred>		The function getopt() is no longer referenced by the
200184610Salfred>		program.
201188412Sthompsa>4.	Error status is not being checked on the fwrite() and fflush() calls.
202188412Sthompsa>		Fixed.
203184610Salfred>
204188412Sthompsa>The following enhancements have been made:
205188412Sthompsa>
206184610Salfred>1.	Added facilities of "compact" into the compress program.  "Pack",
207223486Shselasky>	"Unpack", and "Pcat" are no longer required (no longer supplied).
208184610Salfred>2.	Installed work around for C compiler bug with "-O".
209184610Salfred>3.	Added a magic number header (\037\235).  Put the bits specified
210184610Salfred>	in the file.
211184610Salfred>4.	Added "-f" flag to force overwrite of output file.
212184610Salfred>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
213184610Salfred>	compile.
214218864Shselasky>6.	The 'uncompress' script has been deleted; simply 
215218864Shselasky>	'ln compress uncompress' after you compile and it will work.
216218864Shselasky>7.	Removed extra bit masking for machines that support unsigned
217218864Shselasky>	characters.  If your machine doesn't support unsigned characters,
218238466Srpaulo>	define "NO_UCHAR" when compiling.
219238466Srpaulo>
220184610Salfred>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
221184610Salfred>standard executable location, such as /usr/local.  Then:
222188412Sthompsa>	cd /usr/local
223192984Sthompsa>	ln compress uncompress
224188412Sthompsa>	ln compress zcat
225194228Sthompsa>
226188412Sthompsa>On machines that have a fixed stack size (such as Perkin-Elmer), set the
227188412Sthompsa>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
228188412Sthompsa>
229188412Sthompsa>Next, install the manual (compress.l).
230188412Sthompsa>	cp compress.l /usr/man/manl		- or -
231188412Sthompsa>	cp compress.l /usr/man/man1/compress.1
232188412Sthompsa>
233188412Sthompsa>Here is the README that I sent with my first posting:
234184610Salfred>
235184610Salfred>>Enclosed is a modified version of compress.c, along with scripts to make it
236184610Salfred>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
237192984Sthompsa>>(petsd!joe) and a colleague (petsd!peora!srd) did:
238184610Salfred>>
239192499Sthompsa>>1. Removed VAX dependencies.
240184610Salfred>>2. Changed the struct to separate arrays; saves mucho memory.
241188412Sthompsa>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
242184610Salfred>>4. Sorted the character next chain and changed the search to stop
243188412Sthompsa>>prematurely.  This saves a lot on the execution time when compressing.
244184610Salfred>>
245188412Sthompsa>>This version is totally compatible with the original version.  Even though
246194228Sthompsa>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
247184610Salfred>>machine, due to the size of the arrays.
248184610Salfred>>
249184610Salfred>>Here is the README file from the original author:
250184610Salfred>> 
251184610Salfred>>>Well, with all this discussion about file compression (for news batching
252192984Sthompsa>>>in particular) going around, I decided to implement the text compression
253184610Salfred>>>algorithm described in the June Computer magazine.  The author claimed
254192984Sthompsa>>>blinding speed and good compression ratios.  It's certainly faster than
255184610Salfred>>>compact (but, then, what wouldn't be), but it's also the same speed as
256188412Sthompsa>>>pack, and gets better compression than both of them.  On 350K bytes of
257184610Salfred>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
258184610Salfred>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
259184610Salfred>>>pack got about 30% compression, whereas compress got over 50%.  So, I
260194228Sthompsa>>>decided I had something, and that others might be interested, too.
261184610Salfred>>>
262188412Sthompsa>>>As is probably true of compact and pack (although I haven't checked),
263184610Salfred>>>the byte order within a word is probably relevant here, but as long as
264184610Salfred>>>you stay on a single machine type, you should be ok.  (Can anybody
265194228Sthompsa>>>elucidate on this?)  There are a couple of asm's in the code (extv and
266187259Sthompsa>>>insv instructions), so anyone porting it to another machine will have to
267184610Salfred>>>deal with this anyway (and could probably make it compatible with Vax
268199816Sthompsa>>>byte order at the same time).  Anyway, I've linted the code (both with
269184610Salfred>>>and without -p), so it should run elsewhere.  Note the longs in the
270184610Salfred>>>code, you can take these out if you reduce BITS to <= 15.
271188412Sthompsa>>>
272238466Srpaulo>>>Have fun, and as always, if you make good enhancements, or bug fixes,
273238466Srpaulo>>>I'd like to see them.
274238466Srpaulo>>>
275238466Srpaulo>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
276242777Shselasky>>
277238466Srpaulo>>					regards,
278242777Shselasky>>					joe
279242777Shselasky>>
280238466Srpaulo>>--
281238466Srpaulo>>Full-Name:  Joseph M. Orost
282188412Sthompsa>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
283188412Sthompsa>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
284188412Sthompsa>>Phone:      (201) 870-5844
285188412Sthompsa