1205194SdelphijThis is a patched version of zlib, modified to use
2205194SdelphijPentium-Pro-optimized assembly code in the deflation algorithm. The
3205194Sdelphijfiles changed/added by this patch are:
4205194Sdelphij
5205194SdelphijREADME.686
6205194Sdelphijmatch.S
7205194Sdelphij
8205194SdelphijThe speedup that this patch provides varies, depending on whether the
9205194Sdelphijcompiler used to build the original version of zlib falls afoul of the
10205194SdelphijPPro's speed traps. My own tests show a speedup of around 10-20% at
11205194Sdelphijthe default compression level, and 20-30% using -9, against a version
12205194Sdelphijcompiled using gcc 2.7.2.3. Your mileage may vary.
13205194Sdelphij
14205194SdelphijNote that this code has been tailored for the PPro/PII in particular,
15205194Sdelphijand will not perform particuarly well on a Pentium.
16205194Sdelphij
17205194SdelphijIf you are using an assembler other than GNU as, you will have to
18205194Sdelphijtranslate match.S to use your assembler's syntax. (Have fun.)
19205194Sdelphij
20205194SdelphijBrian Raiter
21205194Sdelphijbreadbox@muppetlabs.com
22205194SdelphijApril, 1998
23205194Sdelphij
24205194Sdelphij
25205194SdelphijAdded for zlib 1.1.3:
26205194Sdelphij
27205194SdelphijThe patches come from
28205194Sdelphijhttp://www.muppetlabs.com/~breadbox/software/assembly.html
29205194Sdelphij
30205194SdelphijTo compile zlib with this asm file, copy match.S to the zlib directory
31205194Sdelphijthen do:
32205194Sdelphij
33205194SdelphijCFLAGS="-O3 -DASMV" ./configure
34205194Sdelphijmake OBJA=match.o
35205194Sdelphij
36205194Sdelphij
37205194SdelphijUpdate:
38205194Sdelphij
39205194SdelphijI've been ignoring these assembly routines for years, believing that
40205194Sdelphijgcc's generated code had caught up with it sometime around gcc 2.95
41205194Sdelphijand the major rearchitecting of the Pentium 4. However, I recently
42205194Sdelphijlearned that, despite what I believed, this code still has some life
43205194Sdelphijin it. On the Pentium 4 and AMD64 chips, it continues to run about 8%
44205194Sdelphijfaster than the code produced by gcc 4.1.
45205194Sdelphij
46205194SdelphijIn acknowledgement of its continuing usefulness, I've altered the
47205194Sdelphijlicense to match that of the rest of zlib. Share and Enjoy!
48205194Sdelphij
49205194SdelphijBrian Raiter
50205194Sdelphijbreadbox@muppetlabs.com
51205194SdelphijApril, 2007
52