1
2History of LZMA Utils and XZ Utils
3==================================
4
5Tukaani distribution
6
7    In 2005, there was a small group working on Tukaani distribution, which
8    was a Slackware fork. One of the project goals was to fit the distro on
9    a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a
10    lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped
11    form into 700 MiB with LZMA. Naturally compression ratio varied across
12    packages, but this was what we got on average.
13
14    Slackware packages have traditionally had .tgz as the filename suffix,
15    which is an abbreviation of .tar.gz. A logical naming for LZMA
16    compressed packages was .tlz, being an abbreviation of .tar.lzma.
17
18    At the end of the year 2007, there was no distribution under the
19    Tukaani project anymore, but development of LZMA Utils was kept going.
20    Still, there were .tlz packages around, because at least Vector Linux
21    (a Slackware based distribution) used LZMA for its packages.
22
23    First versions of the modified pkgtools used the LZMA_Alone tool from
24    Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need
25    to interact with LZMA_Alone directly. But people soon wanted to use
26    LZMA for other files too, and the interface of LZMA_Alone wasn't
27    comfortable for those used to gzip and bzip2.
28
29
30First steps of LZMA Utils
31
32    The first version of LZMA Utils (4.22.0) included a shell script called
33    lzmash. It was wrapper that had gzip-like command line interface. It
34    used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep,
35    zdiff, and related scripts from gzip were adapted work with LZMA and
36    were part of the first LZMA Utils release too.
37
38    LZMA Utils 4.22.0 included also lzmadec, which was a small (less than
39    10 KiB) decoder-only command line tool. It was written on top of the
40    decoder-only C code found from the LZMA SDK. lzmadec was convenient in
41    situations where LZMA_Alone (a few hundred KiB) would be too big.
42
43    lzmash and lzmadec were written by Lasse Collin.
44
45
46Second generation
47
48    The lzmash script was an ugly and not very secure hack. The last
49    version of LZMA Utils to use lzmash was 4.27.1.
50
51    LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written
52    by Ville Koskinen. It was written in C++, and used the encoder and
53    decoder from C++ LZMA SDK with little modifications. This tool replaced
54    both the lzmash script and the LZMA_Alone command line tool in LZMA
55    Utils.
56
57    Introducing this new tool caused some temporary incompatibilities,
58    because LZMA_Alone executable was simply named lzma like the new
59    command line tool, but they had completely different command line
60    interface. The file format was still the same.
61
62    Lasse wrote liblzmadec, which was a small decoder-only library based
63    on the C code found from LZMA SDK. liblzmadec had API similar to zlib,
64    although there were some significant differences, which made it
65    non-trivial to use it in some applications designed for zlib and
66    libbzip2.
67
68    The lzmadec command line tool was converted to use liblzmadec.
69
70    Alexandre Sauvé helped converting build system to use GNU Autotools.
71    This made is easier to test for certain less portable features needed
72    by the new command line tool.
73
74    Since the new command line tool never got completely finished (for
75    example, it didn't support LZMA_OPT environment variable), the intent
76    was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished,
77    but appeared to work well enough, so some people started using it too.
78
79    Because the development of the third generation of LZMA Utils was
80    delayed considerably (3-4 years), the 4.32.x branch had to be kept
81    maintained. It got some bug fixes now and then, and finally it was
82    decided to call it stable, although most of the missing features were
83    never added.
84
85
86File format problems
87
88    The file format used by LZMA_Alone was primitive. It was designed for
89    embedded systems in mind, and thus provided only minimal set of
90    features. The two biggest problems for non-embedded use were lack of
91    magic bytes and integrity check.
92
93    Igor and Lasse started developing a new file format with some help
94    from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin,
95    and Lars Wirzenius helped with some minor things at some point of the
96    development. Designing the new format took quite a long time (actually,
97    too long time would be more appropriate expression). It was mostly
98    because Lasse was quite slow at getting things done due to personal
99    reasons.
100
101    Originally the new format was supposed to use the same .lzma suffix
102    that was already used by the old file format. Switching to the new
103    format wouldn't have caused much trouble when the old format wasn't
104    used by many people. But since the development of the new format took
105    so long time, the old format got quite popular, and it was decided
106    that the new file format must use a different suffix.
107
108    It was decided to use .xz as the suffix of the new file format. The
109    first stable .xz file format specification was finally released in
110    December 2008. In addition to fixing the most obvious problems of
111    the old .lzma format, the .xz format added some new features like
112    support for multiple filters (compression algorithms), filter chaining
113    (like piping on the command line), and limited random-access reading.
114
115    Currently the primary compression algorithm used in .xz is LZMA2.
116    It is an extension on top of the original LZMA to fix some practical
117    problems: LZMA2 adds support for flushing the encoder, uncompressed
118    chunks, eases stateful decoder implementations, and improves support
119    for multithreading. Since LZMA2 is better than the original LZMA, the
120    original LZMA is not supported in .xz.
121
122
123Transition to XZ Utils
124
125    The early versions of XZ Utils were called LZMA Utils. The first
126    releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK.
127    The code was still directly based on LZMA SDK but ported to C and
128    converted from callback API to stateful API. Later, Igor Pavlov made
129    C version of the LZMA encoder too; these ports from C++ to C were
130    independent in LZMA SDK and LZMA Utils.
131
132    The core of the new LZMA Utils was liblzma, a compression library with
133    zlib-like API. liblzma supported both the old and new file format. The
134    gzip-like lzma command line tool was rewritten to use liblzma.
135
136    The new LZMA Utils code base was renamed to XZ Utils when the name
137    of the new file format had been decided. The liblzma compression
138    library retained its name though, because changing it would have
139    caused unnecessary breakage in applications already using the early
140    liblzma snapshots.
141
142    The xz command line tool can emulate the gzip-like lzma tool by
143    creating appropriate symlinks (e.g. lzma -> xz). Thus, practically
144    all scripts using the lzma tool from LZMA Utils will work as is with
145    XZ Utils (and will keep using the old .lzma format). Still, the .lzma
146    format is more or less deprecated. XZ Utils will keep supporting it,
147    but new applications should use the .xz format, and migrating old
148    applications to .xz is often a good idea too.
149
150