jigit/README

423 lines
24 KiB
Plaintext

README for JTE 1.22
Steve McIntyre <steve@einval.com>
20 November 2019
License - GPL v2+. See the file COPYING for more details.
JTE - Jigdo Template Export
===========================
Introduction - jigdo and JTE
----------------------------
Jigdo is a useful tool to help in the distribution of large files like CD and
DVD images. See Richard Atterer's site for more details. Debian CDs and DVD ISO
images are published on the web in jigdo format to allow end users to download
them more efficiently.
Jigdo is generic and powerful - it can be used for any large files that are
made up of smaller files. However, to be this generic is costly. Creating jigdo
files from ISO images is quite inefficient - to work out which files are
included in the ISO image, jigdo has to calculate and compare checksums of
every possible file and every extent in the image. Essentially it has to
brute-force the image. It can take a long time to do this for a large image
(imagine a 4.5GB DVD image or a 30+GB Blu-Ray image).
I first started looking for ways to improve this back in 2004:
1. Modify jigdo so it knew about the internals of ISO images and could
efficiently scan them (bad, not very generic for jigdo)
2. Write a helper tool to dump extra information for jigdo to use alongside
the ISO image (I had a helper tool written, but modifying jigdo to use this
looked HARD)
3. Patch mkisofs/genisoimage to write .jigdo and .template files alongside the
ISO image
I completed the third of these options, and called it JTE (or Jigdo Template
Export). The code worked fine, and ran in a very small fraction of the time
taken to run genisoimage and jigdo separately. The output .jigdo and .template
files worked correctly, i.e. jigdo-file and the wrapper script jigdo-mirror
accept them and would generate an ISO image that exactly matches the original.
Debian used that code for a number of years within genisoimage, but we've since
switched over to using xorriso instead for our image building instead. It has a
lot of useful features that we want compared to genisoimage, not least a
friendly and engaged author in Thomas Scmitt!
Thomas and I and George Danchev worked together to package up my JTE code into
libjte such that xorriso could use it effectively. Xorriso has been capable of
generating jigdo files since 2010.
In late 2019, I took over maintenance of the jigdo upstream code and added
support for a new (v2) jigdo data format, using SHA256 instead of MD5
internally. See my jigdo page for more details about that. I have also updated
the JTE codebase to support this new format, of course.
As genisoimage is effectively dead at this point, I took the decision to not
add the jigdo v2 support into the genisoimage codebase. If you need to generate
jigdo v2 format, either use jigdo itself or xorriso if you'd like the
performance benefit of the libjte integration.
JTE includes a few tools:
• jigit-mkimage, a simple and very fast tool to reconstruct image files from
.jigdo and .template files. It doesn't have any logic to cope with
downloading missing files, but will list the missing files that are needed.
It is also much faster for people (like me!) who already have full local
mirrors.
• parallel-sums is a simple extra utility to generate checksums quickly and
efficiently, reading file data only once and calculating checksums using
multiple algorithms in parallel using threads.
• jigsum, jigsum-sha256 and rsyncsum are checksum tools which will output
checksums in jigdo's base64-like format rather than the normal hexadecimal
format. Useful for debugging jigdo issues.
• jigdump is a tool to dump the contents of a jigdo template or .iso.tmp
file. Useful for debugging jigdo issues.
• mkjigsnap is a utility to help with maintaining the "snapshots" that jigdo
needs if you're going to be keeping data around for users in the long term.
We use this on some Debian systems.
Why the "jigit" name? The packages and source are named jigit to match the name
of a long-dead wrapper script.
----------------------------------------------------------------------
Download
--------
The jigit source package (and hence the various binary packages it builds) is
included in the main Debian archive, so your best bet is to get binary packages
from there. Check for the current version(s) using tracker.debian.org).
Source and backported versions are in the download area [1] alongside
the current ChangeLog. All the files for download are PGP-signed for
safety. You can find my keys online if you need them [2].
jigit is maintained in git [3].
[1] https://www.einval.com/~steve/software/JTE/download/
[2] https://www.einval.com/~steve/pgp/
[3] https://git.einval.com/cgi-bin/gitweb.cgi?p=jigit.git.
----------------------------------------------------------------------
How to use JTE
To use the jigdo creation code, specify the location of the output .jigdo and
.template files alongside the ISO image. You can also specify the minimum size
beneath which files will just be dropped into the binary template file data
rather than listed as separate files to be found on the mirror, and exclude
patterns to ignore certain files in the same way. And paths in the original
filesystem can be mapped onto more global namespaces using the [Servers]
section in the .jigdo file. For example:
genisoimage -J -r -o /home/steve/test1.iso \
-jigdo-jigdo /home/steve/test1.jigdo \
-jigdo-template /home/steve/test1.template \
-jigdo-min-file-size 16384 \
-jigdo-ignore "README*" \
-jigdo-force-md5 "/pool/" \
-jigdo-map Debian=/mirror/debian \
-md5-list /home/steve/md5.list \
/mirror/jigdo-test
If the -jigdo-* options are not used, the normal genisoimage execution path is
not affected at all. The above invocation will create 3 output files (.iso,
.jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are
accepted, for multiple ignore and map patterns.
Use the -md5-list option to specify the location of a list of files and their
md5sums in normal md5sum format. genisoimage will then compare the checksum of
each file it is asked to write against the checksum of that file in the list.
It will abort on any mismatches. The MD5 list file must list all the files that
are expected to be found and listed in the output .jigdo file. The
-jigdo-force-md5 option specifies a path where all files are expected to have
an MD5 entry (e.g. /pool/). Then if any files do not have a match, they must
have been corrupted and genisoimage will abort.
----------------------------------------------------------------------
How JTE works
I hooked all the places in genisoimage where it will normally write image data.
All the normal data write calls (directory entries etc.) I simply copy through
and build into the template file. Any file data entries are instead passed
through with information about the original file. If that file is large enough
(see -jigdo-min-file-size above), I grab the filename and the MD5 of the file's
data. If that MD5, size and length match an entry in the md5-list, I can just
write a file match record into the template file (and then the jigdo file)
instead of the file data itself.
----------------------------------------------------------------------
How to use jigit-mkimage
jigit-mkimage is a faster, more minimal version of "jigdo-file make-image",
written in portable C. It takes a few options:
┌─────────┬─────────────────────────────────────────────────────────────┐
│-f <MD5 │Specify a file containing MD5sums for files we should attempt to │
│file> │use when rebuilding the image │
├─────────┼─────────────────────────────────────────────────────────────┤
│-j <jigdo │Specify the input jigdo file │
│file> │ │
├─────────┼─────────────────────────────────────────────────────────────┤
│-t │ │
│<template │Specify the input template file │
│file> │ │
├─────────┼─────────────────────────────────────────────────────────────┤
│-m <item= │Map <item> to <path> to find the files in the mirror │
│path> │ │
├─────────┼─────────────────────────────────────────────────────────────┤
│-M │Don't attempt to build the image; just verify that all the │
│<Missing │components needed are available. If some are missing, list them in │
│file> │the specified file. │
├─────────┼─────────────────────────────────────────────────────────────┤
│-v │Make the output logging more verbose. │
├─────────┼─────────────────────────────────────────────────────────────┤
│-l <log │Specify a logfile. If not specified, will log to stderr just like │
│file> │genisoimage │
├─────────┼─────────────────────────────────────────────────────────────┤
│ │Don't bother checking md5sums of the input files, or of the output │
│-q │image. │
│ │WARNING: this may lead to corrupt images, but is much faster. │
├─────────┼─────────────────────────────────────────────────────────────┤
│-s <start │Specify where to start in the image (in bytes). If not specified, │
│offset> │will start at the beginning (offset 0). Added for iso-image.pl use │
├─────────┼─────────────────────────────────────────────────────────────┤
│-e <end │Specify where to end in the image (in bytes). If not specified, │
│offset> │will run all the way to the end of the image. Added for │
│ │iso-image.pl use │
├─────────┼─────────────────────────────────────────────────────────────┤
│ │Don't attempt to reassemble the image; simply parse the image │
│-z │descriptor in the template file and print the image size. Added for│
│ │iso-image.pl use │
└─────────┴─────────────────────────────────────────────────────────────┘
Specifying a start or end offset implies -q - it's difficult to check MD5 sums
if the full image is not generated!
----------------------------------------------------------------------
(Dead) experiments
------------------
I had extra plans for JTE that never really came to fruition due to a lack of
time and energy... :-/ Check git history if you're interested.
* iso-image.pl - on-the-fly rebuild of ISO images for HTTP
iso-image.pl was a small perl wrapper script written to drive mkimage and turn
it into a CGI. It would parse the incoming request (including byte-ranges) and
call jigit-mkimage to actually generate the image pieces wanted.
This code worked, but was always too slow for production use. Each CGI request
needed to index into the ISO image independently, leading to lots and lots of
overlapping calls to decompress the template data.
* jigdoofus - a better way to do on-the-fly assembly
I started on a new project, creating a FUSE-based filesystem that would rebuild
ISOs on the fly. I decided to use a database backend and a caching system to
solve the problem of the repetitive decompression that stopped iso-image.pl. I
made some progress, but ran out of steam. Code is still in the "jigdoofus"
branch in git in case anybody ever finds it useful.
* jigit - a friendly wrapper for jigit-mkimage
Similarly to the jigdo-lite script in the jigdo package, I wanted to provide a
nicer user experience for easy downloading of Debian and Ubuntu CD images. It
worked, but never really gained much traction. It needed much more effort to
make things reliable for production use.
----------------------------------------------------------------------
External integration
--------------------
* debian-cd
The debian-cd package in Debian is what we use to generate installer CDs and
DVDs. It has supported JTE since 2005, and we still use it every day.
* cdrkit/genisoimage
genisoimage in Debian shipped with integrated JTE code for a long time, but is
basically dead upstream. Not recommended for use any more.
* xorriso
xorriso uses libjte to generate jigdo and template files, and has worked this
way since 2010.
----------------------------------------------------------------------
What's left to do?
------------------
1. Testing! :-) This is where you lot come in! Please play with this some more
and let me know if you have any problems, especially with data corruption.
2. More documentation.
----------------------------------------------------------------------
How to use JTE
To use the jigdo creation code, specify the location of the output .jigdo and
.template files alongside the ISO image. You can also specify the minimum size
beneath which files will just be dropped into the binary template file data
rather than listed as separate files to be found on the mirror, and exclude
patterns to ignore certain files in the same way. And paths in the original
filesystem can be mapped onto more global namespaces using the [Servers]
section in the .jigdo file. For example:
genisoimage -J -r -o /home/steve/test1.iso \
-jigdo-jigdo /home/steve/test1.jigdo \
-jigdo-template /home/steve/test1.template \
-jigdo-min-file-size 16384 \
-jigdo-ignore "README*" \
-jigdo-force-md5 "/pool/" \
-jigdo-map Debian=/mirror/debian \
-md5-list /home/steve/md5.list \
/mirror/jigdo-test
If the -jigdo-* options are not used, the normal genisoimage execution path is
not affected at all. The above invocation will create 3 output files (.iso,
.jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are
accepted, for multiple ignore and map patterns.
Use the -md5-list option to specify the location of a list of files and their
md5sums in normal md5sum format. genisoimage will then compare the checksum of
each file it is asked to write against the checksum of that file in the list.
It will abort on any mismatches. The MD5 list file must list all the files that
are expected to be found and listed in the output .jigdo file. The
-jigdo-force-md5 option specifies a path where all files are expected to have
an MD5 entry (e.g. /pool/). Then if any files do not have a match, they must
have been corrupted and genisoimage will abort.
----------------------------------------------------------------------
How JTE works
I hooked all the places in genisoimage where it will normally write image data.
All the normal data write calls (directory entries etc.) I simply copy through
and build into the template file. Any file data entries are instead passed
through with information about the original file. If that file is large enough
(see -jigdo-min-file-size above), I grab the filename and the MD5 of the file's
data. If that MD5, size and length match an entry in the md5-list, I can just
write a file match record into the template file (and then the jigdo file)
instead of the file data itself.
----------------------------------------------------------------------
How to use jigit-mkimage
jigit-mkimage is a faster, more minimal version of "jigdo-file make-image",
written in portable C. It takes a few options:
┌─────────┬───────────────────────────────────────────────────────────────────┐
│-f <MD5 │Specify a file containing MD5sums for files we should attempt to │
│file> │use when rebuilding the image │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-j <jigdo│Specify the input jigdo file │
│file> │ │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-t │ │
│<template│Specify the input template file │
│file> │ │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-m <item=│Map <item> to <path> to find the files in the mirror │
│path> │ │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-M │Don't attempt to build the image; just verify that all the │
│<Missing │components needed are available. If some are missing, list them in │
│file> │the specified file. │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-v │Make the output logging more verbose. │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-l <log │Specify a logfile. If not specified, will log to stderr just like │
│file> │genisoimage │
├─────────┼───────────────────────────────────────────────────────────────────┤
│ │Don't bother checking md5sums of the input files, or of the output │
│-q │image. │
│ │WARNING: this may lead to corrupt images, but is much faster. │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-s <start│Specify where to start in the image (in bytes). If not specified, │
│offset> │will start at the beginning (offset 0). Added for iso-image.pl use │
├─────────┼───────────────────────────────────────────────────────────────────┤
│-e <end │Specify where to end in the image (in bytes). If not specified, │
│offset> │will run all the way to the end of the image. Added for │
│ │iso-image.pl use │
├─────────┼───────────────────────────────────────────────────────────────────┤
│ │Don't attempt to reassemble the image; simply parse the image │
│-z │descriptor in the template file and print the image size. Added for│
│ │iso-image.pl use │
└─────────┴───────────────────────────────────────────────────────────────────┘
Specifying a start or end offset implies -q - it's difficult to check MD5 sums
if the full image is not generated!
----------------------------------------------------------------------
(Dead) experiments
I had extra plans for JTE that never really came to fruition due to a lack of
time and energy... :-/ Check git history if you're interested.
iso-image.pl - on-the-fly rebuild of ISO images for HTTP
iso-image.pl was a small perl wrapper script written to drive mkimage and turn
it into a CGI. It would parse the incoming request (including byte-ranges) and
call jigit-mkimage to actually generate the image pieces wanted.
This code worked, but was always too slow for production use. Each CGI request
needed to index into the ISO image independently, leading to lots and lots of
overlapping calls to decompress the template data.
jigdoofus - a better way to do on-the-fly assembly
I started on a new project, creating a FUSE-based filesystem that would rebuild
ISOs on the fly. I decided to use a database backend and a caching system to
solve the problem of the repetitive decompression that stopped iso-image.pl. I
made some progress, but ran out of steam. Code is still in the "jigdoofus"
branch in git in case anybody ever finds it useful.
jigit - a friendly wrapper for jigit-mkimage
Similarly to the jigdo-lite script in the jigdo package, I wanted to provide a
nicer user experience for easy downloading of Debian and Ubuntu CD images. It
worked, but never really gained much traction. It needed much more effort to
make things reliable for production use.
----------------------------------------------------------------------
External integration
debian-cd
The debian-cd package in Debian is what we use to generate installer CDs and
DVDs. It has supported JTE since 2005, and we still use it every day.
cdrkit/genisoimage
genisoimage in Debian shipped with integrated JTE code for a long time, but is
basically dead upstream. Not recommended for use any more.
xorriso
xorriso uses libjte to generate jigdo and template files, and has worked this
way since 2010.
----------------------------------------------------------------------
What's left to do?
1. Testing! :-) This is where you lot come in! Please play with this some more
and let me know if you have any problems, especially with data corruption.
2. More documentation.