423 lines
24 KiB
Plaintext
423 lines
24 KiB
Plaintext
README for JTE 1.22
|
|
|
|
Steve McIntyre <steve@einval.com>
|
|
20 November 2019
|
|
|
|
License - GPL v2+. See the file COPYING for more details.
|
|
|
|
JTE - Jigdo Template Export
|
|
===========================
|
|
|
|
Introduction - jigdo and JTE
|
|
----------------------------
|
|
|
|
Jigdo is a useful tool to help in the distribution of large files like CD and
|
|
DVD images. See Richard Atterer's site for more details. Debian CDs and DVD ISO
|
|
images are published on the web in jigdo format to allow end users to download
|
|
them more efficiently.
|
|
|
|
Jigdo is generic and powerful - it can be used for any large files that are
|
|
made up of smaller files. However, to be this generic is costly. Creating jigdo
|
|
files from ISO images is quite inefficient - to work out which files are
|
|
included in the ISO image, jigdo has to calculate and compare checksums of
|
|
every possible file and every extent in the image. Essentially it has to
|
|
brute-force the image. It can take a long time to do this for a large image
|
|
(imagine a 4.5GB DVD image or a 30+GB Blu-Ray image).
|
|
|
|
I first started looking for ways to improve this back in 2004:
|
|
|
|
1. Modify jigdo so it knew about the internals of ISO images and could
|
|
efficiently scan them (bad, not very generic for jigdo)
|
|
2. Write a helper tool to dump extra information for jigdo to use alongside
|
|
the ISO image (I had a helper tool written, but modifying jigdo to use this
|
|
looked HARD)
|
|
3. Patch mkisofs/genisoimage to write .jigdo and .template files alongside the
|
|
ISO image
|
|
|
|
I completed the third of these options, and called it JTE (or Jigdo Template
|
|
Export). The code worked fine, and ran in a very small fraction of the time
|
|
taken to run genisoimage and jigdo separately. The output .jigdo and .template
|
|
files worked correctly, i.e. jigdo-file and the wrapper script jigdo-mirror
|
|
accept them and would generate an ISO image that exactly matches the original.
|
|
|
|
Debian used that code for a number of years within genisoimage, but we've since
|
|
switched over to using xorriso instead for our image building instead. It has a
|
|
lot of useful features that we want compared to genisoimage, not least a
|
|
friendly and engaged author in Thomas Scmitt!
|
|
|
|
Thomas and I and George Danchev worked together to package up my JTE code into
|
|
libjte such that xorriso could use it effectively. Xorriso has been capable of
|
|
generating jigdo files since 2010.
|
|
|
|
In late 2019, I took over maintenance of the jigdo upstream code and added
|
|
support for a new (v2) jigdo data format, using SHA256 instead of MD5
|
|
internally. See my jigdo page for more details about that. I have also updated
|
|
the JTE codebase to support this new format, of course.
|
|
|
|
As genisoimage is effectively dead at this point, I took the decision to not
|
|
add the jigdo v2 support into the genisoimage codebase. If you need to generate
|
|
jigdo v2 format, either use jigdo itself or xorriso if you'd like the
|
|
performance benefit of the libjte integration.
|
|
|
|
JTE includes a few tools:
|
|
|
|
• jigit-mkimage, a simple and very fast tool to reconstruct image files from
|
|
.jigdo and .template files. It doesn't have any logic to cope with
|
|
downloading missing files, but will list the missing files that are needed.
|
|
It is also much faster for people (like me!) who already have full local
|
|
mirrors.
|
|
• parallel-sums is a simple extra utility to generate checksums quickly and
|
|
efficiently, reading file data only once and calculating checksums using
|
|
multiple algorithms in parallel using threads.
|
|
• jigsum, jigsum-sha256 and rsyncsum are checksum tools which will output
|
|
checksums in jigdo's base64-like format rather than the normal hexadecimal
|
|
format. Useful for debugging jigdo issues.
|
|
• jigdump is a tool to dump the contents of a jigdo template or .iso.tmp
|
|
file. Useful for debugging jigdo issues.
|
|
• mkjigsnap is a utility to help with maintaining the "snapshots" that jigdo
|
|
needs if you're going to be keeping data around for users in the long term.
|
|
We use this on some Debian systems.
|
|
|
|
Why the "jigit" name? The packages and source are named jigit to match the name
|
|
of a long-dead wrapper script.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Download
|
|
--------
|
|
|
|
The jigit source package (and hence the various binary packages it builds) is
|
|
included in the main Debian archive, so your best bet is to get binary packages
|
|
from there. Check for the current version(s) using tracker.debian.org).
|
|
|
|
Source and backported versions are in the download area [1] alongside
|
|
the current ChangeLog. All the files for download are PGP-signed for
|
|
safety. You can find my keys online if you need them [2].
|
|
|
|
jigit is maintained in git [3].
|
|
|
|
[1] https://www.einval.com/~steve/software/JTE/download/
|
|
[2] https://www.einval.com/~steve/pgp/
|
|
[3] https://git.einval.com/cgi-bin/gitweb.cgi?p=jigit.git.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How to use JTE
|
|
|
|
To use the jigdo creation code, specify the location of the output .jigdo and
|
|
.template files alongside the ISO image. You can also specify the minimum size
|
|
beneath which files will just be dropped into the binary template file data
|
|
rather than listed as separate files to be found on the mirror, and exclude
|
|
patterns to ignore certain files in the same way. And paths in the original
|
|
filesystem can be mapped onto more global namespaces using the [Servers]
|
|
section in the .jigdo file. For example:
|
|
|
|
genisoimage -J -r -o /home/steve/test1.iso \
|
|
-jigdo-jigdo /home/steve/test1.jigdo \
|
|
-jigdo-template /home/steve/test1.template \
|
|
-jigdo-min-file-size 16384 \
|
|
-jigdo-ignore "README*" \
|
|
-jigdo-force-md5 "/pool/" \
|
|
-jigdo-map Debian=/mirror/debian \
|
|
-md5-list /home/steve/md5.list \
|
|
/mirror/jigdo-test
|
|
|
|
If the -jigdo-* options are not used, the normal genisoimage execution path is
|
|
not affected at all. The above invocation will create 3 output files (.iso,
|
|
.jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are
|
|
accepted, for multiple ignore and map patterns.
|
|
|
|
Use the -md5-list option to specify the location of a list of files and their
|
|
md5sums in normal md5sum format. genisoimage will then compare the checksum of
|
|
each file it is asked to write against the checksum of that file in the list.
|
|
It will abort on any mismatches. The MD5 list file must list all the files that
|
|
are expected to be found and listed in the output .jigdo file. The
|
|
-jigdo-force-md5 option specifies a path where all files are expected to have
|
|
an MD5 entry (e.g. /pool/). Then if any files do not have a match, they must
|
|
have been corrupted and genisoimage will abort.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How JTE works
|
|
|
|
I hooked all the places in genisoimage where it will normally write image data.
|
|
All the normal data write calls (directory entries etc.) I simply copy through
|
|
and build into the template file. Any file data entries are instead passed
|
|
through with information about the original file. If that file is large enough
|
|
(see -jigdo-min-file-size above), I grab the filename and the MD5 of the file's
|
|
data. If that MD5, size and length match an entry in the md5-list, I can just
|
|
write a file match record into the template file (and then the jigdo file)
|
|
instead of the file data itself.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How to use jigit-mkimage
|
|
|
|
jigit-mkimage is a faster, more minimal version of "jigdo-file make-image",
|
|
written in portable C. It takes a few options:
|
|
|
|
┌─────────┬─────────────────────────────────────────────────────────────┐
|
|
│-f <MD5 │Specify a file containing MD5sums for files we should attempt to │
|
|
│file> │use when rebuilding the image │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-j <jigdo │Specify the input jigdo file │
|
|
│file> │ │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-t │ │
|
|
│<template │Specify the input template file │
|
|
│file> │ │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-m <item= │Map <item> to <path> to find the files in the mirror │
|
|
│path> │ │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-M │Don't attempt to build the image; just verify that all the │
|
|
│<Missing │components needed are available. If some are missing, list them in │
|
|
│file> │the specified file. │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-v │Make the output logging more verbose. │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-l <log │Specify a logfile. If not specified, will log to stderr just like │
|
|
│file> │genisoimage │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│ │Don't bother checking md5sums of the input files, or of the output │
|
|
│-q │image. │
|
|
│ │WARNING: this may lead to corrupt images, but is much faster. │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-s <start │Specify where to start in the image (in bytes). If not specified, │
|
|
│offset> │will start at the beginning (offset 0). Added for iso-image.pl use │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│-e <end │Specify where to end in the image (in bytes). If not specified, │
|
|
│offset> │will run all the way to the end of the image. Added for │
|
|
│ │iso-image.pl use │
|
|
├─────────┼─────────────────────────────────────────────────────────────┤
|
|
│ │Don't attempt to reassemble the image; simply parse the image │
|
|
│-z │descriptor in the template file and print the image size. Added for│
|
|
│ │iso-image.pl use │
|
|
└─────────┴─────────────────────────────────────────────────────────────┘
|
|
|
|
Specifying a start or end offset implies -q - it's difficult to check MD5 sums
|
|
if the full image is not generated!
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
(Dead) experiments
|
|
------------------
|
|
|
|
I had extra plans for JTE that never really came to fruition due to a lack of
|
|
time and energy... :-/ Check git history if you're interested.
|
|
|
|
* iso-image.pl - on-the-fly rebuild of ISO images for HTTP
|
|
|
|
iso-image.pl was a small perl wrapper script written to drive mkimage and turn
|
|
it into a CGI. It would parse the incoming request (including byte-ranges) and
|
|
call jigit-mkimage to actually generate the image pieces wanted.
|
|
|
|
This code worked, but was always too slow for production use. Each CGI request
|
|
needed to index into the ISO image independently, leading to lots and lots of
|
|
overlapping calls to decompress the template data.
|
|
|
|
* jigdoofus - a better way to do on-the-fly assembly
|
|
|
|
I started on a new project, creating a FUSE-based filesystem that would rebuild
|
|
ISOs on the fly. I decided to use a database backend and a caching system to
|
|
solve the problem of the repetitive decompression that stopped iso-image.pl. I
|
|
made some progress, but ran out of steam. Code is still in the "jigdoofus"
|
|
branch in git in case anybody ever finds it useful.
|
|
|
|
* jigit - a friendly wrapper for jigit-mkimage
|
|
|
|
Similarly to the jigdo-lite script in the jigdo package, I wanted to provide a
|
|
nicer user experience for easy downloading of Debian and Ubuntu CD images. It
|
|
worked, but never really gained much traction. It needed much more effort to
|
|
make things reliable for production use.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
External integration
|
|
--------------------
|
|
|
|
* debian-cd
|
|
|
|
The debian-cd package in Debian is what we use to generate installer CDs and
|
|
DVDs. It has supported JTE since 2005, and we still use it every day.
|
|
|
|
* cdrkit/genisoimage
|
|
|
|
genisoimage in Debian shipped with integrated JTE code for a long time, but is
|
|
basically dead upstream. Not recommended for use any more.
|
|
|
|
* xorriso
|
|
|
|
xorriso uses libjte to generate jigdo and template files, and has worked this
|
|
way since 2010.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
What's left to do?
|
|
------------------
|
|
|
|
1. Testing! :-) This is where you lot come in! Please play with this some more
|
|
and let me know if you have any problems, especially with data corruption.
|
|
2. More documentation.
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How to use JTE
|
|
|
|
To use the jigdo creation code, specify the location of the output .jigdo and
|
|
.template files alongside the ISO image. You can also specify the minimum size
|
|
beneath which files will just be dropped into the binary template file data
|
|
rather than listed as separate files to be found on the mirror, and exclude
|
|
patterns to ignore certain files in the same way. And paths in the original
|
|
filesystem can be mapped onto more global namespaces using the [Servers]
|
|
section in the .jigdo file. For example:
|
|
|
|
genisoimage -J -r -o /home/steve/test1.iso \
|
|
-jigdo-jigdo /home/steve/test1.jigdo \
|
|
-jigdo-template /home/steve/test1.template \
|
|
-jigdo-min-file-size 16384 \
|
|
-jigdo-ignore "README*" \
|
|
-jigdo-force-md5 "/pool/" \
|
|
-jigdo-map Debian=/mirror/debian \
|
|
-md5-list /home/steve/md5.list \
|
|
/mirror/jigdo-test
|
|
|
|
If the -jigdo-* options are not used, the normal genisoimage execution path is
|
|
not affected at all. The above invocation will create 3 output files (.iso,
|
|
.jigdo and .template). Multiple -jigdo-ignore and -jigdo-map options are
|
|
accepted, for multiple ignore and map patterns.
|
|
|
|
Use the -md5-list option to specify the location of a list of files and their
|
|
md5sums in normal md5sum format. genisoimage will then compare the checksum of
|
|
each file it is asked to write against the checksum of that file in the list.
|
|
It will abort on any mismatches. The MD5 list file must list all the files that
|
|
are expected to be found and listed in the output .jigdo file. The
|
|
-jigdo-force-md5 option specifies a path where all files are expected to have
|
|
an MD5 entry (e.g. /pool/). Then if any files do not have a match, they must
|
|
have been corrupted and genisoimage will abort.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How JTE works
|
|
|
|
I hooked all the places in genisoimage where it will normally write image data.
|
|
All the normal data write calls (directory entries etc.) I simply copy through
|
|
and build into the template file. Any file data entries are instead passed
|
|
through with information about the original file. If that file is large enough
|
|
(see -jigdo-min-file-size above), I grab the filename and the MD5 of the file's
|
|
data. If that MD5, size and length match an entry in the md5-list, I can just
|
|
write a file match record into the template file (and then the jigdo file)
|
|
instead of the file data itself.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
How to use jigit-mkimage
|
|
|
|
jigit-mkimage is a faster, more minimal version of "jigdo-file make-image",
|
|
written in portable C. It takes a few options:
|
|
|
|
┌─────────┬───────────────────────────────────────────────────────────────────┐
|
|
│-f <MD5 │Specify a file containing MD5sums for files we should attempt to │
|
|
│file> │use when rebuilding the image │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-j <jigdo│Specify the input jigdo file │
|
|
│file> │ │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-t │ │
|
|
│<template│Specify the input template file │
|
|
│file> │ │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-m <item=│Map <item> to <path> to find the files in the mirror │
|
|
│path> │ │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-M │Don't attempt to build the image; just verify that all the │
|
|
│<Missing │components needed are available. If some are missing, list them in │
|
|
│file> │the specified file. │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-v │Make the output logging more verbose. │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-l <log │Specify a logfile. If not specified, will log to stderr just like │
|
|
│file> │genisoimage │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│ │Don't bother checking md5sums of the input files, or of the output │
|
|
│-q │image. │
|
|
│ │WARNING: this may lead to corrupt images, but is much faster. │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-s <start│Specify where to start in the image (in bytes). If not specified, │
|
|
│offset> │will start at the beginning (offset 0). Added for iso-image.pl use │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│-e <end │Specify where to end in the image (in bytes). If not specified, │
|
|
│offset> │will run all the way to the end of the image. Added for │
|
|
│ │iso-image.pl use │
|
|
├─────────┼───────────────────────────────────────────────────────────────────┤
|
|
│ │Don't attempt to reassemble the image; simply parse the image │
|
|
│-z │descriptor in the template file and print the image size. Added for│
|
|
│ │iso-image.pl use │
|
|
└─────────┴───────────────────────────────────────────────────────────────────┘
|
|
|
|
Specifying a start or end offset implies -q - it's difficult to check MD5 sums
|
|
if the full image is not generated!
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
(Dead) experiments
|
|
|
|
I had extra plans for JTE that never really came to fruition due to a lack of
|
|
time and energy... :-/ Check git history if you're interested.
|
|
|
|
iso-image.pl - on-the-fly rebuild of ISO images for HTTP
|
|
|
|
iso-image.pl was a small perl wrapper script written to drive mkimage and turn
|
|
it into a CGI. It would parse the incoming request (including byte-ranges) and
|
|
call jigit-mkimage to actually generate the image pieces wanted.
|
|
|
|
This code worked, but was always too slow for production use. Each CGI request
|
|
needed to index into the ISO image independently, leading to lots and lots of
|
|
overlapping calls to decompress the template data.
|
|
|
|
jigdoofus - a better way to do on-the-fly assembly
|
|
|
|
I started on a new project, creating a FUSE-based filesystem that would rebuild
|
|
ISOs on the fly. I decided to use a database backend and a caching system to
|
|
solve the problem of the repetitive decompression that stopped iso-image.pl. I
|
|
made some progress, but ran out of steam. Code is still in the "jigdoofus"
|
|
branch in git in case anybody ever finds it useful.
|
|
|
|
jigit - a friendly wrapper for jigit-mkimage
|
|
|
|
Similarly to the jigdo-lite script in the jigdo package, I wanted to provide a
|
|
nicer user experience for easy downloading of Debian and Ubuntu CD images. It
|
|
worked, but never really gained much traction. It needed much more effort to
|
|
make things reliable for production use.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
External integration
|
|
|
|
debian-cd
|
|
|
|
The debian-cd package in Debian is what we use to generate installer CDs and
|
|
DVDs. It has supported JTE since 2005, and we still use it every day.
|
|
|
|
cdrkit/genisoimage
|
|
|
|
genisoimage in Debian shipped with integrated JTE code for a long time, but is
|
|
basically dead upstream. Not recommended for use any more.
|
|
|
|
xorriso
|
|
|
|
xorriso uses libjte to generate jigdo and template files, and has worked this
|
|
way since 2010.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
What's left to do?
|
|
|
|
1. Testing! :-) This is where you lot come in! Please play with this some more
|
|
and let me know if you have any problems, especially with data corruption.
|
|
2. More documentation.
|
|
|