torrentcheck - catalog a .torrent file and optionally verify content hashes Usage: torrentcheck -t torrent-file [-p content-path] [-n] [-h] [-c] [-d] Options: -n suppresses progress count, -h shows all hash values, -c or -d uses comma or dot formatted byte counts. Returns 0 if successful, nonzero return code if errors found. Option: -sha1 [optional hash] acts as a simple SHA1 filter. If -sha1 is followed by a hex hash, the return code will be zero on match and nonzero otherwise. Summary ------- This program is a command-line utility to catalog and verify torrent files. Run with only the -t option, it displays the metadata, name, and size of each file in the torrent. Run with the -t and -p options, it computes the hashes of all files in the torrent, compares them against the hashes stored in the metadata, and warns of any errors. If torrentcheck returns "torrent is good" at the end of its output, every byte of every file in the torrent is present and correct, to a high degree of certainty as explained below. For example, if you run torrents on a fast external server and then download the files, this utility will verify that the files you received are complete and uncorrupted. It can also be used to verify backups or to automatically check a series of torrents using scripting. The -t parameter should be the path to the .torrent metadata file. The -p path should point to the file or files. It can include or leave out the torrent name. The -n option suppresses the running count, which is useful if you are writing the output to a file. The -h option shows all piece hash values. The -c or -d options produce comma or dot formatted byte counts for readability. The -sha1 option disables torrent checking, and instead acts as a SHA1 filter. Most Windows machines do not have a SHA1 utility, so I included this mode as a convenience feature. It reads in binary data from standard input until end of file, and prints the SHA1 hash. If a SHA1 hash is provided on the command line, it will return 0 if the hashes match or nonzero if they do not. This mode should agree with the output of "openssl dgst -sha1" or "digest -a sha1" Examples -------- torrentcheck -t \torrents\ubuntu-10.10-desktop-i386.iso.torrent torrentcheck -t \torrents\ubuntu-10.10-desktop-i386.iso.torrent -p \download torrentcheck -t \torrents\ubuntu-10.10-desktop-i386.iso.torrent -p \download && echo good torrentcheck -t \torrents\ubuntu-10.10-desktop-i386.iso.torrent -p \download || echo bad torrentcheck -t \torrents\ubuntu-10.10-desktop-i386.iso.torrent -p \download\ubuntu-10.10-desktop-i386.iso torrentcheck -sha1 < \download\ubuntu-10.10-desktop-i386.iso torrentcheck -sha1 b28bbd742aff85d21b9ad96bb45b67c2d133be99 < \download\ubuntu-10.10-desktop-i386.iso && echo good (These are for Windows; use forward slashes in Unix/Linux) Automation and scripting ------------------------ Torrentcheck returns 0 in the Unix $? return code or Windows errorlevel if it successfully verifies a torrent, or nonzero return codes if it fails. If you have your torrents in \torrents and the downloaded files in \share, make a "bad" directory under \torrents, cd to \torrents, and run: (Windows) for %i in (*.torrent) do torrentcheck -t "%i" -p \share || move "%i" bad (Linux) for i in *.torrent; do torrentcheck -t "$i" -p /share || mv "$i" bad ; done This will check all the torrents, and move any that are not fully downloaded and correct into \torrents\bad. Run this command to generate a master list file with the contents of all your torrents. This file can be searched to find a particular file and which torrent it comes from. (Windows) for %i in (*.torrent) do torrentcheck -t "%i" >> masterlist.txt & echo. >> masterlist.txt (Linux) for i in *.torrent; do torrentcheck -t "$i" >> masterlist.txt ; echo >> masterlist.txt ; done Detailed description -------------------- BitTorrent is a file sharing system which uses a metadata file, usually with the .torrent extension, to identify a data file or group of files. Given the metadata file, a BitTorrent client can download and share the data files. It can also verify the integrity of the files. The metadata file uses an encoding scheme called "bencode" which can store integers, strings, lists, and key-value pairs. It can represent binary values without any escaping, so a bencoded string can be loaded into memory and parsed in place, without any decoding. Torrent metadata contains the names and sizes of all the files in the torrent, and also contains a series of SHA1 hashes on each piece of the data file or files. The piece size is specified in the metadata, ranging from 32KiB (32768) to 4MiB (4194304) in a sample of torrents. SHA1 is a complex error-checking code designed by the National Security Agency for the military Defense Messaging System. It inputs an arbitrarily long byte string and outputs a 20-byte check code. If any bit in the input changes, the check code will change. SHA1 is complex enough so that even by deliberate effort it is very difficult to find two strings with the same check code. The chance of this happening by accident is small enough to ignore. To check a single-file torrent, allocate a buffer equal to the "piece size" string in the metadata, open the input file identified by the "name" string or specified on the command line, and read in pieces one at a time. The last piece will likely be short; keep track of the number of bytes actually read. Hash each piece, and compare the hash code against the corresponding hash code in the metadata. Any mismatch is an error. To check a multiple-file torrent, allocate a buffer as above. Read files in order from the "files" list in the metadata and reconstruct the paths, where the "name" string may be the base directory. Read from each file in sequence into the buffer until the buffer is full or the last file has been read, then hash it and check against the list in the metadata. Any mismatch is an error. Hash pieces span multiple files, so a missing or corrupt file can cause the previous or next file to fail as well. In particular, a missing file usually causes the previous and next files to fail verification. This is an artifact of the torrent format, and there is no way to avoid it. Torrents often contain a large media file and a small descriptive text file. If the text file is missing, the media file usually cannot be verified. Torrentcheck also verifies the length of each file, and flags an error if the length is wrong even if the hash codes match. It is designed to handle files over 4GB on a 32-bit machine. The SHA1 implementation used by torrentcheck was written by David Ireland, AM Kuchling, and Peter Gutmann. The source code does not contain a copyright notice, and this file is widely used on the Internet. Compiling --------- There is no makefile. The required gcc lines are at the top of the torrentcheck.c source file. The major catch in compiling is making 64-bit file I/O work. It is tested on Windows, Linux, and Solaris, but you may have to experiment with compiler options to get 64-bit ftell and fseek working.