eMule uses various means to make sure that files in the network are
shared and downloaded without errors. Should an error occur, called
corruption, eMule has advanced functions to correct this corruption
with a minimum of additional data to redownload.
File Hash and ICH - Intelligent Corruption Handling
File Hash, Part Hashes & Hashset
For every file shared in the network an unique identification value is
created using the mathematical crypto algorithm MD4. This value is
called file hash and is contained in every standard eD2k link, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|/
where 6744FC42EDA527B27F0B2F2538728B3E is the file hash making this
file uniquely identified all over the network.
This File Hash is calculated by dividing the entire file into
parts of 9.28 MB. For each of the parts a Part Hash is calculated using
the same MD4 algorithm. These Part Hashes, called
Hashset, are then used to calculate the final File Hash. For
example a 600 MB file would be divided into 65 parts each with its own
Part Hash which are then used to create the final File
Hash.
To make sure that eMule always receives the correct Hashset a special
link can be created containing this, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|p=264E6F6B587985D87EB0157A2A7BAF40:17B9A4D1DCE0E4C2B672DF257145E98A|/
where the p= value denotes the Hashset. Each Part
Hash is divided by a ":". This file has a size of 12043984 Bytes
(=11.49 MB) which means it has one full 9.28 part and the rest left to
11.49 MB resulting in two Part Hashes.
ICH Intelligent Corruption Handling
Whenever eMule finishes such a part it will be checked if the
downloaded data matches the Part Hash value for this finished part. If
yes, this part is offered for upload to help spreading it.
If no, a corruption has occurred and the part has to be downloaded
again. To avoid downloading the full 9.28 MB, ICH redownloads 180 KB
from the beginning of the part and then check the whole part again to
see if the Part Hash is now correct. If not, the next 180 KB are
downloaded, check again etc. until the part hash is correct again. In
the best case eMule has to download only 180 KB again if the corruption
is right at the beginning of the part. In the worst case the entire
part has to be downloaded again if the corruption is somewhere near the
end of the part. On average ICH saves 50% of redownloading in case of
part corruptions.
AICH - Advanced Intelligent Corruption Handling
The standard ICH is pretty effective although it has its limitations as
only the whole 9.28 MB can be verified and no finer blocks. If more
than one position is corrupted or if malicious clients spread corrupted
data over and over again or even fake an entire Part Hash, ICH
is no long effective.
Here AICH will care for complete data integrity with a minimum cost of
redownloading or overhead by creating much finer hashes.
Root Hash, Block Hashes & AICH Hashset
This time our starting point are the 9.28 MB parts in a file. Each part
is divided into 180 KB blocks, resulting in 53 blocks per part and for
each block a hash value is calculated using the SHA1 hash algorithm.
These values are called Block Hashes and form the lowest level
of a complete AICH Hashset.
The image above shows how such a full hash tree is build upon the
blocks of a full 4 parts file. Each part contains 53 blocks resulting
in 212 Block Hashes which builds up a hash tree of further 7
Levels until the Root Hash is reached. The entire tree is
called AICH Hashset.
The green and yellow dots show the mathematical dependencies of the
smallest Block Hash to the Root Hash. This means if
we have a trusted Root Hash the entire tree can be verified against
it.
eMule can create links containing the Root Hash, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|h=A2NWOTYURUU3P3GCUB6KCNW3FTYYELQB|/
where h= is the Root Hash. For releases this value should be
included as it significantly improves the corruption resistance of the
file by providing a trusted Root Hash. Read Trusting the
Root Hash
Recovering from a corruption
Whenever eMule detects a corruption in a part it requests a Recovery
Packet from a random client with a complete AICH Hash Set. This
Recovery Packet contains all 53 Block Hashes of the corrupted
part and a number of Verifying Hashes of the entire hash tree.
The image above shows a Recovery Packet for a 4 parts file. The number
of Verifying Hashes is determined by the part count of the
file (2^x >= 'part count', with x = Number of Verifying Hashes).
After receiving the Recovery Packet eMule checks the Verifying
Hashes against its trusted Root Hash. If they match, eMule checks
all 53 blocks of the corrupted part against the Block Hashes
from the Recovery Packet. AICH then restores all blocks which match
against their Block Hash to leave only the corrupted block(s)
for redownload.
In the Log a successful recovery will look like:
09.09.2004 02:43:43: Downloaded part 6 is corrupt ([file])
09.09.2004 02:43:46: AICH successfully recovered 8.22 MB of 9.28 MB
from part 6 for [file]
Trusting the Root Hash
The best thing is downloading from a link with a Root Hash.
Assuming that the source of this link is trustworthy the Root Hash is
trusted at once and saved to disk for this file.
If no Root Hash is provided in the link eMule has to trust the Root
Hash, which the sources for the file send. It only trusts a Root
Hash if at least 10 different sources send the same value and if
at least 92% of all sources agree to this value. Because this Root
Hash is not as trustworthy it is only valid for the current
session and does not get saved nor can links with Root Hash be
created.
Once eMule has built an entire AICH Hashset, i.e. the file is
finished, it starts propagating the Root Hash to other
clients.
Notes:
- New releases or rare files will probably not have enough full
sources to generate a trusted Root Hash. It is recommended to
Release files with attached hash.
- If there is no Root Hash or even a faked one eMule will be
able to successfully download and finish the file under normal
conditions. The AICH feature cannot be used in this case.
- As AICH Hashsets can be very large they are not stored in memory
but in the known2.met and are only read upon demand.
- AICH will only be effective for eMule clients v.44a and above but
retains backward compatibility to older clients.
Applies to: v.44a+
Update on: 2004-09-11 by Monk
|