Basics
File identification
All files are given a hash value. This hash is a combination
of numbers and letters to uniquely identify the file. Numerous
filenames may be associated with a file, but this does not
change anything about file’s hash value. This allows
each user to find all sources to a particular file no matter
what file name each user has given the file.
In addition, the files are broken into 9.28 MB of parts of
data. Each part is also given a hash value. For example a
600 MB file would contain 65 parts. Each part is then given
a hash value. Then the file hash is created from these part
hashes to be used in the networks.
Identifying other clients
Like the file hash, each user in the network gets a unique
and permanent user hash. This user identification is highly
secured by a public / private key handshake to prevent misuse.
Downloading Data
It is important to understand that the actual downloading
in eMule is not affected by the choice of the network. The
network topology is only related to searching for files and
finding clients that are sources to a file.
Once a source has been found, your client contacts it. The
source then reserves a queue place for that specific download.
When you reach the first queue place after a certain waiting
time you are entitled for receiving data.
Classic server based eD2k
Connecting to the network
The key to this network is the eD2k server. Each client must
to be connected to a server to enter the network.
When connecting your client to a server, the server checks
to see if other clients can freely connect to your client.
If yes, the server assigns your client a so-called high ID.
If communication is blocked, the server assigns your client
a low ID.
After the ID is assigned, eMule will send a list of all shared
files to the server. The server adds the filenames and hash
values you sent to its database.
Searching for files
Once connected to the network, the client can search for keywords
in filenames. A search can either be local or global. If it’s
a local search (searches only the server you are connected
to), searches are quicker but will have fewer results. If
the search is a global search (searches all the servers within
the network), it will take longer but have more results. Each
server looks up the keyword in its local database and returns
any file names (with the hash value) that matches the keyword.
Finding sources for files
Downloads can be added by eMule’s search function or
a special eD2k link format offered on many websites.
Once they are in the Download list, eMule first queries the
local (connected) server then all other servers in the network
for sources to that particular download. The server looks
up the file’s hash value in its database and returns
the clients it knows for having it.
Sources are other clients who have at least downloaded one
entire part (9.28 MB) of the file matching the hash.
Kademlia serverless network
Connecting to the network
The only thing needed to connect to this network is the IP
and port of any eMule client already connected. This is called
a Boot Strap.
Once a client is in the network, the client then requests for other clients
to determine if it can be contacted freely. This process is very similar to
the HighID/LowID check on the servers. If you can be freely contacted, you are
assigned an ID (similar to a HighID) and given an open status. If you
are not freely contacted, you are given a firewalled status. From version
v.44a on, the Kademlia network supports a Buddy for firewalled users.
Buddies are other Kademlia clients who have status open and work as
a relay for connections, that the firewalled user cannot manage.
Searching in Kademlia
In this network it does not matter what you search for. Be
it a search for filenames, for sources of a download or for
other users, all work pretty much the same.
There are no servers to keep track of clients and the files
they share so it has to be done by each participating client
in the network – in essence, every client is also a
small server.
Since every client is identified by a unique hash value, the
idea of Kademlia is to associate a certain “responsibility”
based on this hash. Each client in the Kademlia network works
as a server for certain keywords or sources. The client’s
hash determines the specific keywords or sources.
So the goal of any kind of search is to find those clients
that have the responsibility for the current search topic.
This is accomplished by a complex calculation of the possible
distance to the target client by asking other clients for
the shortest route to it.
Summary
Both networks have a totally different concept of achieving
the same: Searching for files and finding sources to a file.
The main goal of the Kademlia network is to be independent
of servers and improve scalability. Servers can only handle
a certain amount of users and should a large server go down
the network is severely handicapped.
Kademlia is self-organising and tunes itself for best possible
performance depending on the number of users and their connection
qualities. Therefore, it is more resistance to a large-scale
network loss.