How can we efficiently manage memory and disk usage in a BitTorrent client?

Yoss · Apr 10, 2006

My friend and I are developing a BitTorrent Client for our CS class, but we've run into a dilemma. If anyone can point us in the right direction we'll be thankful.

A little background info if you're unfamiliar with BT. The file that is to be downloaded from multiple peers is broken up into file_length/32K (+ 1) pieces (conventionally.) And typically BT clients do not choose the pieces in order to download (they use the rarest first algorithm.) That ought to be enough!

Anyway, the scheme we developed was to create a 2D byte array of size [num_pieces][piece_length]. So once we've successfully downloaded a piece from a Peer and checked the infohashes and such, we dump the piece into its position in the byte array. We realized the problem of seriously overloading memory because the array in the data part of memory. Obviously this is a terrible solution..imagine downloading a 2Gb file and it all being stored in system memory (or VM) until it is finished.

We developed a partial solution. Once the file is completed, we dump all the pieces to the output file. To note, we create a temporary .dat file when the user prematurely quits before the file is complete, we save meta info (e.g. amount downloaded, uploaded, etc) and the 2D byte array with the downloaded pieces. When the program is loaded again it loads the data through a hashmap, etc. and the byte array is loaded into memory ONLY if the file is not complete. This is what we still have to fix. When the file is complete, this condition is checked when the user quits and the byte array of pieces is dumped to the output file, but not the .dat file. What we came up with, when the program is loaded again and we are uploading to peers, when the peer requests a piece, we just scan the save file and extract the bytes for the piece from there. This works like a charm because we don't have to load all the pieces into memory, at most , 32KB*(num peers uploading to). Much better solution.

When the file is not done, and we are uploading to peers the pieces we have, we need to develop a way to get the pieces off of the disk. But we don't necessarily download the pieces in order, and we might not get all of the pieces with the current peers. If anyone can give us suggestions of how to dump a piece to disk once we get it, and be able to extract it upon request, given that the pieces do not come in order. And how to assemble the pieces in order when we dump it to the final file.

Sorry for the longwindedness! Thanks

dduardo · Apr 10, 2006

What all bittorrent clients I've seen do is pre-allocate the file on disk and then write the data to file when the chunk is completed. You can use fseek to offset the file pointer and write the chunk where you need to.

Also, I don't know if you know but there is a bittorrent library that you could have used:

http://libtorrent.rakshasa.no/

chroot · Apr 10, 2006

Do you not know how to use the seek() function?

- Warren

Yoss · Apr 10, 2006

dduardo said:

What all bittorrent clients I've seen do is pre-allocate the file on disk and then write the data to file when the chunk is completed. You can use fseek to offset the file pointer and write the chunk where you need to.

Also, I don't know if you know but there is a bittorrent library that you could have used:

http://libtorrent.rakshasa.no/

Thanks dduardo. I bet that would have been the solution we would come to . We are using the seek() function to extract the pieces from the 100% file, I just didn't think of pre-allocation, duh! I was thinking a solution of keeping track of the order of downloaded pieces, writing the pieces in the order downloaded, and rearranging them at the end. Your solution is definitely more natural (and easier.)

chroot, do I dare say that your reply approaching scorn? Funny feeling, is all.

How can we efficiently manage memory and disk usage in a BitTorrent client?

Thread 'Learning Assembly and computer architecture for x86'

Thread 'Learning data structures and algorithms in different programming languages'

Thread 'A Crisis for Newly Minted CompSci Majors -- entry level jobs gone'

Similar threads

Hot Threads

Hackathon ideas?

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective