How Can I Efficiently Build a Huffman Tree for Huffman Encoding?

ktoz · Dec 16, 2007

Hi

I'm playing around with Huffman encoding and am having a major problem getting my head around how to efficiently build the tree. I Googled "huffman coding" and have a basic grasp of what needs to be done but the actual process of walking through a list of sorted values has turned into a big headache.

Assuming I start with a list of factored values and their rank, in either descending (or ascending) order, the algorithm I came up with takes almost 3 minutes on a dual core 2.16 ghz Macintosh. This is clearly ridiculous as it takes roughly half a second for the rest of my code to read the file and create the factored list of values and their frequencies.

I tried writing to the model http://en.wikipedia.org/wiki/Image:Huffman_tree_2.svg" but due to the large number of initial factors (3000+) I ran into a problem where the sum of the ranks grew so large over subsequent iterations that it didn't even fit in an unsigned long long value without truncation.

Here's a snippet showing a section of the sorted, pre-Huffman data I'm starting with

Code:

87 = ({rank = 87; word = hate; }, {rank = 87; word = GONZALEZ; }); 
88 = ({rank = 88; word = one; }); 
89 = ({rank = 89; word = mainstream; }, {rank = 89; word = "don't"; }); 
90 = (
	{rank = 90; word = like; }, 
	{rank = 90; word = "it's"; }, 
	{rank = 90; word = his; }
); 
91 = ({rank = 91; word = he; }, {rank = 91; word = all; }); 
92 = ({rank = 92; word = legislation; }); 
93 = (
	{rank = 93; word = by; }, 
	{rank = 93; word = more; }, 
	{rank = 93; word = had; }, 
	{rank = 93; word = report; }, 
	{rank = 93; word = at; }
); 
94 = ({rank = 94; word = being; }); 
95 = ({rank = 95; word = them; }); 
96 = ({rank = 96; word = right; }, {rank = 96; word = crimes; });

What this shows is an associative array sorted and grouped by rank of each word in the source file. For example, under the key "93" we see that the five words "by," "more," "had," "report," and "at" all have the same frequency in the document

From this, could someone help me out with a step by step on how to create the huffman tree?

Thanks for any help

Ken

ktoz · Dec 16, 2007

Never mind. I figured it out

Turns out I wasn't properly grouping values after adding their ranks. I was just changing their rank without combining symbol pairs into a new node.

Got the total encoding time down to .5 seconds for a 90K file. 5 seconds for a 10 MB file etc... Much better.

How Can I Efficiently Build a Huffman Tree for Huffman Encoding?

Thread 'For those who ask: "What programming language should I learn?"'

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

Who is responsible for the software when AI takes over programming?

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers