How Can I Efficiently Build a Huffman Tree for Huffman Encoding?

  • Thread starter Thread starter ktoz
  • Start date Start date
  • Tags Tags
    Building Tree
Click For Summary
SUMMARY

The discussion focuses on efficiently building a Huffman Tree for Huffman encoding. The original algorithm took nearly 3 minutes on a dual-core 2.16 GHz Macintosh due to improper grouping of values after rank adjustments. After correcting the approach to combine symbol pairs into new nodes, the encoding time improved significantly to 0.5 seconds for a 90K file and 5 seconds for a 10 MB file. This highlights the importance of correctly implementing the tree-building process to optimize performance.

PREREQUISITES
  • Understanding of Huffman encoding principles
  • Familiarity with associative arrays and data structures
  • Basic programming skills in a language suitable for implementing algorithms
  • Knowledge of algorithmic efficiency and performance optimization techniques
NEXT STEPS
  • Research efficient data structures for implementing Huffman Trees
  • Learn about priority queues and their role in Huffman coding
  • Explore algorithm optimization techniques for large datasets
  • Study the implementation of Huffman encoding in programming languages like Python or C++
USEFUL FOR

Software developers, algorithm enthusiasts, and anyone interested in data compression techniques will benefit from this discussion, particularly those working with Huffman encoding and tree data structures.

ktoz
Messages
170
Reaction score
12
Hi

I'm playing around with Huffman encoding and am having a major problem getting my head around how to efficiently build the tree. I Googled "huffman coding" and have a basic grasp of what needs to be done but the actual process of walking through a list of sorted values has turned into a big headache.

Assuming I start with a list of factored values and their rank, in either descending (or ascending) order, the algorithm I came up with takes almost 3 minutes on a dual core 2.16 ghz Macintosh. This is clearly ridiculous as it takes roughly half a second for the rest of my code to read the file and create the factored list of values and their frequencies.

I tried writing to the model http://en.wikipedia.org/wiki/Image:Huffman_tree_2.svg" but due to the large number of initial factors (3000+) I ran into a problem where the sum of the ranks grew so large over subsequent iterations that it didn't even fit in an unsigned long long value without truncation.

Here's a snippet showing a section of the sorted, pre-Huffman data I'm starting with

Code:
87 = ({rank = 87; word = hate; }, {rank = 87; word = GONZALEZ; }); 
88 = ({rank = 88; word = one; }); 
89 = ({rank = 89; word = mainstream; }, {rank = 89; word = "don't"; }); 
90 = (
	{rank = 90; word = like; }, 
	{rank = 90; word = "it's"; }, 
	{rank = 90; word = his; }
); 
91 = ({rank = 91; word = he; }, {rank = 91; word = all; }); 
92 = ({rank = 92; word = legislation; }); 
93 = (
	{rank = 93; word = by; }, 
	{rank = 93; word = more; }, 
	{rank = 93; word = had; }, 
	{rank = 93; word = report; }, 
	{rank = 93; word = at; }
); 
94 = ({rank = 94; word = being; }); 
95 = ({rank = 95; word = them; }); 
96 = ({rank = 96; word = right; }, {rank = 96; word = crimes; });

What this shows is an associative array sorted and grouped by rank of each word in the source file. For example, under the key "93" we see that the five words "by," "more," "had," "report," and "at" all have the same frequency in the document

From this, could someone help me out with a step by step on how to create the huffman tree?

Thanks for any help

Ken
 
Last edited by a moderator:
Technology news on Phys.org
Never mind. I figured it out

Turns out I wasn't properly grouping values after adding their ranks. I was just changing their rank without combining symbol pairs into a new node.

Got the total encoding time down to .5 seconds for a 90K file. 5 seconds for a 10 MB file etc... Much better.
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 18 ·
Replies
18
Views
15K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
6K