Huffman file Decompression algorithm

In summary, the article explains how the Huffman code algorithm works and provides an example of a table that uses 3 bit codes to represent 3 bit codewords. The article also explains how to extract bit fields from memory and how to use a table to decode a Huffman code.
  • #1
DODGEVIPER13
672
0
I am going to be compleley honest, I don't know anything about Huffman coding. Except that I think I understand the Huffman tree algorithm. Can anyone assist me in getting started the problems states:

The preceding exercise compresses a file to generate
two files filename.huf and filename.new. Write a program that decompresses
the file. The program prompts the user to enter the file name and
decompresses it into a file named filename

Sorry I don't have any code but the copy and paste feature always jumbles up the format and I don't wish to manuallly enter it. If you have the book it is Introduction To Java Programming 8th ed written by Daniel Liang
 
Technology news on Phys.org
  • #2
DODGEVIPER13 said:
...the copy and paste feature always jumbles up the format...
Put your code inside [noparse]
Code:
...
[/noparse] tags.
 
  • #3
You have to know the huffman coding scheme. It may be a fixed scheme, or it could be defined by the initial data in the file. Generally there's some minimal pattern of leading bits that determines the actual length of a "codeword".

In this wiki article

http://en.wikipedia.org/wiki/Huffman_coding

The example table on the right side of the page uses three 3 bit values, 000, 010, 111, to represent 3 bit codewords. If the first 3 bits are 001, 011, 100, 101, or 101, it's a longer code word, and you need to include a 4th bit and use the leading 4 bits to determine if a code is 4 bits or 5 bits.

That example table could have used huffman codes that were ordered numerically: 3 bit codes: {000, 001, 010}, 4 bit codes {0110, 0111, 1000, 1001, 1010, 1011, 1100}, 5 bit codes: {11010, 11011, 11100, 11101, 11110, 11111}, but this won't have much effect on the software used to encode or decode a huffman string of data.
 
  • #4
Hey DODGEVIPER13 and welcome to the forums.

One of the most important elements of coding mechanisms like the Huffman codes is that the codes themselves are non-ambiguous.

In other words the codes are uniquely decodable in the way that there is a bijection between the compressed and uncompressed sources.

By creating codes that have this property and also in the way that you get an optimal code alphabet that corresponds to probability information for your source file, then the result is basically the huffman coding system.

Once this is understood, it will be a lot easier to understand the algorithm and its implementation.
 
  • #5
thanks guys I am sure if I knew what I was doing I could understand what yah gave meh but I don't so I really can't do anything but that's ok I need a tutor.
 
  • #6
The first step you need to do is to be able to extract bit fields from memory. You'll have to keep track of the bit and byte offset as you "read" huffman codes. As an example, pretend that all codes are 3 bits long, using a group of letters to represent the codes it would look like this in memory (spaces used to separate the bytes)

aaabbbcc cdddeeef ffggghhh jjjkkkll lmmmnnno oopppqq ...

so numbering bits and bytes left to right, aaa starts at byte 0, bit 0, bbb starts at byte 0, bit 3, ccc starts at byte 0, bit 6, and ends at byte 1 bit 0. ddd starts at byte 1, bit 1, ...

Using the wiki example, the longest field is 5 bits. What you could do is get 5 bits from the current byte and bit postion, then use that 5 bit value to index a 32 entry table that indicates the actual number of bits used for that code, based on the upper bits of the code, and the character that the code represents. You would then update the byte and bit offset pointers by the actual number of bits in the code you just retrieved, and then continue the process.
 

What is the Huffman file decompression algorithm?

The Huffman file decompression algorithm is a method used to compress and decompress data in computer files. It uses a variable-length coding scheme to reduce the size of a file by assigning shorter codes to frequently occurring characters and longer codes to less frequently occurring characters.

How does the Huffman file decompression algorithm work?

The Huffman file decompression algorithm works by first analyzing the input file to determine the frequency of each character. Then, it uses this information to create a Huffman tree, a binary tree where the most frequently occurring characters have the shortest code paths. Finally, the algorithm traverses the Huffman tree and replaces each character with its corresponding code, resulting in a compressed file.

What are the advantages of using the Huffman file decompression algorithm?

The Huffman file decompression algorithm offers several advantages, including efficient compression of data, fast decompression speeds, and the ability to preserve the original data without any loss. It can also be used for different types of data, including text, images, and audio files.

What types of data are best suited for the Huffman file decompression algorithm?

The Huffman file decompression algorithm is most effective for data that contains a limited number of characters with varying frequencies. This includes text-based data, such as documents or source code, as well as images and audio files with a limited color or sound palette.

Are there any limitations to the Huffman file decompression algorithm?

Although the Huffman file decompression algorithm is widely used and efficient, it does have some limitations. It may not be suitable for compressing data with a large number of unique characters or data that is already compressed, as the compression may be minimal or even increase the file size. Additionally, the algorithm may not be as effective for data with a uniform distribution of characters.

Similar threads

  • Programming and Computer Science
Replies
30
Views
4K
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
4
Views
5K
  • Programming and Computer Science
Replies
4
Views
8K
  • Programming and Computer Science
Replies
12
Views
14K
Replies
14
Views
2K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
4
Views
11K
  • Programming and Computer Science
Replies
22
Views
5K
  • Programming and Computer Science
Replies
10
Views
25K
Back
Top