File Header Size Information Required for Document Files

In summary, the conversation discusses changing the header of document files like PDF, MS Word, and Libre Office programmatically using byte type commands. The header size of these file formats is mentioned, with the size of a BMP file being known to be 54 bytes. The speaker asks for a link or information about the header size of these formats, but is informed that there is no fixed header size for PDF files. The conversation also mentions that HTML and Notepad files do not have headers. The suggestion is made to look at the source code of Libre Office to see how it handles these formats and PDF viewers that can show file structures are mentioned. A reference manual for PDF v 1.4 is also provided.
  • #1
zak100
462
11
Hi,
I want to change the header of some document files like pdf, ms word, libre office programatically. I know that I have to use some byte type command like putc(..) and getc(). But I don't the header size of the above mentioned file formats.I saw a list of file format at wikipedia https://en.wikipedia.org/wiki/File_format
but I can't see information about the header size. For instance, I know the header size of bmp file=54 bytes.Can some body please guide me any link which tells me this information.

Zulfi.
 
Technology news on Phys.org
  • #2
Where have you looked already? Wikipedia has some information about all of these formats (e.g. https://en.wikipedia.org/wiki/PDF#File_structure) with links to more detailed specifications. PDF files can be very difficult to alter. MS Word (if stored in the Open XML (docx) format which it shares with Open/Libre Office) is a little easier but as this is a set of XML documents stored in a ZIP archive, you will want to work with libraries that do the heavy lifting with these formats for you rather than work at the byte level.
 
  • #3
Sorry, I can't find any information about header size in terms of bytes on the link you have provided. Do you have any information about header size related to HTML or notepad files?

Zulfi.
 
  • #4
zak100 said:
Sorry, I can't find any information about header size in terms of bytes on the link you have provided.
Perhaps that is because PDF files do not have a fixed header size?
zak100 said:
Do you have any information about header size related to HTML or notepad files?
HTML is plain text so it doesn't have a 'header' in the sense you are using this word. Notepad is an editor for plain text files and doesn't have a file format of its own.

'Header size' isn't really a thing; if you want to learn how these files are structured, just read what the documentation says rather than search it for terms that may not be relevent.
 
  • #5
As others said, different formats have different headers. Some have variable size headers. I suppose there must be some with no header at all.

How would the information on byte count help you? What are you trying to accomplish?
 
  • #6
I just saw this thread while wandering around the PF site. I think your best option would be to look at the Libre Office source code and see how it handles the various formats. You can start here: https://www.libreoffice.org/about-us/source-code/
 
  • #8
The reference manual for PDF v 1.4 is:

PDF Reference, third edition, Adobe Portable Document Format, Version 1.4

It is available as a free download from:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf

Size: 8.6MB, 978 pages
(that ought'a keep you busy for awhile)

Cheers,
Tom
 
  • Like
Likes sysprog

1. What is the purpose of file header size information in document files?

The file header size information in document files is used to identify the file type, format, and structure of a document. It helps software programs determine how to process and display the file correctly.

2. What are the common file header sizes for document files?

The most common file header sizes for document files are 8 bytes, 16 bytes, and 32 bytes. However, this can vary depending on the file format and the software used to create the file.

3. Why is it important to include file header size information in document files?

Including file header size information in document files ensures compatibility and consistency between different software programs. This allows the files to be opened and read correctly by various programs, regardless of their specific formatting requirements.

4. How can I find the file header size of a document file?

The file header size of a document file can usually be found by opening the file in a text editor and looking at the first few bytes of the file. Alternatively, you can check the file's properties or use a file analysis tool to determine the header size.

5. Is file header size information required for all types of document files?

While file header size information is necessary for most document files, it may not be required for all file types. Some simple text files, for example, may not have a designated header size. It is best to check the specifications of the file format to determine if header size information is required.

Similar threads

  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
4
Views
5K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
4
Views
2K
  • Sticky
  • Science and Math Textbooks
Replies
27
Views
3K
Replies
2
Views
3K
Replies
2
Views
2K
  • Computing and Technology
Replies
4
Views
3K
Replies
23
Views
5K
  • Computing and Technology
Replies
2
Views
4K
Back
Top