File Header Size Information Required for Document Files

Click For Summary

Discussion Overview

The discussion revolves around the header size of various document file formats, specifically PDF, MS Word, Libre Office, HTML, and plain text files. Participants explore the challenges of programmatically altering file headers and seek information on the specific byte sizes of these headers.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks guidance on the header sizes of PDF, MS Word, and Libre Office files, noting they are familiar with the BMP header size.
  • Another participant suggests that while Wikipedia has some information, PDF files can be difficult to alter and recommends using libraries for MS Word files stored in Open XML format.
  • A participant questions the existence of a fixed header size for PDF files and inquires about header sizes for HTML and notepad files, noting that HTML is plain text and lacks a traditional header.
  • Some participants mention that different formats have varying header sizes, with some potentially having no header at all.
  • One participant suggests examining the Libre Office source code for insights into how it handles different formats.
  • Another participant points out that file systems maintain file metadata, implying that headers are common across file types.
  • A participant provides a reference manual for PDF version 1.4 as a resource for understanding PDF structure.

Areas of Agreement / Disagreement

Participants express differing views on the existence and significance of header sizes across file formats. There is no consensus on specific header sizes or the relevance of byte count in altering file headers.

Contextual Notes

Participants note that the concept of 'header size' may not apply uniformly across all file formats, and some formats may have variable or no headers. The discussion does not resolve the specific sizes or implications of these headers.

zak100
Messages
462
Reaction score
11
Hi,
I want to change the header of some document files like pdf, ms word, libre office programatically. I know that I have to use some byte type command like putc(..) and getc(). But I don't the header size of the above mentioned file formats.I saw a list of file format at wikipedia https://en.wikipedia.org/wiki/File_format
but I can't see information about the header size. For instance, I know the header size of bmp file=54 bytes.Can some body please guide me any link which tells me this information.

Zulfi.
 
Technology news on Phys.org
Where have you looked already? Wikipedia has some information about all of these formats (e.g. https://en.wikipedia.org/wiki/PDF#File_structure) with links to more detailed specifications. PDF files can be very difficult to alter. MS Word (if stored in the Open XML (docx) format which it shares with Open/Libre Office) is a little easier but as this is a set of XML documents stored in a ZIP archive, you will want to work with libraries that do the heavy lifting with these formats for you rather than work at the byte level.
 
Sorry, I can't find any information about header size in terms of bytes on the link you have provided. Do you have any information about header size related to HTML or notepad files?

Zulfi.
 
zak100 said:
Sorry, I can't find any information about header size in terms of bytes on the link you have provided.
Perhaps that is because PDF files do not have a fixed header size?
zak100 said:
Do you have any information about header size related to HTML or notepad files?
HTML is plain text so it doesn't have a 'header' in the sense you are using this word. Notepad is an editor for plain text files and doesn't have a file format of its own.

'Header size' isn't really a thing; if you want to learn how these files are structured, just read what the documentation says rather than search it for terms that may not be relevant.
 
As others said, different formats have different headers. Some have variable size headers. I suppose there must be some with no header at all.

How would the information on byte count help you? What are you trying to accomplish?
 
I just saw this thread while wandering around the PF site. I think your best option would be to look at the Libre Office source code and see how it handles the various formats. You can start here: https://www.libreoffice.org/about-us/source-code/
 
The reference manual for PDF v 1.4 is:

PDF Reference, third edition, Adobe Portable Document Format, Version 1.4

It is available as a free download from:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf

Size: 8.6MB, 978 pages
(that ought'a keep you busy for awhile)

Cheers,
Tom
 
  • Like
Likes   Reactions: sysprog

Similar threads

  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
2
Views
3K
Replies
2
Views
3K
  • Sticky
  • · Replies 33 ·
2
Replies
33
Views
12K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
23
Views
6K
  • · Replies 1 ·
Replies
1
Views
5K