I have been studying binary files, my main text book is not very clear, the other 3 books are useless. I went on line and read a lot of articles. I want to run it by you guys whether I am correct or not:

1) Binary files are written in binary or Hex( same to me that read 4 binary as each hex). Text files are written in ASCII( still binary in the sense the number means characters). They are of different use, some are better in some cases and not as good in others. Bottom line, text files and binary files are like English and Chinese writings.......They are just different. Just that simple.

Yes, I understand there are a lot of details like .exe, .jpg, .png etc are all in binary etc. etc. Those are just details.

2) fstream has different function objects to work on text files and binary files. You have to specify binary(ios::binary) or else it is treated as text files. Then you have function objects to translate back and fore (reinterpret_cast). You literally treat text and binary files as different as apple and orange.

Let me know whether I am correct. I have some more questions later.

Thanks

Related Programming and Computer Science News on Phys.org
pbuk
Gold Member
That's close enough, except:

Bottom line, text files and binary files are like English and Chinese writings.......They are just different. Just that simple.
No, you can store both English and Chinese text in a text file using the right encoding (not ASCII). A better analogy would be to say that you could store the words to a song in a text file but a recording of the song or picture of the album cover would be stored in a binary file.

Edit: as @PeterDonis points out below, 'text' and 'binary' are just convenient labels we notionally attach to files according to how we intend to use them, there is no difference in the files themselves, they are still just a sequence of 1's and 0's.

You literally treat text and binary files as different as apple and orange.
Again not a good analogy, you cant cast an apple to an orange.

Last edited:
yungman
PeterDonis
Mentor
2019 Award
Binary files are written in binary or Hex( same to me that read 4 binary as each hex). Text files are written in ASCII( still binary in the sense the number means characters).
When you say "written in binary or Hex", that's wrong; files aren't "written" in anything. Files are just bytes. "Binary" and "Hex" as you're using those terms in that phrase are notations for describing bytes. They're things that humans write (or have their computer programs display) when they want to describe bytes. There is also a "text" notation for bytes--the byte that in Hex is described as 0x41 can also be described in "text" notation as "A", for example. These are just two different notations for the same byte.

The distinction between "binary" and "text" files (as opposed to notations for bytes) is a distinction that programs, and sometimes also operating systems, make in how they interpret the bytes in the file. "Binary" in this context just means "not text"--it covers all of the interpretations that aren't "text", and as you note, there are a lot of them, so "binary" is not just one thing as far as how programs interpret bytes is concerned.

"Text" actually isn't just one thing either, because there is ASCII text and also Unicode text, and Unicode has multiple encodings that map Unicode code points to bytes. All this, as noted, is part of how programs interpret the bytes in a file. ("ASCII" can actually be thought of as just another encoding, one in which only the bytes in the range 0x00 to 0x7F are valid.)

phinds, yungman and pbuk

Now I have this program that confuses me. The program first write a text line "This is a test, another text, more test." into test.dat in BINARY file mode. Then the program read the file back in BINARY mode and display to content. Then the last part of the program read the file back in TEXT mode as shown in the comment. They both display the EXACT sentence "This is a test, another text, more test.".

C++:
//12.13 binary file read write
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
//First create and wrtie to file test.dat.
const int size = 81;
char data[size] = "This is a test, another test, more test";
fstream file;
file.open("test.dat", ios::out | ios::binary);//Open in binary mode
cout << " Write the characters to file.\n\n";
file.write(data, sizeof(data));
file.close();

cout << " Read back in binary file mode = {";
file.open("test.dat", ios::in | ios::binary);//Open in binary mode
for (int count = 0; count < size; count++)
cout << data[count] ;
cout << "}\n\n";//Notice } is so far from the end of the sentence?
file.close();

cout << " Now file read back in text mode : ";
string str = { '\0' }, st1 = { '\0' };
file.open("test.dat", ios::in);//Open in text mode.
while (!file.eof())
{
getline(file, st1);
str.append(st1);
}
cout << "  {" << str << "}\n\n";//Notice } is so far from the end of the sentence?
file.close();
return 0;
}
The file is written in binary, why I read back the exact thing in TEXT mode? I can change the sentence, result is the same.

Also, in both case, I try to put "}" at the end of the sentence, but it's so far from the last word of the sentence. The code suppose to stop reading when reach eof in one case or the sizeof(file). Why there are so much space after the last word?

Thanks

EDIT:

I just experiment by changing the program to create and write the same sentence in text.txt. Then read back in both BINARY and TEXT mode, both display the EXACT sentence.

Last edited:
Mark44
Mentor
Your program is not a good example of the difference between a text file and a binary file.
Try writing this data to your binary file, and then opening it as a text file and displaying the result.
C++:
unsigned data[] = {0x4481, 0x0000, 0x2031, 0xFF13, 0x2931};
The results will not be the same.
One way to think about text files is that they contain a sequence of lines, separated by newline characters. There are no such separators in a binary file.

To make sense of a binary file, you have to know how the data is organized in it. For example, a .jpg image file is a binary file, with different parts that specify such things as how many bytes make up the image data, where the image data starts, the file format, how many colors are used, how many bytes are used per pixel, how many bits are used for red, green, blue and alpha (if used), and lots more stuff. If you open such a file as a text file and display it as characters, you'll get a lot of garbage.

yungman
pbuk
Gold Member

Now I have this program that confuses me. The program first write a text line "This is a test, another text, more test." into test.dat in BINARY file mode. Then the program read the file back in BINARY mode and display to content. Then the last part of the program read the file back in TEXT mode as shown in the comment. They both display the EXACT sentence "This is a test, another text, more test.".

...

The file is written in binary, why I read back the exact thing in TEXT mode? I can change the sentence, result is the same.

...

I just experiment by changing the program to create and write the same sentence in text.txt. Then read back in both BINARY and TEXT mode, both display the EXACT sentence.
As @PeterDonis points out below, 'text' and 'binary' are just convenient labels we notionally attach to files according to how we intend to use them, there is no difference in the files themselves, they are still just a sequence of 1's and 0's.
As to the second question:
Also, in both case, I try to put "}" at the end of the sentence, but it's so far from the last word of the sentence. The code suppose to stop reading when reach eof in one case or the sizeof(file). Why there are so much space after the last word?
I could tell you why, but I think you should work it out for yourself. Clue: if you want to understand why something is being written to a file then you need to look where you write something to the file.

yungman
Your program is not a good example of the difference between a text file and a binary file.
Try writing this data to your binary file, and then opening it as a text file and displaying the result.
C++:
unsigned [B]data[] = {0x4481, 0x0000, 0x2031, 0xFF13, 0x2931};
[/B]
The results will not be the same.
One way to think about text files is that they contain a sequence of lines, separated by newline characters. There are no such separators in a binary file.

To make sense of a binary file, you have to know how the data is organized in it. For example, a .jpg image file is a binary file, with different parts that specify such things as how many bytes make up the image data, where the image data starts, the file format, how many colors are used, how many bytes are used per pixel, how many bits are used for red, green, blue and alpha (if used), and lots more stuff. If you open such a file as a text file and display it as characters, you'll get a lot of garbage.
I put in your suggestion changing the data[], this is the result in reading as binary and text resp:
Read back in binary file mode = {ü 11 }

Now file read back in text mode : { ü 11 }

They look the same.

I tried so hard to look for more information on this, it's just so hard for find it on line, 4 text books all don't talk much about this. I don't even know how to read the content of the file to show what is really being written in. I know the binary file is important, I just wish books will elaborate this more.

PeterDonis
Mentor
2019 Award
I don't even know how to read the content of the file to show what is really being written in.
That's easy: read the file in binary mode and look at each byte, in whatever notation you're most comfortable with. The file contains bytes, and reading it in binary mode just gives you those bytes. Then you just check that those bytes are the ones you wanted to write to the file. Which in turn means that, if your next question is "how do I know what bytes I wanted to write to the file?", then you need to fix that problem first--which means you need to understand what you are writing to the file.

You seem to be confused about what writing a file in text mode means. It doesn't mean you are magically writing something to the file besides bytes. It just means your program is taking the text you tell it to write to the file and encoding it to bytes, using some encoding. So to know what bytes should be in the file, you need to know what encoding is being used.

You also seem to be confused about what reading a file in text mode means. It means that you are assuming that the bytes in the file are to be interpreted as text--again, using some encoding, which means to understand how the bytes in the file get translated into text in your program, you need to know what encoding is being used. If you're going to print the bytes to the screen, reading them from the file in text mode and then printing them might be exactly equivalent to reading the file in binary mode and then printing the bytes you get as if they were text--which is basically what your program is doing. In either case your program is going to be using an encoding for the bytes to text translation--the only difference is whether the encoding is being used when the file is read from disk (if you open it in text mode) or when your program tries to print the data to the screen (if you open the file in binary mode and then try to print the bytes as text).

jtbell
Mentor
I don't even know how to read the content of the file to show what is really being written in
Surely there is some Windows app or utility that can display the contents of a file (binary or not) in hexadecimal. Under MacOS I can open a Terminal window and use the Unix command-line utility 'hexdump'.

yungman
That's easy: read the file in binary mode and look at each byte, in whatever notation you're most comfortable with. The file contains bytes, and reading it in binary mode just gives you those bytes. Then you just check that those bytes are the ones you wanted to write to the file. Which in turn means that, if your next question is "how do I know what bytes I wanted to write to the file?", then you need to fix that problem first--which means you need to understand what you are writing to the file.
You can see in my first post, the program already written the file in binary and I read back in both binary and text mode, they look the same. Also in post #4, I already tryed writing in .txt file and read back in binary. It read out EXACTLY the same. Do you have a way to read the content without encoding at all? That's my question was.

You seem to be confused about what writing a file in text mode means. It doesn't mean you are magically writing something to the file besides bytes. It just means your program is taking the text you tell it to write to the file and encoding it to bytes, using some encoding. So to know what bytes should be in the file, you need to know what encoding is being used.

You also seem to be confused about what reading a file in text mode means. It means that you are assuming that the bytes in the file are to be interpreted as text--again, using some encoding, which means to understand how the bytes in the file get translated into text in your program, you need to know what encoding is being used. If you're going to print the bytes to the screen, reading them from the file in text mode and then printing them might be exactly equivalent to reading the file in binary mode and then printing the bytes you get as if they were text--which is basically what your program is doing. In either case your program is going to be using an encoding for the bytes to text translation--the only difference is whether the encoding is being used when the file is read from disk (if you open it in text mode) or when your program tries to print the data to the screen (if you open the file in binary mode and then try to print the bytes as text).
I am not confused as I already said in the first post, I even said ASCII is still binary in the sense the number means characters. I might not use the word "encode", but it's each char is a unique binary number.

I am too familiar with bytes binary and Hex, I grew up with those in my days when everything we did in hardware, assemble language and machine language are BINARY or HEX. Of cause it IS given they are encoded bytes. .txt file is encoded ( as you call it) in either ASCII or something, binary is different. My first post was just to verify they are just different format ( or encode as you use), that you can do it either way ( of cause there are preference and all that).

In fact, I am more comfortable working with bytes, binary and Hex. Every bit means something. Like 0AH means D3=1 and D1=1 and all others are 0. I have problem working with decimals as that's NOT how the computer works, they work in binary( say Hex).

I understand you can make up your own encoding method and still is in form of bytes, binary and all that. That's a given. Whether your encoding method is useful mainly depends on whether other people are willing to follow your method and use it. If everyone like it and use it, your encoding can be as popular, but if no one use it, then it's useless. It's the format, they are all bytes, binary (not what they called binary file, but JUST SIMPLE BINARY NUMBERS).

Surely there is some Windows app or utility that can display the contents of a file (binary or not) in hexadecimal. Under MacOS I can open a Terminal window and use the Unix command-line utility 'hexdump'.
That's really my question. I have to look it up. I thought there must be a way just as simple as declaring binary or text to print it out.

PeterDonis
Mentor
2019 Award
I am not confused
You might not think you are, but you keep saying things that indicate confusion. For example:

Of cause it IS given they are encoded bytes. .txt file is encoded ( as you call it) in either ASCII or something, binary is different
No, you have it backwards. "Binary" and "text" files are not different; they're both just bytes. The "binary" or "text", and the "encoding" for "text", is how a program interprets the bytes in the file. It's not a property of the file. It's a property of the program that reads or writes the file. When we say a file is "binary" or "text", what we really mean is that the program we intend to use for reading/writing the file will interpret it as binary (one of the many interpretations that falls into that category) or text (and then, as I said, we have to know what text encoding the program will use).

PeterDonis
Mentor
2019 Award
That's really my question.
The fact that it took you this many posts to ask it also indicates confusion. You could have just asked "does anyone know of a Windows app that will display files in binary/hex?" in the OP of this thread and saved everyone a lot of time and effort.

Is there really any encoding going on when reading and writing text? Whether it's in memory or on disk, it's still just bytes.

PeterDonis
Mentor
2019 Award
Is there really any encoding going on when reading and writing text? Whether it's in memory or on disk, it's still just bytes.
Not if it's Unicode text; there are many different Unicode encodings, which are mappings of Unicode code points to bytes.

Plus, even if it's ASCII text, there are still bytes that are invalid (any byte 0x80 or higher), and the program has to check for them. Also, over the years, many programs and operating systems have devised encodings that use ASCII for bytes 0x7F and lower and invent new meanings for bytes 0x80 and higher (in old versions of Windows these were called "code pages"), and these are different possible text encodings as well.

Jarvis323
I have some question on reinterpret_cast. I marked up some notes on P678 of Gaddis book and scanned and post here. I red lined the relevant parts and wrote notes, can you guys confirm I am correct?

1) From my understanding reinterpret_cast can change from one data type to another. In the book, it show ONLY converting to char type, that's the reason the first argument is <char*>. can it be <int*> also?

2) reinterpret_cast <datatype*>(value) is a POINTER. This is according to the example in the book.

3) In the example shown, it is convert to binary file BECAUSE file.write(reinterpret-cast<char*>(&x), sizeof(x)); the .write by definition is to convert to binary file, not the reinterpret_cast.

Thanks

phinds
Gold Member
2019 Award
That's really my question. I have to look it up. I thought there must be a way just as simple as declaring binary or text to print it out.
I suggest HEX EDIT, a free app that displays both binary and text side by side (of course if the file is not a text file then the "text" displayed with be just random characters).

yungman
I suggest HEX EDIT, a free app that displays both binary and text side by side (of course if the file is not a text file then the "text" displayed with be just random characters).
I really thought I asked a very obvious and simple question. I was expecting there must be a simple command that just not encode, just dump out the hex value of each bite so I can look at it. How can it be more simpler than that? now I found out you have to go online to dump the file!!!

Tom.G
edit: See The Ahhh moment: below

From the program example you gave in the fourth post, it looks like the cout defaults to considering it's argument as a string (characters).

If you write a binary file that is has 0x41 in it, is is written on disk as a bunch of 1's and 0's. When read back in you can tell the program to treat them as:
binary (0100 0001)
octal (101)
ASCII (A)
It appears that cout assumes you want that bunch of 1's and 0's to be consider as alphabetical characters. There is probably a way to tell cout to show them as any of the above formats.

I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.

The Ahhh moment:
After reading the program again, I noticed ln11:
char data[size] = "This is a test, another test, more test";
It casts "data[]" as type 'char'.
When cout is passed "data[]", it recognizes that it is of type char and prints it as such.

So maybe not such of a mystery after all!

Cheers,
Tom

Mark44
Mentor
I have some question on reinterpret_cast. I marked up some notes on P678 of Gaddis book and scanned and post here. I red lined the relevant parts and wrote notes, can you guys confirm I am correct?
View attachment 269678

1) From my understanding reinterpret_cast can change from one data type to another. In the book, it show ONLY converting to char type, that's the reason the first argument is <char*>. can it be <int*> also?
No, it's converting the type of &x to char *, not char. In other words it's converting &x, the address of x, which is of type int *, to a char pointer.
yungman said:
2) reinterpret_cast <datatype*>(value) is a POINTER. This is according to the example in the book.
Yes, in the case of the scanned image you posted, but it doesn't necessarily convert to a pointer type.
yungman said:
3) In the example shown, it is convert to binary file BECAUSE file.write(reinterpret-cast<char*>(&x), sizeof(x)); the .write by definition is to convert to binary file, not the reinterpret_cast.
No, write doesn't convert the file. Presumably the file was already opened in binary mode (ios::binary). The reinterpret_cast operator converts the type of some variable to a different type.

The file.write() function above writes four bytes into the file. In hex, they are 1B 00 00 00.
C++:
int x = 27;
dataFile2.open("demoFile2.txt",  ios::out);
dataFile2 << x;
The stream insertion operator, <<, will write two bytes into demoFile2.txt: 32 37. These are the ASCII codes, in hex, of '2' and '7'.

edit: See The Ahhh moment: below

From the program example you gave in the fourth post, it looks like the cout defaults to considering it's argument as a string (characters).

If you write a binary file that is has 0x41 in it, is is written on disk as a bunch of 1's and 0's. When read back in you can tell the program to treat them as:
binary (0100 0001)
octal (101)
ASCII (A)
It appears that cout assumes you want that bunch of 1's and 0's to be consider as alphabetical characters. There is probably a way to tell cout to show them as any of the above formats.

I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.

The Ahhh moment:
After reading the program again, I noticed ln11:
char data[size] = "This is a test, another test, more test";
It casts "data[]" as type 'char'.
When cout is passed "data[]", it recognizes that it is of type char and prints it as such.

So maybe not such of a mystery after all!

Cheers,
Tom
I kind of give up on this, the book make a big sting about file written in .txt mode and binary mode. The program in post #4 shows if I wrote and store in binary file, I can read back the EXACT same thing just reading back in text mode by using getline!!!

Post #16 is very similar also. I quote the page and show what exactly the book said. But there is inconsistency again.

Normally, when I study a subject, I get like 3 or 4 books and I am going to get a clear answer for all my questions. Not in this C++. C++ is NOT the most difficult subject in all do respect, it's just there is no straight answers. It really doesn't NOT help going on line. This is an example of looking for reinterpret_cast:
https://en.cppreference.com/w/cpp/language/reinterpret_cast
This anything BUT showing how to use reinterpret_cast. This is NOT that hard if you just read from the page of the book that I copy out, just a simple translation from one type to the other!!! But then it's NOT.

I studied advanced calculus, electromegnatics and microwave RF on my own and used them on the jobs for years, I don't think I am slow. You cannot put C++ in the same league as those. It's the inconsistency, it's almost like they make it up as they go. Like I got tripped by the "*" and "&"in the pointer and address. I never seen other scientific subject have the same symbol meaning totally different things.

It's like in post 16, the book clearly it is transforming from integer to char. Try use that as if it's char. It doesn't work. Unless I am so totally wrong reading the few lines in the page, it has problem. 4 books, no answer, go on line, you get those pieces like that.

If only I can be happy to work out problems like the book, no more and no less, I would not have nearly as much question. In fact I don't have any question if I just follow the procedure in the book!!! But as soon as I walk out of the line a little, then I find all the holes that the book missed, that never explain.

Mark44
Mentor
I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.
Actually, cout is just the output stream. The documentation to look at is for the stream insertion operator, <<.

Mark44
Mentor
I kind of give up on this, the book make a big sting about file written in .txt mode and binary mode. The program in post #4 shows if I wrote and store in binary file, I can read back the EXACT same thing just reading back in text mode by using getline!!!
The two file types, text and binary, are not the same. I showed you an example in post #5 that demonstrates this. You can't use getline() to determine this, because getline() works with ordinary characters. With a binary file, there can be bytes that are not ordinary printable characters.
yungman said:
Post #16 is very similar also. I quote the page and show what exactly the book said. But there is inconsistency again.
See my post above. You have some confusion about what the book is actually saying.
yungman said:
Normally, when I study a subject, I get like 3 or 4 books and I am going to get a clear answer for all my questions. Not in this C++. C++ is NOT the most difficult subject in all do respect, it's just there is no straight answers. It really doesn't NOT help going on line.
If a book's explanation is confusing, go to the documentation of the function that is causing the confusion.
yungman said:
This is an example of looking for reinterpret_cast:
https://en.cppreference.com/w/cpp/language/reinterpret_cast
This anything BUT showing how to use reinterpret_cast. This is NOT that hard if you just read from the page of the book that I copy out, just a simple translation from one type to the other!!! But then it's NOT.
I don't know what you mean. It's a straightforward conversion from one type to another.
yungman said:
I studied advanced calculus, electromegnatics and microwave RF on my own and used them on the jobs for years, I don't think I am slow. You cannot put C++ in the same league as those.
It's the inconsistency, it's almost like they make it up as they go. Like I got tripped by the "*" and "&"in the pointer and address. I never seen other scientific subject have the same symbol meaning totally different things.
There are only so many characters on a standard keyboard, so I imagine this is the reason for different meanings for * and & in different contexts. In mathematics, what does - mean? Does it mean the negative of something or does it mean the difference of two expressions? The context can tell you.
For the * operator, the context tells you whether two things are being multiplied, a pointer variable is being declared, or a pointer variable is being dereferenced. And similar for &.
Rome wasn't built in a day, and you didn't learn advanced calculus in just a few weeks.
yungman said:
It's like in post 16, the book clearly it is transforming from integer to char. Try use that as if it's char. It doesn't work. Unless I am so totally wrong reading the few lines in the page, it has problem.
No, as I explained in my previous post. It is converting an int pointer to a char pointer.

sysprog
No, it's converting the type of &x to char *, not char. In other words it's converting &x, the address of x, which is of type int *, to a char pointer.
reinterpret_cast<datatype>(value) as given in the book said as I red lined, datatype is data type you convert to, and the value is the value you are converting. This is STRAIGHT out from the page in post 16 that I red lined. So it is CONVERTING from datatype of value to datatype of <datatype>.

The give away is the line ptr=reinterpret_cast<char*>(&x) that it is a pointer of char. So according to the line above, this is converting from integer value of x to a character.

Mark44 said:
Yes, in the case of the scanned image you posted, but it doesn't necessarily convert to a pointer type.
But then ptr=reinterpret_cast<char*>(&x) claim it is a pointer. For you, it's obvious because you know all these. But for someone like me that start out learning C++, I have to take it very precise and literal.
Mark44 said:
No, write doesn't convert the file. Presumably the file was already opened in binary mode (ios::binary). The reinterpret_cast operator converts the type of some variable to a different type.
I should have said the .write and .read ARE for binary file only, I don't mean it convert to binary file. it's the file.open("name.dat", ios:ut|ios::binary) that define the file store is going to be binary file. that I know is very clear. Actually that is where my question is. I experimented,
C++:
    int Iw[] = { 1, 2, 3, 4, 5 };
index = 0;
test.open("test1.txt", ios::out);
if (!test)
{
cout << " Fail to open test1.txt\n\n";
return 0;
}
cout << " ready to write to test1.\n\n";
while (index < sizeof(Iw))
{
test << reinterpret_cast<char*>(Iw[index]);
cout << "index = " << index << " ";
index++;
}
test.close();
It doesn't work. If I understand it correctly, I convert int Iw={1,2,3,4,5} to character. So I can just store into the file the regular way. Don't pick on the sizeof(Iw), you know if it works, I should have written something. it created the test1.txt, but there's nothing in it.

Mark44 said:
The file.write() function above writes four bytes into the file. In hex, they are 1B 00 00 00.
C++:
int x = 27;
dataFile2.open("demoFile2.txt",  ios::out);
dataFile2 << x;
The stream insertion operator, <<, will write two bytes into demoFile2.txt: 32 37. These are the ASCII codes, in hex, of '2' and '7'.
Sorry, I did not see this post when I answer the other one until now.

I know you really gone out of the way to help me, and I really appreciate this. I cannot say enough times.

I really tried to look for answer myself, but it's like the page that spells out so clear, but if I experiment it, it's not quite work the way it is said. Like I said, if I would have just follow the book how they use it, I would not have questions, it's very simple, but as long as I venture out a little, that's where all the problems started. Apparently reinterpret_cast is a much bigger subject, this is only very small portion. So just have to use what the book shows in limited situation and "trust" it will work!!!
Thanks

Last edited by a moderator:
pbuk