Are Binary Files and Text Files the Same?

yungman · Sep 16, 2020

I have been studying binary files, my main textbook is not very clear, the other 3 books are useless. I went on line and read a lot of articles. I want to run it by you guys whether I am correct or not:

1) Binary files are written in binary or Hex( same to me that read 4 binary as each hex). Text files are written in ASCII( still binary in the sense the number means characters). They are of different use, some are better in some cases and not as good in others. Bottom line, text files and binary files are like English and Chinese writings...They are just different. Just that simple.

Yes, I understand there are a lot of details like .exe, .jpg, .png etc are all in binary etc. etc. Those are just details.

2) fstream has different function objects to work on text files and binary files. You have to specify binary(ios::binary) or else it is treated as text files. Then you have function objects to translate back and fore (reinterpret_cast). You literally treat text and binary files as different as apple and orange.

Let me know whether I am correct. I have some more questions later.

Thanks

pbuk · Sep 16, 2020

That's close enough, except:

yungman said:

Bottom line, text files and binary files are like English and Chinese writings...They are just different. Just that simple.

No, you can store both English and Chinese text in a text file using the right encoding (not ASCII). A better analogy would be to say that you could store the words to a song in a text file but a recording of the song or picture of the album cover would be stored in a binary file.

Edit: as @PeterDonis points out below, 'text' and 'binary' are just convenient labels we notionally attach to files according to how we intend to use them, there is no difference in the files themselves, they are still just a sequence of 1's and 0's.

yungman said:

You literally treat text and binary files as different as apple and orange.

Again not a good analogy, you can't cast an apple to an orange.

PeterDonis · Sep 16, 2020

yungman said:

Binary files are written in binary or Hex( same to me that read 4 binary as each hex). Text files are written in ASCII( still binary in the sense the number means characters).

When you say "written in binary or Hex", that's wrong; files aren't "written" in anything. Files are just bytes. "Binary" and "Hex" as you're using those terms in that phrase are notations for describing bytes. They're things that humans write (or have their computer programs display) when they want to describe bytes. There is also a "text" notation for bytes--the byte that in Hex is described as 0x41 can also be described in "text" notation as "A", for example. These are just two different notations for the same byte.

The distinction between "binary" and "text" files (as opposed to notations for bytes) is a distinction that programs, and sometimes also operating systems, make in how they interpret the bytes in the file. "Binary" in this context just means "not text"--it covers all of the interpretations that aren't "text", and as you note, there are a lot of them, so "binary" is not just one thing as far as how programs interpret bytes is concerned.

"Text" actually isn't just one thing either, because there is ASCII text and also Unicode text, and Unicode has multiple encodings that map Unicode code points to bytes. All this, as noted, is part of how programs interpret the bytes in a file. ("ASCII" can actually be thought of as just another encoding, one in which only the bytes in the range 0x00 to 0x7F are valid.)

yungman · Sep 16, 2020

Thanks for the reply.

Now I have this program that confuses me. The program first write a text line "This is a test, another text, more test." into test.dat in BINARY file mode. Then the program read the file back in BINARY mode and display to content. Then the last part of the program read the file back in TEXT mode as shown in the comment. They both display the EXACT sentence "This is a test, another text, more test.".

C++:

//12.13 binary file read write
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
//First create and wrtie to file test.dat.
    const int size = 81;
    char data[size] = "This is a test, another test, more test";
    fstream file;
    file.open("test.dat", ios::out | ios::binary);//Open in binary mode
    cout << " Write the characters to file.\n\n";
    file.write(data, sizeof(data));
    file.close();

//Read back in binary mode:
    cout << " Read back in binary file mode = {";
    file.open("test.dat", ios::in | ios::binary);//Open in binary mode
    file.read(data, sizeof(data));
    for (int count = 0; count < size; count++)
        cout << data[count] ;
    cout << "}\n\n";//Notice } is so far from the end of the sentence?
    file.close();

//Read back in text mode:
    cout << " Now file read back in text mode : ";
    string str = { '\0' }, st1 = { '\0' };
    file.open("test.dat", ios::in);//Open in text mode.
    while (!file.eof())
    {
        getline(file, st1);
        str.append(st1);
    }
    cout << "  {" << str << "}\n\n";//Notice } is so far from the end of the sentence?
    file.close();
    return 0;
}

The file is written in binary, why I read back the exact thing in TEXT mode? I can change the sentence, result is the same.

Also, in both case, I try to put "}" at the end of the sentence, but it's so far from the last word of the sentence. The code suppose to stop reading when reach eof in one case or the sizeof(file). Why there are so much space after the last word?

ThanksEDIT:

I just experiment by changing the program to create and write the same sentence in text.txt. Then read back in both BINARY and TEXT mode, both display the EXACT sentence.

Mark44 · Sep 16, 2020

Your program is not a good example of the difference between a text file and a binary file.
Try writing this data to your binary file, and then opening it as a text file and displaying the result.

C++:

unsigned data[] = {0x4481, 0x0000, 0x2031, 0xFF13, 0x2931};

The results will not be the same.
One way to think about text files is that they contain a sequence of lines, separated by newline characters. There are no such separators in a binary file.

To make sense of a binary file, you have to know how the data is organized in it. For example, a .jpg image file is a binary file, with different parts that specify such things as how many bytes make up the image data, where the image data starts, the file format, how many colors are used, how many bytes are used per pixel, how many bits are used for red, green, blue and alpha (if used), and lots more stuff. If you open such a file as a text file and display it as characters, you'll get a lot of garbage.

pbuk · Sep 16, 2020

yungman said:

Thanks for the reply.

Now I have this program that confuses me. The program first write a text line "This is a test, another text, more test." into test.dat in BINARY file mode. Then the program read the file back in BINARY mode and display to content. Then the last part of the program read the file back in TEXT mode as shown in the comment. They both display the EXACT sentence "This is a test, another text, more test.".

...

The file is written in binary, why I read back the exact thing in TEXT mode? I can change the sentence, result is the same.

...

I just experiment by changing the program to create and write the same sentence in text.txt. Then read back in both BINARY and TEXT mode, both display the EXACT sentence.

This has already been answered:

pbuk said:

As @PeterDonis points out below, 'text' and 'binary' are just convenient labels we notionally attach to files according to how we intend to use them, there is no difference in the files themselves, they are still just a sequence of 1's and 0's.

As to the second question:

yungman said:

Also, in both case, I try to put "}" at the end of the sentence, but it's so far from the last word of the sentence. The code suppose to stop reading when reach eof in one case or the sizeof(file). Why there are so much space after the last word?

I could tell you why, but I think you should work it out for yourself. Clue: if you want to understand why something is being written to a file then you need to look where you write something to the file.

yungman · Sep 16, 2020

Mark44 said:
Your program is not a good example of the difference between a text file and a binary file.
Try writing this data to your binary file, and then opening it as a text file and displaying the result.
C++:
unsigned [B]data[] = {0x4481, 0x0000, 0x2031, 0xFF13, 0x2931};
[/B]
The results will not be the same.
One way to think about text files is that they contain a sequence of lines, separated by newline characters. There are no such separators in a binary file.

To make sense of a binary file, you have to know how the data is organized in it. For example, a .jpg image file is a binary file, with different parts that specify such things as how many bytes make up the image data, where the image data starts, the file format, how many colors are used, how many bytes are used per pixel, how many bits are used for red, green, blue and alpha (if used), and lots more stuff. If you open such a file as a text file and display it as characters, you'll get a lot of garbage.

I put in your suggestion changing the data[], this is the result in reading as binary and text resp:
Read back in binary file mode = {ü 11 }

Now file read back in text mode : { ü 11 }

They look the same.

I tried so hard to look for more information on this, it's just so hard for find it on line, 4 textbooks all don't talk much about this. I don't even know how to read the content of the file to show what is really being written in. I know the binary file is important, I just wish books will elaborate this more.

PeterDonis · Sep 16, 2020

yungman said:

I don't even know how to read the content of the file to show what is really being written in.

That's easy: read the file in binary mode and look at each byte, in whatever notation you're most comfortable with. The file contains bytes, and reading it in binary mode just gives you those bytes. Then you just check that those bytes are the ones you wanted to write to the file. Which in turn means that, if your next question is "how do I know what bytes I wanted to write to the file?", then you need to fix that problem first--which means you need to understand what you are writing to the file.

You seem to be confused about what writing a file in text mode means. It doesn't mean you are magically writing something to the file besides bytes. It just means your program is taking the text you tell it to write to the file and encoding it to bytes, using some encoding. So to know what bytes should be in the file, you need to know what encoding is being used.

You also seem to be confused about what reading a file in text mode means. It means that you are assuming that the bytes in the file are to be interpreted as text--again, using some encoding, which means to understand how the bytes in the file get translated into text in your program, you need to know what encoding is being used. If you're going to print the bytes to the screen, reading them from the file in text mode and then printing them might be exactly equivalent to reading the file in binary mode and then printing the bytes you get as if they were text--which is basically what your program is doing. In either case your program is going to be using an encoding for the bytes to text translation--the only difference is whether the encoding is being used when the file is read from disk (if you open it in text mode) or when your program tries to print the data to the screen (if you open the file in binary mode and then try to print the bytes as text).

jtbell · Sep 17, 2020

yungman said:

I don't even know how to read the content of the file to show what is really being written in

Surely there is some Windows app or utility that can display the contents of a file (binary or not) in hexadecimal. Under MacOS I can open a Terminal window and use the Unix command-line utility 'hexdump'.

yungman · Sep 17, 2020

PeterDonis said:

That's easy: read the file in binary mode and look at each byte, in whatever notation you're most comfortable with. The file contains bytes, and reading it in binary mode just gives you those bytes. Then you just check that those bytes are the ones you wanted to write to the file. Which in turn means that, if your next question is "how do I know what bytes I wanted to write to the file?", then you need to fix that problem first--which means you need to understand what you are writing to the file.
You can see in my first post, the program already written the file in binary and I read back in both binary and text mode, they look the same. Also in post #4, I already tryed writing in .txt file and read back in binary. It read out EXACTLY the same. Do you have a way to read the content without encoding at all? That's my question was.

You seem to be confused about what writing a file in text mode means. It doesn't mean you are magically writing something to the file besides bytes. It just means your program is taking the text you tell it to write to the file and encoding it to bytes, using some encoding. So to know what bytes should be in the file, you need to know what encoding is being used.

You also seem to be confused about what reading a file in text mode means. It means that you are assuming that the bytes in the file are to be interpreted as text--again, using some encoding, which means to understand how the bytes in the file get translated into text in your program, you need to know what encoding is being used. If you're going to print the bytes to the screen, reading them from the file in text mode and then printing them might be exactly equivalent to reading the file in binary mode and then printing the bytes you get as if they were text--which is basically what your program is doing. In either case your program is going to be using an encoding for the bytes to text translation--the only difference is whether the encoding is being used when the file is read from disk (if you open it in text mode) or when your program tries to print the data to the screen (if you open the file in binary mode and then try to print the bytes as text).

I am not confused as I already said in the first post, I even said ASCII is still binary in the sense the number means characters. I might not use the word "encode", but it's each char is a unique binary number.

I am too familiar with bytes binary and Hex, I grew up with those in my days when everything we did in hardware, assemble language and machine language are BINARY or HEX. Of cause it IS given they are encoded bytes. .txt file is encoded ( as you call it) in either ASCII or something, binary is different. My first post was just to verify they are just different format ( or encode as you use), that you can do it either way ( of cause there are preference and all that).

In fact, I am more comfortable working with bytes, binary and Hex. Every bit means something. Like 0AH means D3=1 and D1=1 and all others are 0. I have problem working with decimals as that's NOT how the computer works, they work in binary( say Hex).

I understand you can make up your own encoding method and still is in form of bytes, binary and all that. That's a given. Whether your encoding method is useful mainly depends on whether other people are willing to follow your method and use it. If everyone like it and use it, your encoding can be as popular, but if no one use it, then it's useless. It's the format, they are all bytes, binary (not what they called binary file, but JUST SIMPLE BINARY NUMBERS).

yungman · Sep 17, 2020

jtbell said:

Surely there is some Windows app or utility that can display the contents of a file (binary or not) in hexadecimal. Under MacOS I can open a Terminal window and use the Unix command-line utility 'hexdump'.

That's really my question. I have to look it up. I thought there must be a way just as simple as declaring binary or text to print it out.

PeterDonis · Sep 17, 2020

yungman said:

I am not confused

You might not think you are, but you keep saying things that indicate confusion. For example:

yungman said:

Of cause it IS given they are encoded bytes. .txt file is encoded ( as you call it) in either ASCII or something, binary is different

No, you have it backwards. "Binary" and "text" files are not different; they're both just bytes. The "binary" or "text", and the "encoding" for "text", is how a program interprets the bytes in the file. It's not a property of the file. It's a property of the program that reads or writes the file. When we say a file is "binary" or "text", what we really mean is that the program we intend to use for reading/writing the file will interpret it as binary (one of the many interpretations that falls into that category) or text (and then, as I said, we have to know what text encoding the program will use).

PeterDonis · Sep 17, 2020

yungman said:

That's really my question.

The fact that it took you this many posts to ask it also indicates confusion. You could have just asked "does anyone know of a Windows app that will display files in binary/hex?" in the OP of this thread and saved everyone a lot of time and effort.

Jarvis323 · Sep 17, 2020

Is there really any encoding going on when reading and writing text? Whether it's in memory or on disk, it's still just bytes.

PeterDonis · Sep 17, 2020

Jarvis323 said:

Is there really any encoding going on when reading and writing text? Whether it's in memory or on disk, it's still just bytes.

Not if it's Unicode text; there are many different Unicode encodings, which are mappings of Unicode code points to bytes.

Plus, even if it's ASCII text, there are still bytes that are invalid (any byte 0x80 or higher), and the program has to check for them. Also, over the years, many programs and operating systems have devised encodings that use ASCII for bytes 0x7F and lower and invent new meanings for bytes 0x80 and higher (in old versions of Windows these were called "code pages"), and these are different possible text encodings as well.

yungman · Sep 18, 2020

I have some question on reinterpret_cast. I marked up some notes on P678 of Gaddis book and scanned and post here. I red lined the relevant parts and wrote notes, can you guys confirm I am correct?

1) From my understanding reinterpret_cast can change from one data type to another. In the book, it show ONLY converting to char type, that's the reason the first argument is <char*>. can it be <int*> also?

2) reinterpret_cast <datatype*>(value) is a POINTER. This is according to the example in the book.

3) In the example shown, it is convert to binary file BECAUSE file.write(reinterpret-cast<char*>(&x), sizeof(x)); the .write by definition is to convert to binary file, not the reinterpret_cast.

Thanks

phinds · Sep 18, 2020

yungman said:

That's really my question. I have to look it up. I thought there must be a way just as simple as declaring binary or text to print it out.

I suggest HEX EDIT, a free app that displays both binary and text side by side (of course if the file is not a text file then the "text" displayed with be just random characters).

yungman · Sep 18, 2020

phinds said:

I suggest HEX EDIT, a free app that displays both binary and text side by side (of course if the file is not a text file then the "text" displayed with be just random characters).

I really thought I asked a very obvious and simple question. I was expecting there must be a simple command that just not encode, just dump out the hex value of each bite so I can look at it. How can it be more simpler than that? now I found out you have to go online to dump the file!

Tom.G · Sep 18, 2020

edit: See The Ahhh moment: below

From the program example you gave in the fourth post, it looks like the cout defaults to considering it's argument as a string (characters).

If you write a binary file that is has 0x41 in it, is is written on disk as a bunch of 1's and 0's. When read back in you can tell the program to treat them as:
binary (0100 0001)
octal (101)
hexadecimal (0x41)
ASCII (A)
It appears that cout assumes you want that bunch of 1's and 0's to be consider as alphabetical characters. There is probably a way to tell cout to show them as any of the above formats.

I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.

The Ahhh moment:
After reading the program again, I noticed ln11:
char data[size] = "This is a test, another test, more test";
It casts "data[]" as type 'char'.
When cout is passed "data[]", it recognizes that it is of type char and prints it as such.

So maybe not such of a mystery after all!

Cheers,
Tom

Mark44 · Sep 19, 2020

yungman said:

I have some question on reinterpret_cast. I marked up some notes on P678 of Gaddis book and scanned and post here. I red lined the relevant parts and wrote notes, can you guys confirm I am correct?
View attachment 269678

1) From my understanding reinterpret_cast can change from one data type to another. In the book, it show ONLY converting to char type, that's the reason the first argument is <char*>. can it be <int*> also?

No, it's converting the type of &x to char *, not char. In other words it's converting &x, the address of x, which is of type int *, to a char pointer.

yungman said:

2) reinterpret_cast <datatype*>(value) is a POINTER. This is according to the example in the book.

Yes, in the case of the scanned image you posted, but it doesn't necessarily convert to a pointer type.

yungman said:

3) In the example shown, it is convert to binary file BECAUSE file.write(reinterpret-cast<char*>(&x), sizeof(x)); the .write by definition is to convert to binary file, not the reinterpret_cast.

No, write doesn't convert the file. Presumably the file was already opened in binary mode (ios::binary). The reinterpret_cast operator converts the type of some variable to a different type.

The file.write() function above writes four bytes into the file. In hex, they are 1B 00 00 00.
If instead you do this:

C++:

int x = 27;
dataFile2.open("demoFile2.txt",  ios::out);
dataFile2 << x;

The stream insertion operator, <<, will write two bytes into demoFile2.txt: 32 37. These are the ASCII codes, in hex, of '2' and '7'.

yungman · Sep 19, 2020

Tom.G said:

edit: See The Ahhh moment: below

From the program example you gave in the fourth post, it looks like the cout defaults to considering it's argument as a string (characters).

If you write a binary file that is has 0x41 in it, is is written on disk as a bunch of 1's and 0's. When read back in you can tell the program to treat them as:
binary (0100 0001)
octal (101)
hexadecimal (0x41)
ASCII (A)
It appears that cout assumes you want that bunch of 1's and 0's to be consider as alphabetical characters. There is probably a way to tell cout to show them as any of the above formats.

I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.

The Ahhh moment:
After reading the program again, I noticed ln11:
char data[size] = "This is a test, another test, more test";
It casts "data[]" as type 'char'.
When cout is passed "data[]", it recognizes that it is of type char and prints it as such.

So maybe not such of a mystery after all!

Cheers,
Tom

I kind of give up on this, the book make a big sting about file written in .txt mode and binary mode. The program in post #4 shows if I wrote and store in binary file, I can read back the EXACT same thing just reading back in text mode by using getline!

Post #16 is very similar also. I quote the page and show what exactly the book said. But there is inconsistency again.

Normally, when I study a subject, I get like 3 or 4 books and I am going to get a clear answer for all my questions. Not in this C++. C++ is NOT the most difficult subject in all do respect, it's just there is no straight answers. It really doesn't NOT help going on line. This is an example of looking for reinterpret_cast:
https://en.cppreference.com/w/cpp/language/reinterpret_cast
This anything BUT showing how to use reinterpret_cast. This is NOT that hard if you just read from the page of the book that I copy out, just a simple translation from one type to the other! But then it's NOT.

I studied advanced calculus, electromegnatics and microwave RF on my own and used them on the jobs for years, I don't think I am slow. You cannot put C++ in the same league as those. It's the inconsistency, it's almost like they make it up as they go. Like I got tripped by the "*" and "&"in the pointer and address. I never seen other scientific subject have the same symbol meaning totally different things.

It's like in post 16, the book clearly it is transforming from integer to char. Try use that as if it's char. It doesn't work. Unless I am so totally wrong reading the few lines in the page, it has problem. 4 books, no answer, go on line, you get those pieces like that.

If only I can be happy to work out problems like the book, no more and no less, I would not have nearly as much question. In fact I don't have any question if I just follow the procedure in the book! But as soon as I walk out of the line a little, then I find all the holes that the book missed, that never explain.

Mark44 · Sep 19, 2020

Tom.G said:

I'm not familiar with the language, so you can either look in the documentation for cout, or ask others here how to do that, if possible.

Actually, cout is just the output stream. The documentation to look at is for the stream insertion operator, <<.

Mark44 · Sep 19, 2020

yungman said:

I kind of give up on this, the book make a big sting about file written in .txt mode and binary mode. The program in post #4 shows if I wrote and store in binary file, I can read back the EXACT same thing just reading back in text mode by using getline!

The two file types, text and binary, are not the same. I showed you an example in post #5 that demonstrates this. You can't use getline() to determine this, because getline() works with ordinary characters. With a binary file, there can be bytes that are not ordinary printable characters.

yungman said:

Post #16 is very similar also. I quote the page and show what exactly the book said. But there is inconsistency again.

See my post above. You have some confusion about what the book is actually saying.

yungman said:

Normally, when I study a subject, I get like 3 or 4 books and I am going to get a clear answer for all my questions. Not in this C++. C++ is NOT the most difficult subject in all do respect, it's just there is no straight answers. It really doesn't NOT help going on line.

If a book's explanation is confusing, go to the documentation of the function that is causing the confusion.

yungman said:

This is an example of looking for reinterpret_cast:
https://en.cppreference.com/w/cpp/language/reinterpret_cast
This anything BUT showing how to use reinterpret_cast. This is NOT that hard if you just read from the page of the book that I copy out, just a simple translation from one type to the other! But then it's NOT.

I don't know what you mean. It's a straightforward conversion from one type to another.

yungman said:

I studied advanced calculus, electromegnatics and microwave RF on my own and used them on the jobs for years, I don't think I am slow. You cannot put C++ in the same league as those.
It's the inconsistency, it's almost like they make it up as they go. Like I got tripped by the "*" and "&"in the pointer and address. I never seen other scientific subject have the same symbol meaning totally different things.

There are only so many characters on a standard keyboard, so I imagine this is the reason for different meanings for * and & in different contexts. In mathematics, what does - mean? Does it mean the negative of something or does it mean the difference of two expressions? The context can tell you.
For the * operator, the context tells you whether two things are being multiplied, a pointer variable is being declared, or a pointer variable is being dereferenced. And similar for &.
Rome wasn't built in a day, and you didn't learn advanced calculus in just a few weeks.

yungman said:

It's like in post 16, the book clearly it is transforming from integer to char. Try use that as if it's char. It doesn't work. Unless I am so totally wrong reading the few lines in the page, it has problem.

No, as I explained in my previous post. It is converting an int pointer to a char pointer.

yungman · Sep 19, 2020

Mark44 said:

No, it's converting the type of &x to char *, not char. In other words it's converting &x, the address of x, which is of type int *, to a char pointer.

reinterpret_cast<datatype>(value) as given in the book said as I red lined, datatype is data type you convert to, and the value is the value you are converting. This is STRAIGHT out from the page in post 16 that I red lined. So it is CONVERTING from datatype of value to datatype of <datatype>.

The give away is the line ptr=reinterpret_cast<char*>(&x) that it is a pointer of char. So according to the line above, this is converting from integer value of x to a character.

Mark44 said:

Yes, in the case of the scanned image you posted, but it doesn't necessarily convert to a pointer type.

But then ptr=reinterpret_cast<char*>(&x) claim it is a pointer. For you, it's obvious because you know all these. But for someone like me that start out learning C++, I have to take it very precise and literal.

Mark44 said:

No, write doesn't convert the file. Presumably the file was already opened in binary mode (ios::binary). The reinterpret_cast operator converts the type of some variable to a different type.

I should have said the .write and .read ARE for binary file only, I don't mean it convert to binary file. it's the file.open("name.dat", ios:ut|ios::binary) that define the file store is going to be binary file. that I know is very clear. Actually that is where my question is. I experimented,

C++:

    int Iw[] = { 1, 2, 3, 4, 5 };
    index = 0;
    test.open("test1.txt", ios::out);
    if (!test)
    {
        cout << " Fail to open test1.txt\n\n";
        return 0;
    }
    cout << " ready to write to test1.\n\n";
    while (index < sizeof(Iw))
    {
        test << reinterpret_cast<char*>(Iw[index]);
        cout << "index = " << index << " ";
        index++;
    }
    test.close();

It doesn't work. If I understand it correctly, I convert int Iw={1,2,3,4,5} to character. So I can just store into the file the regular way. Don't pick on the sizeof(Iw), you know if it works, I should have written something. it created the test1.txt, but there's nothing in it.

Mark44 said:
The file.write() function above writes four bytes into the file. In hex, they are 1B 00 00 00.
If instead you do this:
C++:
int x = 27;
dataFile2.open("demoFile2.txt",  ios::out);
dataFile2 << x;
The stream insertion operator, <<, will write two bytes into demoFile2.txt: 32 37. These are the ASCII codes, in hex, of '2' and '7'.

Sorry, I did not see this post when I answer the other one until now.

I know you really gone out of the way to help me, and I really appreciate this. I cannot say enough times.

I really tried to look for answer myself, but it's like the page that spells out so clear, but if I experiment it, it's not quite work the way it is said. Like I said, if I would have just follow the book how they use it, I would not have questions, it's very simple, but as long as I venture out a little, that's where all the problems started. Apparently reinterpret_cast is a much bigger subject, this is only very small portion. So just have to use what the book shows in limited situation and "trust" it will work!
Thanks

pbuk · Sep 19, 2020

phinds said:

I suggest HEX EDIT, a free app that displays both binary and text side by side (of course if the file is not a text file then the "text" displayed with be just random characters).

There is a hex editor built into Visual Studio (which the OP is using). Use File -> Open -> File..., select the file in the dialog then click the drop-down on Open and select Open With... then choose Binary Editor.

phinds · Sep 19, 2020

pbuk said:

There is a hex editor built into Visual Studio (which the OP is using). Use File -> Open -> File..., select the file in the dialog then click the drop-down on Open and select Open With... then choose Binary Editor.

Very cool. Thanks. I've been using VS as long as there has BEEN a VS and I did not know that was there.

Mark44 · Sep 19, 2020

pbuk said:

There is a hex editor built into Visual Studio (which the OP is using). Use File -> Open -> File..., select the file in the dialog then click the drop-down on Open and select Open With... then choose Binary Editor.

I've only been using VS for 22 years, but didn't know that! Thanks!

Mark44 · Sep 19, 2020

yungman said:

The give away is the line ptr=reinterpret_cast<char*>(&x) that it is a pointer of char. So according to the line above, this is converting from integer value of x to a character.

NO!
x is type int, but &x is the address of x, so its type is int * (int pointer).
The type being converted to is char *, not char.
The conversion is from an int pointer to a char pointer.

Maybe you need a stronger pair of glasses, because not noticing these details has caused you lots of confusion in all the threads you've posted that have questions about pointers.

yungman said:

But then ptr=reinterpret_cast<char*>(&x) claim it is a pointer.

The claim is true, for the reason above.

yungman said:

It doesn't work. If I understand it correctly, I convert int Iw={1,2,3,4,5} to character. So I can just store into the file the regular way. Don't pick on the sizeof(Iw), you know if it works, I should have written something. it created the test1.txt, but there's nothing in it.

Here's the code you wrote, line 12:

C++:

test << reinterpret_cast<char*>(Iw[index]);

You are not converting Iw[index] to a character.
The type of Iw[index] is int.
The type to convert to is in angle brackets, char *.
Your code didn't work because you can't convert a scalar type (char, short, int, long, float, double, etc.) to a pointer type.
It would have worked if the last expression was &Iw[index], since this is now the address of that array element.

yungman said:

Apparently reinterpret_cast is a much bigger subject, this is only very small portion.

It's really not that complicated. Here's the summary for this operator in the VS documentation (https://docs.microsoft.com/en-us/cpp/cpp/reinterpret-cast-operator?view=vs-2019).

Allows any pointer to be converted into any other pointer type. Also allows any integral type to be converted into any pointer type and vice versa.

What is implied in this summary is that you can't convert a pointer type to a scalar type, or vice versa.

pbuk · Sep 19, 2020

phinds said:

Very cool. Thanks. I've been using VS as long as there has BEEN a VS and I did not know that was there.

Mark44 said:

I've only been using VS for 22 years, but didn't know that! Thanks!

Before I blush any more, I should point out that I only found this out myself today: I was about to post "I use Visual Studio Code for nearly all my coding these days and this has a useful add-in 'Hex Editor'" when I noticed that that add-in is credited to Microsoft and I thought I should check VS just in case ... and there it was, just a dropdown away from the 'File Open' dialog!

(Edit: for anyone that doesn't know, Visual Studio Code is a completely different product to Visual Studio, based on different technology - all it shares is a name and its 'author' - Microsoft).

yungman · Sep 19, 2020

Mark44 said:

NO!
x is type int, but &x is the address of x, so its type is int * (int pointer).
The type being converted to is char *, not char.
The conversion is from an int pointer to a char pointer.

See, that's the problem in the book, I red lined on that page, the first line is dataType is the data type that you are converting to, and the value is the value that you are converting. You tell me.

I know for expert like you, you know exactly what's going on. But put yourself in the shoe of a student trying to learn, how deceiving can this be? One side of the mouth said it's data conversion, the other side said it's a pointer!!...in two sentences one after the other! I know they are character pointers, converting from one to another. But without finding more info, I have to assume the reinterpret_cast TRANSLATE the value to characters and put it in memory and giving back the pointer char* to point to that to be used! What else can I assume?

Then go online and search, look at articles like this:
https://en.cppreference.com/w/cpp/language/reinterpret_cast

I am sure for expert like you can understand that and it's obvious. But put it in the shoe of a student, read the 11 conditions. They might as well written in Russian. It's almost like as if their mission is to play with English so only a chosen few can understand. I already studied 700 pages of the book, more than 3/4 of the book now, but I still have NO IDEA what the article is talking. I read it like 5 or 6 times already, there are so many terms I have no idea what are they. This is NOT helping students, it's for experts and advisors like the few of you on the forum here to say " Ah, now I see!". But then you guys don't need the article!

Mark44 said:

Maybe you need a stronger pair of glasses, because not noticing these details has caused you lots of confusion in all the threads you've posted that have questions about pointers.

The exact reason, book doesn't talk clearly, going on line is NOT any better. Again, for expert like you and the others here, it's very obvious. But just put yourself in the shoe of a student trying to learn. Reading online is NO better.

Mark44 said:
The claim is true, for the reason above.
Here's the code you wrote, line 12:
C++:
test << reinterpret_cast<char*>(Iw[index]);
You are not converting Iw[index] to a character.
The type of Iw[index] is int.
The type to convert to is in angle brackets, char *.
Your code didn't work because you can't convert a scalar type (char, short, int, long, float, double, etc.) to a pointer type.
It would have worked if the last expression was &Iw[index], since this is now the address of that array element.
It's really not that complicated. Here's the summary for this operator in the VS documentation (https://docs.microsoft.com/en-us/cpp/cpp/reinterpret-cast-operator?view=vs-2019).

What is implied in this summary is that you can't convert a pointer type to a scalar type, or vice versa.

Thanks for your patience. I am just very frustrated trying to learn a step more than the book. I already given up and try to move on. I already did the problem using reinterpret_cast in the limited purpose of putting a structure into a binary file with no problem. Maybe I should not read deeper than that and move onto Random files. I am stuck on this for over 2 days and going nowhere. Maybe in future chapters will make it clearer. Just don't venture out of the book and just get on, I might have much less questions and move a lot faster.

Thanks

Mark44 · Sep 19, 2020

yungman said:

See, that's the problem in the book, I red lined on that page, the first line is dataType is the data type that you are converting to, and the value is the value that you are converting. You tell me.

Here's what it says on that page, including part that you underlined, and part that you didn't.

where dataType is the data type that you are converting to, and value is the value that you are converting. For example, the following code uses the type cast to store the address of an int in a char pointer variable.

Here's the line in question:

C++:

 ptr=reinterpret_cast<char*>(&x);

In what I quoted, dataType is char *, or char pointer, and value is &x, which is int pointer. The sentence that starts with "For example, ..." explains that an address of one type is being cast as an address of another type. IOW, the cast is changing the type of pointer.

yungman said:

I know for expert like you, you know exactly what's going on. But put yourself in the shoe of a student trying to learn, how deceiving can this be? One side of the mouth said it's data conversion, the other side said it's a pointer!!...in two sentences one after the other! I know they are character pointers, converting from one to another. But without finding more info, I have to assume the reinterpret_cast TRANSLATE the value to characters and put it in memory and giving back the pointer char* to point to that to be used! What else can I assume?

It's not deceiving at all if you read it carefully. And no, reinterpret_cast does not translate a value to characters. It converts a pointer of one type to a pointer of another type.

Jarvis323 · Sep 19, 2020

yungman said:

I kind of give up on this, the book make a big sting about file written in .txt mode and binary mode. The program in post #4 shows if I wrote and store in binary file, I can read back the EXACT same thing just reading back in text mode by using getline!

Post #16 is very similar also. I quote the page and show what exactly the book said. But there is inconsistency again.

Normally, when I study a subject, I get like 3 or 4 books and I am going to get a clear answer for all my questions. Not in this C++. C++ is NOT the most difficult subject in all do respect, it's just there is no straight answers. It really doesn't NOT help going on line. This is an example of looking for reinterpret_cast:
https://en.cppreference.com/w/cpp/language/reinterpret_cast
This anything BUT showing how to use reinterpret_cast. This is NOT that hard if you just read from the page of the book that I copy out, just a simple translation from one type to the other! But then it's NOT.

I studied advanced calculus, electromegnatics and microwave RF on my own and used them on the jobs for years, I don't think I am slow. You cannot put C++ in the same league as those. It's the inconsistency, it's almost like they make it up as they go. Like I got tripped by the "*" and "&"in the pointer and address. I never seen other scientific subject have the same symbol meaning totally different things.

It's like in post 16, the book clearly it is transforming from integer to char. Try use that as if it's char. It doesn't work. Unless I am so totally wrong reading the few lines in the page, it has problem. 4 books, no answer, go on line, you get those pieces like that.

If only I can be happy to work out problems like the book, no more and no less, I would not have nearly as much question. In fact I don't have any question if I just follow the procedure in the book! But as soon as I walk out of the line a little, then I find all the holes that the book missed, that never explain.

https://en.cppreference.com/w/cpp/language/reinterpret_cast

That seems like a pretty precise and complete explanation to me. And it does have examples. You can even edit the example, and recompile and run it right there on the web page.

The explanation gives more and more detail as it goes on, but I think this is all of it that you need to read to get the idea.

In fact, you can understand it pretty well if you just carefully read this sentence alone:

"Converts between types by reinterpreting the underlying bit pattern."

It might be important to know that "reinterpret the underlying bit pattern" implies that the underlying bit pattern is not changed, just interpreted differently.

All of that extra detail, in the list and so forth, is difficult to understand, even for intermediates. In C++ there is a lot of detail, and you haven't even gotten (and probably never will) get to the most difficult to understand or memorize stuff. I feel that C++ is almost like a bottomless pit. Most people just stop at some point and work with the knowledge they have. It can take many many years and dedication to become a true expert in C++. I've been using C++ for about 10 years, and I wouldn't consider myself even close to being a C++ expert. There is a lot in the language that I don't use and haven't bothered to memorize. My philosophy is to keep things relatively simple anyways. Whenever I feel the need to use features that I haven't studied in depth, or don't trust my memory, I have to check documentation. This is how programming is in general though, you get used to reading documentation efficiently and carefully.

Binary files become more useful when writing and reading non-text data (e.g. numeric data). For example, `float 2.984938984398489 in text uses an 8 bit ASCII code for each digit as well as the period. That's 17 bytes. Meanwhile, its binary representation is 4 bytes. Also its binary representation is exactly what is stored in memory, so writing that you can expect to read it back and get exactly the same thing. Also, if you were to read it as text, you would have to then use an algorithm to convert the string to a float.

Also binary file IO is much faster than text IO, because it's just literally copying the bits into memory and not doing any parsing/checking/encoding. For example, if you are using getline, it is scanning through the text for the next '\n'. That will be slow compared to just directly copying some fixed amount of it directly into memory.

yungman · Sep 19, 2020

Mark44 said:
Here's what it says on that page, including part that you underlined, and part that you didn't.

Here's the line in question:
C++:
 ptr=reinterpret_cast<char*>(&x);
In what I quoted, dataType is char *, or char pointer, and value is &x, which is int pointer. The sentence that starts with "For example, ..." explains that an address of one type is being cast as an address of another type. IOW, the cast is changing the type of pointer.
It's not deceiving at all if you read it carefully. And no, reinterpret_cast does not translate a value to characters. It converts a pointer of one type to a pointer of another type.

Thanks for the explanation.

So there's no translation, just forcing the int* pointer to become char* pointer that point to the same address &x? That's it? Would it kill them to put in two extra words <datatype> is a pointer to data type it's converting to? I would not have wasted two days.

BTW, I tried your suggestion of &Iw[index] instead Iw[index] in post 24. This is what is written in the file:

Iw[] = {1,2,3,4,5};

Jarvis323 · Sep 19, 2020

yungman said:

So there's no translation, just forcing the int* pointer to become char* pointer that point to the same address &x? That's it? Would it kill them to put in two extra words <datatype> is a pointer to data type it's converting to? I would not have wasted two days.

It is just treating it as if it's a different type, but not changing what is stored in memory.

datatype in this example is just a type, not necessarily a pointer.

You could have saved those days just reading this sentence from en.cppreference carefully:

"Converts between types by reinterpreting the underlying bit pattern."

yungman · Sep 19, 2020

Jarvis323 said:

It is just treating it as if it's a different type, but not changing what is stored in memory.

datatype in this example is just a type, not necessarily a pointer.

You could have saved those days just reading this sentence from en.cppreference carefully:

"Converts between types by reinterpreting the underlying bit pattern."

Ha ha, my English is bad, this sounds even worst.

Are Binary Files and Text Files the Same?

Similar threads

Hot Threads

Recent Insights