Is there a byte that never occurs in a float?

Jarfi · Nov 1, 2017

A float consists of four bytes. I am using a data stream of floats through radio. This data occasionally is not a float though, and sometimes I'd want to stop the logging of floats of the stream, by inserting a "stop char", which would be checked by the program.

Previously I was using simple LF or NULL, obviously this did not work as many values of floats include a "null", such as 0.00

LF is also included in some float, as it is simply the char equivalent of 13. Statistically almost any character will eventually appear inside a float. So I cannot insert a stop char as there would be too many false positives in the float stream. Is there any byte that is statistically impossible to show up in a float, that could thus be used as a stop char?

jedishrfu · Nov 1, 2017

Java and other languages use a NaN bit pattern in a float. I think that would be the strategy. They also have bit patterns for negative and positive infinity.

https://en.m.wikipedia.org/wiki/NaN

jim mcnamara · Nov 1, 2017

There two non-numbers - NaN and INF, and I think Jedishfru picked the best choice. Next is the question one byte(or char) stop? How can you have a stream of (4 bytes for IEEE 754) floats and then have just a single stop byte/char? You need 4 bytes to make a float. I suppose you could define some kind of union but the two non-numbers are four bytes as well - not a char. Although in theory at least CHAR_BIT can be 32 if the hardware supports it correctly.

Enlighten us about stop char. I must simply not understand.

jbriggs444 · Nov 1, 2017

Jarfi said:

A float consists of four bytes. I am using a data stream of floats through radio. This data occasionally is not a float though, and sometimes I'd want to stop the logging of floats of the stream, by inserting a "stop char", which would be checked by the program.

The mantissa of a normalized float can contain an arbitrary binary value. For a 32 bit float, the 23 expressed bits in the mantissa end on the byte boundary, so there are at least two bytes that can take on any value. That means that no one-byte or two-byte stop pattern is viable.

The bit pattern for a 32 bit NaN has almost 24 "don't care" bits. It has an exponent of all ones, a sign bit and a 23 bit mantissa. The all zeroes value of the mantissa is used to express plus and minus infinity. All the other 16,777,214 patterns are available for NaN's.

I am no expert but...

https://stackoverflow.com/questions/19800415/why-does-ieee-754-reserve-so-many-nan-values

You could glom onto a particular NaN and be fairly likely not to see that four byte pattern used in a real life IEEE float.

Jarfi · Nov 1, 2017

jbriggs444 said:

The mantissa of a normalized float can contain an arbitrary binary value. For a 32 bit float, the 23 expressed bits in the mantissa end on the byte boundary, so there are at least two bytes that can take on any value. That means that no one-byte or two-byte stop pattern is viable.

The bit pattern for a 32 bit NaN has almost 24 "don't care" bits. It has an exponent of all ones, a sign bit and a 23 bit mantissa. The all zeroes value of the mantissa is used to express plus and minus infinity. All the other 16,777,214 patterns are available for NaN's.

I am no expert but...

https://stackoverflow.com/questions/19800415/why-does-ieee-754-reserve-so-many-nan-values

You could glom onto a particular NaN and be fairly likely not to see that four byte pattern used in a real life IEEE float.

I see. I reckon checking for NaN for each float would seem fairly straightforward. I'll either have to do that or make up a password that is composed of say 8 bytes, where the program always checks the newest two floats for a match. It's diminishingly unlikely that this password would turn up naturally. There could even be a register of numbers(either extremely high or low), that are not expected in the data stream, that are encoded to multiple different signals(stop feed, new file, etc).

Come to think of it this seems like a good way to encrypt data into whitenoise and other inconspicuous data streams, I'm sure it's been done.

Jarfi · Nov 1, 2017

jim mcnamara said:

There two non-numbers - NaN and INF, and I think Jedishfru picked the best choice. Next is the question one byte(or char) stop? How can you have a stream of (4 bytes for IEEE 754) floats and then have just a single stop byte/char? You need 4 bytes to make a float. I suppose you could define some kind of union but the two non-numbers are four bytes as well - not a char. Although in theory at least CHAR_BIT can be 32 if the hardware supports it correctly.

Enlighten us about stop char. I must simply not understand.

A data stream is always a byte feed under the hood. The first thing to do is to log the data into a byte array, then it is up to you how you interpret that data, you can look for whatever you want, be it floats(4 byte array) or a stop char which tells the program to stop logging data.

I like Serena · Nov 1, 2017

It seems to me that it's not safe to make assumptions of NaN encodings.
However, if your stream of floats contains only valid floating point values, we can detect any INF or NaN patterns and consider them a stop sequence.
That's an FF value for the exponent, which is encoded somewhat awkwardly since these are bits 2-17, meaning they don't quite match a specific byte.
Additionally we'll probably run into little endian issues, meaning that the bytes are reversed in order, so that the bits are not consecutive.

jbriggs444 · Nov 1, 2017

You could always use some form of byte stuffing. https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing

I like Serena · Nov 1, 2017

I guess we could also use FF FF FF FF as delimiter. It's a NaN value - assuming NaN values are not part of the floating point stream.
Or to be safe, we could pick FF FF FF FF FF FF FF if we're not quite sure where the floating point boundaries are.

jedishrfu · Nov 1, 2017

I like Serena said:

I guess we could also use FF FF FF FF as delimiter. It's a NaN value - assuming NaN values are not part of the floating point stream.
Or to be safe, we could pick FF FF FF FF FF FF FF if we're not quite sure where the floating point boundaries are.

Its really best to follow the IEEE guidelines here as these values are recognized by arithmetic hardware.

https://stackoverflow.com/questions/25050133/are-the-bit-patterns-of-nans-really-hardware-dependent

phinds · Nov 1, 2017

Jarfi said:

A data stream is always a byte feed under the hood. The first thing to do is to log the data into a byte array, then it is up to you how you interpret that data, you can look for whatever you want, be it floats(4 byte array) or a stop char which tells the program to stop logging data.

You are missing the point. A single byte would not even come close to guaranteeing you a unique stop pattern since any 8-bit pattern would occur in HUGE numbers of different legitimate float numbers, so the fact that the transmission happens to be a byte at a time is irrelevant. No one byte is going to tell you anything significant.

jim mcnamara · Nov 1, 2017

Phinds explained my view exactly.

I like Serena · Nov 1, 2017

A single byte is not going to do the job.
To do the job we need a sequence of bytes with a specific pattern.

jim mcnamara · Nov 1, 2017

@I like Serena - Jedishfru explained the problem precisely as well. Hardware recognizes NaN and INF values so fooling around with making other bytes of the float some arbitrary value is pointless. Below is the definition of a NaN.

If you look at the (UNIX/Linux) /usr/include directory (probably a subdirectory) you will find ieeefp.h Look at the defines there. Those are IEEE-754 compliant.

The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).

From:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html

Also read the first figure 'Floating point components' to see where an exponent is encoded in floats.
Note the mention of the signaling SNaN.

Further detail on the two types: https://www.doc.ic.ac.uk/~eedwards/compsys/float/nan.html

I like Serena · Nov 1, 2017

Erm... @jim mcnamara, are you contradicting me, confirming me, or clarifying what I wrote?
If you're contradicting me, which it almost seems you're doing from how you've phrased your response, can you clarify please?
No need to explain how an IEEE number is encoded - I already know.

Edit: rereading your response a couple of times, it seems you're confirming what I wrote, and you seamlessly seem to have added more information on how IEEE numbers work.
I guess I should have started my own sentence with 'Indeed' or some such to avoid a sense of contradiction. Sorry for that.

FactChecker · Nov 1, 2017

As others have stated several times, there is no way to use a single byte. And it is very easy to list a set of floats that contain every combination of possible bytes. The safest and most common solution is to attach a coded prefix to each message that indicates the length or type of information in the message.

wle · Nov 1, 2017

Or pick some rarely used floating point number x (like some random NULL value) to use as the start of an escape sequence (like '\' in string literals): send x followed by some different float y to signal some special condition (like "end of transmission"), and just send x twice if you want to actually transmit the float x itself.

rootone · Nov 1, 2017

If your are not expecting the 4 byte pattem for infinity will ever turn up in your normal data stream, you could kludge things by using that as a stop marker.
It's not a single byte but it should work.

FactChecker · Nov 1, 2017

I agree with @rootone . If real physical data is being transmitted in engineering units and not scaled, then there are numbers so large that they will never appear. You can use a huge number as a termination flag. That takes more than 1 byte, but it is simple to implement.

jedishrfu · Nov 1, 2017

FactChecker said:

I agree with @rootone . If real physical data is being transmitted in engineering units and not scaled, then there are numbers so large that they will never appear. You can use a huge number as a termination flag. That takes more than 1 byte, but it is simple to implement.

Again it's better to use the positive infinity bit pattern than a large number. That's so old school ala "999999999" card at the end of a card deck.

Alternatively, you could place a count of float numbers that follow as the first number in the list.

Also, you have to watch the cleaning crew as they would sometimes pluck the last card of the deck in the hopper to pick up dust on the ground and cause the batch job to fail. True story... but I digress.

FactChecker · Nov 1, 2017

jedishrfu said:

Again it's better to use the positive infinity bit pattern than a large number. That's so old school ala "999999999" card at the end of a card deck.

Ok. I'll buy that.

Alternatively, you could place a count of float numbers that follow as the first number in the list.

That is what I have almost always seen -- some standard header that indicates the length or structure of the message that it is prepended to. But that either requires buffering up a bunch of data or every number would need a header. We don't know what the nature of the data transmission is.

Also, you have to watch the cleaning crew as they would sometimes pluck the last card of the deck in the hopper to pick up dust on the ground and cause the batch job to fail. True story... but I digress.

Ha! That would be annoying after a weekend run.

jbriggs444 · Nov 2, 2017

The problem at hand can be seen as one of "framing". We can imagine this as a stream of arbitrary byte values. The task is to identify boundaries in the stream without impairing the ability of the user to transmit arbitrary binary data over the framed stream.

http://www.linktionary.com/f/framing.html

One way of doing framing is, as has been suggested, to use a packet format that includes a byte count. As long as sending and receiving equipment can agree on an initial frame boundary and never lose synch, the rest is easy.

Some framing techniques concern themselves with the problem of establishing or re-establishing synchronization. For this you need something more then just a byte count. You need to establish a pattern that can never occur in the transmitted data except as a frame boundary.

See, for instance the section on asynchronous framing in this document:

https://en.wikipedia.org/wiki/High-Level_Data_Link_Control#Asynchronous_framing.

FactChecker · Nov 2, 2017

jbriggs444 said:

One way of doing framing is, as has been suggested, to use a packet format that includes a byte count.

Or a message type that implies a known length for each type. I think that this could be smaller than a byte count, although I have never seen it used alone. I have only seen it used where the underlying message protocol included a byte count.

As long as sending and receiving equipment can agree on an initial frame boundary and never lose synch, the rest is easy.

Is it ever safe to assume that they would never lose sync?

Some framing techniques concern themselves with the problem of establishing or re-establishing synchronization. For this you need something more then just a byte count. You need to establish a pattern that can never occur in the transmitted data except as a frame boundary.

See, for instance the section on asynchronous framing in this document:

https://en.wikipedia.org/wiki/High-Level_Data_Link_Control#Asynchronous_framing.

Very interesting. I have a feeling that standard protocols must use this at the lower levels to recover from losing sync. Is that right?

newjerseyrunner · Nov 2, 2017

I'm a little confused.

So you have a stream of floats and you want to use a specific input to cause some type of interrupt in your code? You want a specific byte that never appears in a float?

That shouldn't be possible, there is no way you can know that at the byte-per-byte level. So I'd deal with only floats and not worry about individual byte encoding. Why would you go so low level to where you're not even sure of the endianness that you'll be receiving? I can see no performance benefit to that. Is there some performance or technical reason that you don't want to use something more standard like TCP/IP?

Code:

float nextNumber = socket.recvNextFloat();
if (nextNumber != nextNumber){
   //encountered a NaN since it's not equal to itself
} else {
  //Have a normal float or infinity
}

jbriggs444 · Nov 2, 2017

FactChecker said:

Or a message type that implies a known length for each type. I think that this could be smaller than a byte count, although I have never seen it used alone. I have only seen it used where the underlying message protocol included a byte count.Is it ever safe to assume that they would never lose sync?Very interesting. I have a feeling that standard protocols must use this at the lower levels to recover from losing sync. Is that right?

It's a common problem, yes. If one is delivering a bit stream on a physical medium, one has to identify bit boundaries (things like NRZI, PE, GCR, manchester encoding or T1 framing). If one is delivering an octet stream over a bit stream, one has to identify byte boundaries (if you sync up on a bit boundary and have a good clock, bit level synchronization can take care of this, but you do need to worry about clocking). If one is delivering a packet stream over a byte stream, one has to identify packet boundaries (with things like bisync, DDCMP, HDLC and PPP).

None of this is really my area of expertise. I tend to deal with higher layers in the network stack.

Mark44 · Nov 2, 2017

I like Serena said:

It seems to me that it's not safe to make assumptions of NaN encodings.
However, if your stream of floats contains only valid floating point values, we can detect any INF or NaN patterns and consider them a stop sequence.
That's an FF value for the exponent, which is encoded somewhat awkwardly since these are bits 2-17, meaning they don't quite match a specific byte.

For a float, which is what I think we're talking about here, the exponent field is bits 30...23, with bit 31 being the sign bit. I don't get what you're saying about bits 2 - 17.

I like Serena · Nov 2, 2017

Mark44 said:

For a float, which is what I think we're talking about here, the exponent field is bits 30...23, with bit 31 being the sign bit. I don't get what you're saying about bits 2 - 17.

I looked at https://en.wikipedia.org/wiki/Single-precision_floating-point_format .
Counting from the left it's bits 2-17, but indeed, as the wiki page shows, the bits are numbered from right to left, so they are bits 23-30.

jbriggs444 · Nov 2, 2017

I like Serena said:

I looked at https://en.wikipedia.org/wiki/Single-precision_floating-point_format .
Counting from the left it's bits 2-17, but indeed, as the wiki page shows, the bits are numbered from right to left, so they are bits 23-30.

Counting from the left on a quad precision IEEE float it would be bits 2-16 (15 bits). A single precision format with 16 bits of exponent and only 15 bits of expressed mantissa would be quite unusual.

Counted from the left on a single precision IEEE float the exponent is in bits 2-9 (8 bits).

[At a guess, there was a failed subtraction of 23 from 40 yielding 17 when a correct subtraction would have been 23 from 32 yielding 9]

Mark44 · Nov 2, 2017

I like Serena said:

I looked at https://en.wikipedia.org/wiki/Single-precision_floating-point_format .
Counting from the left it's bits 2-17

No. This would imply that there are 16 bits used for the exponent, which is incorrect.
For a float (four byte single-precision floating point number) 8 bits are used. For a double (eight byte double-precision number) 11 bits are used.

I like Serena said:

, but indeed, as the wiki page shows, the bits are numbered from right to left, so they are bits 23-30.

Which makes 30 - 23 + 1 = 7 + 1 = 8 bits.

jim mcnamara · Nov 2, 2017

@newjerseyrunner

if (nextNumber != nextNumber) { ...

I would consider using the isnanf (its a float NaN) macro instead.

Is there a byte that never occurs in a float?

1. What is a byte and a float?

2. Can a byte occur in a float?

3. Is there a specific byte that never occurs in a float?

4. Why is it important to know if there is a byte that never occurs in a float?

5. How is a float stored in memory?

Similar threads

Hot Threads

Recent Insights