Why do programming languages usually not implement number types with units?

PeterDonis · Sep 26, 2021

FactChecker said:

I think that you are underestimating the benefit of a compiler that can detect a mismatch of units and dimensions in an equation.

Compilers can do that with appropriate class definitions as well as with built-in language features.

Baluncore · Sep 26, 2021

FactChecker said:

I think that you are underestimating the benefit of a compiler that can detect a mismatch of units and dimensions in an equation.

I once thought that dimension checking would be valuable, so as a challenge I wrote the code to do it really well. It was great fun to write, and quite educational.
But now I find that after all, I never really needed it.

One of the challenges was using signed integers for dimensions, then deciding how to handle the square root of kg/m³. I found a very simple solution which was fun to implement.

All computations should be in SI units. Then all unit conversions are to or from SI. If I get an equation wrong, then the numbers will not pass the testing, the same thing happens if I get a conversion factor wrong, or I get my loops inside out.

The units you use will be decided by your data source and destination. How can those imports be forced to use the same convention as your compiler. All you can do is verify that the input data and results fall in reasonable ranges.

Dimensional analysis should be done before you write any code. It is hard enough getting to know and trust a good compiler without burdening it with doing your due diligence for you. There will still be many coding mistakes for you to make, things that could never be detected by dimensional analysis.

The only time dimensional verification or tracking might be useful would be in a once-off numerical calculator like Mathcad. Then you could enter the dimensional units as well as the numbers, and so check the dimensions of your equations before writing your code to be compiled for speed, free of all run-time dimensional analysis.

Jarvis323 · Sep 26, 2021

Vanadium 50 said:

I agree with this.

First, it's not entirely clear what is being proposed. If it is that "length in meters" is an internal type and "length in centimeters" is a different internal type such that they cannot be added without explicit conversion, that means that the only units that can ever be used are the ones built-in to the language.

Now, if one says, "no, this can be extended in the language to go beyond these intrinsic types", well, we're there now. I can do this in C++ today, where length_in_meters is an instance of the length class, which has two members: the value, and the units. (And if you like, length and area are derived classes from a base class)

There is actually a boost library to do this. Maybe it will even one day be part of the standard library. To do it all at compile time and efficiently they use template metaprogramming of course. The downside is that template metaprogramming is enormously complex.

The Boost.Units library is a C++ implementation of dimensional analysis in a general and extensible manner, treating it as a generic compile-time metaprogramming problem. With appropriate compiler optimization, no runtime execution cost is introduced, facilitating the use of this library to provide dimension checking in performance-critical code. Support for units and quantities (defined as a unit and associated value) for arbitrary unit system models and arbitrary value types is provided, as is a fine-grained general facility for unit conversions. Complete SI and CGS unit systems are provided, along with systems for angles measured in degrees, radians, gradians, and revolutions and systems for temperatures measured in Kelvin, degrees Celsius and degrees Fahrenheit. The library architecture has been designed with flexibility and extensibility in mind; demonstrations of the ease of adding new units and unit conversions are provided in the examples.

In order to enable complex compile-time dimensional analysis calculations with no runtime overhead, Boost.Units relies heavily on the Boost Metaprogramming Library(MPL) and on template metaprogramming techniques, and is, as a consequence, fairly demanding of compiler compliance to ISO standards.

https://www.boost.org/doc/libs/1_74_0/doc/html/boost_units.html

It might be that if you incorporated it into the language itself, then you could achieve a cleaner and easier to use design. Probably much much less than 1% of scientisits are experts in C++, probably about 1% of C++ programmers are experts in template metaprogramming, and even fewer are experts in the Boost Metaprogramming Library.

Where I imagine it could shine would be in a high level domain specific language for scientific programming. Technically, incorporating units makes the language more expressive. And the compiler will know more your intent, and can thus do more in terms of optimization as well as in generating more helpful warnings and error messages. Of course you have to make such a compiler.

You could use Boost.Units as a backend for the implementetion of a simpler language.

elcaro · Sep 26, 2021

jbergman said:

Type safety is always preferable for critical systems. A programmer can make a human error in a conversion library. With units of measure that code won't even compile.

Now in some cases there might be tradeoffs that force one to abandon such an approach, i.e., critical performance constraints but as a general rule one should prefer type safety.

Perhaps these cases could be handeld by performance optimization, that can be done automatically as last step without loosing the constraints on units enforced by the compiler.

elcaro · Sep 26, 2021

Vanadium 50 said:

I agree with this.

First, it's not entirely clear what is being proposed. If it is that "length in meters" is an internal type and "length in centimeters" is a different internal type such that they cannot be added without explicit conversion, that means that the only units that can ever be used are the ones built-in to the language.

Now, if one says, "no, this can be extended in the language to go beyond these intrinsic types", well, we're there now. I can do this in C++ today, where length_in_meters is an instance of the length class, which has two members: the value, and the units. (And if you like, length and area are derived classes from a base class)

Both meters and centimeters are lengths, and both are of similar type which you can add, just that you need a conversion which can be done automatically. Adding lenghts and - let's say time, would however raise a compilation error. Multiplying or dividing lenghts and time would be ok though, creating a new type.

Vanadium 50 · Sep 26, 2021

Jarvis323 said:

There is actually a boost library to do this.

My understanding from reading the description is that this would be perfectly happy adding an energy to a torque.

Jarvis323 · Sep 26, 2021

Vanadium 50 said:

My understanding from reading the description is that this would be perfectly happy adding an energy to a torque.

I think it allows you to define what can be added to what. But it will give a compile time error if the evaluated expression type cannot be converted to the return type. So if energy and torque have a conversion defined, so that one can be converted to the other, then you could add them, otherwise you would get a compiler error.

Vanadium 50 · Sep 26, 2021

Energy and torque have the same dimensions, but are not the same thing, so one cannot convert between them.

As I understand it, that Boost library will throw an error if I attempt to add an energy to a position, but not when I attempt to add an energy to a torque. Or a Reynolds number to a Mach number.

Baluncore · Sep 26, 2021

Vanadium 50 said:

My understanding from reading the description is that this would be perfectly happy adding an energy to a torque.

That is true. It would also be happy to add your height to your chest measurement, and to subtract that from the distance to Paris. Dimensional analysis cannot be relied upon to trap all silly errors.

elcaro · Sep 26, 2021

Vanadium 50 said:

Energy and torque have the same dimensions, but are not the same thing, so one cannot convert between them.

As I understand it, that Boost library will throw an error if I attempt to add an energy to a position, but not when I attempt to add an energy to a torque. Or a Reynolds number to a Mach number.

The type system could treat them then as distinct types despite them having the same physical dimensions (units). The backdraw of that is that any calculation that produces something with the dimension of energy or torgue, you would have to specify which type you intend it to be.

Baluncore · Sep 26, 2021

elcaro said:

The backdraw of that is that any calculation that produces something with the dimension of energy or torgue, you would have to specify which type you intend it to be.

The solution to the problem is forever expanding. It is actually better to verify that the dimensions are correct before you waste time writing code. Simply plugging numbers into equations without careful thought, is not something to be encouraged.

Jarvis323 · Sep 26, 2021

Vanadium 50 said:

Energy and torque have the same dimensions, but are not the same thing, so one cannot convert between them.

As I understand it, that Boost library will throw an error if I attempt to add an energy to a position, but not when I attempt to add an energy to a torque. Or a Reynolds number to a Mach number.

You could differentiate those types in a special way such that no new conversion functions are required, but you have compiler flags to either give an error or a warning if the compiler trys to convert one to the other.

e.g. something like this

delineate unit torque as Nm from joule warn
torque theTorqu = calculateTorque()
joule energy = theTorque

//compiler says warning: implicit conversion of delineated unit torque as Nm to joule.

elcaro · Sep 26, 2021

pbuk said:

Precisely. So we deal with the unit conversions in the presentation layer, not the impementation layer (and certainly not embedded in the language).

The point is that you only want these constraints enforced at the compilation phase without any runtime performance penalty.

Jarvis323 · Sep 26, 2021

One thing that would be pretty cool is if a tool could use the units to generate latex in the autodocs. It would be useful probably for research papers where the code is coupled with the paper, and you have to describe the code mathematically and algorithmically and link it to the theory.

pbuk · Sep 26, 2021

elcaro said:

The point is that you only want these constraints enforced at the compilation phase without any runtime performance penalty.

No, my point was not about performance (which is irrelevant when talking about the user interface, humans have a pathetic clock rate), it was about the Separation of Concerns. The business logic of my program should not have to worry about the UI.

Every language is a balance between 'enforcing' things and being easy to code in. Because you can only prevent a subset of coding errors at compile time I believe that the balance should be low on enforcement and high on ease of coding (and unit testing). This belief comes partly from experience with Ada (including a project abandoned with USD20m in today's money on the clock), which as has been mentioned upthread is so strict in its persecution of errors that can be caught at compile time that it is almost impossible to create a system of any size that ever gets to run and exhibit the algorithmic errors that can't be caught!

Anyway it has been established above that among general purpose languages Measure types exist in at least F# and Haskell edit: and in C++ via a Metaprogramming Library: isn't that enough for you

?

elcaro · Sep 26, 2021

No, my point was not about performance (which is irrelevant when talking about the user interface, humans have a pathetic clock rate), it was about the Separation of Concerns. The business logic of my program should not have to worry about the UI.

The idea is that also outside the UI you can use numbers with measure, but this won't generate code, just checks that your unit usage is consistent.

Every language is a balance between 'enforcing' things and being easy to code in. Because you can only prevent a subset of coding errors at compile time I believe that the balance should be low on enforcement and high on ease of coding (and unit testing). This belief comes partly from experience with Ada (including a project abandoned with USD20m in today's money on the clock), which as has been mentioned upthread is so strict in its persecution of errors that can be caught at compile time that it is almost impossible to create a system of any size that ever gets to run and exhibit the algorithmic errors that can't be caught!

Errors in usage of units should be catched, it is a programming error.

Anyway it has been established above that among general purpose languages Measure types exist in at least F# and Haskell edit: and in C++ via a Metaprogramming Library: isn't that enough for you ?

For me that is ok, not an active programmer anymore, I meant the physics community.

Vanadium 50 · Sep 26, 2021

I agree that performance is often way overemphasized. There is no benefit to getting an incorrect answer faster than everybody else.

The idea that sometimes certain errors can be caught by the compiler, and sometimes the programmer has to think about what she intends seems to me not so helpful. The programmer doesn't need to know any less. The programmer doesn't need to think any less. If the compiler gives the code a clean bill of health do we know it's OK? No, we don't.

Further, this adds to the complexity. You want fewer programmer errors? Then you want the code less complex and not more.

C++ (and many other languages) provide features to do this, and in a more flexible way that fiffling around with the intrinsic types.

Rive · Sep 27, 2021

Vanadium 50 said:

I agree that performance is often way overemphasized.

Maybe, but not necessarily. Somewhere above (life) critical systems were mentioned.
Those usually does not have the newest CPU and top notch hardware. Just something reliable.

Vanadium 50 said:

You want fewer programmer errors? Then you want the code less complex and not more.

Yeah. Less code, more documentation and engineering :doh:

Especially in case of 'critical' systems.
But if you have that, then why is this whole thing needed?

The only place I can think it would be slightly useful is for some specialized physics- or engineering oriented language (for beginners/ non-programmers).

jbergman · Sep 30, 2021

Vanadium 50 said:

I agree with this.

First, it's not entirely clear what is being proposed. If it is that "length in meters" is an internal type and "length in centimeters" is a different internal type such that they cannot be added without explicit conversion, that means that the only units that can ever be used are the ones built-in to the language.

Now, if one says, "no, this can be extended in the language to go beyond these intrinsic types", well, we're there now. I can do this in C++ today, where length_in_meters is an instance of the length class, which has two members: the value, and the units. (And if you like, length and area are derived classes from a base class)

This is completely wrong.

Look at the f# example I posted earlier. In F# you can define your own units of measure types. Then you annotate numeric types as to what unit of measure they are an you also define conversion functions.

See this blog for a more detailed exposition.

jbergman · Sep 30, 2021

elcaro said:

Perhaps these cases could be handeld by performance optimization, that can be done automatically as last step without loosing the constraints on units enforced by the compiler.

Agree. I believe that F# removes the unit of measure checks as part of the compilation process.

jbergman · Sep 30, 2021

Vanadium 50 said:

My understanding from reading the description is that this would be perfectly happy adding an energy to a torque.

In F# units of measure you get an error if you add types which are not of the same measure of convertible to the same measure.

https://fsharpforfunandprofit.com/posts/units-of-measure/

jbergman · Sep 30, 2021

Rive said:

Maybe, but not necessarily. Somewhere above (life) critical systems were mentioned.
Those usually does not have the newest CPU and top notch hardware. Just something reliable.Yeah. Less code, more documentation and engineering
Especially in case of 'critical' systems.
But if you have that, then why is this whole thing needed?

The only place I can think it would be slightly useful is for some specialized physics- or engineering oriented language (for beginners/ non-programmers).

This comment is ludicrous.

Vanadium 50 · Sep 30, 2021

jbergman said:

This is completely wrong.

Which part?

Is it that things are completely clear? Not to me!

Is it that checking dimensions is not the same as checking units? I think that's self-evident, but in any event examples have been provided where this does not work.

Is it that we can achieve this today without changing intrinsic variables? Again the C++ examples have been discussed.

"You're just wrong" is not so helpful.

Rive · Sep 30, 2021

jbergman said:

This comment is ludicrous.

I can imagine that some would see so.
Care to elaborate?

jack action · Sep 30, 2021

I don't know if I'm hi-jacking the thread, but I would prefer to have a language that can carry the error throughout the calculations before one taking care of units. Something like the inputs are 12.34 (±0.01) and 1.234 X 10⁶ (±1000), then [calculations, calculations, calculations], and the output turns out to be 5.32846895031784 (±0.1). This way, I know that the final answer is 5.3 and all other decimals are superfluous, except if used in other calculations. You change the accuracy of the inputs and your answer might become 5 or 5.3285.

elcaro · Sep 30, 2021

jack action said:

I don't know if I'm hi-jacking the thread, but I would prefer to have a language that can carry the error throughout the calculations before one taking care of units. Something like the inputs are 12.34 (±0.01) and 1.234 X 10⁶ (±1000), then [calculations, calculations, calculations], and the output turns out to be 5.32846895031784 (±0.1). This way, I know that the final answer is 5.3 and all other decimals are superfluous, except if used in other calculations. You change the accuracy of the inputs and your answer might become 5 or 5.3285.

Seems a usefull addition. It is part of extended numeric types that carry information one needs when doing calculculations in for example physical models. Including the unit of measure and the error.

Vanadium 50 · Sep 30, 2021

jack action said:

I would prefer to have a language that can carry the error throughout the calculations before one taking care of units.

Do you need it to be a change in how intrinsics behave or can it be a class?

Given that error propagation can be non-trivial, it is a lot easier to do as a class.

Jarvis323 · Sep 30, 2021

Vanadium 50 said:

Given that error propagation can be non-trivial, it is a lot easier to do as a class.

I agree it is not trivial. Usually it is the work of mathematicians to apply theory based on the numerical algorithms, to get error bounds on things like matrix operations.

But then you also have the issue that high level code may not compile to what you expect.

If you have error propegation in a class, then the class has to know the hardware it will run on, and how the code the compiler will generate, in addition to being able to apply complex global analysis of the dataflow and algorithms.

Vanadium 50 · Sep 30, 2021

Jarvis323 said:

the class has to know the hardware it will run on

Why?

Why is arithmetic used in error propagation different from any other use of arithmetic?

jack action · Sep 30, 2021

Vanadium 50 said:

Do you need it to be a change in how intrinsics behave

I don't know if I need it, but it would be nice if, when I write an equation with basic expressions, the program identifies the error based on how the number is written (say ±1 on the last significant digit) and gives the final answer rounded up. How idiotic is it when you get a float set to 3.00000000000000008 as an answer? It is literally a wrong answer. I think basic computing could correct that very easily. And getting 3.0 or 3.0000, instead of 3, would add meaning to the number.

I've done the unit conversion thing because it is something that bothers me as well. I've tried to do the error propagation, but it is much more complex to do (in a general way) without replacing all expressions (say, 'a + b' becomes 'add(a, b)') and all your programs become much harder to read (and write).

Why do programming languages usually not implement number types with units?

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Sequential Analog Computers?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers