Should definitions of future computer languages incorporate IDE technologies?

Stephen Tashi · Sep 7, 2020

pbuk said:

What advantage would there be in fixing this in the language definition? What about the loss of usability for people that can't easily distinguish Blue-13?

I use "Blue-13" as a abstraction of a color, not a particular color. A particular IDE could use a red color for "Blue-13". The definition of an abstract color implies that when Bob writes code in an IDE where he prefers "Blue-13" to be blue, then Alice can examine the code in her preferred IDE and have it be her preferred color for debug code.

How do you keep the two versions consistent? What would be the advantage over implementing a 'Hide debug code' option in an IDE if you really wanted one?

There would have to be definition of what "debug code" is before any IDE could implement hiding it. A definition of debug code as an aspect of a language would mean that debug code written on one IDE would be treated as debug code on another IDE.

How to keep two versions consistent is, as the saying goes, a topic for further research. In actuality, the two versions (with debug and without) are not consistent if we interpret "consistent" to mean identical. So what's a good computer science type definition for "consistent"? One thought is that the inclusion or non-inclusion of the debug code in a function or class should not affect the values of any variables external to that function or class.

If I'm editing the non-debug version of a file, is there technology that can warn me if I'm creating an inconsistency? Or perhaps I want to "disable" the debug version until I have time to fix it.

fresh_42 · Sep 7, 2020

pbuk said:

No, I don't agree with this. Implementations of computer languages are commands how to store and process bits, not the languages themselves.

I think this is nitpicking. The language is what enables humans to tell the compiler how the machine commands have to be set up in the compiled file such that bits can be moved, since this is the ultimative goal. The language is just the first layer. Whether we start counting there or at the implementation or the final executable code is playing in the sandbox. It is not different from machine codes, only a few conveniences above that level.

Computer languages are by no means an abstraction of calculations. TM are an abstraction, not C#.

And none of it - regardless where you draw your personal line - has anything to do with an IDE. Calling it incorporate definition is gibberish.

pbuk · Sep 7, 2020

Stephen Tashi said:

In the model of a language definition as a specification for a printed document, there are daring ideas like expanded character sets

Most languages in current use already implement handling of expanded character sets, either by default as part of the specification of the native string type (e.g. Java, JavaScript, Python 3) or through libraries (e.g. ICU for C++). This has nothing to do with the IDE though.

Or do you mean expanded character sets in source code for e.g. variable names? This is possible in a number of languages but is a very bad idea and will get you sacked/result in rejection from an open source project.

Stephen Tashi said:

and color printing.

What has this got to do with an IDE, or is this about syntax highlighting again?

Stephen Tashi said:

Thinking of documents as electronic, there are features like HTML.

HTML is simply character strings (or alternatively a DOM node tree), what has this got to do with an IDE?

Stephen Tashi said:

IDE's offer various techniques for debugging. The current computer languages that I know define ways to insert debug code and comment it out or insert it using macros that can be disabled. In a printed document this is distracting clutter.

I haven't seen printed source code for 20 years. And if the debug code still needed to be there, what is the point in having a hard copy without it?

Stephen Tashi said:

In an electronic document, we can imagine the document having various versions and having one version that makes debug code invisible.

Different versions; how would you keep them consistent? It would make more sense to flag the debug code in some way and allow the IDE to hide it if requested. This could easily be impemented currently (e.g. with a JavaDoc style flag) - but I don't know of any IDE that does this, probably because this is not something programmers want (generally debug code should be deleted once the program works, debug code is not a alternative to regression testing).

Stephen Tashi said:

In C, there is the distinction between "#include < name.h>" versus "#include "name.h"".

But they look for files in different places - and they are defined the way they are precisly to AVOID filesystem issues (e.g. Windows %PATH% and posix PATH).

Stephen Tashi said:

I'd prefer to have all the filesystem issues handled in some systematic way

Then you would be tying your language to a particular filesystem. In practice all portable languages have a library abstracting the runtime filesystem, and this is nothing to do with the IDE.

Stephen Tashi said:

- perhaps a computer language needs a sub-language just to handle file system issues. We could even have two different languages for handling computations that employ the same sub-language for handling file system issues.

One way to look at the situation is this: A language must be designed so it can be implemented by current technology to do certain computations. One step in this implementation is compiling or interpreting the program. Hence file systems (or more abstractly, information retrieval systems) must be navigated. If there is a utility in abstracting the notion of computations then there should be a utility in having a language that abstracts the notion of information retrieval.

I don't understand what you mean: can you provide a concrete example of a language that does not abstract the notion of information retrieval?

Stephen Tashi · Sep 7, 2020

pbuk said:

I don't understand what you mean: can you provide a concrete example of a language that does not abstract the notion of information retrieval?

I know of no current computer language that abstracts the notion of information retrieval in a systematic fashion. Languages have ad hoc aspects that affect information retrieval, such as the class naming convention in Java.

The basic aspects of information retrieval (for the purposes of compiling or interpreting) are:

1. Within the data for computer code, what are the identifiers that label the information to be retrieved? (e.g. What is the identifier for a function used in this document, whose code is not part of this document?)

2. What are the identifiers for the places where the data for the code is stored? (e.g. What is the name of the file that stores function whose identifier is "int f234( int k)" ?)

Of course, it's convenient if the name of the file that stores in information for "int f234(int k)" is simply "f234" or "intf234" and the naming of public classes in Java is an example of an attempt to enforce a relation between aspects 1. and 2.

One might take the attitude that only aspect 1. should be treated in the definition of a computer language. From this point of view, computer code should not contain statements that deal with the file names for places where code is stored.

A contrasting view is that a computer language should associate each object in the language with an abstract file identifier and that particular implementations (compilers, linkers) would be configured to map the abstract file identifiers to particular files.

Current computer languages treat aspect 1. in a reasonably systematic manner. The way they handle aspect 2. is not systematic.

pbuk · Sep 7, 2020

Stephen Tashi said:

I know of no current computer language that abstracts the notion of information retrieval in a systematic fashion. Languages have ad hoc aspects that affect information retrieval, such as the class naming convention in Java.

The basic aspects of information retrieval (for the purposes of compiling or interpreting) are:

1. Within the data for computer code, what are the identifiers that label the information to be retrieved? (e.g. What is the identifier for a function used in this document, whose code is not part of this document?)

I don't understand what "information retrieval (for the purposes of compiling or interpreting)" or "the data for computer code" are - are you simply talking about source code?

Stephen Tashi said:

2. What are the identifiers for the places where the data for the code is stored? (e.g. What is the name of the file that stores function whose identifier is "int f234( int k)" ?)
...
Current computer languages treat aspect 1. in a reasonably systematic manner. The way they handle aspect 2. is not systematic.

But these things are entirely implementation dependent: in the GNU C++ compiler on Linux the source code for cin is containted somewhere like /usr/include/stdio.h which is compiled into the executable whereas in Microsoft Visual C++ the object code for cin is implemented in the DLL (dynamic link library) C:\Windows\System32\vcruntime140.dll which is linked in at runtime.

So the definition of include in C++ IS systematic, it is the implementation that is different. Other languages are similar - Python's and Java's import, Node JS's and Ruby's require, even PHP's include, they are all filesystem independent.

Mark44 · Sep 7, 2020

Much of the discussion about incorporating file system semantics seems like a solution in search of a problem.

Stephen Tashi · Sep 7, 2020

pbuk said:

I don't understand what "information retrieval (for the purposes of compiling or interpreting)" or "the data for computer code" are - are you simply talking about source code?

Yes - Aspect 1 refers to source code that is source for objects in the language. Source code as identified by the name of the object, not by the name of the file containing the code.

But these things are entirely implementation dependent:

I agree that in current languages they are. Current languages are not the topic of this post.

in the GNU C++ compiler on Linux the source code for cin is containted somewhere like /usr/include/stdio.h which is compiled into the executable whereas in Microsoft Visual C++ the object code for cin is implemented in the DLL (dynamic link library) C:\Windows\System32\vcruntime140.dll which is linked in at runtime.

I agree that, the design of current languages does not standardize the the relation between source code and the files containing the code. The relation is established by a set of miscellaneous conventions.

So the definition of include in C++ IS systematic,

I agree there is a system to it, but it isn't a modern approach to handling data. It doesn't define any requirement that there be a data structure that tells which program objects are stored in which files.

pbuk · Sep 7, 2020

Stephen Tashi said:

I agree there is a system to it, but it isn't a modern approach to handling data. It doesn't define any requirement that there be a data structure that tells which program objects are stored in which files.

It is true that C++ doesn't map what you call 'program objects' to the source files containing them, but that is because C++ does not allow any symbol to be defined more than once in all the files that are included in a compilation so it doesn't need a map. I suppose this approach could be described as not 'modern' since it originated decades ago, but I don't think this is relevant.

Other languages do have such a map, as you have noted with Java this is a 1:1 correspondence with file names although other languages allow the renaming of a modules on import, or the import of specific symbols from a file:

Python:

import numpy as np
from scipy.integrate import solve_ivp

[code lang=javascript title="Node JS"]
const Custom = require('./custom-class');
const { readFileSync } = require('fs');
[/code]

I still don't see what any of this has to do with an IDE, it is simply about how the source code for the language is separable into different source files, and how you refer to symbols that are defined in the language's core (and managed extension) packages.

Rive · Sep 8, 2020

Stephen Tashi said:

The definition of a future language could assume the basic functions of an IDE exist and define the language in terms of those functions.

I think the way you present this does not reflects on the reason of the existence of IDEs. It's about the complexity of tasks/jobs (as payment basis of a programmer and not as some machine specific abstraction). IDEs were developed to handle complexity. Programing languages (of the future) should be able to handle complexity too, even if they are built around some kind of 'keep it pure' philosophy. It's revolving around the same thing, but it's not a cause and effect relation.

At least, by my opinion.

harborsparrow · Sep 12, 2020

In my view, language specification should always be separate from IDE technology. However, those arguing against the need for IDE's have obviously never written graphical user interface applications. A good IDE will allow the programmer shortcut ways to generate much of the tedious boilerplate code for widget event handling. Another essential-to-me feature of a good IDE is syntax coloring, and auto-completion. But there are many other possible benefits to a good IDE. So--for me--separate but equal in value.

Stephen Tashi · Sep 13, 2020

pbuk said:

I still don't see what any of this has to do with an IDE, it is simply about how the source code for the language is separable into different source files, and how you refer to symbols that are defined in the language's core (and managed extension) packages.

As I mentioned before, the current model for computer language definition is hardcopy printed documents. In a hardcopy printed document we don't have the concept of links, which we can implement in electronic documents. If one hardcopy document needs to refer to something in another hardcopy document, it does so by some hardcopy text. This practice leads to two extremes.

One one hand, the hardcopy text may refer to a vast document (e.g. #include "vastlib.h"). To know what part of the vast document is relevant, a person must be familiar with the contents of the vast document.

At the other extreme, the text may refer to something very specific (e.g. extern int specialfunt( int k, float w, float v, int rowdim, int coldim); ). Such printed detail is convenient for the task of finding the relevant text in another document, but it is inconvenient to write this type of detailed reference in every document where it is needed.

If we use an electronic document as the model for defining a computer language then we can use the concept of links to implement references to specific places in electronic documents. I consider this to be using IDE technology because IDEs provide limited versions of this concept. They treat code as an electronic document. They allow treating sections of text as links to other text.

Ideally, if I see some object mentioned in computer code then an IDE can let me click on it and bring up the other code that defines that object. This ideal is imperfectly realized, depending on the IDE and the language.

Mark44 · Sep 13, 2020

Stephen Tashi said:

As I mentioned before, the current model for computer language definition is hardcopy printed documents.

Hardly. All of the documentation for C, C++, C#, and other languages that Microsoft implements compilers for exists purely in electronic form. I'm sure the same is true for the languages implemented under GNU are as well.
As far as language specifications go, such I'm reasonably sure that they are available primarily, if not exclusively, online as PDF files. For example, the ISO C++ standard is available here: https://www.iso.org/standard/68564.html.

Stephen Tashi said:

If we use an electronic document as the model for defining a computer language then we can use the concept of links to implement references to specific places in electronic documents. I consider this to be using IDE technology because IDEs provide limited versions of this concept. They treat code as an electronic document. They allow treating sections of text as links to other text.

This has nothing to do with IDEs, but instead relates to the use of HTML and CSS (Cascading Style Sheets) technologies in electronic documents.

Stephen Tashi · Sep 13, 2020

Mark44 said:

Hardly. All of the documentation for C, C++, C#, and other languages that Microsoft implements compilers for exists purely in electronic form. I'm sure the same is true for the languages implemented under GNU are as well.
As far as language specifications go, such I'm reasonably sure that they are available primarily, if not exclusively, online as PDF files. For example, the ISO C++ standard is available here: https://www.iso.org/standard/68564.html.

My remarks do not concern the file formats in which text that defines a computer language is stored. My remarks concern the content of the definition of computer languages.

Mark44 · Sep 13, 2020

Stephen Tashi said:

My remarks do not concern the file formats in which text that defines a computer language is stored. My remarks concern the content of the definition of computer languages.

Then I am completely lost in trying to understand your point.
This is what you write a couple of posts ago:

Stephen Tashi said:

As I mentioned before, the current model for computer language definition is hardcopy printed documents.

Stephen Tashi · Sep 13, 2020

Mark44 said:

Then I am completely lost in trying to understand your point.
This is what you write a couple of posts ago:

If we consider the content of a current computer language definition, it describes how non-electronic text documents in that language should be written - e.g. there are certain characters that can be used, they can be used to form certain words, certain phrases using those words are permitted or required. For example, the definition of the C language describes the use of text phrases like "#include "vastlib.h""

The definitions of current computer languages do not describe how electronic documents are written. For example, there is nothing in the C language that says "#include "vastlib.h"" must be a link to some other document. The concept of a link, in that sense, is not used in defining the C language.

So I say that current computer languages use hardcopy text documents as the model for what they are describing. (I am not saying that the text that states the definition of a computer language can't be stored in various electronic file formats. I am saying that what those definitions describe is how to write text that does not have the functionality of electronic documents.)

Mark44 · Sep 13, 2020

Stephen Tashi said:

The definitions of current computer languages do not describe how electronic documents are written.

Any such documents used by C or C++, say, must adhere to the syntax of the language, which includes, in part the use of such punctuation as semicolons, single and double quotes, pound signs (used by the preprocessor), carriage return characters, end-of-file marks, and others. These documents are exclusively electronic in form.

Stephen Tashi said:

For example, there is nothing in the C language that says "#include "vastlib.h"" must be a link to some other document. The concept of a link, in that sense, is not used in defining the C language.

I don't understand what you're trying to say here. A #include preprocessor directive is absolutely a link to the file named in the directive. How else would the preprocessor "know" to insert the text of the include file into the program that is to be compiled?

Stephen Tashi said:

So I say that current computer languages use hardcopy text documents as the model for what they are describing.

This makes no sense to me.

jedishrfu · Sep 14, 2020

I think I understand what @Stephen Tashi is proposing here. He'd like a language to be defined precisely even beyond Backus-Naur notation. Any references to a function or method within the language would indicate the method name and a unique version number so you could in essence rebuild a program any time in the future and be sure its built with the same code.

I've had cases of code I built and encapsulated in a docker image that works as expected until a new feature is needed. As I try to build a new image, I find pieces missing from my environment, code obsoleted and replaced with new code with a different calling api... newer library versions...

One way folks get around this are Maven builds where the library versions are defined precisely or by doing fat jars in java where all classes needed are encapsulated in a single jar and that jar runs on a specific version of Java.

There is nothing today that ties all components of a program together unambiguously so that you can go back and check a specific version of a function in a library that you once used some time ago.

This kind of metadata can't be encoded in a textual document like a source code listing easily as it would make the document difficult to read. However an IDE like a browser could make every method or function a link to its specific versioned source and its code with links to their versioned source which brings in issues of proprietary code.

I could see a problem here which pops up in Maven where one library has a dependency of a specific version of a library whereas another library has a dependency on another version and the question then is which version dominates.

It happened to me once where I got a fat jar of NetCDF code that had an embedded version of log4j that conflicted with the version I was using in my project. Finally I had to replace the fat jar with references to unbundled code to make things work again. Basically, you could wind up with multiple versions of the same method being used throughout your program with a variety of different hopefully minor behaviors.

I don't think an IDE would fix this problem as its a much bigger and more fundamental problem perhaps needing a common repository/registry of a publicly used source code along with all versions of the code.

Having said all that, I think it's time to close off this thread and to thank @Stephen Tashi for bringing up the issue and for everyone else responding with some great commentary and ideas on the problem.

Jedi

Should definitions of future computer languages incorporate IDE technologies?

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Sequential Analog Computers?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers