Storing program source as relations in a database instead of text file

In summary, programming languages typically store program source code in the form of bytes in a text file, enforced by a parser. However, programs can also be seen as a collection of relations between objects, which can be stored in a database. This method offers advantages such as enforcing strong typing rules, providing insights into dependencies between objects, and simplifying refactoring and versioning. This approach has been attempted in programs like Lotus Agenda, but has not yet been widely adopted. For more complex programs and larger coding teams, using a relational database for development may be beneficial.
  • #1
elcaro
128
28
TL;DR Summary
Most programming languages and environments store program source as text in source (and accompanying) files. But a program can be seen as a collection of relations between different objects, and thus these relations could also be stored in a data base. For example: every Variable has a relation to a Type, every function call implies a relation between caller and callee, etc. Has there ever been an attempt to store program source in the form of relations into a database?
Almost all (compiled or interpreted) programming langues store the program source in the form of a series of bytes (using an encoding like ASCII or UTF-8) into a text file, enforcing the grammer of the programming language using a parser (as part of the compilation process or interpretation of the source text).

But intrinsically a program can also be seen as a collection of relations between different objects, which can be stored as tuples in a relational database system.

For example:
  • A program Variable needs to have a definition, which creates both a relation between the variable and a type, and also a location within another object (for example a function and/or module) in which the definition takes place.
  • A function call creates a relation between the calling function or module and a called function, a relation between its return value and a type, and a relation between the function and the object or module in which it was defined.
Etc.

No programming language or environment (perhaps with the exception of a language/programming environment like smalltalk) however stores these relations as such, most languages store it as text in a source file.

The advantages of storing program source as relations in a database are multiple, like for instance:
  • One can enforce strong typing rules for all objects.
  • One can produce many usefull insights into the program source, like dependencies between objects (what function gets called by what function, what modules or objects use what variables, etc.).
  • Compilation could be done piece wise (per object) as part of the editing proces (saving the object also pre-compiles it, and shows the errors encountered during compilation if any - source is stored anyway).
  • Each variable or object name needs to be stored only once, and renaming objects would only take place at one point. Optionally, each programmer could name objects in their native language and script.
  • The program itself can be stored in the form of an abstract syntax tree, simplyfying the process of creating object code and executable generation.
  • Refactoring a program, or moving functions between modules, and retaining all dependency relations (like header files), could be done much simpler and less error prone then in a textual environment.
  • Build/make information can be easily extracted from the relations already stored in the database, using the dependency relations already stored.
  • Optionally also the versioning system itself could be implemented as part of the programming development system, also storing version information as relations into a database.
  • For compatibility with the usual work environment and programming development tools, easy import and export functionality could be provided for such a programming development environment.
 
Technology news on Phys.org
  • #2
For a while, there was an almost legendary program called Lotus Agenda, that did store things as relations. It was rumored to be extremely powerful in such applications as deducing conclusions from diverse evidence in law enforcement investigations.

Unfortunately, Lotus abandoned it because the learning curve was too steep for most users.

So with the Agenda history behind, that suggests that the field is still ripe for invention. Be warned though that other very smart and very rich people have tried and failed. If that doesn't deter you, go for it.
 
  • #3
The vast majority of programs that I have seen have their major structure as a sequence of operations to perform. IMO, any representation of those programs should have the sequence of operations as their "primary" organization. How would your idea implement that?
Consider the simple example of calculating the factorial of a positive integer. How would your method represent that?
a) using recurrence
b) using iterative multiplication.
 
  • #4
From my point of view you have just described an IDE (Integrated Development Environment). The power of an IDE comes from the fact that it allows you to code in a language that is appropriate to the task, and the IDE does the hard work of parsing your source code and storing it all in... a database (well sort of, some kind of indexed structure anyway).
 
  • Like
Likes phinds
  • #5
FactChecker said:
The vast majority of programs that I have seen have their major structure as a sequence of operations to perform. IMO, any representation of those programs should have the sequence of operations as their "primary" organization. How would your idea implement that?
Consider the simple example of calculating the factorial of a positive integer. How would your method represent that?
a) using recurrence
b) using iterative multiplication.
Such simple programs are not worth the overhead of storing it under such a relational development system, the type of applications I had in mind are applications with a large coding team (working concurrently on the same code set) and a large program source base. Any program that can be maintained by a single programmer would not need such a sophiscated development environment IMO, it is simply not worth the overhead...
 
  • #6
pbuk said:
From my point of view you have just described an IDE (Integrated Development Environment). The power of an IDE comes from the fact that it allows you to code in a language that is appropriate to the task, and the IDE does the hard work of parsing your source code and storing it all in... a database (well sort of, some kind of indexed structure anyway).
Sort of, but with a different way of storing the code, ie. not as text files but as relations in a database.
 
  • #7
elcaro said:
Such simple programs are not worth the overhead of storing it under such a relational development system, the type of applications I had in mind are applications with a large coding team (working concurrently on the same code set) and a large program source base. Any program that can be maintained by a single programmer would not need such a sophiscated development environment IMO, it is simply not worth the overhead...
I am just wondering how you would do it for a simple case. I could ask for much more complicated examples, but I don't know if it would be worth the work on your part to do it and explain it to me.
 
  • #8
elcaro said:
a program can also be seen as a collection of relations between different objects
Such relations are part of the information contained in a program, but I don't see how they're all of it. For example, while a function call does contain a relation between caller and callee, that relation does not tell you what the called function actually does or what the caller is going to do with the information returned.
 
  • Like
Likes synch, phyzguy and FactChecker
  • #9
The idea of the OP seems vague to me. I would like to see how it would work in a simple case of a sequence of arithmetic operations.
 
  • Like
Likes phinds
  • #10
FactChecker said:
I am just wondering how you would do it for a simple case. I could ask for much more complicated examples, but I don't know if it would be worth the work on your part to do it and explain it to me.
Well, I don't have an implementation of the idea I expressed realized. But I could think of that the database contains the records of all the objects your program makes use of like the entry function, usually called 'main', in which source module this function is defined, which predefined objects it references or uses (like stdlib calls), and what variables it declares and which functions it calls. For the called function, the same kind of info is stored, and since a recursive function like factorial calls itself, the database would record that that function (eg. named 'factorial') makes a call to itself. A sophisticated version of the development environment then als recognizes this a a direct or indirect form of recursion, and could try to detect if there is a condition met in which the function exits without recursively calling itself, and might generate a warning if such condition can not be detected. But such kind of detections are purely informational, since not all such checks can be deterministically made (there is no way to proof wether a given program given an input wil end, or not, it is in principal undecidable), and further it is not the task of this environment to check for the semantic validity of the program, only to ensure the program is syntactically correct and that all dependency relations are met.
 
  • #11
FactChecker said:
The idea of the OP seems vague to me. I would like to see how it would work in a simple case of a sequence of arithmetic operations.
True, since the idea has no implementation, I can only vaguely or very generally describe how it could be implemented, but the key feature is that the program source is not stored as text files, but as relations in a database. Using known methods of relational analysis, one can detect which relations are implied by a given program. One of the key points is to reduce redundancy. For example renaming an object used throughout a program in many modules using textual tools can be cumbersome and error prone. In a relational system there is only one place where the name of the object is stored, and updating the name there makes the new name available for all other objects that reference or use this first (renamed) object. Another example is using header files, which can contain many definitions (types, and function prototypes), all of that get included in a source file. In many instances however only a few definitions are really used, so in a relational development tool, only those definitions get referenced, avoiding the overhead of having to read and parse all the other unused definitions. Etc.

For a sequence of artithmtic operations, first it would be part of a program source, ie it implies a module and a function therein, and would be contained in a block as a sequence of statements. Next, the function would have a list of parameters (optionally none), declare some local variables and/or have acces to some global variables. Each statement references one or more of such variables and changes at least one variable. And optionally it will call other or std functions. All of such information can be expressed as relations between objects. So, a variable definition is a relation between a named object and its definition (and type), a statement is a relation between one or more variables and a separate relation for the variable being changed. Etc.

That is, if an implementation really wants and needs to go into such fine and detailed granularity. I can think this would be much too detailed, and the only meta information one would want to store is up to the level of functions, including the variables it uses or functions it calls, but not on a per statement basis. The database could store each function in both the textual form, and pre-parsed abstract syntax tree and potentially also the generated object code for the function.

Concluding remarks:
  1. Most likely, we do not want to store all the details of a function block as relations, apart from the relations it has with objects, like for instance.creating/deleting/reading/writing to objects, calls to other (function) objects, etc.
  2. A textual representation of a sequence of statements always presumes a certain ordering. But there are cases in which (a series of) statements can be re-ordered or parallely executed without changing the results, which can not always be expressed in textual languages, like for instance the following case:
Line 1: int x = 2;​
Line 2: int y = 3;​
Line 3: int z = x + y - 3;​
Here the ordering of lines 1 and 2 don't matter, what matters is that line 3 gets executed after the statements on line 1 and 2 are being executed. One could introduce for this situation an intermediate level of program structure, for instance 'sentence' between the block and statement level, blocks consists of one or more sentences, and sentences consists of one or more statements (for which the order of execution of statements it irrelevant). Note that line numbers are not stored as relations, since they have no meaning for the program, and are in most cases generated numbers as part of displaying the program source text or export of the program source into a text file.​
3. Storing the body of functions - like stated in 1. - as relations would normally require too much overhead, and is for the purpose of the relational environment less necessay, it would suffice to store the program text of the function body as a text field in the database, and store the abstract syntax tree of the pre-parsed source text in a separate field or other database structure.​
 
Last edited:
  • #12
PeterDonis said:
Such relations are part of the information contained in a program, but I don't see how they're all of it. For example, while a function call does contain a relation between caller and callee, that relation does not tell you what the called function actually does or what the caller is going to do with the information returned.
That is true, but neither does your editor, compiler, make tool or versioning system have a clue about that. The semantic content of your program is usually information that you store as documents (or as text comments accompanying your code), which in a relational development environment can be done in similar ways (comments on every object and every relation between objects in the database).
 
  • #13
anorlunda said:
For a while, there was an almost legendary program called Lotus Agenda, that did store things as relations. It was rumored to be extremely powerful in such applications as deducing conclusions from diverse evidence in law enforcement investigations.

Unfortunately, Lotus abandoned it because the learning curve was too steep for most users.

So with the Agenda history behind, that suggests that the field is still ripe for invention. Be warned though that other very smart and very rich people have tried and failed. If that doesn't deter you, go for it.
Are you signing up for the development team?
 
  • #14
elcaro said:
That is true, but neither does your editor, compiler, make tool or versioning system have a clue about that.
Sure they do, at least the editor and compiler. The editor has every line of source code in it. The compiler knows how to translate that source code into machine instructions. So the source code does store all the information about what the program does.

elcaro said:
The semantic content of your program is usually information that you store as documents
No, it isn't. Your program ends up as an executable file. That's what actually realizes whatever semantic content it has. Documentation is nice, but it's not your program, and if it is not carefully kept up to date as the source changes, it will not correctly describe the semantic content of your program.
 
  • #15
PeterDonis said:
For example, while a function call does contain a relation between caller and callee, that relation does not tell you what the called function actually does or what the caller is going to do with the information returned.
Exactly. That was my biggest objection to the idea expressed in the first post in this thread.
elcaro said:
That is true, but neither does your editor, compiler, make tool or versioning system have a clue about that.
I disagree, at least insofar as editors and compilers are concerned. Some editors in IDEs are able to check the code for syntactical errors, and can give tips on how a function should be called (i.e., number and types of parameters). The compiler knows exactly what the function does -- it has to translate the source code of the function to machine code.
 
  • #16
FactChecker said:
The vast majority of programs that I have seen have their major structure as a sequence of operations to perform. IMO, any representation of those programs should have the sequence of operations as their "primary" organization. How would your idea implement that?
Consider the simple example of calculating the factorial of a positive integer. How would your method represent that?
a) using recurrence
b) using iterative multiplication.
As a second and more specific answer, the way this would be represented in a relational system would be:
in case a) we would register the function calling itself and in case 2) we would register an additional loop variable for use as iterator, and for the rest the relations would be the same, I guess.
 
  • #17
Your concept seems to me to a grossly inefficient and unnecessarily complicated way to implement an OO language inside of an IDE
 
  • Like
Likes pbuk
  • #18
Mark44 said:
Exactly. That was my biggest objection to the idea expressed in the first post in this thread.

I disagree, at least insofar as editors and compilers are concerned. Some editors in IDEs are able to check the code for syntactical errors, and can give tips on how a function should be called (i.e., number and types of parameters). The compiler knows exactly what the function does -- it has to translate the source code of the function to machine code.
But for this 'relational IDE' this would not work any different. For instance a function call requires the function to be defined (function prototype). The invocation of the function and the prototype of the function are both objects in this system, and there is a relation between these objects, which means that a function can not be used before its prototype is defined. Each parameter of the function is also an object, which itself has a relation with the type of that object, and with the function in which is is used acc. to the prototype.

So in short, similar type of checks are performed in this relational IDE. Entering source code also does the syntax check and will result (if no syntax errors encountered) in the generation of an abstract syntax tree.
 
  • #19
elcaro said:
For instance a function call requires the function to be defined (function prototype). The invocation of the function and the prototype of the function are both objects in this system, and there is a relation between these objects, which means that a function can not be used before its prototype is defined.
I have no idea what you are talking about. Outside of OO programming, I have never defined a "prototype" of a function, I've just defined the function (written its code) and that's all that's ever needed.

I still think you are massively over-complicating things.
 
  • #20
phinds said:
Your concept seems to me to a grossly inefficient and unnecessarily complicated way to implement an OO language inside of an IDE
This relational IDE does no imply the programming language should be an object-oriented type of language, nor that an OO language would be needed for implemenation. But your argument would be similar valid towards existing IDE versus a simple text editor, so why use an IDE in the first place?

We don't know if an implementation of this idea would make code development inefficient and unnecessary complicated, it would depend on the implementation I would say. The intended use is not for small programming projects, but for larger ones, for which the programmer is unable to grasp the whole project. And the benefit of using such a relational IDE should be primarily in program maintainence. not only development. Maintainance costs is in most large programming projects much larger then development costs.

I could argue against your statement by claiming that:
- This relational IDE can do a per-object compilation, no need to recompile a whole source file if only one function changed.
- No need to manually write and maintain build/make scripts, that meta info is already stored in the database and readily available.
- All kind of programming restructuring can be done more easily and without introducing syntactic errors which textual tools would introduce.
- You can generate more in-depth information about the structure of the program.
 
  • #21
phinds said:
I have no idea what you are talking about. Outside of OO programming, I have never defined a "prototype" of a function, I've just defined the function (written its code) and that's all that's ever needed.

I still think you are massively over-complicating things.
In C programming, function protoypes are required. The compiler needs to know what parameters are used in the function, what types they have to put them on the stack, the called function needs to push them from the stack in the reverse order, an for that kind of precise synchronization a function prototype is needed.
 
  • #22
elcaro said:
For instance a function call requires the function to be defined (function prototype). The invocation of the function and the prototype of the function are both objects in this system, and there is a relation between these objects, which means that a function can not be used before its prototype is defined. Each parameter of the function is also an object, which itself has a relation with the type of that object, and with the function in which is is used acc. to the prototype.
But @PeterDonis's and my objection still stands. There don't seem to be any details given about what the function actually does.
phinds said:
I have never defined a "prototype" of a function
This is common C and C++ parlance. It's basically a declaration of the function, including the number and types of parameters, and the return value type.
 
  • Like
Likes elcaro
  • #23
Mark44 said:
But @PeterDonis's and my objection still stands. There don't seem to be any details given about what the function actually does.
Wether your program resides in a filesystem as text or as relations in a database, this issue is indifferent. Where do you normally store info about what a function does? Either as comments in the source text or as external documentation. The relational source system would do it in a similar way I would guess...
 
  • #24
elcaro said:
As a second and more specific answer, the way this would be represented in a relational system would be:
in case a) we would register the function calling itself and in case 2) we would register an additional loop variable for use as iterator, and for the rest the relations would be the same, I guess.
This doesn't represent what the function actually does, which was what you were asked to represent.
 
  • Like
Likes FactChecker
  • #25
elcaro said:
The invocation of the function and the prototype of the function are both objects in this system
The prototype of the function is not the same as the actual code of the function. You still have not shown how the actual code of a function would be represented in your system, even though you have been asked repeatedly.
 
  • #26
PeterDonis said:
This doesn't represent what the function actually does, which was what you were asked to represent.
Where would you normally get that info from? Either from the source text itself (the code) or comments in the code, or external documentation. In the relational source system, similar ways for storing that info could be implemented.
 
  • #27
phinds said:
Outside of OO programming, I have never defined a "prototype" of a function
Obviously you've never programmed in C and had to write a header file. I wish I could say the same, as it would mean I would have avoided much pain and frustration. :wink:
 
  • Like
Likes rbelli1 and elcaro
  • #28
elcaro said:
Where would you normally get that info from?
From the source code of the function, of course.

elcaro said:
Either from the source text itself (the code)
Yes (see above).

elcaro said:
or comments in the code, or external documentation.
No. Neither of those are the actual implementation of the function, as I've already said.

elcaro said:
In the relational source system, similar ways for storing that info could be implemented.
How would you store the actual source code of the function as relations in a database? Again, you've been asked that repeatedly and have not answered it. You were even given an explicit function as a test case.
 
  • Like
Likes Mark44
  • #29
PeterDonis said:
The prototype of the function is not the same as the actual code of the function. You still have not shown how the actual code of a function would be represented in your system, even though you have been asked repeatedly.
It could be simply implemented as a text field as part of the object. But that would be implementation defined how to represent that. Since I do not have an implementation, I can not "show" anything. But there are different ways of representing or storing the actual code of a function body. How much detail do you need to be given? I hope you are familiar with relational databases, text can be stored there for example in a child table which has a relation with the parent table or object. But there are other possibilities.
 
  • #30
PeterDonis said:
From the source code of the function, of course.Yes (see above).No. Neither of those are the actual implementation of the function, as I've already said.How would you store the actual source code of the function as relations in a database? Again, you've been asked that repeatedly and have not answered it. You were even given an explicit function as a test case.
Both PostgreSQL and Oracle databases have stored procedures, which store the actual code of the procedure or function in the database. That would be one way of doing it.
 
  • #31
elcaro said:
It could be simply implemented as a text field as part of the object.
In other words, not as a relation in a database, just as source code text stored in a database instead of in a text file. Which is not what you originally described.

elcaro said:
there are different ways of representing or storing the actual code of a function body
So far the only way you have described is the way we all already know about: text. Storing the text in a database doesn't change the fact that it's text.

What you originally described, and what the title of this thread says, is storing program source as relations in a database. Relations in a database are not text.
 
  • #32
elcaro said:
Both PostgreSQL and Oracle databases have stored procedures, which store the actual code of the procedure or function in the database.
These aren't relations either, so they aren't what you were originally describing.
 
  • #33
PeterDonis said:
These aren't relations either, so they aren't what you were originally describing.
It depends of course how far are you willing to go in breaking down the source text into relations, you could break it further down below the level of function body as nested blocks, each block as a sequence of statements, etc. But then you in fact get an abstract syntax tree, and maybe you would only leave it as that, so storing the original source text as text and the pre-parsed representation as AST. Otherwise, supposedly, it would get too complicated to represent as relations.
 
  • #34
PeterDonis said:
These aren't relations either, so they aren't what you were originally describing.
Partly they are, what gets stored for example is that proc A depends on proc B, for example.
 
  • #35
elcaro said:
It depends of course how far are you willing to go in breaking down the source text into relations
Your OP and thread title imply that everything is expressed in terms of relations. So if there is anything that isn't, you would need to break things down further.

elcaro said:
what gets stored for example is that proc A depends on proc B, for example.
That's storing the function call, not what either function actually does.
 

Similar threads

  • Programming and Computer Science
Replies
7
Views
665
  • Programming and Computer Science
2
Replies
50
Views
4K
  • Programming and Computer Science
Replies
29
Views
2K
  • Programming and Computer Science
2
Replies
41
Views
3K
  • Programming and Computer Science
Replies
22
Views
922
  • Programming and Computer Science
3
Replies
81
Views
5K
  • Programming and Computer Science
Replies
1
Views
282
  • Programming and Computer Science
Replies
14
Views
2K
  • Programming and Computer Science
Replies
11
Views
996
  • Programming and Computer Science
Replies
34
Views
2K
Back
Top