Optimizing Code for Domain-Specific Connections and Data Structures

In summary, the individual wants to learn assembly language to write code for a new Operating System, without taking help from any predefined functions. They also want an editor window to aid in their programming.
  • #36
The Java Virtual Machine basically executes a bunch of tokens in the same way that a CPU does.

A CPU has a token structure where the first byte corresponds to an actual instruction and based on that it grabs some other data that it needs to execute the instruction.

For example in the case of MOV AX, 8000h if I recall the opcode for this is B8 in hex for x86 systems. So the instruction would either be B8 80 00 or B8 00 80 in memory depending on the endian-ness of the system.

In the JVM, the application does the same thing except all this stuff is done by the program by basically allocating large chunks of memory and then using routines to link the virtual tokens in Java (i.e. the instructions, class definitions and so on) and basically have a general way of building up all the meta-data and scripting that a normal compiler does (i.e. parsing, structure building, and all of that) except it does it at run-time.

So think of a compiler taking the code and definitions, creating the memory layout for the structure and the instructions and then taking that static definition and turning that static code into something the OS can use.

What the JVM does is that acts as a kind of compiler where it compiles the code but instead of turning it into the tokens the OS uses it turns it into tokens it uses and has a support system that eventually has a way indirectly (or directly) of executing those tokens with code that already is in the form that OS knows.
 
Technology news on Phys.org
  • #37
pairofstrings said:
Please tell me if the following is correct and tell me something more related to it if possible*.

  • Java Virtual Machine software is able to execute programs of Assembly (architecture known), C, CPP, SQL and PL/SQL (code if needed must connect to database), Java (given) and the part of the JVM architecture that takes care of this is "Native Method Stacks".
No, I don't believe this is true at all. The Java compiler (javac, I think) compiles Java code (not assembly, C, C++ SQL, etc.) to byte codes. The JVM interprets these byte codes into machine language for the architecture the JVM is running on.

I'm more familiar with C# and .NET programming (including a little experience writing Intermediate Langauge (IL) code), but I believe that the underlying mechanisms of Java byte codes and C# IL are similar, in that both are low-level, stack-based languages. That is, there are byte codes that represent pushing something onto the stack and popping a result off the stack, and so on.
pairofstrings said:
I am trying to get a big picture of how I can approach something that is related to programming.

Thank you very much. The above might appear specific to Java but since assembly language is related to it I am putting the post here.
 
  • #38
pairofstrings said:
[*]Java Virtual Machine software is able to execute programs of Assembly (architecture known), C, CPP, SQL and PL/SQL (code if needed must connect to database), Java (given) and the part of the JVM architecture that takes care of this is "Native Method Stacks".

That much is correct, but your idea about what it means and how to use it is wrong.

You develop the native method library completely outside of Java, using a compiler for whatever language you write it in (C++, C#, Assembler, whatever). You have to write the code in the other languaege in a particular way, so that the library is in exactly the format that Java expects it to be.

Then within your Java program, instead of writng the code for function XYZ, you write a declaration that says "function XYZ is in a native liibrary", and tells Java where to find the library.

And the really "fun part" is when you don't get the fomast of the library absolutely 100% correct, and trying to use it just crashes your computer without giving any messages about what you did wrong :smile:
 
  • #39
pairofstrings, you seem to have a lot of energy on this. Why don't you start learning how to program (pick a language, any language) and go from there. You are asking questions that clearly imply that you need to learn far more of the basics and you need to get to that through programming and programming is like riding a bike. You don't learn how to do it by reading a book or asking people, you learn it by doing it.
 
  • #40
To clarify what I said and reconcile it against what AlephZero said, the JVM takes byte codes (only) as input. There are a number of compilers that can translate C, C++, etc. into byte codes.

The basic flow would be:
[Program source code - C, C++, etc.] ---> compiler ----> [Byte codes] ---- JVM ----> [machine code]
 
  • #41
phinds said:
pairofstrings, you seem to have a lot of energy on this. Why don't you start learning how to program (pick a language, any language) and go from there. You are asking questions that clearly imply that ... don't learn how to do it by reading a book or asking people, you learn it by doing it.

I am little bit happy to see the dots getting connected.
You are right, I am going to start learning how to create device drivers in few months from now in C language; once I finished learning fundamentals, and finally I will be learning virtual memory and the meaning of static definitions. But I really prefer to learn making device drivers in assembly language rather than using other language and their function calls to invoke something.
I know I have a long journey to cover before I reach expertise.
Thanks for the suggestion.

The fundamental question that I am going to raise is:
When a computer is given any task then a pattern of circuitry is selected and signals traverses through those circuitry and we obtain an output. My question is: Is there any way I can visualize the clock cycles/timing diagrams while I am writing a program in Assembly language or in Java?
My interest in Java got bigger since I came to know that JVM executes the token in much the same way as CPU does.
 
Last edited:
  • #42
pairofstrings said:
My question is: Is there any way I can visualize the clock cycles/timing diagrams while I am writing a program in Assembly language or in Java?
In assembly, yes. In Java, I doubt very much that this is possible.

To do it in assembly, you need information from the CPU vendor on how long (how many "clocks") each instruction takes. That information might be difficult to find, depending on what processor you are targeting with your assembly code. For example, I have Intel 64 and IA-32 Architectures Software Developer's Manual (published online), dated 2008. I did a quick search but was unable to find op-code-specific information on clock cycles. I believe that this information was published in documentation for earlier Intel processors like the 80286 and 80386 processor families.
pairofstrings said:
My interest in Java got bigger since I came to know that JVM executes the token in much the same way as CPU does.
 
  • #43
Mark44 said:
In assembly, yes. In Java, I doubt very much that this is possible.

To do it in assembly, you need information from the CPU vendor on how long (how many "clocks") each instruction takes. That information might be difficult to find, depending on what processor you are targeting with your assembly code. For example, I have Intel 64 and IA-32 Architectures Software Developer's Manual (published online), dated 2008. I did a quick search but was unable to find op-code-specific information on clock cycles. I believe that this information was published in documentation for earlier Intel processors like the 80286 and 80386 processor families.

That's interesting because when I did assembler programming (back in the era of the 386 and 486) the manuals included the clock cycle information (the ones I used anyway) that I got straight off the Intel website.
 
  • #44
pairofstrings said:
My interest in Java got bigger since I came to know that JVM executes the token in much the same way as CPU does.

I don't know what you are talking about here but it sounds like misinformation. Java does NOT excecute any code at all. I generates machine language and the machine executes it.

Jave does not emulate machine registers, CPU processes, and so forth, so I just don't see why anyone knowledgeable would say that Java does anything like what the machine does.
 
  • #45
chiro said:
That's interesting because when I did assembler programming (back in the era of the 386 and 486) the manuals included the clock cycle information (the ones I used anyway) that I got straight off the Intel website.
Yeah, I remember that, too. I was not able to find it in the manuals I cited, which were several generations into the Pentium series.
 
  • #46
phinds said:
pairofstrings, you seem to have a lot of energy on this. Why don't you start learning how to program (pick a language, any language) and go from there. You are asking questions that clearly imply that you need to learn far more of the basics and you need to get to that through programming and programming is like riding a bike. You don't learn how to do it by reading a book or asking people, you learn it by doing it.
Second that...
 
  • #47
chiro said:
That's interesting because when I did assembler programming (back in the era of the 386 and 486) the manuals included the clock cycle information (the ones I used anyway) that I got straight off the Intel website.

Counting clock cycles by hand still made sense (just about) for a chip as simple (by modern standards) as a 386. For modern CPUs the ideas of overlappinig execution, pipelining, speculative execution of both paths following a branch, multi-level memory caches, etc, mean you can't doi it by hand.

"Assembler programming" doesn't really equate to "writing lists of machine code instructions" any more, except for simple processor architectures.
 
  • #48
Wow, this thread brings back memories. I did most of my assembler language programming way back on the 8085 and the 8048 and 8051 micro controllers. Things were simple then. :smile:

I also did my share of 8086 assembler and I seem to remember that integer division was by far the most costly instruction there in terms of cpu cycles. From memory it was something like about 40 cpu cycles for one integer division. Mind you, this was back before these cpu's even had any floating point instructions at all (with floating point support being provided by an optional 8087 coprocessor).

I can also remember hand optimizing a few code segments for my old AMD k6 processor. This is where it started getting pipelined and "superscalar" with parallel execution units. My hand optimization mostly just consisted of interleaving load/store instructions with arithmetic logic ones, which allowed them to execute in parallel (in places where this was possible without changing the result of course). That and a little bit of pre-fetching, it wasn't much but you could make measurable improvements. I wouldn't like to try to hand optimize assembler for any CPU's these days though.
 
  • #49
Thank you.

Question one:
Can a C language program have code written in C plus plus and/or Assembly language? I will use Turbo C compiler v3.0

Question two:
Can a C plus plus language program have code written in C and/or Assembly language? I will use Turbo C compiler v3.0

Question three:
Can a Assembly language program have code written in C and/or C plus plus? I will use TASM or NASM.

Question four:
Can Java language program have code written in C and/or C plus plus and/or Assembly language? (Can be invoked?) I will use Notepad and Java Development Kit.

Question five:
Can Assembly language program have code written in C and/or C plus plus and/or Java? (Can be invoked?) I will use TASM or NASM.

Where can I invoke and where can the mixing be done directly?

I only have knowledge that:
C, C plus plus and Assembly language code are mixed in one single program to program ARM processors. I do not know if C plus plus and Assembly language are mixed in C program or C and Assembly language are mixed in C plus plus program or C and C plus plus are mixed in Assembly language program.

Please tell me which is where.
Sorry for asking so many questions.
 
Last edited:
  • #50
pairofstrings said:
Thank you.

Question one:
Can a C language program have code written in C plus plus and/or Assembly language? I will use Turbo C compiler v3.0
Yes to both. You would compile the C and C++ code using the Turbo C compiler (tc.exe if I'm remembering correctly), and assemble the assembly code using tasm.exe. Note that C++ is always written this way, not as "C plus plus".
The linker (tlink.exe) would be used to combine the object code produced by the compiler and assembler into an executable. One thing to be aware of is that the compiler "mangles" the names of C++ functions, so you have to take that into account when you call them from the C or assembly portions.

I'm assuming that the different types of code would be in different files, although you can write assembly code inline inside C or C++ code.
pairofstrings said:
Question two:
Can a C plus plus language program have code written in C and/or Assembly language? I will use Turbo C compiler v3.0
A C++ module can call a function written in C or a PROC written in assembly.
pairofstrings said:
Question three:
Can a Assembly language program have code written in C and/or C plus plus? I will use TASM or NASM.
An assembly "main" program can call code written in C or C++. The linker doesn't particularly care what language the code was written in - it just combines object code into an executable.
pairofstrings said:
Question four:
Can Java language program have code written in C and/or C plus plus and/or Assembly language? (Can be invoked?) I will use Notepad and Java Development Kit.
I'm pretty sure you can't mix Java and C/C++/assembly, but I could be wrong. I would be very surprised to find that you can mix Java with C/C++/assembly. How Java gets interpreted and executed is very different from how C, C++, and assembly are translated, linked, and executed.
pairofstrings said:
Question five:
Can Assembly language program have code written in C and/or C plus plus and/or Java? (Can be invoked?) I will use TASM or NASM.
No Java.
pairofstrings said:
Where can I invoke and where can the mixing be done directly?

I only have knowledge that:
C, C plus plus and Assembly language code are mixed in one single program to program ARM processors. I do not know if C plus plus and Assembly language are mixed in C program or C and Assembly language are mixed in C plus plus program or C and C plus plus are mixed in Assembly language program.
I believe that the most common arrangement would be to have a C++ main program that calls functions written in C++ and possibly some legacy code that was written in C. For time-critical or hardware-specific applications, some assembly code might be used to speed up processing.


pairofstrings said:
Please tell me which is where.
Sorry for asking so many questions.
 
  • #51
Mark44 said:
I'm pretty sure you can't mix Java and C/C++/assembly, but I could be wrong. I would be very surprised to find that you can mix Java with C/C++/assembly.

Yes you can: http://en.wikipedia.org/wiki/Java_Native_Interface

But doing it isn't a good "beginners project" IMO.
 
  • #52
Mark44 said:
For time-critical or hardware-specific applications, some assembly code might be used to speed up processing.
or to implement processor specific instructions needed to support multi-tasking and interrupt handling.
 
  • #53
Mark44 said:
For time-critical or hardware-specific applications, some assembly code might be used to speed up processing.


rcgldr said:
or to implement processor specific instructions needed to support multi-tasking and interrupt handling.

Right. That's why I included "hardware-specific" but I didn't complete my thought.
 
  • #55
Mark44 said:
Right. That's why I included "hardware-specific" but I didn't complete my thought.
I was only trying to clarify that sometimes the hardware specific issue is one of functionality versus speed. A type of hardware specific functionality that C / C++ doesn't support and/or would be akward to implement via a library function call. These type of issues mostly show up when writing code for an operating system or device drivers, and even then, only a small part of that code is best written in assembly.

pairofstrings said:
C, C plus plus and Assembly language code are mixed in one single program to program ARM processors.
This could be done, but not normally as a single "program". Part of the operating system and device drivers for the ARM and perhaps some hardware specific stuff would be written in assembly language, but most of the code for an ARM embedded application would be written in C or C++. Most of the assembly code would be in separate source files, as opposed to inline assembly code in C or C++ source files.
 
Last edited:
  • #56
EAC_zps319bf79c.jpg


I do not know if the following is a programming practice.

1. I am thinking that if I have good understanding of how registers are utilized when a program is executed and then by counting clock cycles that are taken by processor to perform an operation I can begin optimization of code and be optimization of code will be little easy task by writing assembly language code inside the programs which are written in C, C++ and Java, and i see that low level information is required for optimization.
For this purpose I need to obtain equivalent assembly language code for whatever code I write in C, C++, Java.
I know that on Unix platform I can get assembly equivalent code of C program by using
gcc -c sample.c and maybe code can be optimized.
The idea is to get equivalent assembly language for programs written in C, C++ and Java.

2. How to determine which section of program is taking more number of clock cycles to perform computation, especially when Java Program have code written in C, C++, Assembly, SQL, PL/SQL so that optimization can be done after generating equivalent assembly language code of the entire code of the project so that modifications can be done in the code for optimization.
 
  • #57
pairofstrings said:
1. I am thinking that if I have good understanding of how registers are utilized when a program is executed and then by counting clock cycles.
I don't know if this is possible anymore. I've read that current documentation on Intel processors no longer includes clock cycles because the internal optimizations of the code, such as out of order instruction processing, which would vary between processors. Cache implementation as well as ram interface would also be an issue.

The compilers do a fairly good job, and if using 64 bit mode on an Intel X86 processor, you get 8 additional registers, which makes it much easier for the compiler to optimize.
 
  • #58
Showing your "Generator" working on assembly code just shows once again how little you seem to have listened to all the advice you have already been given in this thread. What do you think is the "equivalent assembly code" to assembly code ? That's like saying "I'm going to take English prose and translate it into English prose"

Writing programs in a high-level language and then getting the assembly language of the machine code that the compiler generates and then optimizing it is a TERRIBLE idea. For one thing, you may well miss optimizations that the compiler made and by screwing around with it, you will make the code worse, not better.

If you want to optimize code in the way you seem to be thinking of, then write it in assembly language and optimize it as you write it. Unless you plan on doing nothing but writing device drivers, then for 99.9% of all code you are ever likely to need to write, this is insanity, but that has already been pointed out in this thread and you don't seem to be interested in listening.

As rcgldr pointed out, counting clock cycles on modern machines is likely to be a waste of time, if it can even be done in the way it could when machines were simpler.
 
  • #59
Okay I get your point.
Earlier I wanted to know if writing huge programs in assembly language was good or not. Now I wanted to know if there is any such "Generator" which can generate equivalent assembly code for above things for optimization.

I recently saw this and wondered how programmers could optimize their code for such a complex component.

Thank you for answering my question. I was looking for an answer which could explain me the process of optimization, not in detail only the important things. But as you said equivalent assembly code is generated from the machine code generated by compilers. But can I use decompilers in a way which gives me only assembly code of the entire code written in Java having C, C++, SQL, PL/SQL? What decompiler is that? There is no such decompiler, right? And I don't think there is any compiler which can take-in code of Java program having code written in C, C++, SQL, PL/SQL and produce equivalent assembly code. Am I right?
If yes that means equivalent assembly code can be generated only for C, C++ language code. Correct?

May be someone can tell me what book to refer to for optimization.
I just want to know the process of how optimization takes place in simple English language, may be someone can tell me name of a good book and provide a basic idea about the working of optimization technique.
 
Last edited:
  • #60
pairofstrings said:
equivalent assembly code is generated from the machine code generated by compilers.
Some compilers have the option of producing assembly code as an output.

pairofstrings said:
But can I use decompilers in a way which gives me only assembly code of the entire code.
There are dissassemblers, but these are time consuming to use, since you would need to figure what parts of a program are data and what parts of a program are code. Assuming you're starting with source code in some high level language, there's no point in doing this.

pairofstrings said:
May be someone can tell me what book to refer to for optimization.
The issue here is optimization depends on the processor(s), and the application. Some applications, such as video rendering, can be easily split up to work on independent portions of the video image, and can take advantage of parallel processing.
 
  • #61
To add to rcgldr's comment, optimization depends more on the domain for practical purposes than anything else.

Optimizing code for getting the best use of "cycles" or "CPU time" is one thing, but a lot of what optimization is about is looking at your domain and seeing if there are some domain specific connections in the code that can be optimized or whether the domain exhibits the potential for data structures and appropriate algorithms that use these to do a task that is quicker than in another implementation that achieves the same thing.

Typically there is a kind of rule of thumb between the use of memory and computational complexity of a task (or algorithm) where the trade-off is that if you sacrifice memory, then the computational complexity increases but if you don't then it decreases.

The best example of this would be to compare a search algorithm with data that had no-overhead vs one that had a lot (i.e. a hash-table).

Some-where in between all of this you have say a binary-tree classification system for records, or even some kind of graph structure to help organize the data but a hash-table is one where if it's a good table with a good hash-algorithm with low collisions (you don't aim to remove collisions, you just aim to make them as uniform as possible) then using memory with the hash-table has a habit of making the speed a lot better and as a "rule of thumb" if you sacrifice less memory, you increase computational complexity.
 
Back
Top