I want to open any given program and look at it's coding, can anyone tell me how it can be done?
Yes. Get a machine code reader / bit parser.
And learn machine code.
If this were possible, a lot of companies would be out of business.
There are reverse compilers around.
Google and thou shall find!
I have some experience with some of those.
You won't get very far on a large and complex program though.
How about a Hex Editor ?
Yeah. Same thing. You're just looking at bits 16 at a time.
But it is no closer to illuminating the workings of the app.
The most common tool used for this is a dissassembler, but this would be very time consuming on a large program. It's a form of reverse engineering, and the legality of this depends on the circumstances. Wiki article:
There are disassemblers for popular high-level languages such as Java, VB and .NET which can yield very readable code and give substantial insight as to how the app operates.
Some languages are more vulnerable than others. In my experience, C and C++ programs are much more difficult to reverse engineer but can still be modified depending on how the code is structured - this is how hackers bypass software protection for example.
That's because, in some cases, even the language itself is little more than memory allocations and pointers and incomprehensible at the best of times.
Ha that's true. The reality is that languages like .NET compile into an intermediate representation, using something aptly called "Intermediate Language".
.NET ships with Ildasm.exe a disassembler that outputs the IL for a given .NET program. If you take a look you'll see that the IL is quite readable, so .NET decompilers have alot more information. JAVA also retains symbol information when compiled into class files.
C/C++ compile to lower level code which makes it much more difficult to piece back together, but you can always stare and step through the assembly code - Visual Studio has dumpbin for this, there are also more sophisticated tools like IDA.
What the OP wants to do is pick an app (or apps) he wants to examine, determine what language they are written in, and then determine the feasibilty for decompiling them.
Feasibility and choice of solution will be on a case-by-case basis.
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.
The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, .....
I have tried disassembly of .com programs on 8086's and that is just about impossible. Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.
There are cases where a person or a group have reversed engineered an old PC game to enhance it and/or to make it work with current versions of Windows. This involved dissassembling and understanding key aspects of the program, and figuring out key data files for games. One example of this is the racing game Grand Prix Legends (orignally made in 1997), where a group of people modified the game to include shifter support, race cars from other years, and a large number of tracks were made as add ons.
I guess it depends on what you want to do.
If all you want to do is crack the software protection, then it's going to be pretty quick with something like SoftICE to at least get a basic idea of the regions of the program that are involved with this feature. If the protection is not really advanced (like say a simple function), then writing a few NOP codes in the right places (maybe with some extra modification) is pretty easy. If its more complex protection, then the structure of the protection is there, but its usually more complicated and more integrated into the code-base than the simpler mechanisms.
But yeah if you want to actually understand the whole executable, then for most cases good luck.
I thought the OP's question was fairly clear on that.
That's assuming that you can fit only 10 bytes on a page.
Not necessarily. I had a disassembler about 20 years ago, and I was able to disassemble and modify several .com and .exe files.
One of the .com files was a DOS utility that would display a text file. By default, the utility displayed white letters against a blue background. I found the code that was setting the text and background colors, and changed it so that it displayed black letters on a gray background.
Another .com file was called bubble.com, IIRC. When it was running it seemed to make the text on the screen drop off the bottom, one character at a time. If it ran long enough, the screen would eventually become blank. I disassembled that code and changed it so that the letters rose up instead of falling down.
These were relatively small files, so disassembling them wasn't that difficult. One that took more time was an .exe that was part of Norton Utilities. One of the utilities they offered was the ability to change the name of a directory. The standard way of doing this back then (early 90s) was to create a directory with the name you wanted, move all the files from the old directory, and then delete the old directory. I couldn't imagine that Norton was doing all of this just to rename a directory. I disassembled the file, and looked through all of the assembly code (about 35K bytes) for all of the int 21H instructions, the interrupts that go into the DOS functions. For each of the int 21H instructions, I wrote down how the AX and other registers were set, so as to determine the particular DOS function being used. I then looked up each function call in a reference I have, and found that one of them was using the DOS Rename File function, which is what Norton was using to rename a directory. The only difference between a directory and a file is that the directory has the directory attribute set.
Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?
Not sure if a C=64 disassembler could handle a Java app unless it'll fit on a 5 1/4" floppy...
IIRC, the name was MD86, but I don't think they're in business any longer.
A disassembler works by attempting to translate binary machine code back into a particular assembly language. I say "attempt" because a program will typically have blocks of code, which are relatively easy to translate, as well as blocks of data, which the disassembler will sometimes try to turn into code instructions. Part of being able to disassemble an executable is being able to recognize blocks of data for what they are: just numbers or characters.
Another thing that makes a disassembled executable or COM file difficult to understand is that named labels and named variables in the original assembly code show up in the disassembly code as just labels with uninformative names such as B200 and the like. The disassembler I had let you replace these label names with ones of your choice.
Separate names with a comma.