How to open any software and look at it's code?

  • Thread starter Thread starter pairofstrings
  • Start date Start date
  • Tags Tags
    Code Software
Click For Summary

Discussion Overview

The discussion revolves around the methods and challenges associated with opening software to examine its underlying code. Participants explore various tools and techniques for reverse engineering software, including disassemblers, hex editors, and machine code readers, while also considering the implications of legality and complexity involved in such processes.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants suggest using a machine code reader or bit parser to examine compiled software, noting that most software is not in a human-readable format.
  • Others mention the existence of reverse compilers and disassemblers, which can provide insights into high-level languages like Java and .NET, but caution that this can be time-consuming for larger programs.
  • There is a discussion about the effectiveness of hex editors, with some arguing they only allow viewing bits without clarifying the software's functionality.
  • Participants highlight that C and C++ programs are particularly challenging to reverse engineer due to their low-level nature, which complicates understanding the code structure.
  • Some contributions detail personal experiences with disassembling older software and the complexities involved, including the need for knowledge of operating system internals and assembly language.
  • There are references to specific tools like Ildasm.exe for .NET and various disassemblers that can yield more readable code, contrasting them with the difficulties faced when dealing with C/C++ binaries.
  • Participants discuss the potential for modifying software through reverse engineering, citing examples from gaming and utility software, while also acknowledging the legal and ethical considerations involved.

Areas of Agreement / Disagreement

Participants express a range of views on the feasibility and methods of reverse engineering software, with no consensus on a single approach or solution. The discussion reflects a variety of opinions on the challenges and tools available, indicating that multiple competing views remain.

Contextual Notes

Limitations include the complexity of modern software, the need for specific knowledge about programming languages and operating systems, and the varying legality of reverse engineering practices depending on context.

pairofstrings
Messages
411
Reaction score
7
I want to open any given program and look at it's coding, can anyone tell me how it can be done?
 
Technology news on Phys.org
Yes. Get a machine code reader / bit parser.

And learn machine code.Seriously though, depending on what kind of programs, most software has been compiled to a form that is efficient for machine reading - zeros and ones. It is not interpreted human-readable code, like HTML and JavaScript for example.
 
If this were possible, a lot of companies would be out of business.
 
There are reverse compilers around.
Google and thou shall find! :wink:
I have some experience with some of those.
You won't get very far on a large and complex program though.
 
How about a Hex Editor ?
 
uart said:
How about a Hex Editor ?

Yeah. Same thing. You're just looking at bits 16 at a time.

But it is no closer to illuminating the workings of the app.
 
The most common tool used for this is a dissassembler, but this would be very time consuming on a large program. It's a form of reverse engineering, and the legality of this depends on the circumstances. Wiki article:

http://en.wikipedia.org/wiki/Reverse_engineering
 
There are disassemblers for popular high-level languages such as Java, VB and .NET which can yield very readable code and give substantial insight as to how the app operates.

Some languages are more vulnerable than others. In my experience, C and C++ programs are much more difficult to reverse engineer but can still be modified depending on how the code is structured - this is how hackers bypass software protection for example.
 
-Job- said:
In my experience, C and C++ programs are much more difficult to reverse engineer

That's because, in some cases, even the language itself is little more than memory allocations and pointers and incomprehensible at the best of times. :biggrin::biggrin::biggrin:
 
  • #10
DaveC426913 said:
That's because, in some cases, even the language itself is little more than memory allocations and pointers and incomprehensible at the best of times. :biggrin::biggrin::biggrin:

Ha that's true. The reality is that languages like .NET compile into an intermediate representation, using something aptly called "Intermediate Language".

.NET ships with Ildasm.exe a disassembler that outputs the IL for a given .NET program. If you take a look you'll see that the IL is quite readable, so .NET decompilers have a lot more information. JAVA also retains symbol information when compiled into class files.

C/C++ compile to lower level code which makes it much more difficult to piece back together, but you can always stare and step through the assembly code - Visual Studio has dumpbin for this, there are also more sophisticated tools like IDA.
 
  • #11
So:

What the OP wants to do is pick an app (or apps) he wants to examine, determine what language they are written in, and then determine the feasibilty for decompiling them.

Feasibility and choice of solution will be on a case-by-case basis.
 
  • #12
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.

The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible. Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.
 
  • #13
There are cases where a person or a group have reversed engineered an old PC game to enhance it and/or to make it work with current versions of Windows. This involved dissassembling and understanding key aspects of the program, and figuring out key data files for games. One example of this is the racing game Grand Prix Legends (orignally made in 1997), where a group of people modified the game to include shifter support, race cars from other years, and a large number of tracks were made as add ons.
 
  • #14
256bits said:
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.

The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible. Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.

I guess it depends on what you want to do.

If all you want to do is crack the software protection, then it's going to be pretty quick with something like SoftICE to at least get a basic idea of the regions of the program that are involved with this feature. If the protection is not really advanced (like say a simple function), then writing a few NOP codes in the right places (maybe with some extra modification) is pretty easy. If its more complex protection, then the structure of the protection is there, but its usually more complicated and more integrated into the code-base than the simpler mechanisms.

But yeah if you want to actually understand the whole executable, then for most cases good luck.
 
  • #15
chiro said:
I guess it depends on what you want to do.

I thought the OP's question was fairly clear on that.
 
  • #16
256bits said:
A 1 million byte .exe file debugged would print out on paper - approximately 100000 pages give or take.
That's assuming that you can fit only 10 bytes on a page.
256bits said:
The OP has to know internals of the operating system, DLL's, threads, assembly and machine code of the CPU, drivers, ...

I have tried disassembly of .com programs on 8086's and that is just about impossible.
Not necessarily. I had a disassembler about 20 years ago, and I was able to disassemble and modify several .com and .exe files.

One of the .com files was a DOS utility that would display a text file. By default, the utility displayed white letters against a blue background. I found the code that was setting the text and background colors, and changed it so that it displayed black letters on a gray background.

Another .com file was called bubble.com, IIRC. When it was running it seemed to make the text on the screen drop off the bottom, one character at a time. If it ran long enough, the screen would eventually become blank. I disassembled that code and changed it so that the letters rose up instead of falling down.

These were relatively small files, so disassembling them wasn't that difficult. One that took more time was an .exe that was part of Norton Utilities. One of the utilities they offered was the ability to change the name of a directory. The standard way of doing this back then (early 90s) was to create a directory with the name you wanted, move all the files from the old directory, and then delete the old directory. I couldn't imagine that Norton was doing all of this just to rename a directory. I disassembled the file, and looked through all of the assembly code (about 35K bytes) for all of the int 21H instructions, the interrupts that go into the DOS functions. For each of the int 21H instructions, I wrote down how the AX and other registers were set, so as to determine the particular DOS function being used. I then looked up each function call in a reference I have, and found that one of them was using the DOS Rename File function, which is what Norton was using to rename a directory. The only difference between a directory and a file is that the directory has the directory attribute set.
256bits said:
Com programs were limited to 64k data segment and 64k program segment but you still had to worry about the operating system access from the program. As the system was DOS, there was no oddball threading and other stuff that Windows programs do nowadays.
 
  • #17
Mark44 said:
I had a disassembler about 20 years ago, and I was able to disassemble and modify several .com and .exe files.

Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?
 
  • #18
pairofstrings said:
Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?

Not sure if a C=64 disassembler could handle a Java app unless it'll fit on a 5 1/4" floppy... :biggrin:
 
  • #19
pairofstrings said:
Can you name the disassembler? Is it powerful enough to handle complex softwares? How does it work?
IIRC, the name was MD86, but I don't think they're in business any longer.

A disassembler works by attempting to translate binary machine code back into a particular assembly language. I say "attempt" because a program will typically have blocks of code, which are relatively easy to translate, as well as blocks of data, which the disassembler will sometimes try to turn into code instructions. Part of being able to disassemble an executable is being able to recognize blocks of data for what they are: just numbers or characters.

Another thing that makes a disassembled executable or COM file difficult to understand is that named labels and named variables in the original assembly code show up in the disassembly code as just labels with uninformative names such as B200 and the like. The disassembler I had let you replace these label names with ones of your choice.
 

Similar threads

Replies
38
Views
4K
Replies
1
Views
6K
  • · Replies 29 ·
Replies
29
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
3
Views
2K
Replies
7
Views
3K
Replies
8
Views
1K