Is it possible to deduce the software from the hardware activity?

Click For Summary
Deducing software functionality solely from hardware activity is highly challenging, even for someone well-versed in physics but unfamiliar with computing. Understanding a computer's operation requires knowledge of its components, such as CPUs and memory, which are complex and interdependent. While resources like Ben Eater's educational videos can provide foundational knowledge, they do not enable one to reverse-engineer software from hardware alone. The intricacies of modern computing, including the vast array of possible programming tasks and the abstraction of high-level languages, complicate any attempts to derive software from hardware. Ultimately, while some insights may be gained through analysis, fully reconstructing software from hardware is generally impractical and often impossible.
  • #31
A related RESEARCH ARTICLE:

Could a Neuroscientist Understand a Microprocessor?​

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005268

Abstract​

There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor. This suggests current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data. Additionally, we argue for scientists using complex non-linear dynamical systems with known ground truth, such as the microprocessor as a validation platform for time-series and structure discovery methods.
 
Computer science news on Phys.org
  • #32
Mark44 said:
But disassembling the code is no guaranteed that you're going to understand what the code is doing. Back in the late 80's/early 90's I played around with an x86 disassembler called MD86. A big problem for a disassembler is determining whether a sequence of bytes is machine instructions or data. One can make an educated guess, but that is, after all, only a guess.
I have never seen a problem. In 1973, I wrote a disassembler for the CDC-3300. It showed interpretations of memory as binary, EBCDIC (not ASCII), and assembly side-by-side on the same page of the printout. There was never any doubt which was the correct interpretation.

In some cases (including with the x86) you will see short pieces of code mixed with data structures - but even in that case, you will see function or interrupt returns at the end of the code segments.

Our Ideal Naive Investigator would be able to instrument the running PC and capture what is executed and what is not (a code coverage exercise).
 
Last edited:
  • #33
.Scott said:
In 1973, I wrote a disassembler for the CDC-2100.
I was not able to find any reference to a CDC-2100 computer. This wiki page, https://en.wikipedia.org/wiki/Control_Data_Corporation, lists all of the models produced by Control Data Corp., but doesn't show a model CDC-2100. HP had a model 2100.

In any case, computers changed a fair amount from 1973 to 1993, when I was disassembling x86 code, and they have changed even more drastically from the 90s to the present day. If you're working with a system like ARM or MIPS, with a relatively small number of fixed-length instructions, that's a whole different ball game from other current architectures that are built using complex instruction sets. For example, 64-bit Intel and AMD instructions can range from a single byte up to 12 or more bytes in length. Besides that, modern processors can execute instructions out of order, and even execute multiple instructions in parallel in one machine cycle.
 
  • #34
The OP phrased this question very loosely. But I would compare the complexity of reverse engineering computer hardware/software to the complexity of deciphering Egyptian and/or Maya hieroglyphics. Both of those took a lot of time and effort to solve.

Remember that advanced aliens would certainly understand digital logic, and logic building blocks, and programmable devices with hardware and software, and would no doubt use them in their own devices. Some things are more alien (like spoken language) or less alien (like mathematics). Circuits IMO are somewhere in the middle.
 
  • #35
Mark44 said:
I was not able to find any reference to a CDC-2100 computer. This wiki page, https://en.wikipedia.org/wiki/Control_Data_Corporation, lists all of the models produced by Control Data Corp., but doesn't show a model CDC-2100. HP had a model 2100.

In any case, computers changed a fair amount from 1973 to 1993, when I was disassembling x86 code, and they have changed even more drastically from the 90s to the present day. If you're working with a system like ARM or MIPS, with a relatively small number of fixed-length instructions, that's a whole different ball game from other current architectures that are built using complex instruction sets. For example, 64-bit Intel and AMD instructions can range from a single byte up to 12 or more bytes in length. Besides that, modern processors can execute instructions out of order, and even execute multiple instructions in parallel in one machine cycle.
You're right, it was a CDC-3300 (and I fixed that post). It was in 1973 while I was a student at Lowell Tech (now UMass Lowell).
They also had an IBM-1620 and a HP-2100 series (HP-2136 I think), plus another computer in the Physics department that I never worked with.

I programmed one system before those (Honeywell series 200) and scores more since then. I am currently working with a TI MPS uP, last year it was an Infineon Aurix TC397. In all cases I have run across the machine language. If you just do a simple histogram on the bytes, you will get vastly different patterns depending on whether it is executable code, ASCII data, numeric data, or the kind of pointer/numeric mish-mash you find on the stack. As for executing instruction out of order, I first worked on a pipeline processor in the early 1980's.

But before our Ideal Naive Investigator looks at the code, it should nail down the hardware. Even if it miscategorized some data, it would discover it soon enough by simply trying to execute it. Executing data generally produces pointless results - like repeatedly moving the same content into the same register or overwriting a register content before it could have been used.
 

Similar threads

Replies
204
Views
12K
Replies
2
Views
669
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
Replies
34
Views
812
  • · Replies 2 ·
Replies
2
Views
7K
Replies
10
Views
5K