Is it possible to deduce the software from the hardware activity?

Click For Summary

Discussion Overview

The discussion revolves around the possibility of deducing software functionality from hardware activity in computers. Participants explore whether a person with a strong understanding of physics but no knowledge of computers could reconstruct software or understand its operation solely by analyzing hardware components like the CPU and transistors. The scope includes theoretical considerations, practical implications, and the limits of understanding based on hardware analysis.

Discussion Character

  • Debate/contested
  • Conceptual clarification
  • Exploratory

Main Points Raised

  • Some participants argue that without prior knowledge of computers, it would be difficult to deduce how a computer works or to rebuild software from hardware alone.
  • Others suggest that it is possible to understand some aspects of software by studying the hardware, citing resources like Ben Eater's videos as helpful for learning.
  • A participant notes that understanding the function of a semiconductor does not equate to knowing how an IT chip operates, raising questions about the depth of knowledge required.
  • There is a discussion about the complexity of modern computing systems, with some asserting that the intricate nature of microprocessors makes it challenging to unravel the software code from hardware analysis.
  • One participant proposes a bottom-up approach, suggesting that starting from the basic components like transistors could lead to an understanding of logic gates and CPU instructions, but expresses uncertainty about the feasibility of this process without prior knowledge of programming languages.
  • Another participant mentions that while basic functions of a CPU can be understood through reverse engineering, this does not imply that one could recreate complex software systems like operating systems from hardware alone.
  • There is a consideration of whether the difficulty in deducing software from hardware is a matter of complexity or if it presents a logical impossibility, referencing concepts like Gödel's incompleteness theorem.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility of deducing software from hardware, with no consensus reached. Some believe it is possible under certain conditions, while others maintain that it is unlikely or impractical without prior knowledge of computing concepts.

Contextual Notes

The discussion highlights limitations in understanding due to the complexity of modern computing systems and the varying levels of knowledge required to analyze hardware effectively. There are unresolved questions about the depth of understanding necessary to connect hardware functionality with software operations.

  • #31
A related RESEARCH ARTICLE:

Could a Neuroscientist Understand a Microprocessor?​

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005268

Abstract​

There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information. Microprocessors are among those artificial information processing systems that are both complex and that we understand at all levels, from the overall logical flow, via logical gates, to the dynamics of transistors. We show that the approaches reveal interesting structure in the data but do not meaningfully describe the hierarchy of information processing in the microprocessor. This suggests current analytic approaches in neuroscience may fall short of producing meaningful understanding of neural systems, regardless of the amount of data. Additionally, we argue for scientists using complex non-linear dynamical systems with known ground truth, such as the microprocessor as a validation platform for time-series and structure discovery methods.
 
Computer science news on Phys.org
  • #32
Mark44 said:
But disassembling the code is no guaranteed that you're going to understand what the code is doing. Back in the late 80's/early 90's I played around with an x86 disassembler called MD86. A big problem for a disassembler is determining whether a sequence of bytes is machine instructions or data. One can make an educated guess, but that is, after all, only a guess.
I have never seen a problem. In 1973, I wrote a disassembler for the CDC-3300. It showed interpretations of memory as binary, EBCDIC (not ASCII), and assembly side-by-side on the same page of the printout. There was never any doubt which was the correct interpretation.

In some cases (including with the x86) you will see short pieces of code mixed with data structures - but even in that case, you will see function or interrupt returns at the end of the code segments.

Our Ideal Naive Investigator would be able to instrument the running PC and capture what is executed and what is not (a code coverage exercise).
 
Last edited:
  • #33
.Scott said:
In 1973, I wrote a disassembler for the CDC-2100.
I was not able to find any reference to a CDC-2100 computer. This wiki page, https://en.wikipedia.org/wiki/Control_Data_Corporation, lists all of the models produced by Control Data Corp., but doesn't show a model CDC-2100. HP had a model 2100.

In any case, computers changed a fair amount from 1973 to 1993, when I was disassembling x86 code, and they have changed even more drastically from the 90s to the present day. If you're working with a system like ARM or MIPS, with a relatively small number of fixed-length instructions, that's a whole different ball game from other current architectures that are built using complex instruction sets. For example, 64-bit Intel and AMD instructions can range from a single byte up to 12 or more bytes in length. Besides that, modern processors can execute instructions out of order, and even execute multiple instructions in parallel in one machine cycle.
 
  • #34
The OP phrased this question very loosely. But I would compare the complexity of reverse engineering computer hardware/software to the complexity of deciphering Egyptian and/or Maya hieroglyphics. Both of those took a lot of time and effort to solve.

Remember that advanced aliens would certainly understand digital logic, and logic building blocks, and programmable devices with hardware and software, and would no doubt use them in their own devices. Some things are more alien (like spoken language) or less alien (like mathematics). Circuits IMO are somewhere in the middle.
 
  • #35
Mark44 said:
I was not able to find any reference to a CDC-2100 computer. This wiki page, https://en.wikipedia.org/wiki/Control_Data_Corporation, lists all of the models produced by Control Data Corp., but doesn't show a model CDC-2100. HP had a model 2100.

In any case, computers changed a fair amount from 1973 to 1993, when I was disassembling x86 code, and they have changed even more drastically from the 90s to the present day. If you're working with a system like ARM or MIPS, with a relatively small number of fixed-length instructions, that's a whole different ball game from other current architectures that are built using complex instruction sets. For example, 64-bit Intel and AMD instructions can range from a single byte up to 12 or more bytes in length. Besides that, modern processors can execute instructions out of order, and even execute multiple instructions in parallel in one machine cycle.
You're right, it was a CDC-3300 (and I fixed that post). It was in 1973 while I was a student at Lowell Tech (now UMass Lowell).
They also had an IBM-1620 and a HP-2100 series (HP-2136 I think), plus another computer in the Physics department that I never worked with.

I programmed one system before those (Honeywell series 200) and scores more since then. I am currently working with a TI MPS uP, last year it was an Infineon Aurix TC397. In all cases I have run across the machine language. If you just do a simple histogram on the bytes, you will get vastly different patterns depending on whether it is executable code, ASCII data, numeric data, or the kind of pointer/numeric mish-mash you find on the stack. As for executing instruction out of order, I first worked on a pipeline processor in the early 1980's.

But before our Ideal Naive Investigator looks at the code, it should nail down the hardware. Even if it miscategorized some data, it would discover it soon enough by simply trying to execute it. Executing data generally produces pointless results - like repeatedly moving the same content into the same register or overwriting a register content before it could have been used.
 

Similar threads

  • · Replies 204 ·
7
Replies
204
Views
12K
Replies
4
Views
2K
Replies
2
Views
1K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
Replies
38
Views
4K
  • · Replies 2 ·
Replies
2
Views
7K