Simulate computer inside a computer

fredreload · Jan 26, 2016

If I simulate a virtual computer inside a real computer graphically, would it take more resources or would it work like before? It is different from cloud computing, I'm simulating a computer graphically.

anorlunda · Jan 26, 2016

Look up emulators and "virtual machine" on Wikipedia. After reading those articles, if you still have questions, post them here.

Edit: I don't understand what you mean by graphically.

russ_watters · Jan 26, 2016

More resources than what? Before what?

Jeff Rosenbury · Jan 26, 2016

In general, yes. It will use more resources. There's no free lunch.

It might be possible to trade resources though. For example certain unused portions of the CPU like branch prediction could be skipped by lowering the execution speed since only one branch will be taken anyway.

It also might be possible to limit resources by limiting the faithfulness of the simulation. A trivial example: A brick can simulate a computer that's been turned off.

nsaspook · Jan 26, 2016

Your modern Intel x86 instruction set CPU doesn't run native x86 instructions on the silicon. It runs RISC-like micro-instructions internally with very complex micro-coded decoders for CISC instructions. It's both faster and more efficient to emulate/simulate the CISC instruction set on modern processes than the run the native instructions in silicon.

http://www.hardwaresecrets.com/inside-pentium-m-architecture/4/

fredreload · Jan 27, 2016

I thought simulating a computer would create a faster one, but I feel that the exascale computer is still going in the right direction, we'll wait for that and go from there

fredreload · Jan 27, 2016

Well hmm, is it possible to simulate a computer with a particle simulation for the transistors, after all they are just on and off switches. So each particle would represent an on or off switch with a different type of computing. Eventually it would simulate how a CPU works. That way you don't need to build those transistors, you just simulate how it works. But again simulating how these particles behave might take up more computing power than a CPU can handle. What do you guys think? It's only a thought

Jeff Rosenbury · Jan 27, 2016

Typically they are not just on/off switches. Timing is critically important and most systems use the transition states to control timing. For example a state of one bit might be latched by the transition of another.

These are technical issues for optical and quantum computing. Solving them is a high priority.

fredreload · Jan 27, 2016

You are right, all you need is timing and RAM. We need a lot of RAM, and more RAM for the clock assuming 1 byte each

Jeff Rosenbury · Jan 27, 2016

Back when I was in school, we did an graphical simulator of the micro-coding of an 8x86 for teaching. It wasn't that hard. (Getting the artwork right was the hardest part). But it was a simulator, not any sort of emulator intended to run code.

fredreload · Jan 29, 2016

Hmm, quantum computer is for exponential calculation, couldn't we optimize arithmetic or create a supercomputer with a graphical simulator and use that for our calculations? The actual computer device is limited by 2 bit calculations but a simulation program can do just about anything.

analogdesign · Jan 29, 2016

The simulation program is running on a computer that is limited to binary calculations. Where is the win?

anorlunda · Jan 29, 2016

Jeff Rosenbury said:

Back when I was in school, we did an graphical simulator of the micro-coding of an 8x86 for teaching. It wasn't that hard. (Getting the artwork right was the hardest part). But it was a simulator, not any sort of emulator intended to run code.

Enlighten me please Jeff. Graphical? I don't know what you mean.

You used an interactive circuit builder in Spice?
You sketched artwork for a chip etching mask or a printed circuit mask?

analogdesign · Jan 29, 2016

anorlunda said:

Enlighten me please Jeff. Graphical? I don't know what you mean.

You used an interactive circuit builder in Spice?

You sketched artwork for a chip etching mask or a printed circuit mask?

SPICE is far, far too slow to use as an interactive microcode simulator.

I'm sure Jeff meant he had a program that allowed the various registers to be printed to the screen as the simulation progressed. I've done similar things but I've always just printed them to stdout or redirected to a log file.

Jeff Rosenbury · Jan 29, 2016

analogdesign said:

SPICE is far, far too slow to use as an interactive microcode simulator.

I'm sure Jeff meant he had a program that allowed the various registers to be printed to the screen as the simulation progressed. I've done similar things but I've always just printed them to stdout or redirected to a log file.

Yes. The program displayed the CPU in block diagram form. It would execute commands by loading the registers, running the ALU, setting the flags, etc. Basically it displayed the machine code and how the µcode drove the machine code.

It was intended for teaching CPU architecture classes.

fredreload · Jan 30, 2016

Can I emulate more CPU power for number of calculations greater than the CPU I use itself?

Borg · Jan 30, 2016

That's kind of like asking if you can run faster by carrying yourself.

fredreload · Jan 30, 2016

But Wikipedia says it is possible to emulate an IBM PC with a Commodore 64 here. I'm interested in the performance I would get from that

Vanadium 50 · Jan 30, 2016

fredreload said:

But Wikipedia says it is possible to emulate an IBM PC with a Commodore 64 here.

A. "But Wikipedia says" is not a very good argument.

B. You didn't - and should have - included the second half of that quote. "Yes, it's possible for a 64 to emulate an IBM PC, in the same sense that it's possible to bail out Lake Michigan with a teaspoon." It was disingenuous not to do that.

fredreload said:

Can I emulate more CPU power for number of calculations greater than the CPU I use itself?

Of course not. If you could run an emulator faster on the same hardware, we'd run emulators on top of emulators until we had infinitely fast computers and we could just sit back and await the robot apocalypse.

fredreload · Jan 30, 2016

Darn it I had my hopes up, it's not that it would be faster, but how would you maximize calculations per second? Is increasing the number of CPU the only way?

mfb · Jan 30, 2016

Note: The last few posts were merged into this thread, they were a separate thread before.

fredreload said:

I thought simulating a computer would create a faster one

It cannot - you cannot simulate more than one operation with one operation. Actual simulations of different processor architectures are way slower than those processors, as they need much more complex high-level operations to decide what each individual component in the simulated system would do. Simulating systems is a massive performance loss. This can be acceptable if the simulated system itself does not exist yet (you want to design something new), or does not exist any more (you want to see how some C64 software worked but you don't have a C64), otherwise it is a huge waste of ressources.

fredreload · Jan 30, 2016

A CPU is essentially a clock that turns on billions of times per second. Having multiple core CPU just for a clock sounds rather inefficient, and having a lot of clocks at that. Just saying, I think there should be a better design in maybe simulating these clocks?

mfb · Jan 30, 2016

A CPU is not a clock. It has a clock, and all elementary operations in the CPU need one clock cycle, afterwards the next operations can be performed in the next clock cycle. Modern CPUs can do many operations in parallel, and having more CPUs allows to perform even more calculations at the same time. A faster clock speeds up calculations as well, but every component in the CPU has to get fast enough to do so.

I suggest you look up how a CPU works before you continue this thread. It will help a lot to understand what limits computation speeds.

Averagesupernova · Jan 30, 2016

mfb said:

I suggest you look up how a CPU works before you continue this thread. It will help a lot to understand what limits computation speeds.

Yes it seems that someone who is asking the questions is pretty bent on giving the answers.

fredreload · Jan 30, 2016

Precisely I am referring to how CPU works here, I don't think you can make the clock goes any faster. The thing that bothers me is the amount of transistors you need. Since you can't make the clock goes faster, you can only increase the amount of data being calculated. How does the number of transistors factor in into this? Does anyone know?

mfb · Jan 30, 2016

fredreload said:

I don't think you can make the clock goes any faster.

CPU clock speeds increased by orders of magnitude in the last decades.

fredreload said:

How does the number of transistors factor in into this?

More transistors allow to do more things in parallel - if the software allows it. If every calculation step depends on the result of the previous step, a computer is very slow. An example
Modern CPUs spend a large fraction of their transistors on logic to check which steps can be done in parallel.

fredreload · Jan 30, 2016

mfb said:

Note: The last few posts were merged into this thread, they were a separate thread before.

It cannot - you cannot simulate more than one operation with one operation. Actual simulations of different processor architectures are way slower than those processors, as they need much more complex high-level operations to decide what each individual component in the simulated system would do. Simulating systems is a massive performance loss. This can be acceptable if the simulated system itself does not exist yet (you want to design something new), or does not exist any more (you want to see how some C64 software worked but you don't have a C64), otherwise it is a huge waste of ressources.

1. Let's see, first we need a clock, then we link multiple CPU designs onto this one clock, then we create a CPU emulator software from this. The CPU designs repeat infinitely, like a skyscraper, and each floor receives 4 bits of input 0000 to 1111. Now assume I have 16 bits information 1011 0100 1100 0011, I need to have 4 floors repeat of this skyscraper in parallel, but is there a way for me to truncate it down to just 1 floor and pass all 16 bits in one go? Like stack them or something.
2. Can't I make a CPU that takes 100 bits of information and output 100 bits? This sounds like a silly question.

Averagesupernova · Jan 30, 2016

fredreload, the type of questions you ask tell me that you need to start with more basics. A young teenage boy took a look at my bench one day not long ago. I was showing him how I prototyped some stuff and a motion control project with a graphic LCD. He asked how he could do that kind of stuff. I told him he needed to go to school for electronics and start with the basics. He claimed he already knew the basics. His experience with 'the basics' involved a gifted program at school hooking up some LEDs. On a residential wiring project he was with me one day and I was explaining to him some basics and he told me he knew the two types of circuits which were series and parallel. Well I found that he knew OF those types of circuits but really didn't know at all. He recognized the names. So, it is not uncommon to not realize how much you don't know. fredreload, I suspect you fall into this category. Do you know what a register is? A shift register? Gates? Latches? Counters? If you don't have a really good grasp on those things, and others I have missed, you will fail miserably with microcontrollers. Take it from folks on this forum who have gone through it. You will appreciate and understand microcontrollers much more when having a good grasp on the basics. It is my opinion that I had a very poor microcontroller program back when I went to school and it affected my understanding of microcontroller systems for a long time.

mfb · Jan 30, 2016

fredreload said:

1. Let's see, first we need a clock, then we link multiple CPU designs onto this one clock, then we create a CPU emulator software from this. The CPU designs repeat infinitely, like a skyscraper, and each floor receives 4 bits of input 0000 to 1111. Now assume I have 16 bits information 1011 0100 1100 0011, I need to have 4 floors repeat of this skyscraper in parallel, but is there a way for me to truncate it down to just 1 floor and pass all 16 bits in one go? Like stack them or something.

That does not make sense at all.

Imagine you would simulate a computer adding two numbers with pen and paper: look up how your computer would do the addition internally, write down the corresponding state of all ~1 billion transistors, then calculate how they change during a clock cycle. Repeat for 10 clock cycles, or whatever your computer needs. Total time you need? If you manage to simulate one transistor in a second (very optimistic, as you would be going through a library of a thousand books just to check all the previous states for each step), you are done in 300 years.
How long do you need if you just add the numbers? A few seconds.

meBigGuy · Jan 30, 2016

Simulation will always be slower than "the real thing" unless you make approximations in some way.

In order to emulate a TARGET computer the HOST computer must parse every instruction and perform the operations as contained in the instruction. Also, it must determine the program flow, that is, emulate the hardware portions of the TARGET computer regarding the program counter, memory access, cache operation and so on.

If you could take each TARGET computer's instruction apart and execute it in less than 100 HOST computer instructions you would be doing good.

Some languages are emulated (they call it interpreted) on the HOST computer, such as JAVA, PYTHON, TCL, etc. These are always slower than executing fully compiled and optimized code in the computers native language.

Parsing and emulating the TARGET instruction set on the HOST is a big task. But don't underestimate the hardware activity that is happening every TARGET clock cycle which must be emulated also by the HOST.

meBigGuy · Jan 30, 2016

I mentioned that you can speed things up by making approximations. But, that is something you can/must do in any program to get the best performance. Adding the overhead of emulating a different machine is just wasteful. It adds nothing. The only reason anyone ever does it is because they want machine code compatibility so they can (for example) execute the game machine ROMs exactly as the old machine did. It is horribly inefficient.

If you want better performance, you need to optimize your program to incorporate the algorithms, approximations and hierarchical reductions that you are willing to trade off for better performance. For example you can add 100 to a number by writing a loop that adds one 100 times, or you can execute an ALU add with the original number and 100 as operands to a single ADD instruction. That is your choice when you write the program.

DrZoidberg · Jan 30, 2016

fredreload said:

Precisely I am referring to how CPU works here, I don't think you can make the clock goes any faster. The thing that bothers me is the amount of transistors you need. Since you can't make the clock goes faster, you can only increase the amount of data being calculated. How does the number of transistors factor in into this? Does anyone know?

The performance per transistor is actually very low in modern CPUs. For example imagine you recreated a 6502 processor (similar to the one used in the C64) with a modern 22nm manufacturing process and run it at 4GHz. That thing only has about 3,500 transistors and it's performance for most types of software would be about 50 to 100 times slower than that of a quad core i7 with more then 1 billion transistors. So you put 285,000 times as many transistors in it but only get 100 times the processing power.
In other words the performance per transistor and clock cycle is about 3000 times worse.
The main reason for this is the fact that single thread performance is extremely important. Putting 10 times as many transistors in each core to increase their performance by just a few percent can be more valuable than putting 10 times as many cores in there. Most software isn't even able to use more than one core.
If we managed to find a way to automatically rewrite a piece of software that can only run on one core to one that can make use of hundreds of cores efficiently we could fundamentally change the way CPUs are designed and use huge amounts of small cores instead of a few big ones. That would then give us a lot more power per transistor.

Boolean Boogey · Jan 30, 2016

People use Minecraft as a means through which they can build and use computers.

fredreload · Jan 31, 2016

I was thinking about it too but, doesn't the CPU it takes to simulate such a computer out weight the computer itself? Or is it possible to build one with an even better performance?

meBigGuy · Jan 31, 2016

It's a total waste to build a faster HOST computer to use multiple cycles to simulate a TARGET doing something. Just do it with the HOST as efficiently as possible.
If the TARGET has a better architecture, then build a faster computer using the TARGET architecture.

A common way to get peak performance from a machine is to hand code in assembly language. In many cases a human can do a better job of optimizing than the compiler (but that isn't always true). A prime example where humans do well is in tightly coded pipelined DSP arithmetic loops. A human can sniff out tricks and shuffle code to cut cycles from the loops that an optimizing compiler could never find.

The next level of optimization is to build dedicated hardware coprocessors to do certain operations more efficiently, like a graphics GPU, or a custom coprocessor to do a square root or an FFT. (or even just use multiple CPU's to execute parallel threads)

The only way to efficiently emulate a TARGET computer is to build dedicated hardware (that's a joke, BTW, since that would be the TARGET computer itself)

fredreload · Jan 31, 2016

meBigGuy said:

It's a total waste to build a faster HOST computer to use multiple cycles to simulate a TARGET doing something. Just do it with the HOST as efficiently as possible.
If the TARGET has a better architecture, then build a faster computer using the TARGET architecture.

A common way to get peak performance from a machine is to hand code in assembly language. In many cases a human can do a better job of optimizing than the compiler (but that isn't always true). A prime example where humans do well is in tightly coded pipelined DSP arithmetic loops. A human can sniff out tricks and shuffle code to cut cycles from the loops that an optimizing compiler could never find.

The next level of optimization is to build dedicated hardware coprocessors to do certain operations more efficiently, like a graphics GPU, or a custom coprocessor to do a square root or an FFT. (or even just use multiple CPU's to execute parallel threads)

The only way to efficiently emulate a TARGET computer is to build dedicated hardware (that's a joke, BTW, since that would be the TARGET computer itself)

Well but let's say I emulate 10, 20, or even 100 of this software CPU emulator, will that dramatically boost the amount of calculations I can do in a second? I am not sure the amount of data I can process, someone has to enlighten me.

Jeff Rosenbury · Jan 31, 2016

fredreload said:

Well but let's say I emulate 10, 20, or even 100 of this software CPU emulator, will that dramatically boost the amount of calculations I can do in a second? I am not sure the amount of data I can process, someone has to enlighten me.

What you seem to want to do is break tasks into parts and so the parts at the same time. Yes this works.

But the tricky part of this is to break problems into parts so one part doesn't depend on the result of other parts. Usually this is handled logically (in software). It is a very difficult problem. One could do it in hardware, but software costs the time of the programmer. Hardware costs $100,000 a chip run plus the cost of the Cadence chair.

rcgldr · Jan 31, 2016

Simulating old 8 bit or 16 bit computers on a much faster PC has been done already. I'm not sure how much slower a virtual PC is on Windows, but I suspect CPU stuff is the same speed and I/O emulation is much slower. Mainframes like IBM Z systems, can run in multiple modes (tri modal addressing) for backwards compatibility support at full speed. In addition to current and classic IBM OS support, it also supports UNIX API. The series includes computers with 80 to 100+ central processors (like having 80 to 100+ cores on a PC).

http://en.wikipedia.org/wiki/IBM_System_z

http://en.wikipedia.org/wiki/Z/OS

FactChecker · Jan 31, 2016

Detailed simulations of the exact working and timing of a computer can take hours to just simulate a second of the actual computer operations. That is what others meant when they said that SPICE is slow. It is possible to have simplified models of computers that run much faster than the actual computers would. The simplified models would be useful to see if a computer system is adequate for some task. It would not really perform the same calculations as the computer being simulated. For instance, I might model an entire computer calculation as a simple delay along a signal path, just to see if the delay would be too great for a system to work right.

meBigGuy · Jan 31, 2016

I don't know what is behind your thinking that emulating something will lead to better performance. You are starting with that wrong assumption and trying to somehow fit it into reality without doing any real thinking. I challenge you to give one detailed example. I think part of the problem is that you really don't understand how computers actually work. What level of programming skills do you have? Try to write code that emulates a 6502 microprocessor. You will see very quickly what others are telling you.

fredreload said:

Well but let's say I emulate 10, 20, or even 100 of this software CPU emulator,

Lets say I emulate a TARGET that does an ADD with a loop that adds 1 each time. If I emulate 1000 of these loops on my HOST, it doesn't go any faster, since my HOST has only a certain performance. But, if I execute the add natively on my HOST, it executes in 1 cycle.

How many times will I have to say this: Your idea of emulating another computer to achieve performance is totally off base. There is no way. Forcing a HOST to behave like a TARGET will always take extra cycles and will be slower.

I already outlined what is done to achieve extra performance.
1. Better algorithms (more efficient code)
2. Co processors
3. Multiple CPU's
4. Enhanced Architecture (faster clock, better cache, better instructions, etc)

The only thing emulation will do is slow you down.

Vanadium 50 · Jan 31, 2016

fredreload said:

I was thinking about it too but, doesn't the CPU it takes to simulate such a computer out weight the computer itself? Or is it possible to build one with an even better performance?

You keep asking that. It's still impossible. (See my post #19)

fredreload · Jan 31, 2016

Right I thought about it and it really does run on the original hardware for the CPU emulator. If we run the hardware design in parallel all we need is a single clock to drive the whole thing, I suppose you need current running through to drive the design, something I never really thought about. I've been interested about how super computer works, especially the k super computer and brain computation. Feel free to leave any comments about how the super computer or exascale computer can be improved with CPU usage for more calculations per second.

meBigGuy · Jan 31, 2016

fredreload said:

If we run the hardware design in parallel all we need is a single clock to drive the whole thing,

What you say above makes no sense in the context of what you are asking.

You are talking about designing a different architecture to do things more efficiently. It is called computer science.

Have you ever written a computer program of any sort? Do you have any idea what machine language is? Do you have any concept of the way in which hardware controls the program sequencing? Do you know what an alu is? DO you understand the structure of cache memories and the algorithms that drive them? Do you know about the difference between Van Neumann architectures and Harvard Architecture?

There is no way you can even begin to understand enough to ask intelligent questions about computer performance until you start studying computer science and computer architecture.

I don't say this to discourage you from asking questions, but rather to motivate you to read about how a CPU operates and the part it plays in a computer.

Previously posted link: http://www.hardwaresecrets.com/inside-pentium-m-architecture/4/ for example explains the innards of the pentium CPU. That is probably too advanced, but serves as an introduction to the richness of CPU architecture as an engineering discipline.

One of the simplest and most common architectures is the Intel 8051. Much has been written about its hardware architecture. Here is a reference manual.
http://web.mit.edu/6.115/www/document/8051.pdf You should read about how it actually operates.

Here is a introduction to what a CPU actually is. You especially need to read through the Operation and Structure sections (several times, BTW):
https://en.wikipedia.org/wiki/Central_processing_unit

Without understanding what a CPU is and what a computer is (and how they are different) you will not be able to understand how performance is improved and how different architectures are suited to different tasks.

It would be like asking if hooking together 100 volkswagon engines would yield a Formula A race race car. If you understood how an engine works, and what comprises a complete high performance automobile, that is not the question you would be asking.

Again, I'm not discouraging questions, just pointing out you need to do a bit more of your own home study. The fact that you are asking these questions shows an inquisitive nature. Focus for a bit on the fundamentals, and you will never regret it.

fredreload · Feb 1, 2016

If I am running a software like NEST here. Is it better to use a video card or just uses CPU for computation?

mfb · Feb 1, 2016

If you have software that uses the graphics card in a meaningful way, it can speed up calculations, see the video in post 26. The speedup you get will depend on the task and the implementation in software.

fredreload · Feb 1, 2016

Thanks, I was looking for that video. Also here is a note on the new Nvidia's supercomputer, supposedly help out the Google Brain porject

Simulate computer inside a computer

Similar threads

Hot Threads

Is AI hype?

Q-Day: When Quantum Computers can Factor ultra-large numbers in a few...

Seeking Information on a WW II Era Westinghouse Gyro

How to disable AI responses in Google Searches?

Dealing with the new security règime

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective