I'm not sure this is the right forum for this. Anyway, I just finished a course in computer organization and design, and there's one thread that's left hanging. If a pipelined design only increases throughput, and single-cycle datapaths are faster because they don't need temporary registers, why not just have several single-cycle datapaths working in parallel instead of a pipelined multistage design? Five parallel single-cycle datapaths should be faster than a five-stage pipelined datapath, and have better latency too. Is it just because of chip space? Also, one disadvantage posed for the single-stage datapath was that all instructions would have to have the same temporal length, because the clock cycle has to be long enough for the instruction that takes the most time. But couldn't you have many clock cycles and just not use the majority of them, so you would still have an edge available no matter when the next instruction is ready to start?