Please be gentle
Russ, wimms,
Many thanks for your replies to my ignorant posts. I see now that introducing telecoms was, on balance, more of a distraction than a benefit.
Back to my original comment ("A good infrastructure should be able to isolate local failures, irrespective of how heavily loaded it is; it's surely not a very challenging technical problem."), and a (hopefully!) wiser re-casting of it.
this is a 0-th order take; many devils - a.k.a. details - are licking their lips in anticipation of ambushes on the road ahead[/color]
Demand varies seasonally (~100 days characteristic time), weekly (~10 days), daily, and hourly. A significant part of this demand is predictable; much detailed historical data is available to characterise variance about (modelled) means within all periods.
Broadly speaking, supply is available to meet all but peak hourly demand. However, there are unplanned supply failures, and the characteristic time for indications of incipient failure ranges from days ("that unit sure has been acting strange!") to milliseconds (or less). Further, a great deal of historical data is available to characterise the root causes, frequency, and 'phenomenology' of all failure modes.
Technology to detect, analyse, and transmit useful information about demand, supply, and failure already exists. As long as the response times are greater than 1 second, 'pre-canned' or algorithmically-based automatic response decisions can be implemented. These automatic decisions can, in principle, be optimised according to a wide range of equipment, supply, demand, down-stream impact, ... conditions. These optimisations can be performed both 'off-line' (independent of the particulars of the event) or 'on-line'.
... and that's as far as technology could take us, in a reactive sense.
Proactively, we could fairly accurately characterise future demand, supply, and improvements in failure detection and remediation capabilities. Through risk analyses (crudely, prioritisation by the 'impact' metric - probability of event x cost of event), main areas to be addressed can be confidently identified (and research investment targeted to improving the probability and cost estimates of the top 3 risks, say). Installing, testing, and refining equipment, maintenance schedules, operations proceedures, etc then follows, using standard QA methodologies.
Finally, the key dimension, economics. Crudely, economics is all about how to better match supply and demand, though price. In the case of grid-supplied electricity, IMHO, there is enormous opportunity for basic economic principles to be better applied. For example, as wimms said "When you switch on consumer device, power starts to flow, and grid has no control over it other than cutting off completely". Yet no (residential) consumer has ever been asked what price they would be prepared to pay for 99% (or four/five/six/seven 9s) availability. With today's technology, I would guess, a multi-tiered set of service contracts could be easily implemented - from 'el-cheapo' electricity (but can have supply cut for up to 10 hours with no notice), to guarranteed 99.9999% availability and 10 seconds restoration in the event of failure (for a VERY large fee).
This is the kind of thing I was referring to when I said "the root cause is bad regulation and wilful ignorance of economics. Behind that there is, without a doubt, the hand of Big Oil [...]
A competitive market should be able to meet demand, unless the regulatory barriers are inefficient."