Yes compilers have become better in optimising code, but everything after has stayed the same. The linking phase of C / C++ programs is still largely a brute force operation, including everything the program might need and very often code that never will be executed. This leads to enormously bloated programs, that have to be:
- stored in non-volatile storage, and
- in the RAM of the system that execute them.
A simple “Hello World” might need a few Mbytes and links in 10000’s of functions. While this is less of an issue in desktop type systems, due to them having ample of cheap (D)RAM available and reasonably sized caches, the latter is not true for embedded systems, which represent the ever growing bulk of computer driven systems on the planet.
Needing a lot of (D)RAM does not only cost money, but also energy, because (D)RAM needs to be continuously refreshed and operates often with 100’s of wait states compared with the superfast GHz CPUs. Thus this becomes part of the power wall we are currently hitting. And to follow Moore’s law, the only way forward is more parallel processing cores on the same die, even if that doesn’t increase the access speed to the external (D)RAM. In the end, chips are pin-bound. Performance is cache bound on such chips and therefore code size still matters.
However this is only a side line of the real problem that developers hit today, when trying to exploit the parallelism. First of all the approach used today with threading is a difficult to get right approach, due to the state-space exploding easily beyond what a single developer can keep in his head, and traditional testing cannot cope with this. The situation is worsened by the fact that most thread synchronisation mechanisms are hard to get right. However, there is good news.
The problem has already been solved over 30 years ago, by C. A. R.