# The Machine that Designs the Machine
> [!warning]
> This section is under #development
All in all, engineering still requires humans to make decisions in a coordinated manner. If humans were to be removed from the decision-making point of view and replaced by some sort of algorithm, tech organizations could go "robotic" and be controlled by computers. This would need to include, of course, the design process, which means a product would be programmatically designed by said algorithm from a set of rules or specifications, or measured market fit. This algorithm could, for example, analyze sales, analyze competitors, analyze market share, and decide when to make changes, to meet pre-revenues set points, for instance, by launching a new product.
No matter how advanced computers are today, running a company and designing complex systems remains a very human-centered activity. Computers help, and they do help a lot, but they participate in doing repetitive and computing-intensive tasks we humans do not want to do ourselves very fast and efficiently. However, computers still do not design. Capturing and codifying knowledge in ways computers can understand could pave the way for more AI-driven decisions during the design process and at the organizational level in the future.
All we know is that, at least currently, all AI runs on top of digital systems full of CPU cores, GPUs, a plethora of PCBs, cables, racks, cabinets, routers, hubs, and whatnot. All these run on top of the research done by humans in the last 300 years. If an AI were to take over, a few questions remain:
- Will the machine pursue a profit? Or are products purely a human thing? Is the transactional nature of the global economy system compatible with machines?
- Will the machine design things of high quality? Is quality only a human thing?
- Will the machine design intuitive objects? Will the machine design objects for humans, or for other machines?
- Will the machine design simple things?
- Will the machine that designs the machine create secure machines? Will machines spy on each other?
- Will machines have to agree on standards for them to interoperate?
- Will machines do their research on new materials and methods to overcome the physical limitations faced by systems with the current state-of-the-art technology in terms of data speeds, energy generation, and consumption? Will they refute existing theories?
- Will the machine have a sense of sustainability? Will the machine care about the environment?
- What will it take for the machine that designs the machine to replicate itself?
Moreover, for a machine that intelligently designs another machine will need to deal with the fact supply chains are another human-made invention and very analog in their nature. For instance, an algorithm creating a digital system will need to produce its own [[Semiconductors|semiconductors]], design and produce [[Printed Circuit Boards|PCBs]], deal with manufacturers' tolerances and [[Semiconductors#Process Design Kits (PDKs)|PDKs]], [[Dependability, Reliability, and Availability|reliability]], [[Software Bugs, Glitches, and The Big Lie Behind Unit Testing|bugs]], and all the pains we suffer when we create systems. This also means machines will need to either pay for software licenses to design machines, or create/code their own tools.
## Are We Drowning in Data?
As we speak, petabytes of data are stored only to be ignored forever. It may be time-series telemetry from a machine, sales data from an e-commerce platform, or video footage from a camera pointing to a bucolic street. Neglecting data is a growing problem: we generate more and more data, but we humans are still instrumental in performing all sorts of data [cleansing](https://en.wikipedia.org/wiki/Data_cleansing), [wrangling](https://en.wikipedia.org/wiki/Data_wrangling), tidying^[https://www.jstatsoft.org/article/view/v059i10], and [feature engineering](https://en.wikipedia.org/wiki/Feature_engineering) for algorithms to perform better. No matter how sophisticated or complicated our machine learning algorithm might be, it still requires a human brain equipped with good domain knowledge to help the algorithm be aware of the nuances they cannot parse by [themselves](https://towardsdatascience.com/stacking-machine-learning-models-for-multivariate-time-series-28a082f881). In the words of Robert Monarch: no algorithm survives bad data^[Robert Monarch is the author of “Human In the Loop Machine Learning”: https://www.manning.com/books/human-in-the-loop-machine-learning].
However, data wranglers and feature engineering experts are not growing at the same rate as data is growing. Therefore, just like an overflowing sink, raw data is filling hard disk drives everywhere. Ironically, such data “waste” is actively backed up, that is, ignored not once but several times, just in case. You never know.
This data _surplus_ appears as a ‘good problem’ to have: it is better to have more data than less data. But, is it? Data seems to be showing what economics calls _marginal utility._ This factor has been illustrated by the famous diamond-water paradox, where an essential element for life like water is comparatively cheaper than an object with less practical use like diamonds. A way to look at this paradox is to apply the simple principles of supply and demand. The availability of water at no marginal cost^[In economics, the marginal cost is the change in the total cost that arises when the quantity produced is incremented; i.e., the cost of producing additional quantity.]—although many would argue that this is not entirely true—relative to demand means that the equilibrium price^[The equilibrium price is where the supply of goods matches demand] will be low or negligible for water. Diamonds, on the other hand, are high in demand and are expensive to find and produce so the supply is limited and the intersection of the supply and demand curves occurs at a high price. Hence water is cheap and diamonds are dear.
Data follows a similar pattern. The more data there is, the less the value of a single "unit" of extra data generated. Now, during a data drought—typically during failures or malfunctions—every single bit of information becomes a matter of life or death.
You would think data is *always* valuable, as everyone appears to make it look. But data can show effects intangible assets show, such as sunkenness^[For more details about intangible assets, see _Capitalism without Capital: The Rise of the Intangible Economy_ by Jonathan Haskel and Stian Westlake]. Certain types of data are difficult to liquidate or sell to others, because such data is only worth, if worth at all, to its creator: think of an IoT-equipped fridge temperature data stored in the cloud. For free, any data scientist would accept practically any data set as a gift. For a price, that’s a different story.
Also, [research](https://www.di.ens.fr/users/longo/files/BigData-Calude-LongoAug21.pdf) has provided proof that as data sets grow larger they have to contain arbitrary correlations. These correlations appear simply due to the size of the data, which indicates that many of the correlations will be [spurious](https://www.tylervigen.com/spurious-correlations). Too much information tends to behave like very little information.
And, last but not least, data gets old. For instance, earth observation data loses value as time passes and as the scenery under observation changes. Try using Google Street when the imagery available is 10 years old or more and the buildings and features of the landscape all have changed.
In times where Machine Learning and [[Artificial Intelligence|Artificial Intelligence]] are the buzzwords everyone wants to spout in their marketing materials, it is good to think about the practical limits behind data hoarding and the effort—human effort—required to improve data quality for algorithms to perform better.