Expert Elicitation + Data-Driven Methods

Data-driven models can describe relationships observed in data. However, so far we have only covered how models can be used to describe past changes in technology performance. Now we want to look into the future, to forecast rates of technology performance improvement in order to help decide how best to invest time and money. Importantly, the goal of forecasting is to inform decisions while recognizing that we will always be uncertain about the future. Rather than trying to create perfect predictions, the goal is to use forecasts to make decisions that are more successful than those based on random guesses. In this section, we will discuss two approaches to forecasting: expert elicitations and data-driven forecasts. > [!info] > Expert Elicitation involves surveying a group of experts on their expectations for future improvements in technology. The results of this survey are then compiled, and a forecast is developed for a technology’s performance over a given time period (e.g., 1, 5, 10 years). Experts are typically asked to estimate the uncertainty associated with their predictions, and the overall uncertainty in the aggregate forecast is based on this set of these responses. > [!info] > Data-driven models involve observing relationships in past data as a basis for forecasting future changes. The steps include: > >1. Gathering relevant data. >2. Identifying a model that describes some of the variability in the data. >3. Developing a forecasting model that incorporates the observed trends and estimates the associated uncertainty. > One forecasting approach is to search for a general model that can be applied to many different technologies. In this case the uncertainty in the forecast can be estimated based on the uncertainty associated with the general model and with the particular technology being considered. # Expert elicitations Formal expert elicitations generally follow this procedure in generating forecasts: 1. Develop a survey, including questions on the technology or technologies of interest, and background information. 2. Assemble a group of experts to take the survey. 3. Harmonize the results from the survey and develop a forecast. > [!caution] > While we draw a distinction here between data-driven forecasting and expert elicitations, expert elicitations may involve data as well, since experts may base their predictions on quantitative or qualitative data. Some experts may rely on their intuition about industry trends, while others may analyze quantitative data. Experts may even develop their own data-driven forecasting models. An important distinction between expert elicitations and data-driven forecasts is that the forecasting models applied by experts are typically not known, while they are specified in the case of a data-driven forecast. The exception would be cases where surveys have been explicitly designed to collect information on experts’ internal models. > [!important] > In 2007, Curtright, Morgan, and Keith conducted an expert elicitation to predict prices and efficiencies for 26 photovoltaic (PV) technologies through 2030 and beyond. They identified 58 experts from research centers such as national laboratories, industry, and academia. Out of those, they selected 18, and they sent them an initial mailed survey and then conducted an interview with them. They published their results in 2008. >Then 13 years later, Meng et al. (2021) checked back: How did these projections compare with what actually happened? The figure below shows the results. The blue circles show the data, which goes up until 2019. The colored lines show varying data-driven projections going forward from them. And to the right, the box with black borders with whiskers indicates the 5th, 10th, 50th, 90th, and 95th percentiles of the 2007 expert elicitation results for 2030. >The graph shows that even the most optimistic of the experts didn’t predict just how low PV costs would drop, since the observed costs in 2019 (blue circles) are already below the lowest costs predicted by the experts for 2030. > ![[Pasted image 20250322105507.png]] > > [!Figure] > Model-based forecasts of PV electricity cost. Adapted from Meng, J., Way, R., Verdolini, E., & Diaz Anadon, L. (2021). Comparing expert elicitation and model-based probabilistic technology cost forecasts for the energy transition. _Proceedings of the National Academy of Sciences_, _118_(27). [URL](https://doi.org/10.1073/pnas.1917165118) > [!info] > **Recommended readings** > Read more about the about the methods and results used in the 2008 expert elicitation by Curtright, Morgan, and Keith in “[Expert assessments of future photovoltaic technologies](https://pubs.acs.org/doi/10.1021/es8014088)”. > > Learn more about the comparison of expert elicitation and data-driven methods published by Meng et al. in “[Comparing expert elicitation and model-based probabilistic technology cost forecasts for the energy transition](https://www.pnas.org/doi/epdf/10.1073/pnas.1917165118)”. # Data-driven forecasting Another approach to forecasting technological innovation is data-driven forecasting. Broadly speaking, data-driven forecasting consists of three steps: 1. Gather data: This step involves collecting and processing data on a technology's past performance changes. The data of interest includes measures of performance intensity, such as the cost per unit service, extending back as far as possible. Finding sufficient data is often challenging, as historical records may be limited. Additionally, there is often substantial uncertainty in the data due to sampling error (e.g., not including all manufacturers when tracking an industry average performance) or measurement errors (incorrect records of performance). 2. Select a model: This step involves identifying a model that describes the data and has a reasonable predictive power. Moore's, Wright's, and Goddard's Laws are examples of data-driven models that might be selected after evaluating how well they describe the data. ==An important approach to testing their predictive power is to perform hindcasting, where a portion of the data is used to train the model and the model predictions are then compared to the real values in another portion of the data.== Multiple different models may be fitted to the data sets describing each technology's performance over time, to see which one performs best for a single technology and across a set of technologies. 3. Develop a forecast: In this step, the model selected in Step 2 and the results of testing the model’s performance on a set of technologies are used as a basis for developing a forecasting model. The forecast should capture the expected trend and the model- and data-set-specific uncertainty. Some technologies may show more steady improvement while others may improve more erratically, and this should be captured in the forecast. ## Gathering data Developing a data-driven forecasting model requires data. The process of data collection is therefore an important step in developing a forecast. The International Energy Agency, for example, which publishes extensive international data on technology performance, identifies four main sources (IEA, 2014): 1. **Administrative sources:** Data collected by government agencies, industry associations, among others. These data may include data from surveys, measurements, and models. 2. **Surveys:** Data collected via questionnaires from a target population, e.g., vehicle manufacturers, and vehicle owners. 3. **Measurements:** Data collected by installing meters for direct measurement. 4. **Models:** Data produced from a model based on inputs and assumptions. Inputs may include the units of a technology sold and the revenue, while the assumptions may be about the efficiency of a technology or other operational features, or about market-related characteristics such as profit margins. Once data are gathered they must also be cleaned and harmonized before they can be analyzed. This ensures that any data that do not reflect the correct quantity (for example if there was human error in data entry) are excluded, and it puts all data in comparable form (for example, the units need to be the same, the currency needs to be the same, inflation needs to be adjusted for, etc.). ## Selecting a model As the phrase goes, all models are wrong but some are useful. Models are simplifications of reality intended to describe some real-world phenomenon in a way that is revealing and useful. They never reflect the full complexity of the real world, both because it would be impractically difficult to do so and because the result would be so messy and complicated that the model wouldn’t help us understand things very well. So whichever model you use, it won’t fit the data perfectly. That means we need a principled way of choosing which model best describes our data and is most likely to generate useful predictions. The key to doing this is to quantify how well each model approximates the variability in a data set (called the "goodness of fit") and how good their predictions are. We can then test the performance of each model and select the best one for our purposes. These approaches are discussed in more detail below. ### R-Squared Consider the plotted data set in the figure below: ![[Pasted image 20250322110456.png]] Having set the parameters for our model, we can then calculate R-squared to evaluate just how good our model is. R-squared compares the model to the simplest possible one, a constant function that is the mean of the y-values. The mean of the y values provides a useful baseline of comparison because it offers no information about the relationship of the y values to the x values in the data set. In other words, the mean y value remains the same no matter what the x value is. The line in the figure below displays the average of the data set’s y values. ![[Pasted image 20250322110604.png]] Now, the figure below shows the squares of the differences between the actual values and the mean. Comparing the two figures above, it is obvious at a glance that the linear model gives a better prediction with smaller resulting squares. So, the model fits the data better than the mean (at least by the definition of “goodness” we get by R-squared). ![[Pasted image 20250322110712.png]] To quantify how much better in a single number, we calculate R-squared. R-squared involves the following values: - Total sum of squares (SSTot): The sum of the squared differences between the actual values and the mean, represented in the figure below by the orange boxes. In our example, SStot is 948. ![[Pasted image 20250322110937.png]] - Residual sum of squares (SSres): The sum of the squared differences between the actual values and the values predicted by the model, represented in the figure below by the blue boxes. In our example, SSres is 15. ![[Pasted image 20250322111103.png]] R-squared is defined this way: $\large R_{squared} = 1 – SS_{res}/SS_{tot}$ Since we wouldn’t ever consider a model that’s worse than the mean, SSres will always be smaller than SStot; thus the ratio of SSres to SStot will be less than one, and R-squared will always be between 0 and 1. The smaller SSres is, the better the fit, and the smaller the ratio of SSres to SStot will be. That means that the closer R-squared is to one, the better the model fits the data. In our example: $\large R2 = 1 – (15/948) = 0.98$ This indicates that this model is a good fit. ## Moving beyond a linear model, using R-Squared The example we walked through above involved a linear model, but a similar process can be followed for fitting power laws (the functional forms in Wright’s and Goddard’s Laws), and exponential curves (the functional form in Moore’s Law) to data. In those cases we would take the log of cumulative or annual production and the log of cost (or the performance metric of interest). This is because taking the log of the x- and y- axis values in the case of a power law, and taking the log of the y-axis values in the case of an exponential curve, converts the models to linear models. This effect can be observed comparing figure below (left), in which the data are presented on a linear scale, to the figure on the right, in which they are presented on a log scale. ![[Pasted image 20250322111633.png]] > [!Figure] > (left) Data showing cost over cumulative production on a linear scale. (right) Data showing cost over cumulative production on a log scale, with a linear model that has been fitted to it. Having performed this log transformation, we can then proceed with linear regression applying the least squares approach described above. ## Measuring prediction error out of sample ### Out-of-sample testing Our aim in building a model isn't just to fit a set of data well—our hope is that it will predict future data points. One way to test that is to set the parameters (i.e., "train the model") using a subset of all the available data. Then you can check it against additional data and see how well it predicts those results. This is called "out-of-sample testing". This process enables us to observe the difference between the predicted and the observed data, known as the model’s prediction error. If the model is able to predict the test data well, yielding a small prediction error, we have more confidence in the model’s predicted values of data that have yet to be collected. ![[Pasted image 20250322111939.png]] The figure above gives an example of this, with the training data in blue and the test data in orange. The model is trained on the training subset, excluding the test subset, as displayed below: ![[Pasted image 20250322112019.png]] The slope of the line for the model is determined by minimizing the sum of the squares of the distances of the training data only. We can then apply the model to all the data, including the test data, as shown in the figure below: ![[Pasted image 20250322112059.png]] ## Hindcasting When we’re dealing with time-series data, out-of-sample testing gives us a way to essentially time travel. We imagine we’re at an earlier moment, train our model on the data we have up to that point, and then we test the model to see how it performs going forward. This is called “hindcasting”. We can even travel back in time again and again, performing multiple iterations of hindcasting on a single data set. We start with a training subset consisting only of data from the earliest period in the data set and project farther out into time, concluding with a training subset that includes nearly the entire period for which data has been collected and projecting for a relatively short period. The figure below displays a three-part iteration process for our example data set. As one might expect, as more data is included in the training subset, the more closely the model predicts the values of the test subset. If the model changes enormously as it’s trained on more data, we might have less confidence in its predictions, while if the model stays pretty stable, we might regard its predictions as being more likely. And more generally, the process provides information about how accurate the model can be expected to be over a certain time horizon given a certain amount of data. ![[Pasted image 20250322112239.png]] > [!Figure] > Three iterations of hindcasting on a data set ## Quantifying uncertainty Quantifying uncertainty is a key step in developing a robust forecast. For a forecast to be useful, we need to know how much confidence to put in it. A forecast for the near future based on a trend that has held up over a long period of time with a lot of data is likely to be more accurate than one for the distant future based on a small amount of data for a short time. Quantifying that degree of uncertainty will help guide better decision-making. There are a number of ways of visualizing uncertainty. The figure below, for example, illustrates the prediction errors for Moore’s, Wright’s, and Goddard’s Laws over various time horizons on a single data set. The x-axis gives the origin year—in other words, the last year for which we’re including training data, which we can imagine as the “now” from which we’re forecasting. The y-axis gives the target year —the time for which we’re making our prediction. And the z-axis gives the level of uncertainty. The graph shows a “mountain of error,” with its precise shape and size describing the level of uncertainty. Of course, we’ve all observed that the further into the future a forecast goes, the greater the uncertainty—a weather forecast that says it will rain in ten minutes is more certain than one that says it will rain in two weeks. Notice that this is reflected in these graphs, which compare the uncertainty from Moore’s Law, Wright’s Law, and Goddard’s Law in forecasting a particular dataset: The lower we are on the x-axis and the higher we are on the y-axis, the higher the z-axis will be. ![[Pasted image 20250322112432.png]] > [!Figure] > Mountains of error. Adapted from Nagy, B., Farmer, J. D., Bui, Q. M., & Trancik, J. E (2013). Statistical basis for predicting technological progress. _PLoS ONE_, _8_(2). [URL](https://doi.org/10.1371/journal.pone.0052669) The analysis of prediction errors provides information on the uncertainty associated with a given model and a given dataset. We can use this information in a variety of ways. For example, it may be observed with a particular model and data set that the prediction error tends to grow in some regular way with the forecasting horizon. Provided that the data set for the analysis is sufficiently large and dependable, we can use this information to inform the development of a forecasting model. ## Avoiding overfitting The [[Data-Driven Models|models]] introduced before (Moore’s, Wright’s, and Goddard’s Laws) are all quite simple in terms of the number of parameters that need to be estimated through the fitting process described above—each has just two parameters. Having only two parameters to play with limits how tightly the model can fit the data. It can be very tempting to choose a more complex model with more parameters so that you get a model that fits the data beautifully. For forecasting, though, it’s a dangerous temptation that must be resisted. The problem is that the more parameters you include, the more choices you have about which function to use and, hence, what its predictions will be. And the more choices you have, the more the results will be determined by that choice rather than by the data itself. Essentially, you’ll end up identifying patterns in the training data that are spurious and won’t hold up as new data comes in. The simplicity of the linear model makes it more likely to hold predictive power, even though it fits the existing data less well. Models incorporating more parameters than the data can justify are said to be overfitted. Because overfitted models are unlikely to have predictive power, maximal parsimony is a desirable trait in model selection. > [!cite] > _With four parameters I can fit an elephant, and with five I can make him wiggle his trunk._ > John von Neumann John von Neumann, one of the greatest mathematical minds of the 20th century, reportedly said that with just four parameters he could find patterns in a dataset conjuring something as outlandish as an elephant (where in reality there was no elephant). He went on to say that, with a fifth parameter, he could even make the elephant wiggle its trunk. What he meant was that ==one should beware of overly complex models, meaning models that have too many parameters, because they can find structure in a dataset where there is none.== This issue can be hidden when the goodness of fit is examined using only the R-squared value, which may only indicate that the overfitted model matches the data quite closely. Hindcasting is a much more powerful tool to see overfitting. As more recent data is added, an overfit model will produce wildly varying errors, with no identifiable trend over time. An overfit model has low predictive power and is therefore not useful for our purposes of forecasting technology performance. >[!tip] >The key is to make our models simple but without losing meaning related to the question we are attempting to answer. In the context of the data-driven models we are discussing here, the amount of data we have constrains how complex we can make the model. ==The more data we have, the more parameters we can include in our models and still reliably estimate their values. The less data we have, the fewer parameters we should include in our models.== Wright’s, Moore’s, and Goddard’s Laws each try to find the right balance between model complexity, the available data, and the question of interest. But this is a work in progress. As more data becomes available on technology change trajectories, other data-driven models may well emerge that perform as well or better than the models we have introduced so far. # Conclusion Anticipating future trends in technology performance using principled forecasts allows decision-makers to strategize more effectively to achieve their desired outcomes. Forecasts can help them set achievable technology goals, identify promising technologies, create strategies based on likely technological developments, and many other possibilities. While it is clear from these examples that forecasting has often been used implicitly or explicitly, the data-driven technology forecasting we have covered in this section can be applied far more broadly. The potential for further applications of technology forecasting will be explored as we proceed through the rest of the course.