As your data acquisition system comes online, begin data collection so you can evaluate and train the model(s) you will use.
Main Points
- Even before you have begun to build your forecasting system, you can collect historical data and ensure it’s in the format necessary for your forecasting system.
- The more data you have, the better statistics you will have for training your forecast model.
- If your forecast system requires new data collection, you can start testing that system as it comes online.
First Read This
Even before your forecasting system is built, you can begin the process of collecting the necessary data and placing it in the data repository for later use. As a reminder, the types of data needed are listed below. Specific variables are listed in the previous section.
WEATHER FORECAST is the primary input. For wind energy, forecasts of the following weather parameters are of interest: wind speed, wind direction, and air density—all preferably at the generator hub height. For solar energy, forecasts of the following weather parameters are of interest: irradiation, temperature, cloud cover, and precipitation.
HISTORICAL DATA pertaining to generation and weather are the second input and are used in algorithms to infer seasonality, trends, and conditions in the immediate past.
GENERATOR PLANT DETAILS are static data about the VRE generating plant, such as total installed capacity, number of units (number of wind turbines for wind plants or number of inverters and number of modules for solar plants), the capacity, latitude/longitude, and characteristics of each unit (power production versus wind speed, power production versus solar radiation and temperature), overall plant losses, on-site weather measurement stations (list of parameters measured), and others.
GENERATOR AVAILABILITY FORECAST is the last input and is the amount of installed capacity available for production for each time block in the forecast horizon. This input is provided by the VRE
generation facility and depends on the scheduled and unscheduled maintenance of the VRE plant, planned curtailments, and other factors. The inputs and outputs are the same regardless of whether the VRE forecast is plant-level or substation level. For the substation level, the VRE forecast would require input data from each of the connected VRE plants.Excerpt from page 7 of Scaling Up Renewable Energy Project Grid Integration Series: Variable Renewable Energy Forecasting by USAID.
Next, Read This
One way in which forecast accuracy can be improved is by using historical data and statistical methods to train the forecasting model you select.
On-site observations of power production or the renewable resource (e.g., irradiance or wind speed at hub height) are critical for helping the FSPs [forecast service providers] statistically “train” their forecast models and thus reduce error and bias in the forecasts. Good quality data is critical. ”Good quality” means that the data does not, for example, contain many gaps or unrepresentative values. Curtailed power data should be accompanied by plant availability or a curtailment flag. Data time intervals should be regular and there should be a clear documentation of the units, how the observations were averaged, the time zone of the data, and whether there’s a shift in time due to daylight savings time.
…
For locations in which there are significant seasonal differences in weather conditions and the associated renewable generation levels and variability, it is best to provide 12 months or more of historical data from the target generation facilities to the FSPs for the purpose of training their forecast models. However, if it is not feasible to make this amount of data available or if the target location does not exhibit much seasonal variation, most FSPs can typically train their forecast models reasonably well with 3-6 months of on-site historical observations.
It should be noted that advanced machine learning methods often exhibit significantly greater performance improvement over less sophisticated methods as the training sample size increases. Thus, FSPs that employ the latest and most advanced machine learning prediction tools may not be able to demonstrate the ultimate value of their approaches, if only short historical data sets are provided. If 6-12 months of data are not available, the trial operator might consider another location or conduct a longer trial on the order of 4-6 months.
Excerpts from pages 14 – 15 of Recommended Practice for the Implementation of Renewable Energy Forecasting Solutions – Part 2: Designing and Executing Forecasting Benchmarks and Trials by IEA Wind Task 36
Now See This Example

Read Excerpt: Page(s) 13-14 of Using Forecasting Systems to Reduce Cost and Improve Dispatch of Variable Renewable Energy by ESMAP.
Suggested Actions & Next Steps
- Begin the data collection process and test the data flow of generator details, weather data, and other necessary information needed for the forecasting system.
- Ensure that any upgrades or changes you have made to your data collection systems are working properly.
- Identify any gaps in data collection versus the needs you previously identified and plan how to address them.
- Begin statistical analysis on the data in support of training the model to improve its performance.