Historical Data Accuracy

Bias and error validation of Solcast historical data against surface measurements
Historical and TMY

The purpose of this validation analysis is to enable users to estimate Solcast’s historical timeseries accuracy for their site(s) prior to subscription or integration effort. It is based on data across 15 years, compared to surface measurements from high-quality measurement sites. Also included are error statistics, and benchmarks against a common alternative. Users focused on live and forecast data will find a separate live and forecast data analysis.

For brevity, a selection of commonly used statistics are included, summarised by climate type. For a full range of statistics, at site level, and to request the raw data, please contact the Solcast team.

Executive Summary

A global, accuracy verification analysis of Solcast Version 2.0 Historical Data is performed using surface measurements from 70 sites spanning a range of climate types and latitude zones. The measurements span 2007 to 2021. Results were compared to the commonly used ERA5 reanalysis dataset (publicly available from the European Centre for Medium-Range Weather Forecasts, ECMWF).

The mean bias in estimated actual GHI across all the measurement sites is -0.1%, compared with the ERA5 which has a +2.4% bias. The hourly-average MAPE of the Solcast GHI estimates across all sites is 10.7% compared with 18.4% for ERA5.

For DNI, bias across all sites is +1.3% compared to ERA5 values of +23.3%, and hourly average MAPE across all sites is 23.6% compared with ERA5 values of 46.9%.

Metric Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 ercentile
MAPE
Mean & 10/90 percentile
Solcast GHI
(all 70 global sites)
-0.1%
(-2.4% to +2.2%)
2.2% 17.5%
(10.7% to 23.9%)
10.7%
(6.5% to 15.0%)
Solcast DNI
Global average
(all 70 global sites)
+1.3%
(-7.9% to +10.5%)
6.9% 40.4%
(24.3% to 58.4%)
23.6%
(15.0% to 32.9%)

Historical data products

Solcast operates a global cloud opacity estimation system, using satellite imagery from 11 weather satellites, supplemented with atmospheric information from global NWP. These inputs are used to make irradiance data, which is distributed via the Solcast API (Application Programming Interface), which enables automated, synchronous data requests for any point on earth.

This analysis focuses on historical data, giving our best estimate of what the irradiance was, between 2007 and the present, at time granularities of 5, 10, 15, 30 and 60 minutes. Coverage is all non-polar continental areas and nearby islands, with spatial resolution of 2km.

Accuracy verification methodology

Accuracy is typically the most important product attribute amongst commercial users of the Solcast API. Customer evaluations or trials can produce definitive results; however they require a large amount of resource, expertise, and elapsed time to gather and analyse the data, and to quality control the surface measurements.

The purpose of this analysis is to enable users to estimate Solcast accuracy for their site(s) prior to subscription or integration effort. It is based on data across 15 years, compared to surface measurements from high-quality measurement sites. Also included are error statistics, and benchmarks against a common alternative. Users focused on live and forecast data or TMYs will find a separate analysis on the Solcast website. This document analyses global horizontal irradiance (GHI) and direct normal irradiance (DNI). A selection of commonly used statistics is included, summarised by climate type.

Measurement site selection

This analysis uses research grade, quality-controlled surface measurements from a variety of sources, including but not limited to, the Baseline Surface Radiation Network (BSRN; global), the Surface Radiation Budget Network (SURFRAD; US) and the enerMENA Meteo Network (Middle East and North Africa). The sites used were selected for (1) quality, characterised by robust calibration and maintenance standards and quality control of data - this is challenging for irradiance measurements, where calibration and maintenance often cause large errors; (2) recency, i.e. data is available for recent periods so that the analysis can use recent Solcast algorithmic configuration, and for relevance to newer customer sites; (3) availability, as much as possible the measurements should be non-private, and readily available to users who may want to replicate the results; and (4) broad geographic and climate-type coverage, so that users can estimate accuracy for their own sites.

As well as having automated quality control applied to the measurements, all the measurement data was subjected to additional manual quality control to censor data which exhibited anomalous characteristics. Even highly regarded datasets often have periods of sensor drift, inconsistent diffuse and direct components, sensor obstruction, and sensor malfunction, even after standard automated quality control is applied. This process is of utmost importance to ensure that the verification results are as representative as possible of the true irradiance, rather than the quirks of the instrument at the time.

Sites with very high elevation (over 2000 metres) have been excluded due to limited applicability to solar energy applications. Otherwise, all sites with data available during the analysis period are included. A total of 70 sites are included. For climate type categorisation, two dimensions are used: (1) the latitude zone (Tropical/Subtropical being latitudes equatorward of 35 degrees, and Temperate being latitudes between 35 and 60 degrees north and south (no sites poleward of 60 degrees are included); and (2) site climate type based on annual average precipitation (Humid being sites of greater than 750mm per year, Arid sites are less than 350mm per year, and Semi-Arid sites are between 350mm and 750mm per year) based on the CPC Merged Analysis of Precipitation (CMAP) from NOAA. Solcast does not perform site-specific adaptation of its satellite-derived data (i.e. measurements from a site are not specifically used in the dataset for that site) although site measurements have been used for algorithm-tuning at a regional level.

Map of measurement sites included in the analysis. Of the 70 included sites, a total of 35 are designated Tropical/Subtropical (21 Humid, 5 Semi-Arid, 9 Arid). A total of 35 are designated Temperate (13 Humid, 16 Semi-Arid, and 6 Arid).

Benchmark data

ERA5 reanalysis data is used as a benchmark. ERA5 is a publicly available, high quality, and well-regarded reanalysis dataset provided by ECMWF (European Centre for Medium-Range Weather Forecasts).

ERA5 data is at 1-hourly resolution on a 0.25 by 0.25 degree grid. Data was bi-linearly interpolated to the verification site locations, reprocessed from accumulation to hourly averages, and in the case of ERA5 parameter fdir reprojected from a plane horizontal to the ground to a plane perpendicular to the solar beam (i.e. converted to DNI). The native hourly temporal resolution was unchanged to ensure fair comparison.

Verification analysis method

The measurements span 2007 to 2021, with different periods of measurement available for different sites due to availability and data removed after quality control. This gives an average of 7 years-worth of data per site. All data were converted to hourly means prior to analysis.

For each individual site, a range of error statistics were calculated following “Metrics for Evaluating the Accuracy of Solar Power Forecasting” (NREL, 2013) for each of GHI and DNI irradiance. The irradiance statistics were normalised by the daily mean irradiance for each site. This enables sites with differing climates to be compared and makes irradiance more comparable. Data was confined to sun-up hours to ensure that reported performance was not inflated by the trivial 0 W/m^2 estimate during night-time hours.

Accuracy verification results

GHI estimated actuals accuracy

The following table shows statistics for the normalised bias, Mean Absolute Percentage Error (MAPE) and normalised Root Mean Square Error (nRMSE), as defined in the above-mentioned NREL 2013 analysis, of GHI.

Errors in Estimated Actuals for GHI

Data: hourly average, nocturnal zeros excluded
Site type Estimate Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 percentile
MAPE
Mean & 10/90 percentile
Tropical/Sub-Tropical, Arid & Semi-Arid
(14 sites)
Solcast +0.1%
(-4.2% to +3.5%)
3.2% 11.5%
(6.9% to 16.5%)
7.3%
(4.1% to 10.1%)
ERA5 +1.6%
(-1.8% to +5.7%)
3.3% 19.0%
(12.2% to 27.7%)
11.5%
(6.4% to 18.2%)
Tropical/Sub-Tropical, Humid
(21 sites)
Solcast +0.5%
(-2.0% to +3.6%)
2.3% 17.9%
(13.4% to 22.5%)
11.3%
(8.3% to 15.0%)
ERA5 +2.9%
(-1.6% to +7.4%)
4.9% 32.3%
(24.7% to 38.8%)
21.2%
(16.0% to 26.0%)
Temperate, Arid & Semi-Arid
(22 sites)
Solcast -0.7%
(-2.3% to +1.4%)
1.5% 19.7%
(12.7% to 27.3%)
11.8%
(7.0% to 16.0%)
ERA5 +2.9%
(-2.7% to +8.6%)
5.1% 31.1%
(21.0% to 41.2%)
18.6%
(12.6% to 23.5%)
Temperate, Humid
(13 sites)
Solcast -0.1%
(-2.3% to +1.4%)
1.6% 19.6%
(13.7% to 26.0%)
11.4%
(7.4% to 15.0%)
ERA5 +1.4%
(-1.6% to +5.0%)
3.2% 34.4%
(28.1% to 41.3%)
21.2%
(15.6% to 26.1%)
Global average
(all 70 sites)
Solcast -0.1%
(-2.4% to +2.2%)
2.2% 17.5%
(10.7% to 23.9%)
10.7%
(6.5% to 15.0%)
ERA5 +2.4%
(-2.0% to +7.5%)
4.4% 29.7%
(18.1% to 40.0%)
18.4%
(9.5% to 25.4%)
DNI estimated actuals accuracy

The following table shows statistics for the normalised bias, Mean Absolute Percentage Error (MAPE) and normalised Root Mean Square Error (nRMSE), as defined in the above-mentioned NREL 2013 analysis, of DNI.

Errors in Estimated Actuals for DNI

Data: hourly average, nocturnal zeros excluded
Site type Estimate Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 percentile
MAPE
Mean & 10/90 percentile
Tropical/Sub-Tropical, Arid & Semi-Arid
(14 sites)
Solcast -2.1%
(-12.5% to +7.3%)
8.0% 25.8%
(18.2% to 35.0%)
17.0%
(11.2% to 23.5%)
ERA5 +13.8%
(+1.6% to +29.2%)
13.4% 43.1%
(28.6% to 66.1%)
31.7%
(18.3% to 50.7%)
Tropical/Sub-Tropical, Humid
(21 sites)
Solcast +4.4%
(-6.2% to +13.7%)
7.7% 41.2%
(27.3% to 57.1%)
25.6%
(18.4% to 35.6%)
ERA5 +32.4%
(+13.2% to +58.8%)
22.9% 74.1%
(45.4% to 98.1%)
55.1%
(33.8% to 75.9%)
Temperate, Arid & Semi-Arid
(22 sites)
Solcast -0.8%
(-6.6% to +5.0%)
4.9% 45.8%
(24.8% to 62.9%)
25.2%
(14.9% to 32.3%)
ERA5 +21.8%
(+8.0% to +26.6%)
21.4% 69.4%
(42.7% to 94.7%)
46.8%
(30.6% to 61.7%)
Temperate, Humid
(13 sites)
Solcast +3.4%
(-1.5% to +9.7%)
4.9% 45.8%
(29.2% to 58.9%)
24.5%
(15.4% to 31.8%)
ERA5 +21.6%
(+13.1% to +37.2%)
11.3% 73.8%
(54.5% to 99.2%)
50.1%
(34.3% to 70.1%)
Global average
(all 70 sites)
Solcast +1.3%
(-7.9% to +10.5%)
6.9% 40.4%
(24.3% to 58.4%)
23.6%
(15.0% to 32.9%)
ERA5 +23.3%
(+6.5% to +47.6%)
19.8% 66.4%
(37.2% to 95.1%)
46.9%
(25.2% to 69.8%)

Historic Data Products

Time Series
The complete suite of irradiance and weather data required for effective monitoring, operation, and forecasting at your large-scale solar farm.
Typical Meteorological Year (TMY)
The complete suite of irradiance and weather data required for effective monitoring, operation, and forecasting at your large-scale solar farm.