Historical Data Accuracy

Bias and error validation of Solcast historical data against surface measurements
Historical and TMY

These validation analyses assist users in evaluating the accuracy of Solcast’s historical time series data. This can be useful for estimating expected accuracy for your own region(s). These studies compare to quality-controlled surface measurement data from across the world. Users focused on live and forecast data should refer to our separate live and forecast data analysis. For users interested in TMY data, these historical analyses are the relevant resource, since TMY data is sourced from the historical time series. Further information on Solcast’s TMY methodology is also available here.

The most recent study, the DNV bankability report, was completed by DNV in 2023 and published in early 2024. This study reviewed methodology and validated GHI and DNI against measurements from 207 sites globally.

An earlier study was completed by Solcast in 2022 on 70 sites, for transparency, a summary of the details of that validation is contained at the bottom of this page. The bias and error statistics of the DNV and Solcast studies are very similar.

A selection of commonly-used statistics from each study are included here. For a full range of statistics, including at site level, and to request the raw data from the Solcast report, please contact the Solcast team.

DNV Bankability Report

Executive Summary

The full report was published by DNV in February 2024. The study continued on validation work DNV started prior to acquiring Solcast in 2023. DNV reviewed Solcast’s irradiance and TMY methodology and conducted a global validation analysis of the Solcast historical time series (HTS) GHI and DNI data against surface irradiance measurements. Data was included from 207 sites of which 53 sites are considered highest quality, based on cleaning schedules, data quality and sensor classification.

The data used in the validation analysis was subject to an initial and secondary quality assurance process, and sites have been classified and grouped by continent, according to the World Bank approved administrative boundaries, and by Köppen-Geiger-Photovoltaic (KGPV) Climate Zones.

The study considers three main metrics per site and this data is provided for all sites in the full report: Mean Bias Difference/Error (MBD), Mean Average Difference/Error (MAD) and Root Mean Square Difference/Error (RMSD). The key findings for bias were:

Data Mean Bias Standard Deviation
GHI - Highest Quality Sites +0.05% ±2.05%
GHI - All sites +0.33% ±2.47%
DNI - All sites +1.5% ±5.75%

The report concluded as follows: “DNV finds the Solcast methodology to be consistent with industry best practices. DNV finds the results of the validation study to be within expectations and that the Solcast data is suitable for use to accurately predict project energy yield in energy assessment and for use in energy assessment for project financing purposes. The Solcast bias is considered low and there are limited variations between different regions and climatic zones for both GHI and DNI.”

Review of Solcast Methodology

DNV reviewed the Solcast irradiance and TMY methodology and found that it is consistent with industry best practices. The study assessed the varying inputs into the Solcast algorithm including our sources of geostationary weather satellite imagery and weather model data. Models were reviewed and validated as separate components:

  • The cloud model: Detects and characterises clouds from satellite imagery
  • The clear sky model: Calculates irradiance under clear sky conditions
  • The separation model: Decomposes global horizontal irradiance into diffuse and direct components
  • The transposition model: Converts irradiance on a horizontal surface to plane of array irradiance (GTI was not validated in this study)
  • Terrain shading model: Accounts for beam blocking and reduced sky view due to horizon terrain

Review of Solcast’s TMY methodology was also included along with the model reviews.

Validation Methodology

Data used in the study came from publicly available data sources, measurement stations used for project development and measurements from operational assets. Site selection focussed on achieving the best possible global coverage, specifically including most major solar markets. All sites provided data at a resolution of 1 hour or finer.

Sites were classified according to World Bank administrative boundaries for region, and using Köppen-Geiger-Photovoltaic (KGPV) classification for climate zones. These classifications were used to provide regional and climate zone breakdowns of validation data, as accuracy of solar irradiance estimates are expected to vary between regions and with climatic conditions. A breakdown of the results in this way is present in the full report.

Every site included in the study passed two rounds of quality assurance and additional exclusion criteria. Quality assurance processes ensured that data was excluded which exhibited unphysical qualities, sensor drift or calibration errors or was otherwise unsuitable. Exclusion criteria applied to sites that were at a very high elevation, at polar latitudes, did not provide sufficient time coverage or fell outside satellite coverage boundaries. This process ensures that sites used are both accurate and representative of typical solar asset locations.

A total of 207 sites were used for the validation study. These sites passed through both stages of quality assurance and the exclusion criteria. GHI measurements were available for all sites, shown below, however measurements of DNI are less frequently available and were available for a total of 117 sites, also below.

GHI.png DNI.png

DNV identified a total of 53 highest quality sites that were confirmed to have Class A pyranometers that were cleaned a minimum of every 2 weeks. The measurements at these locations are expected to be more accurate and validation performed at these locations more indicative of the performance of Solcast’s estimates than for other locations.

GHI-High Quality.png

Metrics were calculated against hourly values of irradiance for all locations. Equations for the metrics used can be found in the full report.

Key Results and Findings

The key results and findings of the results are below. Data for confidence intervals, regional breakdowns, and climate zone breakdowns are available in the full report.

Highest Quality GHI Sites All GHI Sites All DNI Sites
No. of sites 53 207 117
Mean Bias +0.05% +0.33% +1.50%
Bias Std. Dev. ±2.05% ±2.47% ±5.75%
80% CI Bias (10% to 90%) -2.57% to 2.67% -2.84% to 3.50% -5.87% to 8.86%
90% CI Bias (5% to 95%) -3.31% to 3.41% -3.74% to 4.40% -7.96% to 10.95%
Mean nMAD (nMAE) 10.37% 10.33% 19.97%
Std. Dev. nMAD (nMAE) ±5.03% ±3.72% ±5.94%
Mean nRMSD (nRMSE) 16.16% 15.99% 31.51%
Std. Dev. nRMSD (nRMSE) ±7.94% ±5.74% ±9.99%

See how these results compare to both free and paid alternatives in the market here.

Solcast 2022 Validation Study

Executive Summary

A global, accuracy verification analysis of Solcast Historical Data was performed in 2022, using surface measurements from 70 sites spanning a range of climate types and latitude zones. The measurements span 2007 to 2021. Results were compared to the commonly used ERA5 reanalysis dataset (publicly available from the European Centre for Medium-Range Weather Forecasts, ECMWF).

The mean bias in estimated actual GHI across all the measurement sites is -0.1%, compared with the ERA5 which has a +2.4% bias. The hourly-average MAPE of the Solcast GHI estimates across all sites is 10.7% compared with 18.4% for ERA5. For DNI, bias across all sites is +1.3% compared to ERA5 values of +23.3%, and hourly average MAPE across all sites is 23.6% compared with ERA5 values of 46.9%.

Metric Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 ercentile
MAPE
Mean & 10/90 percentile
Solcast GHI
(all 70 global sites)
-0.1%
(-2.4% to +2.2%)
2.2% 17.5%
(10.7% to 23.9%)
10.7%
(6.5% to 15.0%)
Solcast DNI
Global average
(all 70 global sites)
+1.3%
(-7.9% to +10.5%)
6.9% 40.4%
(24.3% to 58.4%)
23.6%
(15.0% to 32.9%)

Measurement site selection

This analysis uses research grade, quality-controlled surface measurements from a variety of sources, including but not limited to, the Baseline Surface Radiation Network (BSRN; global), the Surface Radiation Budget Network (SURFRAD; US) and the enerMENA Meteo Network (Middle East and North Africa). The sites used were selected for (1) quality, characterised by robust calibration and maintenance standards and quality control of data - this is challenging for irradiance measurements, where calibration and maintenance often cause large errors; (2) recency, i.e. data is available for recent periods so that the analysis can use recent Solcast algorithmic configuration, and for relevance to newer customer sites; (3) availability, as much as possible the measurements should be non-private, and readily available to users who may want to replicate the results; and (4) broad geographic and climate-type coverage, so that users can estimate accuracy for their own sites.

Map of measurement sites included in the analysis. Of the 70 included sites, a total of 35 are designated Tropical/Subtropical (21 Humid, 5 Semi-Arid, 9 Arid). A total of 35 are designated Temperate (13 Humid, 16 Semi-Arid, and 6 Arid).

Accuracy verification results

GHI estimated actuals accuracy

The following table shows statistics for the normalised bias, Mean Absolute Percentage Error (MAPE) and normalised Root Mean Square Error (nRMSE), as defined in the above-mentioned NREL 2013 analysis, of GHI.

Errors in Estimated Actuals for GHI

Data: hourly average, nocturnal zeros excluded
Site type Estimate Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 percentile
MAPE
Mean & 10/90 percentile
Tropical/Sub-Tropical, Arid & Semi-Arid
(14 sites)
Solcast +0.1%
(-4.2% to +3.5%)
3.2% 11.5%
(6.9% to 16.5%)
7.3%
(4.1% to 10.1%)
ERA5 +1.6%
(-1.8% to +5.7%)
3.3% 19.0%
(12.2% to 27.7%)
11.5%
(6.4% to 18.2%)
Tropical/Sub-Tropical, Humid
(21 sites)
Solcast +0.5%
(-2.0% to +3.6%)
2.3% 17.9%
(13.4% to 22.5%)
11.3%
(8.3% to 15.0%)
ERA5 +2.9%
(-1.6% to +7.4%)
4.9% 32.3%
(24.7% to 38.8%)
21.2%
(16.0% to 26.0%)
Temperate, Arid & Semi-Arid
(22 sites)
Solcast -0.7%
(-2.3% to +1.4%)
1.5% 19.7%
(12.7% to 27.3%)
11.8%
(7.0% to 16.0%)
ERA5 +2.9%
(-2.7% to +8.6%)
5.1% 31.1%
(21.0% to 41.2%)
18.6%
(12.6% to 23.5%)
Temperate, Humid
(13 sites)
Solcast -0.1%
(-2.3% to +1.4%)
1.6% 19.6%
(13.7% to 26.0%)
11.4%
(7.4% to 15.0%)
ERA5 +1.4%
(-1.6% to +5.0%)
3.2% 34.4%
(28.1% to 41.3%)
21.2%
(15.6% to 26.1%)
Global average
(all 70 sites)
Solcast -0.1%
(-2.4% to +2.2%)
2.2% 17.5%
(10.7% to 23.9%)
10.7%
(6.5% to 15.0%)
ERA5 +2.4%
(-2.0% to +7.5%)
4.4% 29.7%
(18.1% to 40.0%)
18.4%
(9.5% to 25.4%)
DNI estimated actuals accuracy

The following table shows statistics for the normalised bias, Mean Absolute Percentage Error (MAPE) and normalised Root Mean Square Error (nRMSE), as defined in the above-mentioned NREL 2013 analysis, of DNI.

Errors in Estimated Actuals for DNI

Data: hourly average, nocturnal zeros excluded
Site type Estimate Bias
Mean & 10/90 percentile
Bias
Standard deviation
nRMSE
Mean & 10/90 percentile
MAPE
Mean & 10/90 percentile
Tropical/Sub-Tropical, Arid & Semi-Arid
(14 sites)
Solcast -2.1%
(-12.5% to +7.3%)
8.0% 25.8%
(18.2% to 35.0%)
17.0%
(11.2% to 23.5%)
ERA5 +13.8%
(+1.6% to +29.2%)
13.4% 43.1%
(28.6% to 66.1%)
31.7%
(18.3% to 50.7%)
Tropical/Sub-Tropical, Humid
(21 sites)
Solcast +4.4%
(-6.2% to +13.7%)
7.7% 41.2%
(27.3% to 57.1%)
25.6%
(18.4% to 35.6%)
ERA5 +32.4%
(+13.2% to +58.8%)
22.9% 74.1%
(45.4% to 98.1%)
55.1%
(33.8% to 75.9%)
Temperate, Arid & Semi-Arid
(22 sites)
Solcast -0.8%
(-6.6% to +5.0%)
4.9% 45.8%
(24.8% to 62.9%)
25.2%
(14.9% to 32.3%)
ERA5 +21.8%
(+8.0% to +26.6%)
21.4% 69.4%
(42.7% to 94.7%)
46.8%
(30.6% to 61.7%)
Temperate, Humid
(13 sites)
Solcast +3.4%
(-1.5% to +9.7%)
4.9% 45.8%
(29.2% to 58.9%)
24.5%
(15.4% to 31.8%)
ERA5 +21.6%
(+13.1% to +37.2%)
11.3% 73.8%
(54.5% to 99.2%)
50.1%
(34.3% to 70.1%)
Global average
(all 70 sites)
Solcast +1.3%
(-7.9% to +10.5%)
6.9% 40.4%
(24.3% to 58.4%)
23.6%
(15.0% to 32.9%)
ERA5 +23.3%
(+6.5% to +47.6%)
19.8% 66.4%
(37.2% to 95.1%)
46.9%
(25.2% to 69.8%)

The purpose of this validation analysis is to enable users to estimate Solcast’s historical timeseries accuracy for their site(s) prior to subscription or integration effort. It is based on data across 15 years, compared to surface measurements from high-quality measurement sites. Also included are error statistics, and benchmarks against a common alternative.

Historic Data Products

Time Series
The complete suite of irradiance and weather data required for effective monitoring, operation, and forecasting at your large-scale solar farm.
Typical Meteorological Year (TMY)
The complete suite of irradiance and weather data required for effective monitoring, operation, and forecasting at your large-scale solar farm.