CaptureMap is based on public data, that means that the facilities on the map measure or calculate their CO2 emissions, and report them to their national authorities who then publish the data openly. We pick the data up, and integrate it in our tool. Sounds easy, right?
One of the main databases we use is the E-PRTR database, which stands for European Pollutant Release and Transfer Register. This database is a treasure trove of emissions data, at least for CO2 geeks like us. The database actually covers many other pollutants, but we’re mostly into CO2 data. Alas, public data does not necessarily mean easy to use and free. Despite many qualities, the E-PRTR has several downsides. Let’s look at four of them.
1. The location data is sometimes good, sometimes completely wrong
At the moment the E-PRTR database includes 3634 facilities with CO2 emissions in Europe. This is too much data to be able to quality check the location of the facilities one by one. At the same time, it is important to have correct location data to be able to analyse potential emission clusters and transport routes.
Looking at the map to the right, one would assume that the data looks correct, except for the missing data in Norway and Slovakia.
But knowing a bit about industry in France, you might start questioning what is happening in Bretagne in the west for example. That region is famous for its farming, but not for its industrial activity, so there should not be significant emitters there.
Let’s zoom-in on “ArcelorMittal Mediterranée”, France’s second largest iron and steel plant. The plant is actually located in Fos-Sur-Mer (near Marseille in the south of France), but during a few versions of the database its location data had moved to Bretagne (west of France). This is a pretty major error, in particular when you know that this facility is the second largest CO2 emitter in France.
Actually most of the sites in France were wrongly-located in earlier versions of the database. You can easily can see that on the map to the right: notice the lack of overlap between the blue and grey dots? These are facilities from version 3 (blue) and version 7 (grey) of the E-PRTR.
And this was not limited to France: with wrong locations in Germany, Ireland and other places.
Knowing a little bit about industry and geography, you would easily catch some of these issues, but repeat it 3000 times, and you get a good headache. In order to make CaptureMap we had to cross-check the location data of our facilities with other databases, manually correct many of them, and establish a master database with what we mean is the correct location for all facilities in Europe. We do this so that our users can trust what they see on the map.
2. It contains errors in the reported CO2 amounts
Every site in the E-PRTR database measures or calculates its CO2 emissions and reports them to the relevant authorities. This process often relies on manual work, involving calculations in Excel spreadsheets and copy-pasting data in forms. With manual work comes human error, and this sometimes leads to under or over reporting of the emission data.
Take Alcoa Fjarðaál in Iceland. The site has consistently been reporting CO2 emissions of around 540 000 tonnes per year, except in 2019, when it reported only 536 tonnes. This is about 1000 times too low, and someone probably reported the emissions in tonnes in a database that requires kilograms.
Here is another example: the landfill site “Centre de Stockage de St-Jean de Libron” in France had never been reporting significant CO2 emissions until version 3 of the E-PRTR database, when it suddenly reported 455 million tonnes CO2 emissions for 2019.
That sounds high, too high for a landfill site, and even too high for a whole country: the total CO2 emissions in France are about 350 million tonnes per year. How could one facility emit more than a whole country? This error has been corrected in more recent versions of the database, but similar issues regularly come up in updates.
We embedded quality checks in our processing code for CaptureMap. Among others we check for deviations from earlier emission patterns, we implement thresholds, and we cross-check emission numbers across databases. This allows us catching and correcting many reporting errors.
3. Some countries are lagging behind in reporting
The E-PRTR reporting relies on every country in Europe sending their emission data to the EU every year, with data for the past year. Unfortunately some countries are still lagging behind, and Germany for example has not sent data since 2017 (red in the figure below). This is also the case for other significant emitters: the UK and Italy still have not sent their data for 2020 (respectively orange and yellow in the figure), other minor emitters like Slovakia and Norway also stopped reporting in 2017.
The situation is improving and each new version of the E-PRTR database contains new updates for some of the missing countries. But this still creates serious challenges when trying to give an up to date overview of emissions for large emitters like Germany: not only is the latest emission data old, but none of the new sites that started emitting CO2 after 2017 are included in the database for that country.
We make-up for this major issue in the database by relying on other databases and on our in-house emission model, in order to be able to show a complete and up to date overview of emitters and CO2 data.
4. It is difficult to track facilities over time
One neat feature of the E-PRTR database is that it contains emission data for several years, sometimes going back to before 2010. This makes it possible to track emissions over time for specific facilities.
That is, as long as there is a way to recognise the facilities over time: many facilities change name and parent company. The database contains a field with a unique ID (‘Facility_INSPIRE_ID’). Unfortunately that unique ID … is not unique. And when the location data is also inaccurate and changing, there is little way to track facilities over time. Imagine trying to keep track of an old friend who would have changed name, address and phone number. You’d have to do some serious
stalking research to be able to keep in touch.
Take for example Vattenfall’s Nordjyllandsværket in Denmark. The facility has been emitting ca. one million tonne CO2 each year in the past five years, so this is a pretty significant emitter. Nordjyllandsværket had a unique ID number until 2015, then it changed in 2018 and 2019, and then again in 2020. And notice how data is missing for 2016 and 2017 on the chart below? Luckily we are able to estimate these missing data points with our in-house model.
Here is another example: Ardagh Glass Barnsley in the UK. The site has three ID numbers over the course of 10 years.
We track facilities over time, check their names, location, emissions and assign them our own unique ID to make sure that we show the most complete data in CaptureMap.
This happens despite EU having a pretty good standard for data reporters
EU’s Manual for Reporters describes the requirements for reporting data to the EU Registry on Industrial Sites database, which includes the E-PRTR data. Some of the requirements there are pretty specific, but unfortunately they are not being followed in practice. Here are some examples:
- Point 2.8 about deadlines for reporting states that “reports must be submitted annually within 9 months of the end of the reporting year”, however to this date some countries still have not reported data for 2017, more than five years late on their reporting deadline, see section 3 above.
- Point 3.3 about identifiers sets 4 requirements for compliance, including “Persistence: The identifier has to remain unchanged during the lifetime of the entity.”, however this is not the case, see section 4 above.
- Point 3.4 about coordinates mentions that the geographic coordinates should be “to five decimal places” and represent “the approximate centre of a given entity”. As seen above in section 1, it is far from being the case. Despite the decimal places, the coordinates do not represent the site location for many of the facilities in the database.
Regardless of all these issues, we wished more countries and regions would publish their data like the EU does
The issues listed in this article are just a few of what we experience on a daily basis with public data. We could also mention the fact that Denmark does not report biogenic CO2 emissions (which can be significant for power from biomass, or waste to energy facilities), or that the activity-categorisation of some of the sites is wrong (for example power plants being categorised as “iron and steel” because they use blast furnace gas as a fuel).
Making CaptureMap sometimes gives us headaches, but it is always possible to quality-check, filter and fix incorrect data. It is much more difficult to work with emissions when there is no data available at all.
Many countries have now implemented PRTR in one way or another, but few publish their database of site-specific data for CO2 emissions like the European Commission does. For that reason, and despite all these issues with missing or incorrect data, we still wished many more countries in the world would publish their data like the EU does.
Our quick assessment of key points for the E-PRTR dataset
Based on our experience with CaptureMap, a few key points make for a good CO2 emission database. Here is how the E-PRTR dataset does:
- Sites names and ownership data: the database reports both facility name and parent company name. There are no further details about company ownership structure.
- CO2 emission data: yearly CO2 emissions are provided, some countries distinguish between fossil and biogenic CO2. Most countries include the biogenic CO2 emissions in total CO2 (except for Denmark). Lower threshold for reporting is 100 000 tonnes CO2/year per facility, this is higher than what specific countries report at the national level (e.g. 10 000 tonnes CO2/year). There are unfortunately errors in the reported amounts for some of the sites.
- Reporting coverage: the coverage of the reporting is mostly complete, meaning that most sites with emissions above the threshold are reported in the dataset. An exception is biogenic CO2 emissions from the production of biofuels (e.g. biogas): emissions from fermentation are not included in the reporting. This can represent significant biogenic amounts for larger production plants.
- Activity data: the facilities are categorised in activities following standard activity categories. The activity categorisations on some specific sites is incorrect and it is not always consistent between countries. The main fuel type for power plants is not provided in the dataset.
- Location data: both geographic coordinates and facility address are provided in the database. Unfortunately many coordinates are imprecise or completely wrong.
- Details about CO2 sources at facility level: details are not provided in this dataset.