Using the COINS data to recreate PESA reports
On June 4th, 2010 HM Government released Combined Online Information System (COINS) database, the main database used by HM Treasury budgeting. I’ve written a brief overview of the data here. This post is about a little programming project I set myself to actually do something with this data.
The COINS data is in a raw format and needs to be converted to a more useful format before any useful reports can be generated. Mainly out of curiosity, I set myself a project to do something useful with this data. I wanted to do three things:
- Extract useful subsets of the data that were small enough to be loaded into Excel.
- Load the COINS data into an SQL database.
- Recreate some of the government PESA(Public Expenditure Statistical Analyses) reports from the COINS data and check that my results are the same.
The project to do all this is written in python, and I’ve placed it on github here
Extracting useful subsets of the COINS data
The COINS fact_table_extract_2008_09.txt file contains 2,043,129 records that have a non-zero value field: of these 20,566 are outturn records, 1,610,900 are forecast records, 13,733 are plans records, and 397,930 are snapshot records. Older Excel spreadsheets are limited to 65535 rows, and I wanted to produce a useful data set that fit within this limit. The non-zero outturn records easily fit, I’ve made them available on rapidshare: coins2008_09 and coins2009_10.
Loading the COINS data into an SQL database.
Checking my reports against published government data
HM Treasury produces Public Expenditure Statistical Analyses(PESA) reports. I wanted to recreate some of these reports from the COINS data.
I’ve produced outturn Resource, Capital, DEL and AME reports by departmental group from the 2008-09 COINS data, and checked it against the estimated 2008-09 outturn from the PESA 2009 report and the actual 2008-09 outturn from the PESA 2010 report.
The PESA data used was:
The spreadsheet PESA 2010, Chapter 1 tables which contains tables of DEL, AME and TME grouped by Resource and Capital expenditure (from webpage PESA 2010 section 1 – Budgets). 2008-09 subtotals by departmental group are provided in Column G (2008-09 outturn) of the worksheets:
- Table 1.3 Resource budgets, 2004-05 to 2008-09 (by departmental group)
- Table 1.6 Capital budgets, 2004-05 to 2008-09 (by departmental group)
The spreadsheet PESA 2010, Chapter 1 tables which contains tables of DEL, AME and TME grouped by Resource and Capital expenditure (from webpage Public Expenditure Statistical Analyses 2009). 2008-09 subtotals by departmental group are provided in Column G (2008-09 estimated outturn) of the worksheets:
- Table 1.5 Resource budgets, 2003-04 to 2010-10
- Table 1.10 Capital budgets, 2003-04 to 2010-11
The following tables give my results and the PESA result side by side for comparison.
|Children, Schools and Families||46,848||46,848||46,848|
|Innovation, Universities and Skills||N/A||N/A||16,666|
|CLG Communities and Local Government||28,755||4,105
|Law Officers’ Departments||723||724||733|
|Foreign and Commonwealth Office||2,027||2,027||2,025|
|Energy and Climate Change||292||293||1,016|
|Business, Enterprise and Regularity Reform||175||N/A||1,594|
|Environment, Food and Rural Affairs||2,449||2,446||2,654|
|Culture, Media and Sport||1,455||1,456||1,633|
|Work and Pensions||7,937||7,937||8,059|
|Northern Ireland Executive||7,926||7,926||8,117|
|Northern Ireland Office||1,177||1,177||1,342|
|Children, Schools and Families||-2||10,652||10,914|
|Innovation, Universities and Skills||N/A||N/A||274|
|CLG Communities and Local Government||1,282||621
|Law Officers’ Departments||9||9||N/A|
|Foreign and Commonwealth Office||-28||-28||-7|
|Energy and Climate Change||2,403||2,403||4,589|
|Business, Enterprise and Regularity Reform||1||N/A||819|
|Environment, Food and Rural Affairs||0||0||-52|
|Culture, Media and Sport||3,890||3,890||3,911|
|Work and Pensions||135,344||135,344||135,546|
|Northern Ireland Executive||6,711||6,104||8,142|
|Northern Ireland Office||395||396||264|
|Children, Schools and Families||5,519||5,519||5,634|
|Innovation, Universities and Skills||0||N/A||2,123|
|CLG Communities and Local Government||7,233||7,112
|Law Officers’ Departments||8||9||12|
|Foreign and Commonwealth Office||226||227||218|
|Energy and Climate Change||1,667||1,667||1,688|
|Business, Enterprise and Regularity Reform||2||N/A||16|
|Environment, Food and Rural Affairs||609||610||618|
|Culture, Media and Sport||841||842||791|
|Work and Pensions||91||91||57|
|Northern Ireland Executive||1,233||1,233||1,137|
|Northern Ireland Office||67||68||78|
|Children, Schools and Families||0||N/A||N/A|
|Innovation, Universities and Skills||0||N/A||4,230|
|CLG Communities and Local Government||3,766||516||-|
|Law Officers’ Departments||0||N/A||N/A|
|Foreign and Commonwealth Office||41||N/A||0|
|Energy and Climate Change||-35||-279||-419|
|Business, Enterprise and Regularity Reform||-210||N/A||N/A|
|Environment, Food and Rural Affairs||-12||1||0|
|Culture, Media and Sport||571||572||717|
|Work and Pensions||126||136||142|
|Northern Ireland Executive||611||378||413|
|Northern Ireland Office||0||N/A||N/A|
There is good agreement between some of my calculated results and the PESA reports, but there are also some significant differences between the figures.
In the “Resource DEL” and “Capital DEL” tables there is generally good agreement, except for Health, Transport, Defence, Cabinet Office and Independent Bodies departmental groups. The Cabinet Office and Independent Bodies discrepancies could be because some departmental data is missing from the COINS data (see next section). I think the differences in the values for Health are probably because the PESA reports do not include the figures for Scotland, but I haven’t checked this. I have no explanation for the differences between the figures for Transport and Defence.
The agreement for the “Resource AME” and “Capital AME” tables is less good. Some of this disagreement is for the same reasons as the “DEL” differences, but I suspect the main reason for the differences is that my SQL query for the “AME” figures is not correct. (Any corrections to my SQL queries are most appreciated, as are any bug reports or corrections to my programs.)
Missing COINS departmental data
The COINS data does not seem to have spending figures for the following government departments:
- Assets Recovery Agency
- Department for Business, Enterprise and Regulatory Reform
- Office of Communications
- Forrestry Commision
- National Investment and Loans Office
- Royal Mint
- Office of Government Commerce
- Security and Intelligence Agencies
This may because the departments have been renamed, or subsumed into other departments.
I’ve satisfied my curiosity about the COINS data. I’ve managed to recreate some of the PESA reports, and although there are differences between my reports and the PESA reports, the differences are mostly explainable. I’m not Cliff Stoll, and won’t be tracking down the exact nature of the differences. I don’t plan to do any further work on the COINS data, but may look at future data that is released, especially if it is more detailed.
Feel free to use my programs if you wish – it would nice if you credit me if you do so. I’m also interested in other programming projects that use the COINS data – feel free to post a comment with a link if you do anything interesting with the COINS data.