Generally three types of data are used – data collected nationally by Government and made available to cities, data collected by private sector companies, and data collected by cities themselves. Some are collected as part of their services, while some others were collected specifically for analytical purposes.
ONS data is widely used by most departments in all cities engaged, however data is not consistently visualised or presented in a way that allows insight to be made easily. This is an area that many cities have discussed improving.
There are further questions about the robustness of ONS data. In one city the correlation with surgery level birth rates suggests ONS data provides an under estimate of population in the region of 10%. Even when the data aligns with local knowledge, census data has a drawback in that it is very good for the two to three years following survey, however becomes less accurate with time.
There are many issues surrounding privacy. Data held by health, police, benefits and social care services is highly personal. This data is rarely shared between departments and is never published in its original form. There are legal restrictions on sharing children’s data.
Before publishing, any data that allows an individual or household to be traced must be pseudonymised or aggregated to reach an anonymous level – in practice if less than six records are in a single group they are aggregated up to the next unit or boundary.
Cities are large objects that are difficult to define cleanly in relation to the systems and structures within them. In some cases data is required across a scale larger than the city itself, and in other cases there are inconsistencies between the boundaries used by different departments. Education for example clusters schools into groups that cross LSOAs and other commonly used boundaries.
The redefinition of LSOAs between Censuses creates limitations in making comparisons over time.
Private sector companies also collect and create data. The most widely used set is produced by Experian and is called Mosaic. This uses some modelling to create household level socio-economic data. It is understood that many cities use this data but that it is acquired on a city by city basis.
The process of collecting and inputting data is very manual and sensitive to the way someone collects it. There are consequently many issues around data quality; individually collected datasets do not follow standards (e.g. no consistent ID to join), data is not linked across departments, historic data may be collected in the wrong format, it is not clear what data is from primary source and what is from a secondary source.
The creation of Metadata is important so that the robustness of data can be fully understood. This is one area where the Inspire legislation will have an impact.
To summarise, the collection of data creates many issues around consistency, efficiency and sustainability of data sources. Cities need to use data at level of the individual, household, property, street, estate, school, education boundary, allocation site, green space, surgery, hospital, LSOA, City, Region, infrastructure network and sometimes the country.