Anna Rosso and I have just published the next phase of our big data project. Kindly funded by NESTA, this builds on the work we did with Google last year. As before we’re working with Growth Intelligence, who’ve developed the very nice multi-layer dataset we use. We’ll be publishing a further paper sometime in the New Year.
The abstract is below. Or take a look at this writeup in the FT.
Governments around the world want to develop their ICT and digital industries. Policymakers thus need a clear sense of the size and characteristics of digital businesses, but this is hard to do with conventional datasets and industry codes. This paper uses innovative ‘big data’ resources to perform an alternative analysis at company level, focusing on ICT-producing firms in the UK (which the UK government refers to as the ‘information economy’). Exploiting a combination of public, observed and modelled variables, we develop a novel ‘sector-product’ approach and use text mining to provide further detail on the activities of key sector-product cells. On our preferred estimates, we find that counts of information economy firms are 42% larger than SIC-based estimates, with at least 70,000 more companies. We also find ICT employment shares over double the conventional estimates, although this result is more speculative. Our findings are robust to various scope, selection and sample construction challenges. We use our experiences to reflect on the broader pros and cons of frontier data use.