And here are some emerging conclusions:
In one sense big data is nothing new – ever since the Romans conducted a census to aid tax collection in Britain, governments have been using data about citizens to help them manage more effectively. In slightly more recent times, the public sector, and particularly local authorities, has made increasingly sophisticated use of datasets to segment customers and provide more targeted and more efficient services. Big data is an extension of this sort of analysis.
And yet in another sense it is. Many conversations about big data start by asking ‘what is big data’. My answer is that it is a useful catch-all phrase to describe the new tools we have which allow us to store and analyse more data, more quickly, and more cheaply than ever before. The two benefits of this which excite me most are:
· Greater opportunities to fish; in the past you would typically test a single hypothesis – for example that truancy is related to crime – and gather two datasets and run them against each other. Now you don’t have to have a grand hypothesis because you can run multiple and large datasets against each other – all you are looking for is a pattern which you can then attempt to explain. In statistical terms you are looking for correlation rather than causation – at least initially – and this makes life a lot more interesting
· Less anxiety about ‘perfect data’: because you can search across so many datasets and because you are looking for patterns rather than definitive answers you can afford to be (a bit) less fussy about the data
The public sector (and particularly Local Government) is perfectly placed to take advantage of big data; the public sector has lots of data about the public – and lots of very personal data about, for example, income, health, lifestyle. Mining this data appropriately and legally could yield massive dividends. Moreover, typically this data has been stored disparately (different databases, different languages) and messily. In the past these challenges have prevented effective data analysis - big data techniques can help overcome them.
However, the results so far have been underwhelming; disappointingly I have yet to come across any dazzling examples of big data analysis yielding massive dividends. New York City is widely lauded as the most advanced in its use of data but the improvements they have made – better identification of: unstable manhole covers, buildings vulnerable to fire, restaurants which dispose of cooking oil illegally - are marginal rather than transformative. Is big data a game of incremental improvements rather than exciting large ones?
And the appetite for doing more is currently limited amongst public sector bodies and suppliers; possibly because the results have been underwhelming most organisations appear to be at the sage nod stage (‘this looks interesting’) rather than the action stage. This is unsurprising perhaps in public sector bodies who want compelling evidence before investing time and money, but I was disconcerted that the offerings from suppliers are so cautious.
Even though the barriers may not be that high; I have yet to come across a barrier that seems insurmountable around big data. Issues to do with compatibility and cleanliness of data are much easier to overcome than they used to be. Issues of ownership around data and governance can be troublesome depending on the personalities and context but should always be soluble. And although the legal side is not crystal clear yet (admittedly this is the sort of statement that sends shivers down the spine of legal departments) the ability to use aggregate data for the greater good makes it likely that courts will view careful efforts of public sector bodies favourably.
At a time when public sector organisations are looking for an edge to help them save money without reducing services, investigating the insights that could be gleaned from big data with relatively low cost and low risk seems like an obvious thing to do.
Postscript: I and some colleagues are working with a handful of local authorities to do precisely this sort of investigation. We will be testing some specific hypotheses using whatever datasets are most easily accessible. Hopefully there will be some useful and interesting results to share at some point.