The Independent Expert Advisory Group on ‘data revolution for sustainable development’ set up by the United Nations, and led by Claire Melamed, published the first public draft of their report on October 24, 2014. The PDF version of the report can be accessed here.
Various potential criticisms of the draft report aside, it surely marks a splendid beginning. Though sometimes unevenly, it crucially includes explicit acknowledgements of (1) the reality of an unfolding ‘data revolution’ almost without public regulation, and enforcement of ‘data rights’ of individuals, (2) the inadequacy of the actually existing ‘data revolution’ to achieve sustainable development for all, (3) the importance of re-imagining National Statistical Organisations as a nodal agency in directing ‘data revolution for sustainable development,’ (4) the need to adopt open data principles and practices, and (5) the critical necessity of international and national principles and policies to govern ‘data rights’ of individuals. Of course this is a very selective list of a great range of themes covered in the draft report.
Below are the comments submitted by me in response to the first public draft report. The relevant sections of the report are in bold followed by my comments.
Data are the lifeblood of decision-making, and the raw material for accountability. Without data, we cannot know how many people are born and die; how many men and women still live in poverty; how many children need educating, and how many teachers to train or schools to build; the prevalence and incidence of diseases; if water is polluted or if the fish stocks in the ocean are dangerously low; how many adolescent girls are getting pregnant and what policies are effective in helping them; what companies are trading and whether demand for their product is expanding.
Here is an inflation of the meaning of data to all forms of record-keeping. This may create a vagueness around the term ‘data’ early on. Data, in this context, refers to a specifically structured form of record-keeping that allows easier querying, analysis, extraction of insight and portability (across different media / file formats). Further, calling data ‘lifeblood’ of decision-making tends to de-politicise the processes of decisin-making, reference to which is also perhaps avoidable. Also, decision-making is critically shaped by which party has access to what kind of data. It is in this way, perhaps, data can be called the ‘lifeblood’ of decision-making.
To know what we need to know involves a deliberate and systematic effort of finding out. It means seeking out high quality information that can be compared over time, between and within countries, and continuing to do so, year after year. It means careful planning, spending money on technical expertise, robust systems, and ever changing technologies. It means building public trust in the data, and expanding people’s ability to use it.
The phrase ‘To know what we need to know’ leads to three difficulties – (1) it can be understood as ‘to know about the things that are important for us to know’ and also as ‘to know what things we should be wanting to know,’ (2) it introduces a ‘we’ that is not defined yet, and (3) it does not mention the objective of the ‘data revolution’ that is being talked about. While the first requires re-phrasing, the second and third are crucial. The full paragraph gives a sense of the ‘we’ being the governments of the world. If that is the case, it needs a clear statement early on. If the ‘data revolution’ is about both top-down and bottom-up approaches, then that needs to be stressed much more. Say, by adding the ideas of ‘enabling data as medium of monitoring and transparency of governance processes, ensuring public institutes both support and respond to data-driven engagements with the public, and expanding people’s ability to produce, co-produce, and act upon data.’
And now the stakes are rising. In 2015, the world will embark on an even more ambitious initiative, a new development agenda underpinned by the Sustainable Development Goals (SDGs). Achieving these goals will require integrated action on social, environmental and economic challenges, with a focus on being inclusive and thus ensuring that no one is left behind. This in turn will require another significant increase in the information that is available to governments, civil society, companies and international organisations to plan, monitor and be held accountable for their actions.
‘…significant increase in the information’ – Perhaps the word ‘information’ is best avoided (and the word ‘data’ is used) unless of course the sentence specifically mean information as opposed to data.
Fortunately, this challenge has come together with a huge opportunity. The volume of data in the world is increasing exponentially: one estimate has it that 90% of the data in the world has been created in the last two years. As the graph below demonstrates, the volume of both existing sources of data (represented in the graph by the number of household surveys conducted) and new sources (represented by mobile subscriptions per 100 people) have been rising, as has the openness of data (illustrated by the number of surveys placed on line). Thanks to new technologies, the volume, level of detail, and speed of data available on societies, the economy and the environment is without precedent. Governments, companies, researchers and citizens groups are in a ferment of experimentation, innovation and adaptation to the new world of data. This is the data revolution.
‘Thanks to new technologies…’ – The phrasing gives a sense of technology being the driving force in the production of increasing amount of (born-digital) data in contemporary world (as opposed to business interests being the driving force, see: http://www.theatlantic.com/technology/archive/2014/08/advertising-is-the-internets-original-sin/376041/). This positioning of technology is controversial and best avoided. Rephrasing it as ‘Enables by new technologies…’ will perhaps neutralise the positioning.
Further, as the embedded chart shows, a large (if not majority) of this newly available data is privately owned. The implications of this private-ownership of large quantity of data about citizens can be discussed later in this report, but its reality deserves a mention in this paragraph.
When discussing the existing ‘ferment of experimentation, innovation and adaptantion,’ it should also be mentioned that there is an unfolding competition between governments, private entities and citizen groups in producing and getting access to data. While certain parts of this competition (say cyber-surveillance) may not be discussed in this report, its reality is important to note, since it may have negative impacts upon the goal of sustainable development.
Revolutions do not begin with reports, and the data revolution is no different. This report is not about how to create a data revolution – it is already happening – but how to mobilise it for sustainable development. It is an urgent call for action now to support the aspiration for sustainable development and avert major social and environmental disasters, to stop and reverse growing information inequalities, and to ensure that the promise of the data revolution is realised for all.
Great paragraph. Only suggestion is the possible introduction of the idea of negative impacts of ‘data revolution.’ The paragraph makes it seems like that ‘data revolution’ is great but it is just not realised for everyone yet. Though it is discussed in the next section, it will be useful to have a phrase connecting ‘data revolution’ to ‘growing information inequalities.’ Also, perhaps consider calling it ‘data inequalities’ and not ‘information inequalities.’
This involves new sources of data – satellite imagery, social media or anonymous mobile phone records, or data created and willingly shared by citizens to monitor and reflect their own circumstances and priorities. It involves the quantification of what was previously considered qualitative data – for instance, defining proxies for the measurement of happiness or the fulfilment of human rights. Bringing together established and new sources in the service of sustainable development can shed new light on old problems, reveal new possibilities for action, identify what remains to be done and provide the real time monitoring that allow policies to be adapted for maximum effect. To fulfil this promise, it must be done in a way that adheres to the highest standards of honesty, respect of privacy, rigour and impartiality that have been developed over decades and centuries of academic research, statistical practice and political negotiation.
When it comes to engaging with qualitative data, tt is true that ‘data revolution’ so far has primarily been interested in its quantification (through proxy variables, sentiment analysis, etc.). While this gives new instruments for ‘data revolution for sustainable development’ to work with qualitative data, I agree with Neva Frecheville that the phrasing in this paragraph suggests that this is the only potential way of using qualitiative data for sustainable developemnt. I second Frechevill’s suggestion to insert a sentence stating that ‘data revolution’ allows new ways of generating and combining quantitative, as well as qualitative data, to ‘allow for a more timely, nuanced … decision making.’
It is also a revolution of expectations – of people demanding that these changes and innovations be used to enhance their control over their own lives and the decisions that affect them. Data is the bedrock of accountability. More information opens up the possibility for an honest, informed dialogue between service providers and beneficiaries, between tax payers and governments who spend tax revenues, between companies and employees and between the private sector, governments and civil society. Data is the basis for social compacts and ultimately this contributes to improving the responsiveness, efficiency and effectiveness of institutions, and, eventually, the overall welfare of citizens.
‘Data is the bedrock of accountability’ – The metaphor of ‘bedrock’ is problematic. The bedrock for accountability is perhaps (social) contract. But keeping that aside, it is important nonetheless to emphasise the importance of data in ensuring accountability. The word to use here is perhaps not accountability but transparency. What availability of (reliable) data does is making processes transparent to those who were not involved in the process itself. But transparency can also be ensured through other means – such as opening up decision making processes to public participation. Hence the point to emphasise here is the key role data can play to ensure transparency of processes at (the global) scale.
‘More information opens up the possibility…’ – ‘Information’ can perhaps be substituted by ‘data,’ so as to avoid the quick shift from usage of ‘data in the previous sentence.
Further, there is a darker side to the ‘revolution in expectations’ point. ‘Data revolution’ tends to normalise the expectation that extraction and mining of personal usage data (of digital devices and networks) is acceptable business model in particular, and form of data gathering in general. ‘Data revolution for sustainable development’ may have to act against the normalisation of this expectation, which is also hinted at later in the report.
But the data revolution comes with a range of new risks, posing questions and difficult challenges concerning the rights to access and use data. Fundamental issues of human rights: privacy, respect for minorities or data sovereignty requires us to balance the rights of individuals with the benefits of the collective. As more is known about people and the environment, so there is a correspondingly greater risk that the information could be used to harm, rather than to help. They could be harmed deliberately, if the huge amount that can be known about people’s movements, their likes and dislikes, their social interactions and relationships is used with malicious intent, such as discriminating in access to services. Or they could be harmed inadvertently, if information that has not been checked for quality or standardised in accepted ways is used for policy or decision making and turns out to be wrong.
‘… challenges concerning the rights to access and use data’ – The challenges also involve conceptualisation and enforcement of the rights to opt out of behaviour quantification and data collection through digital devices and networks.
‘… balance the rights of individuals with the benefits of the collective’ – Please also add the economic opportunities of data as the third factor to be balanced against ‘rights of individuals’ and ‘benefits of the collective.’
‘… the information could be used’ – Change to ‘the data could be used.’
Major gaps are already opening up between the information haves and have-nots. Without action, a whole new inequality frontier will open up, splitting the world between those who know, and those who do not.
‘Major gaps are already opening up between the information haves and have-nots’ – Please change to ‘data haves and have-nots.’
New institutions, new actors, new ideas and new partnerships are needed, and all have something to offer the revolution. But national statistical offices, the traditional guardians of public data for the public good, will remain central to government efforts to harness the data revolution for sustainable development. To fill this role, however, they will need to change, and more quickly than in the past, and continue to adapt, abandoning expensive and cumbersome production processes, incorporating new data sources, and focusing on providing data that is human and machine-readable, compatible with geospatial information systems and available quickly enough to ensure that the data cycle matches the decision cycle. In many cases, technical and financial investments will be needed to enable those changes to happen.
It is indeed commendable that the National Statistical Offices are being re-imagined in this report as a fundamental actor in realising ‘data revolution for sustainable developemnt’. Addition of two concerns here, however, can be crucial.
Firstly, along with desired qualities of the data created by NSOs mentioned here, please also suggest ‘openness’ of data as a fundamental pre-condition for collection and usage of data for sustainable development. ‘Openness,’ in this context is needed not only ensure the wide availability and usability of the data concerned across various types of actors, but also to make the data collection processes themselves more transparenct, and thus accountable.
Secondly, harnessing possibilities opened up by ‘data revolution,’ as discussed above, requires various government agencies to team up. Data privacy and rights guidelines will perhaps be prepared by Ministries of Information Technology, Communication and Law, while implementation of government-wide re-engineering of data collection, processing, archival and analysis processes will perhaps be governed by Ministries of Personnel, Governance Innovation, and Home Affairs. NSOs are often responsible for undertaking only the major data gathering exercises of the government (such as census and national sample surveys). ‘Data revolution for sustainable development’ will require more granular and broad-based changes in how governance of and through data happens across the state agencies.
Not enough good quality data. In a world increasingly awash with information, it is shocking how little we know about some people and some parts of our environment.
‘…how little we know about’ – The use of ‘we’ here is misleading. Perhaps it can be changed to ‘how little is publicly known about.’
Data that is not used or not usable. To be useful, data must be of high quality and must be made accessible to those who want or need to use it. Comparability and standardisation are crucial, as they allow data from different sources or time periods to be combined, and the more data can be combined, the more useful it is. Combining data allows for changes of scale – aggregating data from different countries to produce regional or global figures. It allows for comparison over time, if data on the same thing collected at different moments can be brought together to reveal trends. Too much data is still produced using different standards – household surveys that ask slightly different questions or geo-spatial data that uses different geographical definitions. And too little data is available at a level of disaggregation that is appropriate to policy makers trying to make decisions about national level allocation or monitoring equitable outcomes. This prevents researchers, policy makers, companies or NGOs from realising the full value of the data produced.
It would be useful to identify in this paragraph that a great volume of data is also not used or usable because either it is privately owned, or because it is not available in open formats and licenses. The former issue highlights the predominant privately-owned nature of data coming out the of the ‘data revolution.’ The latter foregrounds that though government, private, academic and civic entities do produce significant amount of data, it is often not used not only because of lacking quality of data, but its simple unavailability as data open for re-usage and re-sharing.
It’s not only about standards. Access is often restricted behind technical and/or legal barriers that prevent or limit effective use of data. Data buried in pdf documents, for example, is much harder for potential users to work with, though it represents an improvement on data that is only accessible to a small pool of well-connected statisticians and policy makers; administrative data that are not transferred to statistical offices; data generated by the private sector or by academic researchers that are never released or data released too late to be useful; data that cannot be translated into action because of lack of operational tools to leverage it. This is a huge loss in terms of the benefits that could be gained from more open data and from linking data across different sectors.
This will be a great opportunity to mention the need to embrace ‘open data’ as a necessary condition of data coming out of the ‘data revolution for sustainable development.’ The argument is already present in text, but the term ‘open data’ is missing.
The value of better and more open data. As well as being important in its own right for accountability purposes, through its impact on policy and behaviour better and more open data can save money and create economic, social and environmental value. Although research in this area is still limited, modelling exercises and evidence from actual examples illustrates the scale of the potential impact of better and more open data on the economy.
A caveat regarding the estimates of economic value of open data can perhaps be mentioned here. Much of such estimates take it for granted that gathering of personal data of iusers of digital devices and networks as an unproblematic starting point for value creation (in terms of monetisation of personal data). It is impotant for ‘data revolution for sustainable development’ to address this issue critically, and not avoid its discussion against the justification of economic value creation.
Mobilising the data revolution for sustainable development and ending information inequalities is a long and complex endeavour. The main objective is to enable data to play its full role in the realisation of sustainable development by closing key gaps: between developed and developing countries, between information-rich and information-poor people, and between the private and public sectors.
‘[I]nformation ineuqalities’ can perhaps be substituted by ‘data inequalities (and hence all the various forms of inequality of power and priviledges it creates and is created by).’
Basic Principles for Data Revolution for Sustainable Development
Great section overall. Congratulations!
Just to add one thing – along with principles and standards, it is important for UN to push governments to enact policies enforcing such principles and standards. Without policies backing them, such measures are often difficult to enforce, and especially for citizens to monitor and audit.