The OKFN AU listserv is a mailing list for those who want to get involved in building communities around open knowledge and open data in Australia as a part of the Open Knowledge Foundation’s global network. It’s a fantastic place to learn more about open data in Australia, connect with others working in the area, and ask questions.
Recently I put a question to the list regarding the notion of ‘high value datatsets’, and whether there were useful criteria or methodologies for their identification that they could point me to. Noting, as I felt it was important to do, the difficulty in trying to establish a normative measure of something that is inherently a subjective matter (of value to whom? in what terms?). Despite the vagueness of my question the list responded brilliantly. Here are some of the key suggestions that came back:
- feedback on the quality of published datasets should be routinely gathered and fed back to creators and decision makers
- downloads and page hits should be monitored for popularity of published datasets as well as of standard website pages and FOI requests – all of which helps build a picture of public interest trends (although be wary of automated workflows that artificially increase page hit counts)
- the open data index offers a valuable a bird’s eye view of where we are at in Australia and globally and where there are gaps and more progress needs to be made (note also however that there was some debate as to how accurate and useful it really is beyond being a good marketing tool)
- criteria for ‘value’ can vary depending on the discipline, but having the dataset (more) publicly accessible is a key measure for determining how worth supporting its maintenance in $ terms; and
- the adoption of approaches like the adoption of standards and tools for a frictionless data ecosystem (see OKFN Frictionless data) will in turn increase the value of data in the community.
Some neat matrices / sets of criteria that I found in my research, and again with the help of the excellent OKFNAU community, include:
>>From the City of Philadelphia’s Open Data Census
- Publication Quality – The team found that whether a dataset was “published” is more complicated than “true or false,” and thus recorded information about what formats were available, how up-to-date they were, how well documented they were, etc., and used that information to inform a publication quality score.
- Other Cities – To get a sense of what high demand datasets were being released elsewhere and help inform departments of existing precedents, the team researched the data portals of four other major U.S. cities – Baltimore, Boston, Chicago, and New York City. Popular datasets not yet published by the City of Philadelphia were recorded as “unpublished” datasets.
- Demand / Impact – The team used information derived from an analysis of over 2,800 Right to Know requests, voting on the Open Data Pipeline Trello board, and nominations on OpenDataPhilly.org to estimate demand for each dataset using a scale of 1-5 (5 being greatest)
- Cost / Complexity – Information about the level of technical effort required to prepare each dataset for publishing was used to produce an estimate of the cost/complexity on a scale of 1-5 (5 being greatest)
>>From Steve Bennett (http://stevebennett.me/):
“three criteria when pondering priorities for government data release:
1. Uniqueness: to what extent are there no other sources of this information? A council’s collection of street information is valuable but there’s a lot of overlap with OpenStreetMap, for instance. But no one else could have the garbage collection zone boundaries.
2. Maintenance. Datasets age pretty quickly, and a dataset that’s more than a year out of date seems to go downhill in value pretty fast.
3. Reusability: was the data being collected with a general purpose in mind, or are there limitations due to the original purpose for which it was collected (eg, lack of comprehensiveness, idiosyncratic groupings, jurisdictional filtering…)”
>>From the European Commission’s Report on High Value Datasets from EU Institutions, 2014:
“a dataset may be considered of high – value when one or more of the following criteria are met
It contributes to transparency:
These datasets are published because they increase the transparency and openness of the government towards its citizens. For instance the publication of parliaments’ data, such as election results, or the way governmental budgets are spent, or staff cost of public administrations all contribute to the transparency of the way public administrations are working.
Its publication is subject to a legal obligation:
In some cases the publication of data is enforced by law.
The PSI Directive for instance, regulates the publication of policy – related documents by (semi) public organisations.
It directly or indirectly relates to their public task:
A public administration may publish a dataset because it directly relates to its public task. For instance DG CLIMA may publish statistics on CO2-emission as part of its task for raising awareness about climate change.
It realises a cost reduction:
The availability and re-use of a dataset, e.g. contact information, code lists, reference data and controlled vocabularies, eliminates the need for duplication of data and effort, reduces costs and increases interoperability.
Collections of data housed in the base registers and geospatial data are prime examples of dataset which opening up will lead to direct cost reductions in data management, production and exchange.
The type and size of its target audience:
A dataset may be useful for/relevant to a large audience (size-based value), for instance traffic data.
On the other hand a dataset may bring large value to a specific target audience (target/subject-based value), for instance a dataset containing data of particles colliding at high speed in a particle accelerator.”
About the author
Cassie Findlay is a Senior Consultant with Recordkeeping Innovation. In past roles, Cassie has worked strategically at the whole of public sector level on digital recordkeeping, training and open data / open government initiatives, and implemented NSW’s first digital archive for born digital government records. Cassie has a Masters degree in information management from the University of NSW and is a co-founder of the Recordkeeping Roundtable.