YLibrary? Embracing Open Data: The Library’s Role as Digital Curator by Rachel Cobcroft

In May 2011, Europe faced a significant health crisis: a deadly outbreak of the bacteria E.coli had emerged from an unknown food source, affecting 4000 people, and killing 53. Researchers turned to the broader scientific community for help: releasing details of the sequenced bacterial genome via Twitter[i] and sharing publicly accessible sequence data via the NCBI (National Center for Biotechnology Information) database.[ii] Within 24 hours, international teams were uploading analyses and annotations to the open data repository GitHub,[iii] and within days, possible ancestral strains were being identified. At record speed, scientists were able to pinpoint the source of the contamination, allowing authorities to isolate the farms in question, and to declare the outbreak over by the end of July.[iv]

Such collaboration was enabled by the open licensing of the genomic data under the ‘no rights reserved’ CC0 licence.[v] This licence, released by Creative Commons,[vi] enables copyright holders to waive their rights to materials, placing them as completely as possible in the public domain. This allows scientists, educators, artists, and other creators to build upon, enhance, and reuse these materials for any purpose, without restriction under copyright or database law.

Open data’ is data ‘that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.’[vii] According to the Open Knowledge Foundation, open data’s key features are:[viii]

  •  Availability and Access: data must be available as a whole, and at no more than reasonable reproduction cost, preferably over the Internet. It must be in convenient and modifiable form;
  • Reuse and Redistribution: data must be made available under terms that permit reuse and redistribution, including intermixing with other datasets. It must be machine-readable;
  • Universal Participation: data must be available to everyone to use, reuse, and redistribute. There must be no restrictions against persons or groups, or against commercial interests, for example.

The many benefits of open data for both the institution and the community include greater accessibility, collaboration and innovation, greater transparency and accountability, and greater responsiveness of institutions to changing conditions, including emergency scenarios such as the above.[ix]

As individuals and institutions face increasingly complex computational challenges and grapple with exponentially increasingly amounts of data, there is an urgent need to establish frameworks to support open data distribution, use, and reuse. It is here that the library of the 21st century plays a critical role – that of digital curator.

What is Digital Curation? Why Does It Matter?

The UK’s Digital Curation Centre (DCC) defines this essential research activity as follows:[x]

‘Digital curation involves maintaining, preserving, and adding value to digital research data throughout its lifecycle.’

Through its education and oversight of the various stages of the digital curation lifecycle,[xi] depicted using the DCC model below, the library can ensure that data is appropriately captured, described, stored and secured, appraised and preserved, and disposed of, according to relevant policies, procedures, and legal requirements. Such frameworks will ensure that meaningful data is preserved for others to access, use, share, and re-use in both the short and long term.

Digital Curation

The Digital Curation Centre’s Digital Curation Lifecycle Model[xii]

Digital curation processes also play a crucial role in guaranteeing that data is accurate, authentic, and has integrity; i.e., it is what is says it is, and has not been added to, deleted, or otherwise modified since creation. This is fundamental in many fields, and forms the very foundation of scientific endeavour. By its insistence on internationally recognised information standards, the library ensures that both the value and veracity of data can be established on an ongoing basis.

Embracing ‘Intelligent Openness’

Worldwide, scientific institutions such as the Royal Society have called for ‘intelligent openness,’[xiii] in which data and its associated metadata (‘data about data,’ which enables its retrieval, management, and use)[xiv] must be accessible, intelligible, assessable, and re-usable.

Here, the library plays an integral role in achieving intelligent openness – by encouraging owners of data to engage in the following steps, defined by the Open Knowledge Foundation:[xv]

  •  Make your data available: in bulk and in a useful format;
  • Make it discoverable: put it on the web with its associated metadata;
  • Apply an open licence to your datasets.[xvi]

In this way, with the help of the library, we will be free to use, reuse, and redistribute data in all its forms.


    I agree that the library should be the centre of the community, and that access to knowledge is critical for children’s education. I sympathise that closures in Oakland may have affected your branch. It’s impressive how organised the http://saveoaklandlibrary.org/ campaign is, with a popular Facebook presence! I note that you contributed your story to the campaign also (http://saveoaklandlibrary.org/tell-us-your-stories/#comment-77), which now appears to have saved the library for the 2013-15 period! Great news!
    The digital library is one aspect of a library service. In fact, ensuring that materials are licensed openly means that they will exist for a long time – they can be shared easily, and they can be kept in multiple locations, and adapted to multiple devices. They can even be translated into many languages! Moreover, materials which are ‘free’ in both cost and licensing mean that there are as few impediments to access as possible – ensuring that people can enjoy them both now and into the future.

