Gå till huvudinnehållet

Research Data: Open and/or preserve after project end

Tips and support for data management for researchers at ÅAU

What happens to the research data after the project?

Your data are valuable! Preserve, archive, publish, or open your research data and the metadata (descriptions of the data) in a responsible way at latest at the end of the project. Note that large research funders may require FAIR and open data, in case there are no reasons for keeping the data closed or destroying them.

In a research project, large amounts of data are collected and generated. Consequently, it is a good idea to already at the plannig stage think about which datasets may be useful or valuable in the future.

The researcher needs to assess which data (everything or a part of it) should be preserved, for which purposes access can be provided (for example, research only or also teaching), and for whom. Research data may be deposited in a data archive which curates the data and provides access. In addition, juridical, contractual and research ethical reasons may affect to what extent the data can be published and opened. Disposal of data also requires planning in advance.

FAIR and open data

ÅAU:s open science policy promotes the openness, transparency and reuse of research data following the FAIR principles, according to which research data should be findable, accessible, interoperable and reusable. ​According to the national policy on open research data and methods (2021-2025), research data and methods should be made as open as possible, and as closed as necessary. In addition, data should be managed in a proper way to meet the FAIR principles.

FAIR data and open data are not synonymous, although the terms appear together. Research data may be FAIR without being (completely) open, and data may be open without following the FAIR principles. The aim is to make research data as open and as FAIR as possible, following needed juridical and research ethical aspects.

For example, the metadata (description of a dataset) may meet the FAIR principles, in case it is not possible to completely open the data (such as sensitive data or data related to patents, innovations). In many cases, anonymized data (from which personal, sensitive, confidential data have been deleted) can be archived and/or published/opened for future use. Embargoes can be applied when the data cannot be made available immediately.

Where should I archive, publish or open my data efter the project?

How to find a data archive/repository for your research data:

  • Check your funder's requirements for FAIR and/or open data
  • Check the publisher's/journal's submission guidelines or data policy
    • Some publishers/journal recommend or reuiqre that authors deposit the data in specific data archives/repositories. More information is usually found in the submission guidelines or in a data policy.
    • Some publishers' data policies are listed at FAIRsharing.org.
  • Use search services to explore data archives/repositories:
    • Repository Finder and the FAIR filter (service maintained by Datacite, builds on Re3data)
    • Re3data - a registry of data archives/repositories
  • Where do researchers in your field publish/open their data?
    • A rule of thumb is to pubslish/open data in a data archive/repository where you find relevant and useful data in your field; then you can be sure that researchers in your field will find them.

 

When archiving data, it is important that

  • the dataset receives a persistent identifier (PID), such as a DOI, handle or similar, which serves as a permanent link on the internet,
  • the metadata (description) follows specific standards and is enough detailed, so that others can find it and understand what the dataset is about
  • the data/materials are provided with a license (Creative Commons, GNU etc.) which guarantees good conditions for re-use, and
  • the data archive/repository guarantees access in a long-term perspective.

Firsthand, discipline-specific data archives/repositories are recommended in case they adhere to the FAIR principles. As far as the criteria above are met, the archive/repository can be considered appropriate for your data. In case there is no suitable archive for your data, a generalist data archive/repository is a good option.

 

Some discipline-specific data archives/repositories

Humanities, social sciences, health sciences etc.:

  • Finnish Social Science Data Archive (FSD) / Tietoarkisto - a curated archive which provides support for the archivingsprocess, proof reading etc. Deposited data adhere to the FAIR principles. The archive receives both quantitative and qualitative data (not audio-visual materials).

Language research:

Resources for finding data archives/repositories in natural sciences (also according to data types):

Service for publishing code:

  • GitHub - platform used by millions for sharing of code for collaboration and version control.
 

Some generalist data archives/repositories (all disciplines)

The following are free to use for researchers and maintain a good archiving policy:

  • Zenodo - archiving of all types of materials, such as "publications ("book, book section, conference paper, journal article, patent, preprint, report, thesis, technical note, working paper, etc.), posters, presentations, datasets, images (figures, plots, drawings, diagrams, photos), software, videos/audio and interactive materials such as lessons").
  • FAIRdata.fi -  a service package which consists of IDA for data storage (for stable datasets, open or closed), the Qvain tool for describing and publishing datasets, and the data finder Etsin for exploring available datasets
  • EUDAT -  B2SHARE - European network for storing data with servers in Finland, and later a possibility of choosing from different European server locations.
  • Figshare - easy publishing of all research data. All formats, including video and datasets. CC licenses.

 

Other services and tools for research

  • EOSC - European Open Science Cloud, a collection of open science services and tools for researchers in Europe
  • OSF.io - Open Science Framework, platform for open workflows

Guides for publishing and opening data

Why open data?

Opening data make them available for re-use for other researchers and for the entire society. Open data also promotes the transparency and reliability of research.

CC BY Danny Kingsley & Sarah Brown

Fairdata.fi - Finnish services for FAIR data

The Finnish FAIR data services, provided by CSC, consists of IDA for data storage, the Qvain tool for describing and publishing datasets, and the data finder Etsin for exploring available datasets.

The recommended minimum effort is to make a description (=metadata) of your dataset available in Etsin. Enter the metadata by using the Qvain tool, which provides your dataset with a landing page. After the dataset has been published, other researchers and research funders may find the dataset in Etsin.

Open licenses for research data

Applying an open license is a way of  informing others of what rights they have to share and reuse one's research data. Without a license, potential valuable reuse may be unwillfully restricted.

  • Often Creative Commons licenses are used, complemented with CC0 (for data which is not covered by copyright law).
  • Graphics, diagrams etc. are covered by copyright law and are often included in publications, see the open access guide.
  • Algorithms, scripts, programs can be licensed with MIT and GNU GPL-licenses.

How can I protect my data?

It is possible to legally protect and restrict the reuse of one´s data referring to:

  • contractual conditions (the owner of the data restricts the rights to use them) 
  • as a business secret (it may contain business secrets or such can be inferred from it)
  • as a database or catalogue 
  • as a work (copyright)

Publishing data with a restrictive license (CC-BY-NC-ND) is to be preferred to keeping it on your own harddrive.

Opening the data under the license CC-BY (or CC0 including a requirement to quote) is explicitly giving others the right to reuse it, which may prove beneficial in the long run since those who may want to use the data won't have to track down every single participant in the creation of the data to get permission for reuse.