Gå till huvudinnehållet

Research Data: Open and/or preserve after project end

Tips and support for data management for researchers at ÅAU

What happens to the research data after the project?

Your research data are valuable! At the latest at the end of the project, the data and metadata (the description of data and its origin) must be archived and/or published/opened in a responsible manner. Note that several major research funders require FAIR and open data, unless there are grounds to keep the material closed or destroy it.
You can deposit the material in an archive service that takes care of it and ensures that others can access it. You decide yourself for which purposes your research data become available (e.g. research and teaching), and whether you open all or only part of your data. Research ethical, legal and practical aspects can limit how and to what degree research data can be archived, accessed and published.
Large amounts of data/material are collected within a research project. It is worth thinking already in the planning stage about which datasets may be useful or important to save. The researcher must also plan when and how the data will possibly be destroyed. However, it is worth keeping data for at least 3–5 years for possible verification of research results.

FAIR and open data

ÅAU:s open science policy promotes the openness, transparency and reuse of research data following the FAIR principles, according to which research data should be findable, accessible, interoperable and reusable. ​According to the national policy on open research data and methods (2021-2025), research data and methods should be made as open as possible, and as closed as necessary. In addition, data should be managed in a proper way to meet the FAIR principles.

FAIR data and open data are not synonymous, although the terms appear together. Research data may be FAIR without being (completely) open, and data may be open without following the FAIR principles. The aim is to make research data as open and as FAIR as possible, following needed juridical and research ethical aspects.

For example, the metadata (description of a dataset) may meet the FAIR principles, in case it is not possible to completely open the data (such as sensitive data or data related to patents, innovations). In many cases, anonymized data (from which personal, sensitive, confidential data have been deleted) can be archived and/or published/opened for future use. Embargoes can be applied when the data cannot be made available immediately.

Where should I archive, publish or open my data efter the project?

How to find a data archive/repository for your research data:

  • Use search services to explore data archives/repositories:
    • DataCite Commons and the FAIR filter
    • Brand - a registry of data archives/repositories
    • Discipline-specific and general data archives are found below
  • Check your funder's requirements for FAIR and/or open data
  • Check the publisher's/journal's submission guidelines or data policy
    • Some publishers/journal recommend or reuiqre that authors deposit the data in specific data archives/repositories. More information is usually found in the submission guidelines or in a data policy.
    • Some publishers' data policies are listed at FAIRsharing.org.
  • Where do researchers in your field publish/open their data?
    • A rule of thumb is to pubslish/open data in a data archive/repository where you find relevant and useful data in your field; then you can be sure that researchers in your field will find them.

 

When archiving data, it is important that

  • you inform research participants in advance about the intention to open data
  • the dataset receives a persistent identifier (PID), such as a DOI, handle or similar, which serves as a permanent link on the internet,
  • the metadata (description) follows specific standards and is detailed enough, so that others can find and understand it
  • the data/materials are provided with a license (Creative Commons, GNU etc.) which guarantees good conditions for re-use, and
  • the data archive/repository guarantees access in a long-term perspective.

Firsthand, discipline-specific data archives/repositories are recommended in case they adhere to the FAIR principles. As far as the criteria above are met, the archive/repository can be considered appropriate for your data. In case there is no suitable archive for your data, a generalist data archive/repository is a good option. Also note that some repositories allow versioning of data if, for example, you find an error and need to upload a new version of your dataset.

 

Discipline-specific data archives/repositories

Humanities, social sciences, health sciences etc.:

  • Finnish Social Science Data Archive (FSD) / Tietoarkisto - a curated archive which provides support for the archivingsprocess, proof reading etc. Deposited data adhere to the FAIR principles. The archive receives both quantitative and qualitative data (not audio-visual materials).

Language research:

Natural sciences:

Service for publishing code:

  • GitHub - platform used by millions for sharing of code for collaboration and version control.
 

Generalist data archives/repositories (all disciplines)

The following are free to use for researchers and maintain a good archiving policy:

  • Zenodo - archiving of all types of materials, such as "publications ("book, book section, conference paper, journal article, patent, preprint, report, thesis, technical note, working paper, etc.), posters, presentations, datasets, images (figures, plots, drawings, diagrams, photos), software, videos/audio and interactive materials such as lessons").
  • FAIRdata.fi -  a service package which consists of IDA for data storage (for stable datasets, open or closed), the Qvain tool for describing and publishing datasets, and the data finder Etsin for exploring available datasets
  • EUDAT -  B2SHARE - European network for storing data with servers in Finland, and later a possibility of choosing from different European server locations.
  • Figshare - easy publishing of all research data. All formats, including video and datasets. CC licenses.

 

Other services and tools for research

  • EOSC - European Open Science Cloud, a collection of open science services and tools for researchers in Europe
  • OSF.io - Open Science Framework, platform for open workflows

Guides for publishing and opening data

Why open data?

Opening data make them available for re-use for other researchers and for the entire society. Open data also promotes the transparency and reliability of research.

CC BY Danny Kingsley & Sarah Brown

Fairdata.fi - Finnish services for FAIR data

The Finnish FAIR data services, provided by CSC, consists of IDA for data storage, the Qvain tool for describing datasets, and the data finder Etsin for exploring available datasets.

As a minimum effort, it is recommended to register a description (metadata) of your dataset in Qvain, which gives the dataset an entry page that allows other researchers to find it in Etsin and to refer to it.

Open licenses for research data

Applying an open license is a way of  informing others of what rights they have to share and reuse one's research data. Without a license, potential valuable reuse may be unwillfully restricted.

  • Often Creative Commons licenses are used, complemented with CC0 (for data which is not covered by copyright law).
  • Graphics, diagrams etc. are covered by copyright law and are often included in publications, see the open access guide.
  • Algorithms, scripts, programs can be licensed with MIT and GNU GPL-licenses.

How can I protect my data?

It is possible to legally protect and restrict the reuse of one´s data referring to:

  • contractual conditions (the owner of the data restricts the rights to use them) 
  • as a business secret (it may contain business secrets or such can be inferred from it)
  • as a database or catalogue 
  • as a work (copyright)

Publishing data with a restrictive license (CC-BY-NC-ND) is to be preferred to keeping it on your own harddrive.

Opening the data under the license CC-BY (or CC0 including a requirement to quote) is explicitly giving others the right to reuse it, which can make it easier in the long run because the person who wants to use the data will not have to contact you and any other co-owners.