Libguides: Research Data: Open and/or preserve after project end

What happens to the research data after the project?

Your research data are valuable! At the latest at the end of the project, the data and metadata (the description of data and its origin) must be archived and/or published/opened in a responsible manner. Note that several major research funders require FAIR and open data, unless there are grounds to keep the material closed or destroy it.
You can transfer the material to an archive service that takes care of it and ensures that others can access it. You decide yourself for which purposes your research data become available (e.g. research and teaching), and whether you open all or only part of your data. Research ethical, legal and practical aspects can limit how and to what degree research data can be archived, accessed and published.
Large amounts of data/material are collected within a research project. It is worth thinking already in the planning stage about which datasets may be useful or important to save. The researcher must also plan when and how the data will possibly be destroyed. However, it is worth keeping data for at least 3–5 years for possible verification of research results.

FAIR and open data

ÅAU:s open science policy promotes the openness, transparency and reuse of research data following the FAIR principles, according to which research data should be findable, accessible, interoperable and reusable. According to the national policy on open research data and methods (2021-2025), research data and methods should be made as open as possible, and as closed as necessary. In addition, data should be managed in a proper way to meet the FAIR principles.

FAIR data and open data are not synonymous, although the terms appear together. Research data may be FAIR without being (completely) open, and data may be open without following the FAIR principles. The aim is to make research data as open and as FAIR as possible, following needed juridical and research ethical aspects.

For example, the metadata (description of a dataset) may meet the FAIR principles, in case it is not possible to completely open the data (such as sensitive data or data related to patents, innovations). In many cases, anonymized data (from which personal, sensitive, confidential data have been deleted) can be archived and/or published/opened for future use. Embargoes can be applied when the data cannot be made available immediately.

Where should I archive, publish or open my data efter the project?

How to find a data archive/repository for your research data:

Use search services to explore data archives/repositories:
- and the FAIR filter
- - a registry of data archives/repositories
- Discipline-specific and general data archives are found below
Check your funder's requirements for FAIR and/or open data
Check the publisher's/journal's submission guidelines or data policy
- Some publishers/journal recommend or reuiqre that authors deposit the data in specific data archives/repositories. More information is usually found in the submission guidelines or in a data policy.
- Some publishers' data policies are listed at FAIRsharing.org.
Where do researchers in your field publish/open their data?
- A rule of thumb is to pubslish/open data in a data archive/repository where you find relevant and useful data in your field; then you can be sure that researchers in your field will find them.

When archiving data, it is important that

you inform research participants in advance about the intention to open data
the dataset receives a persistent identifier (PID), such as a DOI, handle or similar, which serves as a permanent link on the internet,
the metadata (description) follows specific standards and is detailed enough, so that others can find and understand it
the data/materials are provided with a license (Creative Commons, GNU etc.) which guarantees good conditions for re-use, and
the data archive/repository guarantees access in a long-term perspective.

Firsthand, discipline-specific data archives/repositories are recommended in case they adhere to the FAIR principles. As far as the criteria above are met, the archive/repository can be considered appropriate for your data. In case there is no suitable archive for your data, a generalist data archive/repository is a good option. Also note that some repositories allow versioning of data if, for example, you find an error and need to upload a new version of your dataset.

Discipline-specific data archives/repositories

Humanities, social sciences, health sciences etc.:

Finnish Social Science Data Archive (FSD) / Tietoarkisto is a curated archive that offers support in the archiving process and in following the FAIR principles. FSD accepts anonymized quantitative and qualitative research data on Finnish society, the Finnish population and cultural phenomena, but not audiovisual data. FSD is a digital repository for long-term archiving, and the researcher retains ownership of the material. Contact FSD well in advance and offer your research data to receive support and instructions. Remember to inform the research participants of your intention to archive in FSD.
Traditional and research archives archive, in a traditional sense, qualitative research material from cultural or social science research, including audiovisual material. Some examples are: The Cultural Sciences Archive Cultura, the Society of Swedish Literature in Finland (SLS) and the Archives of the School of History, Culture and Arts Studies (SHCAS Archives). Please note that such archives have their own processes and forms that must be followed. Please contact them well in advance of data collection for information on how to formulate your privacy notice and other information for the research participants. Traditional and research archives prefer material that is not anonymous or pseudonymous, and the materials are donated to the archive so that ownership is transferred to the archive.

Language research:

Language Bank of Finland / Språkbanken i Finland / Kielipankki - services for language research, contains speech and text corpuses with annotation in more than 60 languages.

Natural sciences:

EMBL EBI’s data submission wizard - the abbreviation stands for European Molecular Biology Laboratory's European Bioinformatics Institute, and the listing is for repositories for different biology subjects.
ELIXIR Deposition Databases for Biomolecular Data - ELIXIR is an European intergovernmental organization for bringing together elektronical research resources, such as data repositories.
Scientific Data Journal´s Recommended Data Repositories - Scientific Data is a Nature journal that requires the opening of the data presented in their articles.

Service for publishing code:

GitHub - platform used by millions for sharing of code for collaboration and version control.

Generalist data archives/repositories (all disciplines)

The following are free to use for researchers and maintain a good archiving policy:

Zenodo - archiving of all types of materials, such as "publications ("book, book section, conference paper, journal article, patent, preprint, report, thesis, technical note, working paper, etc.), posters, presentations, datasets, images (figures, plots, drawings, diagrams, photos), software, videos/audio and interactive materials such as lessons").
- Open datasets and other open materials can be shared in Åbo Akademi's own community in Zenodo, Åbo Akademi University - ÅAopen!
FAIRdata.fi - a service package which consists of IDA for data storage (for stable datasets, open or closed), the Qvain tool for describing and publishing datasets, and the data finder Etsin for exploring available datasets
EUDAT - B2SHARE - European network for storing data with servers in Finland, and later a possibility of choosing from different European server locations.
Figshare - easy publishing of all research data. All formats, including video and datasets. CC licenses.

Other services and tools for research

EOSC - European Open Science Cloud, a collection of open science services and tools for researchers in Europe
OSF.io - Open Science Framework, platform for open workflows

Data archives and repositories according to field
A more extensive list of archives and repositories according to field. PDF, 155 kB.

Guides for publishing and opening data

Five steps to decide what data to keep (Digital Curation Centre)
Researcher’s check list for publishing research data (Vastuullinentiede.fi)
The FAIR principles underpin good quality in research (Vastuullinentiede.fi)
How do I license my research data (OpenAire) - OpenAire guide for licensing research data

Why open data?

Opening data make them available for re-use for other researchers and for the entire society. Open data also promotes the transparency and reliability of research.

CC BY Danny Kingsley & Sarah Brown

Fairdata.fi - Finnish services for FAIR data

The Finnish FAIR data services, provided by CSC, consists of IDA for data storage, the Qvain tool for describing datasets, and the data finder Etsin for exploring available datasets.

As a minimum effort, it is recommended to register a description (metadata) of your dataset in Qvain, which gives the dataset an entry page that allows other researchers to find it in Etsin and to refer to it.

Open licenses for research data

Applying an open license is a way of informing others of what rights they have to share and reuse one's research data. Without a license, potential valuable reuse may be unwillfully restricted.

Often Creative Commons licenses are used, complemented with CC0 (for data which is not covered by copyright law).
Graphics, diagrams etc. are covered by copyright law and are often included in publications, see the open access guide.
Algorithms, scripts, programs can be licensed with MIT and GNU GPL-licenses.

How can I protect my data?

It is possible to legally protect and restrict the reuse of one´s data referring to:

contractual conditions (the owner of the data restricts the rights to use them)
as a business secret (it may contain business secrets or such can be inferred from it)
as a database or catalogue
as a work (copyright)

Publishing data with a restrictive license (CC-BY-NC-ND) is to be preferred to keeping it on your own harddrive.

Opening the data under the license CC-BY (or CC0 including a requirement to quote) is explicitly giving others the right to reuse it, which can make it easier in the long run because the person who wants to use the data will not have to contact you and any other co-owners.