Libguides: Research Data: Documentation and metadata

Why documentation and metadata?

Documentation refers to describing what happens to the data during the research process. Researchers are experts on their own research data/material and are, consequently, those who have expertise to create the documentation needed. Data without documentation and metadata are meaningless, since they cannot be understood and re-used. If it is difficult to assess what documentation is needed - imagine what an outsider needs to know in order to understand what your data is about and how it can be used.

The researcher documents the work throughout the project on several levels: 1) project level (background information, methods, etc.), 2) file level (relations between files), 3) variable level (descriptions of variables and their origin, use etc.). The FAIR principles (findable, accessible, interoperable and reusable) must characterize the entire research process with reproducibility as a criterion for how the research process and data are documented and which data is opened in the long term.

Examples of documentation are:

code books/schemas, lab books, field diaries, notes,
descriptions of settings and calibrations for instruments and equipment,
desriptions of methods used,
readme-file: a .txt fil which describes the origin of the data and its contents,
administrative documents related to the research project, such as research plans, data management plans, contracts and agreements, research permits, scientific publications, permissions for data use, licenses, etc.

In general, the practices for documentation vary throughout disciplines and depends on the needs of the project.

The advantages of good documentation are:

The contents of the research project and its data are made understandble for the researcher and for others. Without documentation it is difficult to remember afterwards what has been done, when and how.
The risks for wrong interpretations and misunderstandings are minimized.
Documentation is needed at the end of the project, at the latest when archiving and publishing/opening the research data. Proper documentation practices already from project start make the archiving process more smooth.
Detailed documentation is necessary for validation of results and potential replications of the study.

Metadata means "data about data" and refers to information about the data needed for understanding and interpreting the data and how to use it: for example, the origin of the data, who has collected/generated it, time, place, methods, subject words, which describe the main content. Consequently, metadata is a crucial part of the documentation. A central aspect of the FAIR-principles is that the metadata is structured and machine-readable, which means that the data can be transferred between different data services.

CHECKLIST: Can your research be reproduced?

1. Is your work steered by a data management plan throughout the entire data lifecycle so that all the data processing procedures are open and sufficiently documented?

2. How have you taken into account the openness of data and usage restrictions throughout the process?

3. Have you utilised shared practices, such as standards and glossaries, in the metadata and the actual data?

4. Have you systematically documented the lifecycle of the research data and is the description accurate? Are as many data processing stages as possible automated and is the code stored? Are the software and settings used documented (technical documentation)?

5. Have you versioned the data and other outputs?

6. Have you stored the data and its documentation in a referenceable form (persistent identifiers and metadata)?

When reproducibility is ensured, the FAIR principles are implemented naturally as part of the research process and there is no need to create data or documents separately in the publication phase of the article or data.

Lehtisalo, A. et al. (2023). Improve the quality and impact of your research through data management - A guide for making your data FAIR. Zenodo. https://doi.org/10.5281/zenodo.8012377

Guides about documentation

CSC - data documentation https://research.csc.fi/metadata-and-documentation

DCC - disciplinary metadata https://www.dcc.ac.uk/guidance/standards/metadata

Finnish Social Science Data Archive's guide on data description and metadata https://www.fsd.tuni.fi/en/services/data-management-guidelines/data-description-and-metadata/#metadata-standards

Siiri Fuchs, & Mari Elisa Kuusniemi. Making a research project understandable - Guide for data documentation (Version 1.2). Zenodo. http://doi.org/10.5281/zenodo.1914401

Improve the quality and impact of your research through data management - A guide for making your data FAIR - AVOTT working group (2023)

Organize your data files

Tired of not finding what you are looking for? It is a good idea to have a clear system for file management. Some best practices when organizing your files:

Create a simple, consistent and meaningful system for file names already in the beginning of the project. Do not use the same file name more than once.
Create a logical folder structure to more easily search and find files, when needed also a hierarchical structure can be used (main folders and sub folders).
Tag files to find them more easily. A file can be located in one folder only, but have many tags instead.
Use version control to manage old and new versions of the files, either manually or by using software for automatic version control (e.g. in GitLab). Manual version control is usually enough in projects which are not data intensive. Indicate the version in the end of the file name, for example: V02-03
Write a readme file which contains all information needed for interpreting the data, for example the origin of the data, contents, name conventions. Add the readme file in a logical place in the folder together with other data files.