BASIC INFORMATION
What are research data?
“Research data are (digital) data that, depending on the scientific context, are related to, originate from, or are the result of a research process.” (Kindling et al. 2013).
Scientific data are created by a variety of methods, depending on the research question. These include studying source material, experiments, measurements, descriptions, surveys, or polls. The data are the basis of scientific results. This results in the recognition of discipline- and project-specific data with different requirements for processing and managing such data.
Since research data are necessary to verify the results based on them, the preservation of such data is a recognised part of good scientific practice (see, for example, “DFG-Leitlinien zum Umgang mit Forschungsdaten” (Guidelines on the handling of research data)).
Research data include measuring data, laboratory results, audio-visual information, texts, survey data, objects from collections or samples that are the result of, were developed, or evaluated during scientific work. Software, simulations or images are also included.
Publishing research data provides opportunities not just for researchers, but also for science in general:
Opportunities for researchers
- Your research becomes more visible. Publications are cited significantly more often when the data are publicly available (Piwowar and Visions 2013).
- The publication of research data is gaining more and more recognition as a scientific achievement.
- You can increase the quality and credibility of your research by offering others a chance to verify your data.
- You comply with the current requirements of the research funding agencies (see above)
- You can secure your own research investment by setting blocking periods.
Opportunities for science
- The publication of data opens up new potentials for research as data become available for re-analysis in the context of new research questions and methods or for combining data from different sources.
- It also reduces the production of redundant scientific data, which saves time and money.
Research funding agencies and the scientific community increasingly demand Open Access to research data (achieved by publication of data in Open Access) so that published research results can be verified and the data accessed for reuse.
The science ministers of the G8 signed one of the most important international commitments to Open Science in 2013: “…to the greatest extent and with the fewest constraints possible, publicly funded scientific research data should be open […] whilst acknowledging the legitimate concerns of private partners.” (G8 Science Ministers 2013). The German Federal Ministry of Research supports research data management initiatives which includes the publication of research data.
The geoscientific community engages internationally under the leadership of AGU, Earth Science Information Partners (ESIP) and Research Data Alliance (RDA) in the project Enabling FAIR Data for open and FAIR research data and propagates the publication of research data in the Enabling FAIR Data Commitment Statement.
Some research funders, like the EU (Pilot on open research data in the HORIZON2020 programme) and the DFG (“Leitlinien zum Umgang mit Forschungsdaten” (Guidelines on the Handling of Research Data)), urge scientists to publish research data.
The Registry of Research Data Repositories re3data offers a global overview of data repositories, especially suitable if you search for data. To find data repositories in re3data which A) accept data upload from scientists worldwide and B) adhere to high standards defined by the Enabling FAIR Data project, use Repository Finder, a tool that filters the large re3data database accordingly. You may as well follow our recommendation of three data publication services:
GFZ Data Services
GFZ German Research Centre for Geosciences cooperates with FID GEO in data publishing. It issues Digital Object Identifiers (DOI) for data sets since 2004 and publishes data sets in GFZ Data Services, the data repository of GFZ. Almost all geoscientific disciplines are covered by this service.
Datasets are submitted online by the author and are described using an online metadata editor. This editor is easy to use and provides extensive help functions. You may refer to the “Quick Start Guide for Data Publications”.
As a special feature, GFZ Data Services offers the possibility to publish the research data of entire projects or all data records of a particular institution on websites which have the “look and feel” of the respective project or the respective institution. Additionally, GFZ Data Services supports institutions to harvest metadata from the GFZ repository to transfer the data to their own systems, e.g. a university bibliography.
PANGAEA
Pangaea is an open access database that archives, publishes and make available georeferenced data from earth and life sciences. Long-term availability of the content is guaranteed through the Pangea’s operating institutions, the Alfred Wegener Institute Helmholtz Center for Polar and Marine Research and the MARUM Center for Marine Environmental Sciences at the University of Bremen.
Records are electronically uploaded by the authors and described by means of an electronic form. Data and metadata uploads start with a registration and are carried out via a so-called ticket system, which is described in more detail here. It also provides detailed information on data upload, workflow and possible cost sharing.
EarthChem Library
The EarthChem Library is a data repository that archives, publishes and makes accessible geoscientific data and other digital objects. EarthChem Library publishes analytical data, data syntheses, models, technical reports, etc. The EarthChem Library Submission Guidelines provide detailed instructions for submitting research data. Public access to submitted data sets can be restricted with an embargo up to a maximum of 2 years.
Access to data in the EarthChem Library is open (Open Access) under the terms of the Creative Commons license BY-NC-SA 3.0. The EarthChem Library ensures long-term availability of its content by working with the Columbia University Libraries Digital Program. Data sets in the library are equipped with a Digital Object Identifier (DOI). The EarthChem Library is part of IEDA, a publishing agent of the DataCite Consortium.
Just send us an e-mail or call. Publishing research data is usually straightforward, but depending on the type and amount of data different things may need to be considered. We will help you and make sure that your data are citable, licensed and permanently available and that the data can be found worldwide, all according to the latest standards.
Publishing data means that the data can be accessed and cited. For this purpose, it is important to create a persistent electronic “guide” to make sure that the data can always be found on the Internet, even if the web address (URL) under which the data can be accessed changes. Scientific publications usually employ the DOI (digital object identifier) to fulfil this purpose. Our “Frequently Asked Questions” list contains more information on the subject of DOIs and how to create a DOI (we´ll create a DOI for you, indeed).
In order for your data to be found on the Internet, the data must be described in a way that ensures search engines can read the information. This description is realised by means of metadata. Our “Frequently Asked Questions” list contains more information below on the subject of metadata and how you can use metadata to describe your data set. For other people to evaluate and use your data, the data often require another, human-readable description in addition to the machine-readable metadata.
An electronic licence attached to your data will indicate how other people can reuse your data. The appeal for the use of open licences in science (“Nutzung Offener Lizenzen in der Wissenschaft”) by the Alliance of Science Organisations in Germany, which is also supported by research funders such as the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), recommends open licences. In this context, Creative Commons Licences have proved a good way to publish openly accessible research data. You can choose between several options: a CC BY license, for example, grants other people free use of your data; they can change and even redistribute your data but they always have to include you as the author of the data. In most cases, we recommend this type of licence. Our “Frequently Asked Questions” list contains more information on the subject of electronic licences and on our recommendations.
Yes. And only you decide what others may or may not do with your data. In order to make clear in which way your data may be reused or not, a licence directly connected to the data defines the copyrights and access rights. We will be happy to recommend a suitable licence and advise you on this topic.
Yes. The creator of the data of course has the right of first publication. It is therefore possible to implement embargos, in particular, but not limited to work associated with earning academic credit such as PhD theses. Even though the data are not publicly accessible during the embargo, they can already be published: thanks to the assigned DOI your data can be cited and your data publication can be found by search engines thanks to the assigned metadata.
Most publishers of geoscience journals support the publication of research data, some of them even require it. A great number of publishers have signed the COPDESS “Statement of Commitment“, in which they commit to support the publication of data and to accept citations of data sets in the lists of works cited in scientific articles. Copernicus, Elsevier, Science, SpringerNature and Wiley as well as societies such as the American Geophysical Union, the European Geosciences Union and the Geological Society of London have signed this statement.
Discipline-specific data repositories offer additional benefits that publishers of journals usually do not offer.
For more than 10 years now, publishers have offered the chance to add electronic data supplements to scientific articles. For a long time, this used to be the only option for publishing research data. Now there are data repositories (= special electronic archives for research data only), and these assign great value to the publication of reusable data: the quality of the metadata (and data) is checked by scientists from the corresponding field. Moreover, data repositories have a lot of experience in curation and long-term archiving of data.
This is why even many publishers of journals now recommend publishing research data through dedicated data repositories and linking the data to the research articles published by them. This recommendation is also part of the COPDESS “Statement of Commitment” signed by, for example, Copernicus, Elsevier, Science, SpringerNature, Wiley and societies such as the American Geophysical Union, the European Geosciences Union and the Geological Society of London.
What are the advantages of publishing research data with FID GEO?
The service is supervised by specialists at the German Research Centre for Geosciences, GFZ, in Potsdam, who are familiar with geoscientific data. They oversee the GFZ data repository, which has been publishing data sets with DOIs since 2004 and are, at the same time, actively involved in the international development of state-of-the-art research data management.
- DOIs (Digital Object Identifiers) are assigned to guarantee the unique and permanent identification of data on the Internet.
- We make sure that your data appear in catalogues relevant for geoscientists around the world, thus guaranteeing the highest possible level of visibility in your research community.
- We know which metadata are best suited to describe geoscientific data.
- We also offer advice on data documentation and are able to provide information on data quality.
- We make sure that data and corresponding published texts are electronically connected.
- We can also advise you regarding legal aspects, in particular on topics such as control and reuse of your data. Depending on your needs, we can recommend the appropriate licenses that clearly define how others can use your data.
The FID GEO service is limited to the publication of research data that provide the basis for an article published in a scientific journal. If you would like to publish other types of data, such as data not yet linked to a text published in a scientific journal or data you don’t plan to link to a published text, we will be happy to advise you.
Just send us an e-mail or call. Publishing research data is usually straightforward, but depending on the type and amount of data different things may need to be considered. We will help you and make sure that your data are citable, licensed and permanently available and that the data can be found worldwide, all according to the latest standards.
Metadata are data that provide information about data. They consist of structured information that describes or helps localise resources or that makes it easier to access, use, or handle the corresponding resources in another way. The National Information Standards Organization offers a detailed description of what metadata are and how they are used: Understanding Metadata.
There are different types of metadata. The metadata for data discovery are the most important. In addition, there are structural and contextual metadata.
In order to make the automatic exchange of metadata possible, standardised, machine-readable metadata have been developed. These standards usually refer to the metadata for data discovery and include, for example, information on the authors and/or creators of the data, the title of the data set, the year the data was published and the geographic location, but also a brief description of the data set and the cross references to related published articles.
Contextual and structural metadata are information required for reusing the data, such as an overview of the units of the parameters in a table or information on data processing or an overview of all individual files of a data package. This type of metadata is often made available in the form of README.txt files or other supplementary documents.
You provide the metadata for data discovery (see answer to previous question) online, using an online metadata editor. Although the editor includes extensive user support, most users will find it easy to fill in the form. Since metadata play such an important role for your data to be found on the Internet later on, we will check your entries before publishing them.
Contextual and structural metadata (see answer to previous question) are provided in a useful format depending on the data set. We will be happy to give advice.
Not all data require the same type of metadata. This is why different metadata schemes for different types of data and data from different disciplines have emerged over time. We use the DataCite metadata scheme.
A DOI is an online reference assigned to a digital resource (e.g. an article in a journal or research data) to give it a unique and permanent reference on the Internet. The DOIs are permanently connected to the digital resource – regardless of changes on websites or servers being shut down (in this case a DOI is simply rerouted to a new URL). The use of DOIs, for example, prevents the occurrence of dead links when publishers change the web address of a server. Among all the different ways to reference digital objects on the Internet permanently, DOIs have become the leading system when publishing text and data.
We will take care of assigning a DOI to your publication. DOIs are assigned based on the rules of the International DOI Foundation. The German Research Centre for Geosciences, GFZ, is a DOI publication agent and assigns the DOIs for data publications of GFZ Data Services. FID GEO takes advantage of the competency and infrastructure of GFZ Data Services not only to assign DOIs, but for the entire process of data publication.
There are no fixed specifications, but recommendations are offered by, for example, the UK Data Service and Stanford University. We will be happy to advise you.
In general, the following applies: data should be exchangeable without barriers and readable by others. Ideal formats are non-proprietary, unencrypted and commonly known across your research community and are based on open, documented standards. If the problem of proprietary formats occurs, in particular in the case of commercial software, you may be able to convert the data into open, standardised formats. Open and common formats are always preferable to proprietary formats if they achieve the same results or can be used accordingly without much effort.
The data are stored in the data repository of the German Research Centre for Geosciences, GFZ, where they are permanently available. The GFZ has been publishing geoscientific research data since 2004 and ensures technical integrity and long-term availability of the data.
Data sets that have been assigned a DOI must not be changed. It is, however, possible to assign a new DOI to changed data sets.
Exceptions are constantly growing dynamic data sets, such as the time series from a climatological station. Here, new data can be added to the existing data set without changing the DOI IF the already published data have not been changed. However, the moment the already published data set is changed (e.g. after the removal of outliers or a recalibration) a new version of the DOI must be created. When a DOI gets a new version, this will be indicated in both the original and the new version.
Subject-specific services such as FID GEO are deeply rooted in their disciplines and usually offer specific advantages for that reason, for example regarding the documentation of the data or the visibility for the research community. Our “Frequently Asked Questions” list more advantages of publishing research data with FID GEO.
If your home institution agrees, the presentation of your data published with FID GEO on the Internet can be adapted to the look and feel of your home institution’s websites. It is possible, for example, to display all FID GEO data publications of a specific university on the Internet with the web design elements of that university. This emphasizes your affiliation with your home institution and at the same time increases the visibility of the institution on the Internet.
If your home institution also offers the publication of data, you should inform the institution of your publication with FID GEO. This is important to make sure that the metadata of your publication are also added to your home institution’s catalogue and can be used there for, for example statistical evaluations with regard to the performance-based allocation of resources. We will be happy to contact your institution.
There are different initiatives that promote progress in the publication of research data. Three of the most important initiatives for the publication of geoscientific data are described below.
Coalition on Publishing Data in the Earth and Space Sciences (COPDESS)
With its initiative, this group of publishers and data facilities has created a framework for the joint development of policies and approaches to how to publish and cite data. In its “Statement of Commitment” the group recommend, among other things, to make specialist databases accessible. In this statement the publishers commit to providing information on the storage location and availability of the data related to a journal article. Moreover, the citation of data sets in scientific publications is equivalent to the citation of published journal articles.
Since January 2015, more than 40 publishers and scientific institutions, organisations and data facilities have signed the statement. These include publishers like Copernicus, Elsevier, Science, SpringerNature and Wiley as well as societies such as the American Geophysical Union, the European Geosciences Union and the Geological Society of London.
Joint Declaration of Data Citation Principles (JDDCP)
Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing (FAIR Principles)
supported by
Licensed under the Creative Commons Attribution International 4.0 license.