To date, the use of digital repositories to manage and share research data is not a well-established practice among German social and economic science scholars. Scholars in these fields usually resist the idea of publishing their research data. This is unsurprising since only a small number of German research funders and journals engage in data archiving or have replication policies (Gherghina & Katsanidou, 2013; Kvalheim & Kvamme, 2014). The GESIS Data Archive for the Social Sciences1 has provided data-archiving services for decades, storing a number of well-documented and securely-archived social science datasets that are reusable by other scientific scholars. Yet the obvious advantages of sharing data have not produced the emergence of a nationwide data-sharing culture among the German social and economic sciences research community. Even today, most German researchers do not publish the data that serves as the basis of their publications (Vlaeminck & Siegert, 2012; Zenk-Möltgen & Lepthien, 2014).
To change this, GESIS – in collaboration with the Social Science Centre Berlin, the German Institute for Economic Research, and the German National Library of Economics – initiated the development of the research data repository SowiDataNet. This service focuses on the specific demands of social and economic scholars and institutions; demands that were identified during a comprehensive requirements analysis (Droß, 2015). SowiDataNet’s two main objectives are to 1) reduce the reluctance amongst researchers to share their data and 2) promote the ideal of data sharing in the long term.
For far too long the management and dissemination of research data as an essential complement to research findings were neglected in the German research data landscape. Accordingly, research data management is still a relatively new topic to German scholars. As a result, most scientific institutions have neither the required expertise nor do they provide the technical prerequisites for professional research data management. Yet, at the same time, they have to meet the requirements of funding agencies for research data management and data publication. These institutions focus on research and scientific outputs, but often have neither the financial resources nor technical equipment to develop their own research data management tools.
Now in the final stages of development, SowiDataNet is a research data infrastructure service for the social and economic sciences (https://sowidatanet.de/). One central element is a web-based research data repository that enables researchers to safely and permanently document, publish and share their data with others. This new service development aligns to the academic community’s specific requirements for the social and economic sciences. This is (among other aspects) reflected in a detailed requirement analysis. Particular emphasis is placed on the repository’s flexible linkup with the practical workflows of institutional research data management. In many disciplines, research data form the basis of new scientific insights and are, therefore, prerequisites for the resulting publications. In terms of increasing digitisation—and with the majority of research data already available electronically—flexible data distribution and the subsequent use of research data are important. The data’s long-term availability ensures an increase of knowledge and secures the needed degree of transparency and traceability in the research process (cf. Jasny, Chin, Chong, & Vignieri, 2011). The option to replicate data prevents redundant data surveys while contributing to a professional quality assurance and providing a useful basis for scientific education. Additionally, increasing research data availability also increases citations for the primary researchers; generating an essential incentive for the often time-consuming data documentation process (Piwowar, Day, & Fridsma, 2007).
SowiDataNet builds on the development and experiences of the GESIS data sharing repository datorium, which was successfully launched in January 2014 and has been online since then.2 datorium’s development began at the start of 2012 and was initially focused on individual researchers working in the social sciences (Linne, 2013; Zenk-Möltgen & Linne, 2013). In datorium, researchers found an instrument with which they could independently document, share, manage and publish their research data, making it both visible and available to the scholarly community. Nevertheless, to counteract growing disparities in the social and economic scientific data landscape, additional development work focusing on integrating research data from different sources was needed. In this context integrating data from scientific institutions is of special interest. Only a small number of research institutes are able to assemble and continually operate research data infrastructures using their own resources. For this reason the project SowiDataNet was initiated. GESIS, in collaboration with the Social Science Centre Berlin (WZB), the German Institute for Economic Research (DIW), and the German National Library of Economics (ZBW), started SowiDataNet’s development and is funded by the Leibniz Association.
While datorium is aimed at individual social scientists who are not necessarily employed at a research institution, SowiDataNet focuses on data produced at scientific institutions. The thematic scope and functional requirements of datorium need to be extended to the field of the economic sciences and the specific requests of scientific institutions. As a consequence, all workflows, metadata fields, controlled vocabularies, curation processes, etc. have to be adjusted to fit the demands of scientific institutions and a range of scientific fields. Furthermore, SowiDataNet, in contrast to datorium, will support data versioning and is designed as both an internationalized and a localized application.3 When SowiDataNet goes online in 2017, GESIS will work towards consolidating datorium and SowiDataNet to provide a single user-friendly service for different target groups. The main focus of this new repository service rests upon quantitative data from the social and economic sciences and therefore on two specifically empirically-oriented scientific disciplines. The overarching objective is the implementation of a national data infrastructure which aims at centralising research data from scientific institutions of the social and economic sciences in Germany. Special attention is paid to the research data generated by scientific institutions, since these often do not have their own research data infrastructure. SowiDataNet will be provided to different types of research institutions such as universities or university faculties, non-university research institutes (e.g. from the Leibniz Association like the above mentioned German Institute for Economic Research or The Berlin Social Science Center), other not-for-profit research centres or federal institutions from the social and economic sciences.
With SowiDataNet, research and scientific institutions will be provided with a convenient and professional tool to document and publish their unpublished research data, safely archiving them for reuse by the scientific community. Employees at the participating institutions will have to qualify for data curation depositing. GESIS provides these employee qualifications trainings, for instance with regular research data management trainings. But, as already mentioned, funder’s terms increasingly require specified research data management plans and many institutions are already aware of this imperative. In consequence, they are starting to either hire employees with appropriate qualifications or are educating existing staff within professional training workshops.
To date, German research data holdings are heavily fragmented and this precludes user-friendly, centralised, and therefore quick, data retrieval. Although research data are held by individual scholars, research data centres or scientific institutions in a more or less standardised form, all too often these data are neither visible nor available to the scientific community. Consequently, a comprehensive overview of previously conducted research within a certain topic cannot be easily obtained. Internationally, the situation is similar and that is one of the reasons why the European Union initiated a framework for a Collaborative Data Infrastructure (European Union, 2010), e.g. resulting in the Open Research Data Pilot (OpenAIRE, 2016). Improvements were also implemented through the use of persistent identifiers, such as DOIs or URNs for research data. For the German social and economic sciences this was introduced with the data registration agency da|ra, assigning persistent identifiers to research data of several major German data centres. However, there is no common service for data management, archiving, and publishing within a single system for different institutions (Hausstein & Zenk-Möltgen, 2011). Due to this major hurdle, data reuse by other scholars might require dealing with high levels of complexity and effort, or in the worst – yet not uncommon – case, data reuse is simply impossible. SowiDataNet aims to solve this unsatisfying situation by integrating decentralised research data together within one repository-network. The core of this network will be a web-based, independent infrastructure that allows for low-threshold standardised data documentation and distribution of research data.
3. Current Situation in Germany
As early as 1998 in its Memorandum “Safeguarding Good Scientific Practice” the German Research Foundation’s (DFG) seventh recommendation, “Safeguarding and Storing of Primary Data” stated that “Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin” (German Research Foundation, 1998, p. 74). Since then, it has been discussed several times if there should be a recommendation by funding agencies to preserve research data for later use, but there was never a general agreement that this should be mandatory in every case. In 2015, the DFG published guidelines for research data handling, encouraging researchers to evaluate which research data of a project application might be of interest for other contexts and how they can be preserved and made available (German Research Foundation, 2015). Nineteen years after the 1998 Memorandum there are still too many German scientific institutions that cannot yet follow this basic recommendation.
Institutions with modern technical equipment and a budget for data preservation can easily comply with this recommendation. Unfortunately, not all institutions in Germany are sufficiently provided with the technical and financial resources to meet this demand. And even though “data sharing is essential for all verifications and all secondary analyses” (Fienberg, Martin, & Straf, 1985, p. 9), archiving and sharing social science research data is still the exception rather than the rule. One of the reasons for this dissatisfying situation is that data are viewed as the basis for publications, not as a research output in their own right. Beyond that, datasets are oftentimes neither well documented, nor professionally archived, nor made available to the research community (Kühne & Meusel, 2007; Nelson, 2009; Weichselgartner, Günther, & Dehnhard, 2011). This applies to research conducted on an international scale and even the research published in social science or economics journals that have a dedicated data policy for authors in place (Gherghina & Katsanidou, 2013; Vlaeminck & Siegert, 2012; Zenk-Möltgen & Lepthien, 2014). Given the current situation, the German Council of Science and Humanities issued specific recommendations for Germany regarding the improvement of research data management and data availability (Wissenschaftsrat, 2012). These recommendations cover the increased publication of data; assignment of persistent identifiers to enable citation; data curation services for error corrections, as well as quality metadata; and, possibly, receive funding and/or scientific acknowledgement for research data management. Furthermore, these recommendations are all in line with similar international initiatives, e.g. the FAIR4 guiding principles to make data findable, accessible, interoperable, and reusable or the several DA-RT5 policies to enable data access and research transparency.
As stated in the “Report Commissioned by the Research Information Network,” the attitude of scholars towards data sharing is often “willing but unable” (RIN, 2008, p. 25). This can be seen in a practice such as storing data on hard drive, DVD, CD, USB stick, etc., which prevents systematic documentation, management and data sharing. The reasons for these habits are a lack of knowledge as well as a lack of appropriate repository tools for professional research data management. The risk of data loss can only be minimised by professional research data management with reliable tools such as SowiDataNet or datorium, where trained data curators offer practical guidance to data providers. This data sharing ensures research integrity and facilitates replication. To sum up, it can be stated that research data in Germany in the social and economic sciences is still not adequately preserved and made available for replication by the research community as demanded by the German Research Foundation.
4. Research Data Management with SowiDataNet
In order that researchers can conduct standardised data documentation autonomously, the concept of SowiDataNet applies a user-friendly design for a self-explanatory and intuitive usage. The repository provides an easy way to document, archive and share research projects in combination with its related material. Additionally, the loss of data and metadata can be prevented. International metadata standards, such as the Metadata Terms by the Dublin Core Metadata Initiative6 and the DataCite Metadata Schema7 are being applied, and compatibility to the DDI-Lifecycle8 standard is being pursued. This is particularly of interest for smaller research projects, since data from these projects are at present practically invisible and usually not available for reuse. To address this, SowiDataNet focuses especially on standardised archiving of these smaller projects, enhancing their accessibility and reuse. As opposed to bigger, national or international survey programs – e.g. the European Value Survey (EVS) or the International Social Survey Programme (ISSP) – that are well-documented, published and intensively reused by other scientists, in small research projects budget rarely covers the costs for archiving and publishing data and metadata. For this reason, data documentation with SowiDataNet will be as time-saving as possible. Institutional researchers will be supported by the data curators of their affiliated institution. Individual data providers without an institutional affiliation will be supported by GESIS’s data curator and institutional data curators, in turn, will be supported by GESIS.
Bringing together data from different sources (institutions with or without their own repositories, research data centres or individual scholars) has the advantage that research data from German social and economic sciences can be archived and documented according to common standards. At the same time the individual requirements of any one institution or research data centre will be taken into account (e.g. access protection, user rights management, versioning, institutional data management and curation, secure data storage, professional support for data management). Centralising databases from different sources also enables a user-friendly integrated search option for research data, whereby – whenever possible – direct access to these data will be facilitated.
Another benefit of SowiDataNet will be a substantial improvement enabling the effective long-term archiving of social and economic sciences research data by means of an eventual linkup to the GESIS data archive. Long-term archiving involves bit stream preservation as well as technical and intellectual readability. The availability of primary researchers (in case there are questions by secondary users concerning the data) is often difficult if not impossible to establish. For this reason, SowiDataNet’s goal is to enable research data documentation in a manner that ensures later understanding and reuse even without the assistance of those who generated the data.
5. Content and Features of SowiDataNet
As mentioned above, the main purpose of SowiDataNet is to archive unpublished data from smaller research projects held in scientific institutions with no or small budget for data documentation. This may be comprised of primary as well as secondary data, syntax for conducting replication analysis, data sets extended by contextual data, or in some cases even experimental data. Data providers can individually decide which kinds of access rights to provide for their data in order to maintain full control over the data. Above that, all submitted data will be reviewed by an institutional curator. This ensures high documentation quality as well as data protection.
To lower the threshold of data documentation, SowiDataNet has only five mandatory metadata properties (title, primary investigator and/or institution, publisher, access category and data resource type), but it also contains various optional properties for data specification. Using controlled vocabularies, e.g. from the DDI Alliance, DataCite, da|ra, SSOAR, and certain ISO standards such as ISO 639 for languages and ISO 3166 for countries, the repository supports standardised data documentation allowing for better search results. All published research projects receive a digital object identifier (DOI) to enable reliable identification and citation of a research data set.
The first prototype of SowiDataNet has already been implemented and was tested in 2015 by a heterogeneous group of researchers from different institutions. At present the prototype is being adjusted according to the results of this test and preparation has started for the second test phase.
5.1. Research Data Types and Target Group
In order to include a wide variety of data types, the project uses a very broad definition of the concept of “research data.” Both primary data, newly generated with empirical research tools, and secondary data, transformed from existing sources, may be included in the repository. Moreover, it is possible to incorporate combinations of primary and secondary data as well as routines or scripts used for data transformations and analyses. Additionally, documents that facilitate subsequent use of data (e.g. questionnaires, codebooks, or technical reports) can be provided. SowiDataNet initially focuses on projects with no or little staff and financial resources for the documentation and archiving of research data. As a rule, these are mid-sized and small research and doctoral thesis projects. The professional target group is the social and economic research community of Germany. The included scientific disciplines altogether have the largest number of students and university departments in Germany.
In terms of the target group, it is necessary to distinguish between data depositors and data users. SowiDataNet enables data depositors to document, archive and render accessible their research data in an easy manner. Data users shall be able to use metadata for research, use data for re-analyses or for new research projects, with SowiDataNet offering a central access for working with social and economic research data. Linking the system to graduate colleges, special research areas (the German Sonderforschungsbereiche), or experimental laboratories will also be considered in the course of the project. The question of whether and to what extent it is reasonable to link research data from more comprehensive projects and institutions as well as established data collections to the network will be assessed during the operating phase. This assessment will be performed in different ways. All users will have the option to give feedback via the repository. This feedback will be collected and reviewed by GESIS. Personal and constant contact with the participating institutions will be of great help evaluating the system. Since GESIS takes an active part in the curation process, problems or missing features can be identified quickly. Also, GESIS conducts regular research data management training, during which time any emerging requirements on or for the scientific community can come to light.
5.2. Institutional Curators Mediate between Research Activities and Research Infrastructure
Empirical social and economic research continues to take place essentially in institutional contexts, both inside universities and in non-university settings. As already mentioned above, only a small number of research institutes are able to assemble and continually operate research data infrastructures based on their own resources, thus meeting the two basic requirements of long-term preservation and accessibility of research data for the community. It is clear that high-quality curation of research data is a very time-consuming endeavour that poses a particular challenge in terms of research data management. Naturally, the capacities of research institutes for curating research data are limited. This often leads to the result that quality standards of data documentation have to be substantially reduced. This is unfortunately being encouraged by a number of international repository solutions that allow for data publication without a review. The advantage of data repositories without review is undoubtedly that data are directly traceable and citable via a persistent identifier. The drawback is that the quality of the data documentation is too often not sufficient to make the data reusable. Consequently, these data are mostly not suitable for a direct subsequent use. If, in addition, it is not possible to get in touch with the primary researcher, a subsequent use of data may in fact be impossible. The solution for this dilemma can be found in an intermediate instance to mediate between the service infrastructure and the individual researcher: the institutional data curator.9
SowiDataNet will, therefore, implement functions that are immediately directed towards institutional data curators, and which can be integrated within organisational workflows. In a first step, researchers can generate new data projects, upload research data and describe them with standardised metadata. It will be possible to directly attach questionnaires, codebooks or syntax files. Along with the data entry it will be possible to use comment functions in order to record open questions and immediately forward them to the curator. Once researchers have completed their data entry, the data project is transferred to an institutional project pool. In a second step, the institute’s curator accesses this pool, chooses a project and starts content review. During this step the curator checks the data, metadata and the documents in terms of formal criteria, i.e. readability, completeness, data protection regulations and correct description. Wherever necessary, the curator – in agreement with the involved researchers – can amend or correct the documentation. To facilitate the process of standardisation and as an aid to the curator, SowiDataNet provides a checklist to verify the provided data projects. In the future, this checklist should be adaptable to the particular institute’s requirements, and it will be developed further in cooperation with the users following the official project start. Once the institutional review has been completed, the curator transmits the data project in step three to the technical repository operator at GESIS. This is where the final technical examinations are conducted before the project is assigned a persistent identifier (digital object identifier, DOI) by the registration service da|ra, and is published via the SowiDataNet portal. By using a persistent identifier, the dataset and its metadata are made visible on the da|ra and DataCite metadata portals as well as all services which make use of the metadata harvesting possibilities (e.g. the OAI-PMH). Another advantage is the integration with other interdisciplinary and international initiatives like re3data.org and OpenAIRE.
It is recommended that curators inform the researchers at the beginning of the respective projects on how data curating and publication can be prepared even in the course of the project process. It makes sense, for example, for the curator to explain potential embargo deadlines, formal requirements pertaining to data preparation or useful handling of data documentation. This helps to avoid additional workload during the often busy final phase of the project. In the medium term, SowiDataNet will be employed to manage institutional data collection, support data versioning and provide reporting information for internal and external use. In case errors require correction, a new version is created and comes with a new DOI. Corrections can become necessary if errors occur either in the data itself or in significant metadata information. In both cases the curator is supposed to document the error and this will be published with the newly generated DOI of the data project. By following this procedure all changes to a project will be transparent. In creating a service that enables institutional research data curators to flexibly integrate the repository functions in the respective organisational workflow, the project even serves to strengthen the meaning of institutional data management (Droß & Linne, 2016).
Growing demand for and awareness of making scientific data available, a practice that will increase the transparency and validity of research findings, can undoubtedly only be met by solving all kinds of legal, technical, and organisational challenges. While individual researchers may play a certain role, the institutional policies and the availability of technical tools that are adapted appropriately to scholars’ needs are central for achieving a broad practice of data sharing. SowiDataNet will provide the possibilities for archiving, documenting, and disseminating the data from research projects or scientific institutions, thereby adhering to subject-specific best practices and standards. Considerations regarding more restricted access to some scientific datasets due to legal or privacy requirements will also be incorporated. Built by a consortium of organisations that have experience in data archiving, data documentation, data analysis, and data publication, SowiDataNet overcomes limitations of missing expertise or implementation difficulties and enables a broad target group of researchers to archive and share their data. The SowiDataNet repository is adequately suited for primary data as well as for secondary data, numerous data types and additional material like syntax scripts and documentation files. By providing SowiDataNet as a more mature service in the area of research data management especially aimed at scientific institutions, we hope to make a contribution towards an attitude shift regarding data sharing in the social and economic sciences.
The project SowiDataNet is funded by the Leibniz Association.