Research data management needs to consider individual aspects

Elena Ibello

”Research results can only be understood transparently if you are also familiar with the data on which they are based,” says Simon van Rekum from the ZHAW University Library. He is heading up the “DSembedded” project, which offers support to researchers in the area of research data management. The aim of the Swiss National Strategy for Open Research Data (ORD), which has been in force in Switzerland since July 2021, is to make the handling of research data as open as possible. The ZHAW is promoting this practice with various projects. Sharing research data is not only about making research results comprehensible. “We want to make it possible for the data to be reused,” says van Rekum, whose work focuses on open science and research data management. “Reuse” here refers to instances in which somebody is able to answer a scientific question using data that had already been collected in a previous research project. In other words, researchers should be able to draw on existing data.

To make this possible, we need the corresponding infrastructure, on the one hand, and the endeavour of those involved to process their data in such a way as to render it understandable and useful for others, on the other. According to van Rekum, the latter, in particular, often entails additional work for researchers. This is where the “DSembedded” project comes into play. It sees researchers support other researchers within their Schools in systematically organising, securing and documenting scientific data – from the collection stage right through to archiving. “By providing this service, we want to help people weigh up whether and how data should be shared, how this can be organised at the outset of a project so that it can be tracked further down the line and how questions about suitable tools for sharing data can be answered,” says van Rekum. This also includes the right place for publication. There are various public cloud databases, referred to as repositories, for this purpose.

“The question as to whether data should be disclosed in full cannot generally be answered with a simple yes or no.”

Simon van Rekum, “DSembedded” project manager

Taking data sharing into consideration

Roman Grüter and Nils Ratnaweera also thought about all of these things when they launched their research project entitled “Impact of climate change on pepper cultivation.” The project looked at changes in the suitability of areas around the world for the cultivation of black pepper against the backdrop of current and future climatic conditions. Ratnaweera and Grüter are not only making the results of their project publicly accessible, but everything else too – from the raw data they used to write the scripts they developed to calculate the scenarios for pepper cultivation right through to some of the correspondence they sent between one another. The two of them opted to adopt this completely open approach right at the start of the research project. “We used openly accessible data as a starting point as well and were grateful to have this opportunity,” says Ratnaweera.

The tool used by the project team is an open source program. The tool recorded the coding path: Every step and detail of the development of the code that Ratnaweera wrote when programming the tool for the calculation of the various cultivation scenarios was captured. The same is also true for the communication sent between Grüter and Ratnaweera using the tool's chat function, making it possible to follow the entire development path from outside. Publishing the chat history, in particular, entailed a little bit of courage. “There was a moment when I actually realised what this means. You make yourself quite vulnerable,” says Grüter. It is even possible to read up when the two researchers disagreed or pointed out where possible errors may exist. Nevertheless, it was always clear to both of them that this was the right path to take. “Ultimately, there is no data in this project that would be deemed sensitive to publish. So right from the off, everything pointed towards sharing everything,” says Grüter.

“For a long time, companies have heard that data is the gold of the 21st century. We cannot expect them to now suddenly hand it over.”

Nima Riahi, lecturer at the Institute of Data Analysis and Process Design

Not all data can be shared

Simon van Rekum is of the same view. He not only advises that projects share their data wherever possible, but also that they use open source programs that make it as easy as possible to reuse the data. However, not everything is always possible. And the sharing of data is not so harmless in all cases. “The question as to whether research data should be disclosed in full cannot generally be answered with a simple yes or no,” he says. You have to ask yourself a number of questions: for example, is it a project supported by the Swiss National Science Foundation (SNSF)? If so, the sharing of data is mandatory, as the SNSF has been pursuing an open research data strategy since 2017. In addition, researchers should ask themselves at an early stage of their projects what added value the research data could have for third parties, how great this benefit would be in relation to the effort involved in sharing the data and, of course, whether the data is sensitive or otherwise worthy of protection. “Personal data has to be anonymised and cleaned up. In some instances, however, so much information has to be removed from a data set that it is no longer of much use in terms of subsequent utilisation,” says van Rekum. A similar situation can be observed when industry partners are involved in a research project.

The School of Engineering, for example, is familiar with such challenges. There is a conflict of objectives: You want to share as much knowledge and data as possible in the interests of science, at the same time you have to meet your obligations towards the project clients, most of whom come from industry. Many projects are undertaken in cooperation with companies that have a tangible economic interest in this research. Instances in which these companies are willing to publish data tend to be the exception. Nima Riahi and Reto Bürgin from the School of Engineering are involved in the “DSembedded” project and also know from other researchers how many hurdles there are when it comes to sharing data. Sometimes, it is purely down to data protection. Often, however, there are also political reasons or high costs that nobody is prepared to finance. What's more, it is difficult for companies to assess the risks involved in sharing data. What if somebody develops insights by using your data that go beyond the knowledge you have yourself? Or what if something is found in an extensive data set that could be interpreted in a negative light? “Things get difficult as soon as industry is involved,” says Bürgin. Unlike within the scientific community, it is not usual in the business world for different parties to cooperate in order to make progress. Instead, their goal is to gain a competitive advantage. “For a long time, companies have heard that data is the gold of the 21st century. We cannot expect them to now suddenly hand it over,” says Riahi.

Creating incentives

Nevertheless, the two researchers do believe there are opportunities to promote the sharing of data even more. For example, through the use of artificially generated, so-called synthetic data, which is similar to the original data in terms of its structure and properties, but without the parts deemed worthy of protection. And who knows, perhaps it could even be beneficial for a company’s image in future if it were to share its knowledge and data more openly. Incentives also need to be considered. Here too, van Rekum believes there is potential for the scientific community as well as for individual researchers themselves. He welcomes the current endeavours with regard to the assessment of research. Until now, it has been important for researchers to publish as many articles as possible, and to do so in acclaimed scientific journals. At a research policy level, efforts are now being made to also acknowledge published data sets as important research output so that this has a positive impact on the careers of researchers.

According to van Rekum, the ZHAW is in a strong position when it comes to open research data and is actively promoting both its development and implementation. “There are still very big differences within the ZHAW,” says van Rekum. In certain disciplines and teams, the open sharing of data has already become fully established, while it has yet to take root in others, he continues. “Overall, there are still many questions in terms of practice that are occupying researchers. We can do more here and are working on doing so.”

More information

The final report of the research project “Impact of climate change on black pepper cultivation – a global suitability analysis” is available here.

(Photo: Conradin Frei)

As open as possible, as protected as necessary

Taking data sharing into consideration

Not all data can be shared

Creating incentives

More information

Related Articles

Open access to education and research

“Both science and society benefit from open science”

0 Comments