Digital data management

Digital data management

Digital transformation is increasingly entering classical engineering disciplines. Especially in materials science and engineering, groundbreaking innovations due to modern methods of data processing emerge. That is why the research group Digital Engineering addresses concepts to sustainably digitalize the management of data originating from the various research areas at EMI.

© FE-Modell: NHTSA/DOT
The research group Digital Engineering addresses solutions for the digital management of research data and cross-domain data spaces.

FAIR data: findable – accessible – interoperable – reusable data

In recent years, the pace how heterogeneous and unstructured data are generated worldwide has dramatically accelerated. In addition to their inaccessibility, the bigger part of data assets is not reusable because they are poorly annotated and lack rich metadata. Especially in the light of increasingly data-driven research approaches, the large quantity of information contained in data assets is of crucial value to future R&D activities. The greater goal of the scheme is therefore to address the entire life cycle of research data and exploit it by the consistent implementation of the FAIR guiding principles. To this aim, metadata play a superior role as they form the base to describe and exploit the meaning of data. Semantic technologies allow for the integration of metadata as context information in order to harness the knowledge from the underlying data.

Interoperability

The comprehensive reuse of data is also inhibited because institutions accumulate heterogeneous data assets in so-called data silos – where they partly lie waste. This is counteracted by semantic data enrichment, which renders transdisciplinary and cross-domain interoperability possible. Due to their high expressiveness, ontologies as knowledge organization systems permit to preserve and infer domain knowledge. They formally represent the explicit description of the shared concepts within a domain, their logical relations among them as well as the governing rules and restrictions. This entails a human- and machine-interpretable specification of the underlying data structures and contents and promotes interoperability across heterogeneous data, models and systems. The outcome of this is a productivity gain potentially crucial to success as misconceptions, costly mistakes and extra work can be prevented.

Research activities are not limited to the goal of maximizing the internal value creation from research data present at institutions like EMI: In cooperation with external partners, also cross-company data spaces, such as the International Data Spaces, are harnessed for materials science and engineering. This way, the reuse, the versatile exploitation and combination of data provide the conditions for inferring new conclusions and producing research results of higher value. Furthermore, modern data processing methods from materials informatics can be employed – for example, by the automated analysis of large bulks of data by means of intelligent algorithms. If nothing else, interoperable data and models create the basis for innovative digital workflows, for example, in model-based systems engineering.

© Fraunhofer EMI
Simple semantic network consisting of two resource description framework (RDF) graphs.
© Fraunhofer EMI
Digital data management sustainably exploits the entire life cycle of research data. (Source: f1000research.com/articles/6-1618/v2)

Reproducibility

The exact knowledge on the data provenance is essential to precisely address and trace back completely the current as well as the previous processing status of resources along a process chain.

To this end, data and their evolution are described by semantically interoperable metadata, so-called RDF graphs (resource description framework). If these build on ontologies, their context becomes transparent to its full extent and is rendered interpretable. The single entities are logically linked to form a semantic network – resulting in a knowledge graph, and thus, the transition from data to information to knowledge is accomplished. Based on these principles, the entire life cycle of resources is depicted. This serves the objective of universal transparency concerning the current or past status of materials, technical systems and data accumulating from each process step and improves the overall data quality.