⚠️ This document is a whitepaper for a now existing service at the DZD. A technical implementation is described here https://git.apps.dzd-ev.org/dzdtools/dzddatasharingservice now.
With a DZD Account it is also possible to use our instance of this at share.apps.dzd-ev.org
This document describes the details for a concept to share pseudonymized medical data within the German Center for Diabetes Research (DZD) with data protection in mind.
The data should not contain any direct connection to patients nor should it allow drawing any conclusions/connections to patients. Example data points that are not allowed to be contained in a dataset:
The amount of data should be reasonable. At the moment this concept does not cover a large storage solution. A dataset should not be larger than ~100 Gigabytes.
These areas should be negotiated by involved parties (including your Data Protection Officer) beforehand.
Clarification
This concept does not cover full end-to-end encryption. In favor of usability, transport encryption and storage encryption are combined to still achieve reasonable access security.
Roles
Name | Description |
---|---|
Administrator | The group responsible for the technical operation. Concerned with running the required software. |
Owner | Person who uploaded a dataset |
Manager | Person responsible for a certain dataset. A dataset can have multiple managers. The owner is always a dataset manager too |
User | Person able to download a dataset (if having permission) |
Entities
Name | Description |
---|---|
Dataset | A bundled amount of data, which will have at least one responsible manager. It can be a file or directory. It must always be defined and visible who has clearance for a certain dataset |
Events
Name | Description |
---|---|
Clearance | The process of giving a User the permission to download a dataset. Done only by a manager |
Rejection | The opposite process of Clearance. Withdrawing an existing Clearance. Done only by a manager or Administrator |
Upload | The process of uploading a new dataset into the system. Done by a User (becoming Owner/Manager of this Dataset due to this process) |
Download | The process of downloading a dataset. Usually done by a User with Clearance |
User Creation | The process of creating a new user, to later give Clearance. Done by an Administrator. The User must be related to a real person. For event logging reasons, Users will not be deleted, only disabled if necessary |
Storage encryption | The process of permanently encrypting a dataset, so that it can only be accessed by persons with the permission or a secret key. This applies even on a low technical level (e.g. persons having access to the server's operating system, or persons having access to the storage devices) |
Transport encryption | The process of temporarily encrypting data before moving it from an Owner to the server or from the server to a User. This is needed for uploading and downloading datasets and ensures no third parties (e.g. Network administrators, Network providers, etc.) can read the dataset. In this context it will always be achieved with HTTP + SSL (HTTPS) |
Auto Deletion | An automatic background job to delete any uploaded files after 30 days. The scope for this platform is to share files and not to store them. Auto-deletion will prevent an endless growing demand for storage space and also sidesteps any data protection regulation regarding long-term storage |
In the following, the whole process of sharing a fictional dataset will be described:
Name | Description |
---|---|
Person A | Owner of the dataset |
Person B | Owner of the sharing system/server |
Person C | Consumer of the dataset |
Name | Description |
---|---|
Dataset 1A | The example dataset |
Sharing system | The software system realizing the process described here |
This only needs to be done when a person wants to share data the first time:
The need for an HMGU account or a manually created account by Person B could be replaced by an existing federated authentication system like https://www.aai.dfn.de/ in a later stage. Most DZD users could then log in with their local institute account.
This part of the document describes how to apply the above documented concept to a real-world implementation.
The requirements and processes depicted above could be realized with a Nextcloud instance given a certain configuration.
Here we describe which features of Nextcloud can accomplish the above described concepts and processes.
Roles
Name | Equivalent in Nextcloud |
---|---|
Administrator | The Person/Group running the Nextcloud instance |
Owner | Person who uploads the data into Nextcloud |
Manager | A manager is the person who uploaded the data into Nextcloud or has Nextcloud sharing permissions on the data |
User | A user of a dataset is every Nextcloud account having read permissions to a directory or folder containing a dataset |
Entities
Name | Equivalent in Nextcloud |
---|---|
Dataset | A Dataset will be a certain file or a certain directory in Nextcloud |
Events
Name | Equivalent in Nextcloud |
---|---|
Clearance | The process can be mapped to the sharing feature in Nextcloud by giving read access to a certain user. A clearance in Nextcloud could even be of limited duration |
Rejection | This can be achieved by removing read access for a certain user |
Upload | Uploading a file to Nextcloud |
Download | Downloading a file from Nextcloud |
User Creation | This can be either achieved by giving access to the Nextcloud Instance via an LDAP group for HMGU users or creating a local Nextcloud user in the Nextcloud user management |
Storage encryption | This can be achieved with the server side encryption feature of Nextcloud |
Transport encryption | Transport encryption will be achieved with HTTPS. The certificates will be provided by https://letsencrypt.org |
Monitoring and Event Logging
Monitoring and Event logging will be achieved with the Nextcloud Activity feature.
Some base configuration guidelines within Nextcloud that should be considered:
Nextcloud offers both server-side encryption and end-to-end encryption (E2EE), each serving different purposes and security models:
The current server-side encryption approach enables administrators to set a system-wide recovery key for encrypted files, ensuring that even when users lose their password, files can always be decrypted. This means:
Administrator Access:
Use Cases:
E2EE files are only accessible on mobile apps or desktop clients and not on the server. E2EE files are inaccessible by design from the Nextcloud Web UI to minimize needing to trust the server.
Key Benefits for DZD:
Limitations:
Concerns about administrator access to server-side encrypted files is valid. Nextcloud's default server-side encryption does allow administrators to access encrypted files through recovery keys.
The original concern about administrator access is legitimate. For the DZD data sharing platform, server-side encryption provides adequate protection for most use cases when combined with proper administrative controls, role separation, and transparent documentation of access capabilities. E2EE should be considered for future phases when handling the most sensitive research data or when zero-knowledge guarantees are specifically required by data protection agreements.