⚠️ This is document is work in progress and not finalized. Expect changes.
This document describes the details for a concept to share pseudonymized medical data within the German Center for Diabetes Research (in the following referred to as DZD) with data protection in mind
The data should not contain any direct connection to patients nor should it let draw any conclucions / connections to patients. Example datapoints that are not allowed to be contained in a dataset:
The amount of data should be reasonable. At the moment this concept does not cover a large storage solution. A dataset should not be larger as ~100 Gigabyte
These areas should be negotiated by involed parties (including your Data Protection Officer) beforehand
Clarification
This concept does not cover a full end to end encryption. In favor of usability, transport encryption and storage encryption are combined to still achieve a reasonable access security.
Roles
Name | Description |
---|---|
Administrator | The group responsible for the technical operation. concerned with running the required software. |
Owner | Person who uploaded a dataset |
Manager | Person responsible for a certain dataset. A dataset can have multiple managers. The owner is always a dataset manager too |
User | Person able to download a dataset (if having permission) |
Entities
Name | Description |
---|---|
Dataset | A bundled amount of data, which will have at least one responsible manager. it can be a file or directory. It must always be defined and visible who has clearance for a certain dataset |
Events
Name | Description |
---|---|
Clearance | The process of giving a User the permission to download a dataset. Done only by a manager |
Rejection | The opposite process of Clearence. Withdraw an existing Clearence. Done only by a manager or Administrator |
Upload | The process of uploading a new dataset into the system. Done by a User (becoming Owner/Manager of this Dataset due to this process) |
Download | The process of downloading a dataset. Usually done by a User with Clearance |
User Creation | The process creating a new user, to later give Clearance. Done by a Administrator. The User must be related to a real person. For event logging reasons Users will not be deleted, only disabled if necessary |
Storage encryption | The process of permanently encrypt a dataset, that it only can be accessed by Persons with the permission or a secret key. That applies even on a low technical level (e.g. persons having access to the servers operation system, or person having access to the storage devices) |
Transport encryption | The process of temporary encrypting data before moving it from an Owner to the server or from the server to a User. This is needed for uploading and downloading datasets and ensure no third parties (e.g. Networkadministrators, Networkproviders, ...) can read the dataset. In this context it will be always achieved with http + ssl (https) |
Auto Deletion | An automatic background job to delete any uploaded files after 30 Days. The scope for this plattform is to share files and not to store it. Autodeletion will prevent an endless growing demand for storage space and also sidesteps any data protection regulation regarding longtime storage. |
In the following the whole process of sharing a fictional dataset will be descriped:
Name | Description |
---|---|
Person A | owner of the dataset |
Person B | owner of the sharing system/server |
Person C | consumer of the dataset |
Name | Description |
---|---|
Dataset 1A | the exmaple dataset |
Sharing system | the software system realsing the process described here |
This only needs to be done when a person wants to share data the first time
The need for a HMGU account or a manual created account by Person B, could be replaced by an existing federated authentication system like https://www.aai.dfn.de/ in a later stage. Most DZD user could then log-in with their local institute account.
This part of the document describes how to apply the above documented concept to a real world implementation
The requirements and processes depict above could be realised with an Nextcloud inctance given a certain configuration.
Here we describe which features of nextcloud can accomplish the above described concepts and processes.
Roles
Name | Equivalent in Nextcloud |
---|---|
Administrator | The Person/Group running the nextcloud instance |
Owner | Person who uploads the data into nextcloud |
Manager | A manager is the person who uploaded the data into nextcloud or has Nextcloud sharing permissions on the data |
User | A user of a datset is every Nextcloud account having read permissions to a directry or folder containing a dataset |
Entities
Name | Equivalent in Nextcloud |
---|---|
Dataset | A Dataset will be a certain file or a certain directory in Nextcloud |
Events
Name | Equivalent in Nextcloud |
---|---|
Clearance | The process can be mapped to the sharing feature in Nextcloud by giving read access to a certain user. A clearence in nextcloud could even be of limited duration |
Rejection | This can be achieved by removing read access for a certain user |
Upload | Uploading a file to nextcloud |
Download | Downloading a file from Nextcloud |
User Creation | This can be either achieved by giving access to the Nextcloud Instance via a LDAP group for HMGU users or creating a local Nextcloud user in the Nextcloud usermanagement |
Storage encryption | This can be achieved with the server side encryption feature of Nextcloud |
Transport encryption | Tranport encryptin will be achieved with https. The certificats will be provided by https://letsencrypt.org |
Monitoring and Event Logging
Monitoring and Event logging will be achieved with the Nextcloud Activity feature
Some base configuration guidelines within Nextcloud that should be considered:
There is now also a setup how-to at https://git.apps.dzd-ev.org/dzdtools/dzddatasharingservice
https://nextcloud.com/endtoend/ could be also an interesting feature to realise a secure sharing concept.
From the nextcloud documenation. https://docs.nextcloud.com/server/latest/user_manual/files/encrypting_files.html
If your Nextcloud server is not connected to any remote storage services, then it is better to use some other form of encryption such as file-level or whole disk encryption. Because the keys are kept on your Nextcloud server, it is possible for your Nextcloud admin to snoop in your files, and if the server is compromised the intruder may get access to your files. (Read Encryption in Nextcloud to learn more.)
This information was news to me. In this case the Administrator can have access to the files, even if encrypted. This could be to mitigated by having an storage Administrator and a Nextcloud Administrator. Or just declare the Administrator as trusted part of the system.
But this contradicts with some other informations avaiable by nextcloud https://nextcloud.com/blog/encryption-in-nextcloud/
Maybe the above quote is only appliable on older Nextcloud versions where user-encryption was not available yet.
Further research and testing is needed