Skip to main content

Data Privacy & Responsible Data Handling

Many CACoM projects deal with sensitive or clinically derived data.
Respecting data privacy is both a legal requirement and a professional responsibility.
This page summarizes how to handle biomedical and other personal data in accordance with TUM and EU GDPR principles.


Guiding Principle

Treat every dataset as if it contained information about someone you personally know and care about.

This simple rule captures the spirit of responsible data handling: protect confidentiality, minimize exposure, and only process what is truly needed.


All CACoM activities fall under:

  • The EU General Data Protection Regulation (GDPR),
  • TUM's internal data protection guidelines for teaching and research.

You are not required to become a legal expert, but you are expected to follow the course's operational rules for safe and ethical data use.


Data Classification in CACoM

CategoryDescriptionExamplesPublic sharing allowed?
Identifiable dataContains personal identifiers or metadata that could directly identify an individual.Names, hospital IDs, GPS traces, raw medical records.❌ Never
Pseudonymized dataIdentifiers replaced by codes, but re-identification is still possible with auxiliary information.“Patient_001”, timestamped CTG traces.❌ No — internal use only
Anonymized dataAll identifiers removed, and re-identification is intended to be impossible — though this can rarely be guaranteed.Aggregated metrics, derived features, downsampled recordings.⚠️ Only with explicit instructor approval
Synthetic dataArtificially generated by algorithms or simulations and not based on any real individual or measurement.Simulated CTG signals, mock IMU datasets, random noise generators.✅ Yes, freely shareable
tip

If your “synthetic” dataset was generated entirely from code (e.g., statistical sampling, procedural simulation), you may share it publicly without restrictions. If it was derived from real data, even indirectly, treat it as anonymized and request instructor approval before uploading. Approval is quick — but necessary.

caution

Anonymization is much harder than it seems. Many “de-identified” datasets can still be traced back to individuals when combined with external information. When unsure, treat all data as pseudonymized and keep it private.


Data Handling Rules

✅ You must:

  • Store sensitive or pseudonymized data only on approved TUM or course-managed systems (e.g., institutional cloud, encrypted drives, or CACoM Google Drive).
  • Document data sources and access permissions in your README or report.
  • Delete all local copies after submission unless explicitly instructed otherwise.
  • Share sensitive data only internally via the official CACoM Google Drive submission folder.

🚫 You must not:

  • Upload or share any clinical, patient, or proprietary data to public platforms (GitHub, Kaggle, Google Drive links, personal websites, etc.).
  • Email datasets to external parties without written permission from the course staff.
  • Attempt to re-identify individuals from pseudonymized data.
  • Combine datasets in ways that could indirectly reveal identities.

Data in Reproducibility Packages

When preparing your Reproducibility Package:

  • Include synthetic or example data in public repositories for demonstration.
  • Upload real datasets only to the internal CACoM Google Drive.
  • Clearly label what type of data (synthetic, anonymized, pseudonymized) each file represents.
  • Provide metadata and descriptions, not raw identifiers.

Example:

data/
├── synthetic_ctg_sample.csv # OK to publish
├── real_patient_signals.csv # INTERNAL ONLY
└── README_data.md # explains origins and permissions

External Collaborations

Some projects involve clinicians or industry partners.
In these cases:

  • Follow their institutional data policies and confidentiality agreements.
  • Do not redistribute data obtained through such collaborations.
  • Report any data breach, accidental exposure, or uncertainty immediately to the instructors.

Remember: professionalism in handling real-world data reflects directly on TUM's reputation and on yours.


Proprietary or Restricted Components

Some CACoM projects rely on proprietary hardware, software, or datasets — for example, the fetal heartbeat simulator or other tools provided by collaborators or industrial partners.

In such cases:

  • You may not publish or redistribute the full project if it includes or depends on proprietary components.
  • You may, however, share open parts of your work — such as analysis scripts, derived metrics, or simulation examples — provided they do not reveal or replicate proprietary details.
  • Before publishing or uploading anything, check with the instructors or your project supervisor which parts of your project may be shared and which must remain private.
  • When preparing your Reproducibility Package, clearly mark restricted files or modules (e.g., hardware_interface_proprietary/) and document how they fit into your workflow.
note

Projects involving proprietary resources are still fully valid academic work — reproducibility in such cases refers to methodological transparency, not open publication of all materials.


Derived & Aggregated Data

Aggregated or summary-level results are generally safe to share — but always verify that they cannot be traced back to individual participants.

You may share derived, aggregated, or statistical summaries publicly if:

  • Individual participants cannot be identified, and
  • The summaries do not reveal private or proprietary information.

Examples of safe outputs:

  • Group-level averages (e.g., mean heart rate per minute).
  • Statistical model coefficients.
  • Performance metrics (accuracy, RMSE, etc.).

Common Pitfalls

caution
  • Confusing pseudonymized data with anonymized data.
  • Uploading “cleaned” or “trimmed” datasets that still contain traceable timestamps or IDs.
  • Sharing metadata files that reveal sensitive information (e.g., hospital location, device serials).
  • Forgetting that derived features can still leak identity (e.g., rare clinical conditions).

Quick Checklist

  • All datasets classified correctly (identifiable / pseudonymized / anonymized / synthetic).
  • Sensitive data stored only on approved systems.
  • No patient data uploaded to GitHub or public cloud.
  • Real data included only in Google Drive submission.
  • Synthetic examples provided for reproducibility.
  • README includes data source and access explanation.