Privacy Data Security and Confidentiality: What Researchers Get Wrong

Cluster Post 3  |  Module 10: Research Ethics and the IRB Process

From Concept to Submission Series  |  2026

Academic Writing Mastery: The Complete 2026 Guide To Research Papers, Thesis & Dissertation Writing

Back to Module 10 Overview

Privacy, Data Security, and Confidentiality

The module overview covers privacy and data security at a general level. This post goes deeper: the specific practices that create privacy risks researchers do not anticipate, the practical data security measures that are actually adequate (vs. security theatre), India’s DPDPA compliance requirements for researchers, and the de-identification failures that expose participants even in supposedly anonymised datasets.

Privacy, Data Security and Confidentiality

The Gap Between Promised and Actual Confidentiality

Most researchers promise confidentiality in their consent forms. Fewer actually deliver it consistently throughout the research process. The gap between promised and actual confidentiality typically opens in four places:

  • During transcription: researchers who send audio recordings to external transcription services — even professional ones — are sharing identifiable participant data with a third party. If this is not disclosed in the consent form and covered by a confidentiality agreement with the transcription service, it is a confidentiality breach. Either use institutional transcription services under confidentiality agreement, or transcribe yourself, or disclose third-party transcription in the consent form.
  • In research notes and memos: research notes often contain participant names alongside quoted material, demographic details, or observations that could identify participants. If these notes are on personal devices, in unsecured cloud storage, or in email — they are vulnerable. Apply the same security standards to research notes as to the formal dataset.
  • In informal conversations: researchers discussing their work with colleagues, supervisors, or family members sometimes share details about participants that could identify them — a distinctive job title, a specific incident, a recognisable situation. This is a confidentiality breach even in an informal context if the information is identifiable.
  • In publications: the most common confidentiality failure occurs at the publication stage, when de-identified quotes are combined with participant descriptions (role, institution type, years of experience, gender) that together make the participant identifiable to anyone in their professional context. See the de-identification section below.

The De-identification Problem

True anonymisation is harder than most researchers realise. The combination of multiple attributes — even individually non-identifying ones — can identify individuals in small professional communities.

Individually non-identifying attributes: ‘a female judge, senior level, in the Delhi High Court, who has been on the bench for more than fifteen years, specialising in commercial cases.’  Combined: this description may uniquely identify one or two individuals in the actual Delhi High Court. Any colleague in that court would know who this is. The participant is not anonymous.

The k-anonymity test: before publishing a participant description, ask how many people in the relevant community fit all the attributes you are using. If the answer is ‘probably fewer than five,’ the description may be identifying. Generalise: ‘a senior commercial judge in a major Indian High Court’ instead of the specific attributes.

Qualitative data de-identification checklist

  • Replace all names (participant, institution, location, named third parties) with pseudonyms or generalisations.
  • Check all direct quotes for identifying details embedded in the content — names of cases, colleagues, specific incidents that insiders would recognise.
  • Generalise specific role descriptions that might be unique within their institutional context.
  • Check the combination of characteristics (gender + seniority + specialisation + location) for potential uniqueness.
  • For small population studies (all partners at a specific law firm, all principals in a specific district), consider whether individual anonymity is even achievable — and whether participants understood this limitation when they consented.
  • Have a colleague who knows the research context review de-identified data before publication and attempt to re-identify participants.

Practical Data Security: What Is Actually Adequate

The module lists security practices. This section distinguishes between measures that provide meaningful protection and measures that create a false sense of security.

Security measureAssessment
Password on files (single password, no encryption)Minimal protection. Stops casual access but not determined access. Better than nothing; not a substitute for encryption.
File encryption (AES-256 via VeraCrypt, 7-Zip, or institutional tools)Meaningful protection for stored data. Use this for any file containing identifiable participant information.
Institutional cloud storage (university OneDrive, institutional Google Workspace)Generally adequate for most research data, provided institutional two-factor authentication is enabled. Better than personal cloud storage because institutional accounts have security management.
Personal cloud storage (personal Google Drive, Dropbox, iCloud)Inadequate for identifiable participant data without additional encryption. Terms of service may allow platform access to content. Use institutional accounts instead.
Email (unencrypted)Inadequate for transmitting identifiable participant data. Use secure file transfer or encrypted attachments.
WhatsApp or personal messaging for research communicationNot appropriate for sharing participant data. Convenient but not secure for research purposes.
Locked physical cabinet for paper materialsRequired for paper consent forms and any physical research materials containing participant identifiers. Standard practice.
Separate storage for code key (link between pseudonyms and real identities)Essential. The code key is the most sensitive file — if it is compromised, all pseudonymised data is re-identifiable. Store it separately from the main dataset, preferably on a different system.

DPDPA 2023 Compliance for Indian Researchers

The Digital Personal Data Protection Act 2023 creates legal obligations for Indian researchers who collect, process, or store personal data about identifiable individuals. The Act applies to research conducted in India and to data about individuals in India.

Key provisions affecting research practice

  • Consent as the primary legal basis (Section 7): collecting personal data requires free, specific, informed, unconditional, and unambiguous consent. Research participants must consent to data collection specifically for research purposes. The consent obtained through the ethics consent form should also satisfy DPDPA consent requirements if it is properly designed.
  • Purpose limitation: personal data collected for one research purpose may not be used for a different purpose without new consent. If you collect interview data for a thesis study and later want to use the same data for a separate publication with a different focus, this may require participant notification or re-consent depending on the degree of purpose change.
  • Data minimisation: collect only the personal data necessary for the research purpose. This is consistent with good research practice, but DPDPA makes it a legal requirement.
  • Storage limitation: personal data should not be retained beyond the period necessary for the research purpose. Indefinite retention of identifiable participant data is not compliant.
  • Right to withdrawal and erasure: participants have the right to withdraw consent and request erasure of their data. Your consent form should describe how participants can exercise this right and what happens to their data if they do.

Practical compliance steps for researchers

Full DPDPA compliance for research does not require a legal department — it requires thoughtful practice. The key steps: design consent forms that satisfy both ethics committee and DPDPA requirements simultaneously (most well-designed ethics consent forms already do); establish and document a data retention and deletion schedule; have a documented process for responding to participant data requests; store personal data only in India or in jurisdictions with adequate protection frameworks (check with your institution’s data protection officer).

Exemptions: the DPDPA provides some exemptions for research purposes, but these are not blanket — they apply where the research is in the public interest and where anonymisation is technically sufficient. Researchers who claim a research exemption should document their basis for doing so.

Handling Mandatory Reporting Situations

The module correctly identifies mandatory reporting obligations. This section adds the practical guidance that researchers need when they actually face one of these situations.

The most common mandatory reporting situation in Indian social science research is disclosure of child abuse or risk of harm to a child. If a research participant discloses information suggesting a child is being abused or is at serious risk:

  • Stop the interview or data collection activity.
  • Acknowledge the disclosure: ‘Thank you for telling me this. I want to be honest with you about what I need to do next.’
  • Explain the reporting obligation clearly and compassionately: ‘I am required by law to report information about harm to children. I am going to need to contact [relevant authority]. I can help you understand this process.’
  • Do not promise confidentiality for information that you have a legal obligation to report.
  • Contact your IEC and institution for guidance before making the report if circumstances allow — not instead of reporting, but to ensure you are reporting through the right channel.

The key error researchers make: promising unconditional confidentiality in their consent form and conversation, then facing a mandatory reporting situation without having told participants about this limit. The consent form must state clearly, in plain language, that confidentiality has this specific exception. This is not a hypothetical — it happens in research on education, social work, community development, and many other fields.

Legal Research and Writing: Complete Guide for Law Students and Legal Researchers

FAQs

Q: How should researchers store research data securely?

: Secure data storage requires: encrypting all digital files containing personal data (use AES-256 encryption minimum); storing files on password-protected institutional systems rather than personal devices or unencrypted cloud services; restricting access to named research team members only; maintaining physical security for any paper records; and keeping a data access log. For audio recordings: store encrypted on a dedicated research device, not on personal phones. For interview transcripts: remove all identifying information before storing on any shared or cloud system. Describe your storage measures specifically in the ethics application.

Q: What is data minimisation and why is it required?

Data minimisation means collecting only the personal data that is strictly necessary for your research question — no more. It is required under the DPDPA 2023 and is a core principle of research ethics. In practice: do not collect names if anonymous codes serve the same purpose; do not record video if audio is sufficient; do not collect full dates of birth if age categories are adequate; and do not retain identifying information after the analysis stage. Data minimisation reduces privacy risk for participants, simplifies compliance, and focuses data collection on what actually matters for the research.

Q: How do you anonymise qualitative research data?

Anonymise qualitative data by: replacing participant names with pseudonyms or codes (Participant A, P1); removing or generalising geographic identifiers (city name → ‘a large metropolitan city’); removing identifying workplace details; removing specific dates that could identify time-bound events; generalising unusual demographic combinations that could identify individuals; and removing any information participants asked to be kept off the record. After anonymisation, ask a colleague unfamiliar with the participants to read a sample of transcripts and identify whether any participant could be identified from the text. Residual identifiability after this check must be addressed.

Q: Can researchers share research data with other researchers?

Researchers can share data with other researchers subject to: participant consent for data sharing (this must have been included in the original consent process); removal of all personal identifiers (anonymisation to a standard that prevents re-identification); a data sharing agreement specifying how the receiving researcher can use the data; and compliance with DPDPA 2023 requirements for data transfer. Many funders (ICSSR, international bodies) now require data sharing as a condition of funding. If you plan to share data, state this in the original consent form — you cannot obtain consent for data sharing after data collection without contacting participants again.
 

Q: What are the most common data security mistakes researchers make?

The most common mistakes are: storing identified data (recordings with names, transcripts with identifying details) on personal phones or unencrypted cloud drives; using participant names in research notes and working documents; failing to delete data when the retention period ends; emailing unencrypted data files between team members; and sharing anonymised data that is not actually fully anonymised — small unique combinations of characteristics can identify individuals even without names. The test: could a determined person re-identify a participant from the ‘anonymised’ data combined with publicly available information? If yes, further anonymisation is needed.

References

Next: Cluster Post 4 — Ethical Research with Vulnerable Populations: Power, Protection, and Genuine Respect