Along with most everyone rushing to update their privacy policies and reaffirm your consent, the “other” copies of your data have begun to creep into the discussion, such as a piece that appeared on The Register. Wait? What other copies?!? You know, all those secondary copies of data that you keep:
- That copies sitting in your array/cloud snapshot(s)
- Those copies that you replicated to “site B” for disaster recovery
- The sandbox copies you gave to the developers for them to play with and test against
- That copy you put on your laptop because you were going to work on it outside of the office
- Those copies you made of the data for your regular hourly/daily/weekly/yearly backup
- The copies you have sitting on tapes in a vault because you haven’t defined anything more than “keep it all forever” in your retention policy
You’ve never bothered tracking all of these backup and redundant copies before because the risk was low, and storage cheap, but times have changed. Or have they?
The GDPR police
What happens when the General Data Protection Regulation (GDPR) police come knocking on your door? Well, there aren’t really any actual GDPR police, but there are regulators with audit powers and the ability to fine you firmly wedged in their back pockets. So when someone’s asked that you forget them, the real challenge begins: Do you know where that person’s data is? It could be sitting not just in that one “primary” location happily flipping bits in the data center, but also in any and every one of those copies. What now? Can you “see” all these copies? Can you find the specific data in the copies? Can you delete those specific items without compromising your overall recovery capabilities or other compliance/regulatory mandates? What happens if you DO manage to delete the data but the forgotten data gets “remembered” by a restore or other recovery event?
The Information Commissioner’s Office (a member of the European Data Protection Board, EDPB), has indicated that if data can be shown to be beyond normal use (as in a backup), then organizations should consider that removing data from backups is disproportionate to an erasure request. Of course, the organization must have a documented process, with safeguards to ensure this is accomplished and the data in question is not recovered for active processing again.
This does, however, lead to other considerations:
- If you have to “re-forget” personal data, it could lengthen your recovery SLAs
- Backup software that integrates with service desk software will help you to manage and record these actions
- Process forget requests quickly: if your backup retention time plus the time it takes to forget is less than the time you need to operate the whole “forget” process, your risk is vastly reduced
- Automate the expiration and cleanup of dev and test data, but preferably, anonymize it
- Don’t just focus on applications – unstructured data and laptops probably account for 70-80 percent of your data and will still contain large amounts of personal data/PII
- Archive more – shorter backup retention, combined with content-driven archiving and content indexing will mean you can deal with forget requests much more effectively
GDPR means backup is now a backup for another reason
Commvault software can actually delete backup data if required, even single files on