Eight million police files have been deleted in Dallas – was it an accident? Negligence? Something more sinister?
And what does it mean for the 175,000 cases that have been impacted?
Officials are scrambling to find out. A September 30 report from the City of Dallas found that:
So how does something like this happen?
The report found that the IT team were attempting to move content from the cloud onto on-premises servers, due to cost constraints. These files had a lot of images and video, and were taking up a lot of space, which gets expensive in the cloud (the City had originally estimated that Azure storage would cost them about $60k a year – it ended up being $1.8M, and was only handling about 7% of their IT processes).
The data in scope of the migration back to the on-prem network was considered ‘archival’, not operational. However, Police have identified over 1,000 ‘priority’ cases that they are trying very hard to find copies of the data for, so the data certainly did have continuing value. But the business case was driven completely by cost. The Report found that the leadership team did not pause to consider the risks, or review data migration best practices. They did not have a recovery plan in the event of something going wrong.
And something did go wrong. The IT team did not follow the vendor’s instructions and SOPs for migrating data, and ignored multiple warnings in the interface that data would be lost.
The City is now searching every system they can for any old copies of parts of the data, and reuploading it (back into that expensive Azure cloud).
There are three main takeaways from this that I think apply to everyone managing data:
1: Archives are not records management systems.
2: Migration is not a copy/paste exercise, it’s always a complex project.
3: Organisations need the ability to discover their data, across their whole environment.
The City is now engaging contractors to help search across the network for any possible copies of any of the important records. They are crafting searches in Office 365 (although it’s taking them an hour to create each string). The good news is, there will almost certainly be duplicates of much of this data, all over the network and on devices. The bad news is, manual searching won’t find it all, or affordably (and this whole problem started because cost was a constraint).
With AI technology, we can rapidly search whole networks in seconds. It’s simple to create the taxonomies of terms, and run them instantaneously, across every system and any data format. Last month, we were contacted by an organisation experiencing a data breach on a Friday night. By Saturday afternoon, we had implemented in their network, run across all of their compromised data, and found everything with material risk (including for example IP addresses, passwords, PII, credit card and passport numbers, and information related to topics for which there are secrecy provisions under law). Importantly, we had also coded and applied their Records Authority, so we could tell what of the spilled data should have already been disposed of (which helps determine their scope of liability).
These kinds of data confidentiality, integrity, and availability breaches happen every day, all over the world. They’re not always avoidable, but they can be mitigated. And the only way to effectively mitigate them is to have the tools on hand to know what you have and where it is, all the time.