A Guide to Managing High-Risk Information with Data Mapping

Four ways organisations can discover high-risk and dark data they hold and leverage data mapping to understand and mitigate risk. Find out how you can take a proactive approach to minimising the impact of a data breach and protect your organisation and customers.

In March 2023 alone, 41,970,182 peoples’ records were breached, that we know about. Latitude Financial alone spilled 14 million records, including 8 million drivers licenses and 53,000 passport numbers. And they didn’t know their own data or high-risk information: they originally thought the compromised dataset only affected 300,000 people (a fraction of the actual scale). If your network was breached, would you be in a similar situation? Would you know what information you had lost? And if you didn’t know, and couldn’t explain, what repercussions would you face?

It’s not just big financial institutions having these experiences. Recently, universities, health providers, school districts, car makers, phone providers, sports bodies, retailers, and even a gun auction company reported breaches. This last one is hugely concerning, as the data linked owners, and all their contact details and addresses, with the specific weapons they purchased, making them highly vulnerable to targeted burglaries.

Knowing the risk

These numbers came from just 100 reported breaches. We know that around 75% of breaches were not reported. We also know that 87% of small businesses, who were not required to report, have customer data that could be compromised, and nearly half of all breaches happened to organisations with fewer than 1,000 staff. Whatever size the business is or whatever industry it belongs to, it will be at risk.

Let’s consider a case study that really highlights why knowing your data is so important. Optus, Australia’s second largest telco, suffered a breach in September 2022. While this breach could have happened to anyone, in this case given the size of the organisation and the amount of customer data it held, it made this breach a high profile one. Optus had to rally 120 staff to help find out what data was spilled. Their vendors couldn’t help them. They had to build their own software application from scratch, just to try to map and understand what data was in the spill. Optus was strongly criticised by the government, journalists, and on social media for how long this took – and also for having significant volume of data they didn’t need to keep. They earmarked $140M to cover the immediate cost of the breach, but then also faced class action in the billions. While breaches are almost inevitable, if we don’t know what we have and have not taken proactive steps to responsibly dispose of the information, we just can’t manage the risk. When it comes to sensitive information, timely and responsible minimisation of what we no longer need, retaining information that needs to be kept in secure systems, or tracking how it is being managed is the only way to mitigate the impact. More importantly, if we don’t actually know what data we have, we certainly aren’t getting any value from it! So, hoarding this ‘dark data’ has significantly more risk than reward.

In addition to a big volume problem, we also have a velocity problem. Our information stores are constantly growing and changing. So are our legal obligations, and so is the threat environment. How can we manage this huge scale, with all this complexity, when nothing stands still?

Four ways to manage high-risk information by mapping data

High-risk information takes many forms and lurks in many places. It’s not just personal information or classified records that we need to worry about. Many information assets can cause harm if seen by people who shouldn’t see them, and a lot of the time, the people handling those records just aren’t aware of the risk. The first thing governance teams in any business should do is map the Business Impact Level of their information assets. The process is fairly simple.

  • Step 1: make a list of the type of assets you probably have. This will include personal information, legal information, financial records, health records, intellectual property, audit records… broad categories that will usually align with organisational functions. The next step is to consider the impact of a breach of those records, from a confidentiality perspective as well as integrity (if the data is corrupted) and availability (if the data is destroyed or encrypted). You can make your own risk matrix, or use an existing one: most governments have a Business Impact Assessment tool freely available online that helps you consider types of harm, and rank them.
  • Step 2: understand your threat sources. Governments publish cyber threat reports regularly, and these will help you understand the players in the cybersphere. Who might want your data? How motivated are they likely to be? And how capable? If you have any data that a foreign state would want, you are immediately at risk, because they are such sophisticated hackers.
  • Step 3: know your secrecy obligations. There are secrecy provisions sitting quietly in dozens of pieces of legislation, and they will result in civil or criminal penalties (sometimes both) if breached. Certain, sometimes very specific, types of information can be time bombs waiting to go off – you can check with your legal team what legislation you need to abide by, and whether those Acts or Regulations have any secrecy provisions in them.
  • Step 4: know your other obligations for your data stores, specifically, how long you have to retain the information under laws that apply. Data is like uranium. It’s powerful, and valuable, but gets dangerous as its allowed to decay. You need to be destroying data that has risk, but no longer has value, as soon as you legally can.

How does AI play a part?

By now we can see that we have:

  • A lot of data
  • A lot of threats
  • A lot of inherent risk, and
  • A lot of legal obligations

In addition to a big volume problem, we also have a velocity problem. Our information stores are constantly growing and changing. So are our legal obligations, and so is the threat environment. How can we manage this huge scale, with all this complexity, when nothing stands still? The answer is, we can’t – at least not by using traditional approaches. That’s why Optus still had so many records hanging around from people who weren’t even customers any more, and didn’t know where those records were sitting, and couldn’t tell that they’d been taken. Optus had actually pushed back on proposed government privacy reforms in 2020, saying that implementing systems to be able to destroy customer data on request would have ‘significant hurdles’ and ‘significant cost’.

That was before AI.

The role of ethical AI in understanding data

Artificial Intelligence is a catch-all name for a range of technologies that aim to automate some aspect of work that people have traditionally had to do manually. Previously, to know where all the risky and valuable information was, we had to have people manage file structures with a lot of rigor. We had to ask people to use metadata and naming conventions to clearly mark information. This worked ok with paper files, but it all started to go out the window very early in the digital era, as the volume and velocity of data took off. So, for the last 30 years or so, even though technology has gotten better and better, information control has gotten worse and worse. Castlepoint founders recognised this in their roles as auditors around ten years ago, and conceptualised and invented a new kind of explainable AI (XAI) to help solve this problem because not all AI is the same.

It would take 130 years to read one terabyte of Word documents – even longer to try to manually match them to all the applicable rules. AI was (and remains) the only way to do this at scale in hours or days.

In order to know and map data effectively organisations need to be able to read and register everything in the whole environment, and automatically classify it for known risk, rules, and reusability. But organisations run into the issue of the volume of data. For context, it would take 130 years to read one terabyte of Word documents – even longer to try to manually match them to all the applicable rules. AI was (and remains) the only way to do this at scale in hours or days, without needing anyone to assign any metadata or follow a file plan. However, Castlepoint’s XAI goes one step further. It is explainable and transparent, so organisations know why the rules were applied. It is not based on a Large Language Model (LLM) or a black box, algorithmic AI. With Castlepoint’s XAI, organisations know what data they have, what risk it has, and its value. It clearly guides on what rules apply to the information, where it is, who is doing what to it. This helps organisations discover, from a single interface, what needs to be locked down, and what can be disposed of (across the whole enterprise).

When considering using AI to help you know your own information, think about Optus. Do you have 120 staff and $140M spare to start the process of knowing your own data after you have already been breached? If not, it pays to start the process of understanding what you have, where it is, and who is doing what to it. What risk and value it has, and what rules apply (and whether they are being met). These days, we need to start assuming being breached is not an ‘if’ question, it’s a ‘when’. We need to minimise the impact of any breach by protecting, and destroying, high-risk information in accordance with regulations, so that our exposure to those bad actors is minimised.

Transparent, explainable AI can help to do that, and keep you defensible in the event of a spill.