Recently we were alerted to news of a data breach affecting a large local bank group. It was highly published and the respective instution promptly denied any data was leaked and their cybersecurity is still tight (you can't expect any other answer, unless if the response starts with "We take your security very seriously!" then you know something has hit the fan).
What is a data leak?
Data leak happens when threat actors post publicly about availability of private/confidential belonging to an organization in a website site or a dark site. This note often comes with a ransom payment request, a victim notification or just a public post that the organization has been hacked. The most common curency of data leak is customer data, but we do see corporate data as well in the mix. Some P&C stuff, patent pending research, financial numbers are just some past examples of data leak.
The motivation for a data leak varies from different threat actors. We used to have a category for casual/teenage hackers who work during school holidays, but that category doesn't seem to exist anymore (I guess with social media/gaming/networking platforms people might be more organised now)
A fictitious case study
To better understand data leaks, I have come up with a simple and FICTITIOUS csae study, mirroring closely to almost real Malaysian information. The data you see below is generated using GenAi purely and should not have any links to living person.
So, we have Syarikat Hang Zoro Sdn Bhd (SHZSB) (also fictitious) who has the following customer database. Lets assumne this is the complete database.

I'll repeat this - FICTITIOUS DATA generated by GenAI. If this resembles someone living with the exact data, please inform me and i will remove it immediately!
Back to the show. A threat actor called UNOT365 has claimed to have hacked SHZSB and threatens to release all their customer data into public if the ransom of 10BTC is not paid.
SHZSB Head of Cybersecurity decides to call it bluff and challenges to release proof of the data obtained. UNOT365 responds by releasing the following dataset.
` Daniel Tan, 900115-05-7319
Jason Wong, 910817-04-6611 `
Armed with this new information SHZSB Head of Cybersecurity is now confirmed that the data does belong to SHZSB. But herein lies a problem.
The first question any Cybersecurity Head gets asked during a confirmed data leak is that ...
When did the breach happen???
All systems seem fine, IPS/Firewall shows normal activities, no databse monitoring systems so visibility on database activity is limited. Web log shows almost normal activities.
We're left with the only clue, which is the leak data that is shared. From the details shared, we have 2 registration date, Daniel Tan on 12/12/2022 and Jason Wong on 27/04/2023. From logical deduction, we can identify that the leak must have happened AFTER 27/04/2023. Not much to work on.
If we do take 27/4/2023, then the next logical assumption is that the company has lost about 4 records only. If the leak happened yesterday, then this figure is completely inaccurate. In order to know how extensive is the leak, one has to have the full leak data.
Considerarions around data leak
Timing plays a crucial part on the defender when it comes to data leaks. Often, threat actors don't publicize leaks right after the hack happens. They take time, sort the data, identify which is the treasure trove, look at means of monetizing data (e.g. credit card information gets used and sold immediately for monetization).
For an organization that has previously experienced data leaks, this becomes complex. Data leaks often rehash existing leaks and re-release it to public. Hence, the assumption of a data leak being fresh is always doubtful. Also, smarter threat actors repackage existing leak data claiming it belongs to another organization (i.e. Nuemera millions of data being reused to blame an existing organization for data leak).
Threat actors aren't dumb. Sometimes the sample data is tainted so that they'd see if the company is serious about paying ransom and getting the data back.
This leaves very little wriggle room for corporates to confirm data leaks without having access to full data. Paying ransom is illegal, I note reference to AML/CTF laws which can be invoked if someone does pay ransom. However, paying for data recovery services is perfectly legal. There are legal brokers (from my past experience mostly in SG) who does data recovery through negotiation and proxy work to obtain "confirmation" of data being lost (I think you can figure out what I am trying to say). This endavour takes time and not something oergaizations can have a timeline to resolve.
Some thoughts
I'm wary of organizations who take unrealistic amount of time to disclaim or disprove a data leak. As you have seen from the simple example above, thats just 10 rows of data. In reality, its much more than that. Organizations often poorly plan response and management of such incidents because everyone's on edge, and it's nowhere close to the annual cyberdrill they had in a conference room. Because, you know, the cyberdrill is a mandatory regulatory exercise and operations need to go out with minimal or no disruption.
Sorry but attackers don't wait for a convenient time or even raise tickers. You just have to deal with the hand given to you at that point of time.
Author: ORCID ID - Suresh Ramasamy: 0000-0003-4562-037X
This article is mirrored in Linkedin at https://www.linkedin.com/pulse/data-leaks-investigations-ramasamy-cissp-cism-gcti-gnfa-gcda-cipm-ldync