Dark Data, Real Liability: What General Counsel Need to Know About Unmanaged Information

Every organization generates vast amounts of data – much of it forgotten, unmanaged, and invisible. Think old email archives, legacy backup servers, migrated-but-not-deleted file shares, or unauthorized messaging apps on employee devices. This is dark data: information that accumulates silently, creating significant legal, financial, and security risk. Research suggests approximately 55% of an organization’s data is dark, accounting for ~60 zettabytes of global storage in 2025, a figure expected to grow 20% by 2027. More alarming still, nearly 70% of organizations don’t know what sensitive data they store or where it resides.[1] [2]

Dark data accumulates for many reasons: poor information governance, staff turnover, siloed systems, or a “save everything” culture. Whatever the cause, the scale and lack of visibility create serious risks, particularly in litigation and cyber incidents.

Risks associated with dark data

  1. Hidden evidence – missing the “smoking gun”

Dark data may contain vital evidence: an email chain proving or contradicting a claim, document drafts showing how a decision was reached, or unmonitored communications between key witnesses. Overlooking it can mean being blindsided by evidence you didn’t know existed, undermining your litigation process and case strategy.

  1. Increased eDiscovery costs

Lack of visibility drives over-collection, especially under deadline pressure. This inflates downstream costs for processing, review, and hosting; much of it on redundant, irrelevant data. The darker your data, the more expensive eDiscovery becomes.

  1. Harder, slower, less effective eDiscovery

Without knowing what data exists or where, teams spend time sifting through disorganized, duplicative, and obsolete data. The result: more human review, longer timelines, and unnecessary extension requests; compounding the cost and complexity of an already demanding process.

  1. Increased cyber security risks

Cybercriminals actively seek out forgotten, unmonitored corners of a network: unpatched legacy servers, unencrypted environments, or historic file shares sitting outside the active security perimeter. These systems often contain high-value targets: financial records, customer data, and plain-text credentials that enable deeper network infiltration.

A recent example is the 2025 Oracle Health data breach where hackers compromised obsolete servers that were unpatched and internet facing and contained medical details and personal information for over 14,000 individuals.[1] Attackers know these dark corners exist, often better than the organizations themselves.

  1. Regulatory, financial and reputational damage

Regulatory frameworks such as GDPR require organizations to retain personal data only as long as necessary — violations can be costly. In 2020, Telecom Italia was fined €27.8m for retaining customer data beyond both legal requirements and its own internal policy.[2]

Reputational damage can be equally severe. The Post Office Horizon IT Inquiry is a stark example: poor data mapping, multiple legacy systems, and failure to identify potentially unique emails led to profound credibility loss, public distrust, and suspicion that evidence had been withheld.[3]

Practical steps to reduce and manage dark data

The risks are real, but manageable. Here are four practical approaches to regain control:

  1. Comprehensive data inventory

Use automated tools to map file shares, cloud platforms, and legacy systems. Technology alone is not enough; understanding how departments generate, store, and access data is equally critical. This combination deepens your data landscape awareness and supports more accurate scoping during litigation and incident response.

  1. Classify and tag data based on risk

Once mapped, data should be categorized by risk. Sensitive information (customer data, financial records, medical information) must be handled per regulatory requirements. Tools like Microsoft Purview can automatically apply classification labels based on keywords, sender, or document location, reducing accidental exposure and supporting defensible, compliant decision-making.

  1. Implement strict data retention policies

Organizations should implement retention policies aligned with legal and regulatory requirements (GDPR, HIPAA), for example deleting chat messages after 30 days or auto-purging file share documents after 90 days. A key component is defensible deletion of ROT – Redundant, Obsolete, and Trivial – data with no legal, regulatory, or business value. Eliminating ROT reduces both your attack surface and your litigation burden.

  1. Review and repeat

Data governance is not a one-time exercise. As systems, staff, and regulations evolve, dark data will creep back in without periodic review. Organizations should audit policies and data inventories at least annually, treating dark data management as an ongoing program, not a one-off project.

Conclusion

Dark data is out of sight, but its impact is very real. While complete eradication is unlikely, good governance – inventory, classification, retention, and regular review – can reduce exposure, enable faster and more defensible decisions, and strengthen both eDiscovery and cybersecurity resilience.


[1] 80 Hospitals May Have Been Affected by the Oracle Health Data Breach

[2] MARKETING: THE ITALIAN SA FINES TIM EUR 27.8 MILLION | European Data Protection Board

[3] How legal disclosure failures disrupted the Post Office Horizon inquiry | Computer Weekly


[1] The State of Dark Data

[2] Dark Data Statistics For 2025–2026

Over de auteur(s)

James Grant | KLDiscovery