The Importance of AI in eDiscovery

As AI continues to evolve, the way machine learning capabilities are implemented in the eDiscovery process has evolved with it. With features such as predictive coding, email threading, and duplicate detection, sifting through documents has become much easier. Attorneys can now devote more of their time to meaningful work, and less to tedious tasks.

One such process that has transformed the way AI is used in eDiscovery is clustering. Read on to learn more.

How AI is transforming eDiscovery

What is AI Clustering?

Clustering is a form of unsupervised machine learning that groups “similar” items. It recognizes the topics’ similar characteristics so you can learn about the data set’s composition and take action on all similar documents.

Given this is an unsupervised approach, clustering doesn’t allow you to control the dimensions that define the similarity. You cannot label examples of items in the artificial intelligence cluster to train the system. The system doesn’t have to specify the features that measure the similarity of the items.

Exploring Data Discovery with AI Clustering

AI & Machine Learning

As the volume of electronic data continues to expand every second, the ability to organize and sort through data electronically offers a significant advantage to legal professionals. It provides meaningful insights in a short amount of time and with unmatched scalability. Through AI clustering, legal professionals can use key concepts for organizing key documents. This means they can sort thousands of documents in minutes or even seconds.

The main goal of AI clustering is to ensure that similar unlabeled items are grouped together. By grouping documents on the basis of their content, clustering provides you with a better understanding of the concepts and themes of the dataset. This process involves using an unsupervised machine learning approach for pinpointing conceptually similar documents, without having to build a search manually.

How eDiscovery Concept Clustering Works

AI clustering is the electronic equivalent of gathering and labeling documents and putting them into labeled boxes that correspond with the document labels. After similar documents have been grouped together, they are assigned to the same reviewers. This results in a more efficient review process as related documents are reviewed together..

There are three main functions of AI clustering:

Section Image

Examines the Text in Documents

Section Image

Determines Related Documents


In the case of AI clustering, the organization of documents occurs without the use of query terms. Instead, there are a group of keywords that are used to label the cluster. This way, you have a quick explanation of why the documents have been grouped together. It describes the contents of each artificial intelligence cluster and determines the theme of each cluster.

As a legal professional, you can use machine learning clustering to filter and sort documents. With artificial intelligence advanced analytics, you will be able to prioritize documents and make critical decisions without having to spend hours rummaging through the hundreds or thousands of documents manually. With clustering, you can prioritize documents with relevant themes. You will also learn unexpected themes that come up during the eDiscovery process and require further review. Clustering is capable of scaling millions of records and offering a targeted review, which saves you a lot of time and resources.

Why Clustering Is Critical in eDiscovery

With AI clustering, you have a powerful tool at your fingertips for eDiscovery. It automatically groups similar conceptual content and doesn’t require any involvement on your part. You can use it to quickly identify important documents pertaining to the case.

Section Image

AI clustering automatically eliminates all the irrelevant documents. They will either be removed from the review or set aside, saving you significant time.

No Clutter

Section Image

Since documents with similar content grouped together are assigned to a reviewer, you can with narrow your focus to a certain type of document or a topic.

Improved Output

Section Image

AI clustering groups documents together on the basis of critical subject matter. This ensures that all the relevant documents are identified during the review process.

Quality Control

Section Image

If you are training a system for the technology-assisted review, you can use machine learning clustering to improve the document set.

Improved Dataset

Section Image

With AI clustering, you will be able to easily focus on a particular subject matter regardless of whether the data belongs to several custodians.

Improved Focus

Yes, search can also help you identify certain documents. However, it is nowhere as powerful as clustering. When you are searching through a large volume of data, AI clustering can help facilitate a much more thorough and productive process. In the end, you will have to weed through less irrelevant data.

How to Expand AI Expertise in eDiscovery

There is no denying the cost benefits of using AI in eDiscovery. Even if you simply group the documents in conceptual clusters, you will substantially increase your review speed. If you want even more savings and increased speed, you have to use machine learning to identify relevant, irrelevant, and privileged documents. Sophisticated eDiscovery software such as Casepoint creates patterns in order to get a better understanding of the investigation. This will save you an enormous amount of time.

Once the algorithm has been trained enough on what is considered relevant and irrelevant, it will reach a point where it can set aside the irrelevant documents on its own. The number of hours you save by using AI for the review process will depend on your data set as well as the topics in your investigation.

However, when the data sources are highly relevant on their own, the investigation covers a wide range of topics, or the relevant documents’ density is extremely high, using AI in eDiscovery may not yield the same results. It will still save you time; just not as much as you might have expected.

In some cases, machine learning algorithms are also used for improving the quality of the document review process. To train the algorithm, a sample of review documents is used and then the algorithm classifies previously reviewed documents as relevant, irrelevant, or privileged. After this, there might be some mismatches between the reviewer’s manual coding and the discovery machine learning model. For instance, the algorithm might have coded an algorithm as relevant with a high confidence level, but the reviewer might have coded it as non-relevant or vice versa. To improve quality control, these mismatches are resolved by an experienced reviewer.

How to Leverage AI Clustering in eDiscovery Software

There are a lot of solutions available in the market that use AI to help you with the eDiscovery process. However, not all of these are worth the investment. The depth of expertise for each of these solutions is different. For instance, Casepoint’s AI has been created by people who specialize in machine learning. This expertise makes all the difference.

Imagine you have to search through 10 million emails. In this case, you must be absolutely certain that the algorithm makes the right choice. Even a small percentage of error means that you might end up missing 10,000 emails. Casepoint’s experts have been working on their algorithms, tweaking them, and refining them. Their in-depth understanding of machine learning and how to adapt them makes them a frontrunner in the eDiscovery field.

Redacting Personal Information Using AI

Privacy laws have evolved over the past few decades. When it comes to protecting and redacting personal information, machine learning can be extremely beneficial. Whether the documents contain the client’s trademark secrets or attorney-client privilege, all the confidential information will be identified and redacted.

Data Redaction Process

Different countries have different privacy regulations. For instance, in Switzerland, disclosing personal information such as driver’s licenses, credit card numbers, or passport information to foreign regulators, is prohibited. In such cases, before you send over the documents, machine learning can be used for finding and redacting personal data


It is important to note that AI is not a replacement for humans — it is merely an assistance tool that improves job quality instead of eliminating jobs. It slashes the number of menial, time-consuming tasks, allowing you to focus on value-added services. It makes the review process more efficient and the reviewers more productive. Instead of digging through the documents and seeing the same content multiple times, you will be able to quickly locate the documents with the assistance of AI. It will help you meet your deadlines and stay under budget.