Collection: How it Differs From Preservation

Of all the eDiscovery stages, data collection is widely considered the most complex and technically rigorous. Since data collection is a complex process that can lead to a lot of confusion. In this article, we hope to clarify some of this.

Even experienced professionals have a hard time differentiating between the particulars of data collections and preservation. If you are using data collection as a way to preserve information to meet the requirements of the court, you will end up wasting a lot of resources. Think of preservation as a way to ensure that relevant information isn’t deleted. The court expects you to take the necessary steps for preservation but they do not prescribe a particular preservation method.

Data collection is the first step toward producing relevant documents. Not all collected documents will be produced, but every document does enter into the review process, which dictates the production process.


Types of ESI That Must Be Collected

When it comes to eDiscovery, almost every form of electronic data is required. While it is important to preserve different forms of ESI, you don’t have to collect it all. Let’s take a look at common categories of ESI that are required to be collected:

Section Image

This data is the type that you interact with regularly. These can be traditional files or emails stored on a network or a local hard drive. It is easy to access and collect this form of ESI. For data from messaging apps such as Slack, you will need a connector to simplify collecting Slack data for eDiscovery.


Section Image

This type of data can range from social media accounts to SaaS applications. In the past decade, the volume of cloud data has significantly increased. Every cloud provider has its own policies for accessing data. With an advanced eDiscovery solution, such as Casepoint, you can access this data easily.


Section Image

In order to collect data from mobile phones, you will need experts and sophisticated tools. You must be prepared to collect ESI from instant messaging, text messages, call logs, geolocation data, and more.


Section Image

Offline data refers to data that has been stored or archived and isn’t active anymore. Collecting it is fairly easy as long as you know where it is physically stored.


Section Image

Backups and disaster recovery systems have been designed to ensure that data can be restored. These are compressed files that can’t be easily searched or accessed. So, collecting them presents many hurdles.


Section Image

Hidden files refer to fragmented or previously deleted files that are not readily visible and exist on multiple systems. Accessing these files is difficult and recovering them requires specialized solutions.


Difference Between Forensic Image and Logical Copy

When you collect data from a hard drive, in most cases, you will get the drive’s forensic image. It is also known as bit stream copy or bit by bit. A forensic image is a copy of a drive that includes certain portions of the drive not allocated to the active files. Consider it the drive’s exact duplicate. You will see exactly what you would expect to see while browsing the file listing. You will also get data from forensic deleted files. Forensic imaging requires special tools.

A logical copy is a copy of the directories’ content on a disk. It doesn’t contain either information that would be captured by a forensic image or previously deleted data. Logical copies are also less technically intensive and can be executed by anyone with the right software to collect data and a little training.

For eDiscovery, a logical copy meets the expectations of the court in most cases. Forensic imaging is typically required when you need previously deleted files, or there is suspicion of data tampering.

Different Data Collection Methodologies

Casepoint's Secure, Defensible Data Collection

Before you decide on which data collection methodology to use, you need to first consider a few factors:

  • The volume of data involved
  • The number of data sources and their accessibility
  • Specialized tools or skills needed for the collection
  • Involvement of sensitive or encrypted data
  • Available internal IT resources 
  • Time constraints
  • Technologies available for data collection
  • Civil or criminal case

Using these factors, you can determine the right data collection strategy. Here are some of the most common types of collection strategies you can use:

Section Image

In this method, the custodians copy the relevant files to a portable storage device or a shared device. This isn’t a method recommended by experts, as not every employee will be technically savvy enough to do so. They might end up overlooking relevant documents and make mistakes. The court might even question this method. However, it can still be a useful option for small matters that involve low data volumes. If you opt for this method, it is recommended that you request the judge and opposing party’s signoff.

Employee Self-Collection

Section Image

This is the most common method for data collection. Members of your IT team will work together with the legal department to perform the actual data collection process. Even though this makes sense on paper, there are certain downsides to it. If your organization has limited IT resources, this can be a time-consuming process. Also, your IT team might associate data collection with forensic imaging and end up over-collecting data. This will drive your eDiscovery costs up.

IT Collection

Section Image

If your organization has limited IT resources, it makes sense to hire a third-party expert to perform the ESI collection. They will have established procedures and the required skills and tools to collect data that will hold up to intense judicial scrutiny. However, this also means that you will be spending a considerable amount of money.

External Collection

Section Image

For this method, an IT professional will be handling the data collection process, but it won’t involve any direct interaction with the data sources. This is a much quicker and more efficient method. It also offers targeted collections through analytics and search technologies. Many experts consider this to be the most cost-effective approach, especially for large organizations.

Remote Collection

Developing a Data Collection Strategy

It’s difficult to create a data collection strategy that is applicable to every case. It is important to have a dynamic plan that can be adapted for each case. For instance, in high-stakes legal matters that involve precarious data sources, it is important to collect data immediately. In other cases, immediate collection might not be required, especially if you already have a preservation process.

Highly relevant data must be collected early. However, if you collect too early and too broadly, it might drive up your costs while offering little benefit.

Another factor to consider is how your collection strategy will be impacted by the case strategy. If you are looking to make an early settlement, collecting and processing data won’t be required.

Apart from this, you also have to consider whether you need outside experts and if there is an incentive for any employees to delete or alter relevant data.

With all of this in mind, you will be able to develop a data collection strategy that suits your case.

Validating the Collection

If the collection process alters the contents of the data or if there is suspicion of bad faith, data collection can lead to disputes. When these controversies occur, there are a few mechanisms you can rely on to prove your eDiscovery collection process is defensive. Here are some of these mechanisms:

Section Image

The chain of custody refers to the documentation regarding the collection, moving, handling, and storing of the evidence from the time it was possessed to the time it has to be presented or transferred. With a thorough chain of custody documentation, you can prove the document’s authenticity.

Chain of Custody

Section Image

A hash value or digital fingerprint is an encryption code associated with computer files. If the contents of the file are changed, the hashtag will change as well. By comparing the hash values before and after ESI collection, you can show that there was no tampering.


Section Image

These are automated records that track user activity. You can use these to show when the data collections took place, how much data was collected, and more.

Audit Trail

Data Processing

Data processing is the next step in the eDiscovery process. It prepares the collected data for review. Once you have collected the data, you will have a mix of different file types, attachments, duplicate files, and meaningless records. eDiscovery processing clears up this mess so that it can be reviewed.

Since this is a highly technical process, many organizations outsource it to third parties that have specialized technologies to reduce the data volume and deliver them back to you for review. However, you can also use an eDiscovery solution that combines data collection and eDiscovery processing.

Data Processing

Data Collection Best Practices

Data collections require technical expertise. There are certain practices you can use to ensure that this dynamic and multi-faceted process is carried out in seamless fashion:

Section Image

It might be easy to copy a custodian’s entire hard drive, but this isn’t smart. The more data you collect, the more data you will have to process and review. What you need are strong early case assessment processes and preservation policies. That way, you can target your data collection process. An advanced eDiscovery solution, such as Casepoint, can help you with that so that you can collect potentially relevant ESI and nothing more.

Don’t Over-Collect

Section Image

Every matter will bring on specific challenges. It is important that you realize these challenges early on. Assessing your needs proactively will help you determine if you need outside resources. Even if you don’t, it will help your IT and legal teams be prepared for whatever comes their way.

Be Proactive

Section Image

To avoid over-collection of data, you must carry on data collection in tiers or phases. Start with prioritizing data so that the most relevant data is collected immediately. Then, you can collect less relevant data when it is needed.

Tier Your Collections

Section Image

It is possible for cases to overlap. You might face multiple lawsuits revolving around the same people, data, or issues. Instead of creating multiple copies of data, you need a centralized repository and a systematic process for data reuse.

Create an ESI Repository

Data Collection Tools

There are many tools that you can use for your data collection needs. Let’s take a look at some specific systems you should consider using:

Section Image

Processing usually takes place after data has been collected. However, there are new technologies that collect and process data at the same time. This way, you don’t have to send the collected data to a third party.

In-Place Processing

Section Image

Pre-collection analysis requires tools capable of crawling through multiple data sources and providing insights such as document volume. They can also perform advanced filtering and searching to help you find relevant content. With this, you will have the required visibility for a targeted data collection process.

Pre-Collection Analytics

Section Image

Your eDiscovery software to collect data should integrate with your data sources, such as structured databases, email servers, and Slack messages. This will streamline the collection process and enable the IT team to handle collection remotely. Even non-IT professionals will be able to get more involved in the data collection process.

Data Source Integrations

Section Image

Occasionally, you will need access to data that isn’t in the network. Spot collector tools are portable devices through which you can collect data from non-network systems. You can pre-configure these devices to collect only relevant information.

Spot Collectors

Section Image

Collecting mobile data comes with its own set of challenges. For this, you will need specific solutions that can extract data from mobile devices. They must also be able to reformat this data for review and production purposes.

Mobile Collection Tools


The process of data collection is changing. If you want to stay ahead of the game, you need to develop a modern collection strategy and use solutions that are capable of handling it. eDiscovery solutions such as Casepoint are an essential part of this.