LLM Data for Litigation, Compliance, and Investigations | Casepoint

The arrival of generative AI (GenAI) has created thrilling technological advancements that accelerate growth and innovation in every modern business. No company or government agency is immune to the question of how GenAI will impact the business. However, while GenAI technology has evolved rapidly, the associated government regulations and legal precedents have not, causing companies at the forefront of GenAI to face new challenges and responsibilities.

How Is GenAI Being Used?

GenAI creates new content (images, text, audio, video, code) by leveraging underlying AI models. Large language models (LLMs) are pre-trained on massive amounts of language and text data, providing the key to GenAI’s ability to understand relationships between words and phrases — and, as a result, create meaningful content.

What happens is unlike any traditional Google or Bing search. Create a simple prompt in programs like Claude (Anthropic) or ChatGPT (OpenAI) and receive an automatically generated response tailored to your request.

Ask for a letter to your manager for a raise, and the technology produces a new letter that is clear, concise, and completely responsive to any details you provide.
Ask for a Python code script to translate a computer virus source code into Java, and the script you receive can be used immediately.
Ask for an image of a young professional riding the subway in New York on a rainy morning, and you’ll get a picture that has never been seen before.

You don’t need to be a writer, coder, or artist. And no human imagination or effort is required.

GenAI can field a seemingly endless number of requests and produce a seemingly endless amount of content in response, which starts to reveal where the risk lies.

GenAI as an Emerging Data Risk

GenAI technology has quickly landed into almost every aspect of modern life. Employees are using it to do their jobs, from repetitive tasks to more innovative ideas.

At the same time, prompts are becoming increasingly sophisticated. They represent thoughts and activities that many would consider to be private intellectual property (IP). They can also pertain to highly confidential or sensitive information.

What if an employee devises a prompt that contains IP from a competitor? Or if the individual uses technology for ideas about bribing a prospect?

These examples show how someone may use GenAI for questionable purposes. In any case, prompts and responses should be treated the same way as other data in enterprise systems. Just like how emails, documents, and chat messages fall under normal management and retention requirements, so too should GenAI activity.

This new data type will be used across legal compliance, litigation, and investigations.

An Example: Litigation and Safety

One of the first legal cases that illustrates the challenges introduced with GenAI prompt and response data is when a group of music publishers sued Anthropic for copyright infringement, claiming copyrighted song lyrics were used in the training of GenAI models. The lawsuit was filed in a Tennessee federal court in October 2023 (No. 3:23-cv-01092).

The case is interesting for a couple of reasons. First, U.S. copyright law is not current with the capabilities of AI. As a result, what happens may set some precedent on how courts handle GenAI. Second, it involves Anthropic, widely regarded as a pioneer in AI safety, thanks to its focus on research and policy.

Anthropic Co-Founder and CEO Dario Amodei is active in Washington D.C., where he has testified in front of senators and “nurtured ties with the Executive Branch, lawmakers, and the national-security establishment in Washington,” according to a TIME profile in May 2024. Amodei earned a great deal of credibility — along with plenty of criticism and praise — for the decision to continue safety testing instead of launching Claude a year after founding the company. Rival product ChatGPT launched three months later and the decision likely cost Anthropic billions.

As for the case, looking at the documents filed in court, many of the evidence requests and subsequent productions include prompt and response data from the Anthropic GenAI technology Claude. This confirms that the user input data is critical for the case.

As Cleary’s IP & Technology reported, the court said that the plaintiffs made a “strategic decision to sue a California-based company in the Middle District of Tennessee, and in doing so ran the risk of encountering a jurisdictional hurdle too high to climb.”

Casepoint and GenAI Data

At Casepoint, prompt and response data from LLMs is a new data type supported in the platform. Organizations can now take advantage of the ability to collect, process, and review large GenAI data sets in a secure, cloud-based end-to-end platform with advanced analytics and AI.

As you can imagine, the ability to not only handle incredibly large data sets, but work quickly and efficiently within them, is key. For instance, compare search speed in a universe of 5 million documents:

The industry average is 10 search terms in five hours.
Casepoint runs 10 search terms in just under 10 minutes.

Casepoint offers 97% time savings, even when working with extremely large data sets.

The Future

If the Anthropic case is any indication, GenAI prompt and response data represents the next frontier in legal compliance and investigation data.

According to Venture Beat, OpenAI had 260 businesses and 150,000 users in January 2024 for its enterprise subscription tier four months after launch. GenAI is still in its infancy. Hundreds of thousands of enterprise and government agency customers will soon be using the technology.

And all of them will be faced with an important question — how can they manage prompt and response data created by GenAI?

Learn how Casepoint’s secure data discovery platform helps enterprise corporations defensibly reduce risk.

Author

Chris Kruse

Chief Revenue Officer

As Casepoint’s Chief Revenue Officer, Chris Kruse leads the government and corporate sales teams, drives growth, go-to-market transformation, and product innovation, and provides strategic advice to shareholders. With over 30 years of experience in the legal technology market, Chris has spent his career developing solutions for complex enterprise…

Categories:

How Enterprises Navigate Large Language Model (LLM) Data for Litigation, Compliance, and Investigations

How Is GenAI Being Used?

GenAI as an Emerging Data Risk

An Example: Litigation and Safety

Casepoint and GenAI Data

The Future

Not Sure How You Should Manage GenAI Data?

Why Microsoft 365 Is Not the Answer for eDiscovery

Preventing Inadvertent SSN Disclosure: How Government Agencies, Law Firms, and Corporations Can Avoid a Costly Data Breach

Legal Tech: Preparing for Internal Investigations to Mitigate Risk [Law Journal Newsletters]

CCPA: Who is Most Impacted and How does it Affect the Legal Industry

Trusted technology for the world's most critical communities

How Is GenAI Being Used?

GenAI as an Emerging Data Risk

An Example: Litigation and Safety

Casepoint and GenAI Data

The Future

Not Sure How You Should Manage GenAI Data?

Related Articles

Why Microsoft 365 Is Not the Answer for eDiscovery

Preventing Inadvertent SSN Disclosure: How Government Agencies, Law Firms, and Corporations Can Avoid a Costly Data Breach

Legal Tech: Preparing for Internal Investigations to Mitigate Risk [Law Journal Newsletters]

CCPA: Who is Most Impacted and How does it Affect the Legal Industry