Preserving Generative AI Data: Legal Risks and Discovery Obligations for Law Firms and Clients
Learn why generative AI prompts, outputs, and training data must be preserved for litigation, compliance, and legal holds in today’s AI-driven workflows.
Dean Taylor
6/16/202522 min read


As generative AI tools like ChatGPT and Claude become embedded in legal and business operations, organizations face growing obligations to preserve AI-generated content. From prompts and outputs to fine-tuning data, these digital artifacts can carry significant legal weight in discovery, litigation, and regulatory compliance. This post breaks down what law firms and in-house counsel need to know now to stay defensible and proactive.
Generative AI tools like ChatGPT, Anthropic’s Claude, and similar large language models are now creating content across industries – drafting emails, summarizing meetings, generating code, and more. With rapid AI adoption comes a new question for all of us lawyers: How do we preserve the artifacts of generative AI – the prompts we input, the outputs we receive, and even the fine-tuning materials or training data behind the scenes? Those same questions pertain to our clients as well.
These AI-generated artifacts present unique challenges under U.S. law when it comes to record-keeping and e-discovery obligations. In the rush to harness AI’s benefits, corporations (and even small businesses) likely have the same duty to preserve this information for litigation, compliance, risk management and governance purposes.
Below I explore the emerging legal expectations around preserving generative AI artifacts, the practical challenges organizations face, and steps you can take to ensure your clients meet their obligations. We’ll draw on recent commentary and case law – including a Reuters analysis from June 2025 – to shed light on best practices and risks for those integrating generative AI into their operations.
Generative AI Artifacts as New Business Records and Evidence
Generative AI systems produce unique, novel records that organizations have never had to deal with before. Every interaction with a generative AI (think querying and reviewing responses from a chatbot like ChatGPT) can generate a prompt (the user’s input or question) and an output (the AI’s generated response). These prompts and outputs don’t fit neatly into traditional document categories, yet courts are increasingly viewing them as electronically stored information (ESI) subject to discovery. As the Reuters analysis notes, in discovery “GAI prompts and outputs may be considered unique information that must be preserved for litigation.” In other words, an AI-generated email draft or a ChatGPT conversation could be as relevant to a case as any internal memo or email chain.
Beyond the prompts and outputs, consider the fine-tuning materials and training data behind generative AI models. Fine-tuning materials typically include specialized datasets or instructions used to adapt a pre-trained model to a specific task or company data. Training data refers to the large corpus of text, images, or other information originally used to train the AI model. These too can become relevant evidence. For example, in copyright infringement disputes against AI companies, courts have compelled production of the AI’s training dataset when plaintiffs allege it contains their copyrighted works. In one case, access to a defendant’s training data revealed copied text and even led to a summary judgment of infringement. This underscores that the data used to train or fine-tune AI can itself be evidence – evidence of what the AI was exposed to, which may be pivotal in intellectual property or bias-related cases.
Whether we’re dealing with user prompts, AI outputs, or underlying training data, these artifacts count as “writings” or records if they contain information relevant to business operations. A prompt entered by an employee and the AI-generated result of that prompt can be considered part of the business record – especially if that output is used to make a decision or is incorporated into some work product. One legal commentator noted organizations must determine if GAI-generated prompts and outputs are considered official “records” and, if so, update their retention policies accordingly. In the public sector, this notion has been codified in some states. Washington State’s public records guidance suggests AI prompts used by government employees, if saved, qualify as public records, just like browser history logs. The City of Seattle’s policy explicitly requires that “all generative AI solutions… support retrieval and export of all prompts and outputs” and that city employees “maintain, or be able to retrieve upon request, records of inputs, prompts, and outputs” in line with records management rules. Despite the fact that we know the private sector doesn’t operate under public records laws, the principle can carry over – if generative AI content relates to your business activities. If that is the case, it’s probably good to assume it may need to be treated as a record for legal purposes.
Emerging Legal Expectations and Case Law on AI Data Preservation
Because generative AI is so new, courts and regulators are only starting to grapple with how to handle AI-generated material. Nonetheless, early cases provide a roadmap. A headline example is Tremblay v. OpenAI (N.D. Cal. 2024), a copyright suit by authors who claimed OpenAI’s ChatGPT was trained on their books. We have talked about that case in prior posts. During discovery in Tremblay, a dispute arose over the preservation and production of ChatGPT prompts and outputs from the plaintiffs’ pre-lawsuit testing of the model. OpenAI requested all prompts the authors or their lawyers had tried (and the outputs) – including those not quoted in the complaint. The authors resisted, arguing that prompts entered as part of preparing the lawsuit were attorney work product and privileged.
A magistrate judge initially sided with OpenAI, reasoning that by referencing some of these prompts and outputs in the complaint, the plaintiffs had waived work product protection. However, the district judge overturned part of that, holding that only the specific prompts and outputs actually referenced in the complaint (and certain account settings) had to be produced. This ruling signaled two important points for practitioners: First, generative AI prompts/outputs can indeed be discoverableevidence, even if generated during a legal investigation. Second, courts will wrestle with how privilege and work product doctrines apply – here the judge drew a line, limiting the waiver to what was used in the pleading. The Tremblay case effectively put parties on notice that if you plan to sue over AI, you must preserve your relevant AI interactions, because you may later need to disclose at least those you rely on in your claims.
Beyond the IP context, expect AI prompts and outputs to surface in many types of litigation. A Reuters commentary observed that as use of ChatGPT and similar tools becomes relevant in all manner of cases, so too will the records of that use. If an employer uses an AI tool for hiring or promotions, the AI’s recommendations and the input data could be evidence in a discrimination case. If a vendor’s software uses generative AI under a contract, the prompts and outputs might be evidence in a breach of contract dispute. The key point: wherever AI is involved in decision-making or content creation related to the dispute, those AI-generated artifacts are likely fair game in discovery. This reality “triggers preservation obligations to avoid sanctions for spoliation”, as one legal analysis bluntly stated. In other words, once litigation is reasonably anticipated, a company that fails to preserve relevant AI prompts or outputs risks court sanctions just as if it deleted emails or documents.
Courts haven’t yet issued definitive rules tailored to AI, but the existing frameworks (the Federal Rules of Civil Procedure, evidence rules, etc.) are being applied. The U.S. Judicial Conference’s Advisory Committee on Evidence recently discussed whether new evidence rules are needed for AI, such as how to authenticate AI-generated material or deal with reliability concerns. No new rules have been adopted yet, meaning for now we proceed under traditional principles: Is the AI information relevant? Is it non-privileged? Is it proportional to the case to preserve and produce it? These are the questions lawyers must be ready to address. Notably, relevance and proportionality act as gatekeepers – a 2023 Law360 article emphasizes that like any ESI, GPT prompts must meet relevance standards under Fed. R. Civ. P. 26(b)(1) , and parties can argue that massive troves of AI data that have little importance may be outside the scope or unduly burdensome. In a recent case against Meta, a court limited discovery to a refined subset of post-training data because the raw training dataset was “massive” and not proportional to the needs of the case. Lawyers should be prepared to make similar arguments or agreements in cases involving AI: preserve what’s relevant and necessary, but you may not need to turn over every log of every AI query ever made if it’s not germane.
The Duty to Preserve GenAI Data and Updating Legal Holds
Under U.S. law, the duty to preserve evidence kicks in when a party reasonably anticipates litigation. At that point, the party must suspend routine deletion and take reasonable steps to preserve relevant information – now including generative AI artifacts. Failure to do so can lead to spoliation sanctions. This means companies using generative AI need to proactively integrate it into their legal hold workflows. If a lawsuit or investigation seems likely, you should identify any AI tools or data used that might be relevant, and ensure those prompts, outputs, or related data are saved.For example, if an employee used ChatGPT to draft a proposal that is later alleged to contain misrepresentations, the exact prompts and outputs from that ChatGPT session could become critical evidence of intent or knowledge. They must be preserved like any other ESI once a legal hold is in place.
Commentators are urging organizations and counsel to update their legal hold notices and policies to explicitly cover AI-generated content. Many legal hold templates already mention email, texts, and social media; now they should mention “generative AI prompts and outputs” where applicable. Employees need to be instructed that if they have used tools like ChatGPT, Bing Chat, Bard, Copilot, or any AI assistant in ways related to the matter, those uses are subject to the hold. This might involve preserving conversation histories or outputs that would otherwise vanish. In fact, technology counsel at Skadden recommend incorporating questions about GenAI use into custodian interviews and data collection checklists – essentially, don’t forget to ask “Did you use any AI tools for this project or communication?” as part of your discovery scoping. If the answer is yes, the IT team may need to retrieve data from the AI platform (or the user’s account history) to secure it.
One practical complication is that not all AI tools automatically save user prompts or outputs in a retrievable form. Some do – for instance, ChatGPT’s web interface keeps a history (unless the user opts out of data retention), and an enterprise license might log content. Other tools may default to ephemeral use. Microsoft’s Copilot, for example, integrates with Office apps and may not separately archive everything it generates. As a result, if you wait too long to put a hold on AI data, it might be gone. A legal tech expert described this as “data everywhere (and nowhere)” – AI content may be generated across various platforms, making it like “playing hide-and-seek in a digital jungle” when trying to find relevant items later. To counter this, legal teams are negotiating with their IT and vendors up front. Some companies, recognizing this challenge, are “making sure AI vendors agree to preserve data and provide access when needed”, baking those requirements into contracts. For lawyers, it’s wise to advise clients: if you’re adopting a new AI tool, consider how you’d preserve its data in a dispute. Early consultation between legal, IT, and the business units can ensure the organization isn’t blindsided by an inability to retrieve AI-related information when a hold arises.
Unique Challenges in Preserving Generative AI Outputs and Data
Preserving generative AI artifacts isn’t as straightforward as keeping email or paper files. These new data types come with unique challenges.
Ephemeral and Dynamic Content
Many AI-generated outputs are ephemeral by design. Some tools generate text or images that disappear as soon as they are shown or used. Think of an AI writing assistant that suggests a sentence as you type – if you don’t copy and save it, it’s gone. Even when outputs are saved, generative models can produce different results each time for the same prompt. As one e-discovery expert noted, “the dynamic nature [of AI models] makes it challenging to capture and preserve a specific response tied to a particular prompt.” In other words, if an employee runs the same prompt a week later, the wording of the answer might differ due to model updates or randomness. This variability complicates preservation: it’s not enough to save the prompt; you may need to save the exact output as it was generated at that time, or you won’t be able to replicate it later. No built-in tracking or versioning exists for most AI tools. The best practice is to actively log or export results at the time of generation if they could be important. Organizations might use screenshots, copy-paste into documents, or AI provider export functions (if available) to freeze the state of an output. Failing that, once the AI’s context resets, you’ve lost that data. This is a radical shift from email threads that sit in an inbox until deleted – AI content might never be stored at all unless you intentionally preserve it.
Data Dispersed Across Systems
Generative AI often operates across various platforms and storage locations. A single AI-powered application can touch multiple data stores. For instance, consider a meeting assistant that transcribes a Zoom call, then uses an AI model to generate a summary. The transcript might be saved in one system (e.g., the organizer’s cloud storage), while the summary could be emailed to participants or saved in the app’s own repository. If your client’s organization adopts such a tool, you must ask: where are those transcripts and AI summaries stored – on the vendor’s cloud? in your SharePoint? in each user’s cache? And how long are they kept? The Reuters piece highlights that each GAI tool may store data differently, so lawyers must ascertain “the form and function of a given tool, including where it stores its prompts and outputs.” Small businesses especially might not realize how fragmented this can be; they might assume “it’s all on my laptop,” when in fact the AI service might be hosting the data externally. During preservation, this means you may need to pull data from multiple sources. It also raises the concern that some data might fall through the cracks – for example, if an employee used a personal OpenAI account on their phone, that prompt/output might reside outside any corporate system. Clear policies (discussed later) and thorough custodian interviews can help map these out.
Volume and Proportionality Concerns
Another challenge is the sheer volume that AI interactions can produce. While a single prompt-output pair is usually small (a few lines of text), organizations with heavy AI use could accumulate thousands of such exchanges, not to mention large training datasets. If a company fine-tunes an AI model, the fine-tuning dataset could be millions of records. The primary training data for big models is enormous – e.g., Meta’s recent Llama 3 model was trained on 15 trillion tokens of data. Clearly, no one is going to dump all of that into a court filing. The volume raises issues of cost and feasibility in preservation. You might not need or want to preserve every AI output ever generated if it’s not pertinent. U.S. discovery rules allow parties to argue that certain ESI is not reasonably accessible or not proportional to the case due to burden or volume. In AI-related cases, courts have shown some sympathy to these proportionality arguments. We saw a judge agree to limit discovery to a refined dataset rather than raw training data because the raw data was massively larger and mostly irrelevant to the specific claims. As lawyers we should be prepared to negotiate similar protocols for AI data – for example, maybe you preserve a random sample of outputs, or only outputs containing certain keywords, rather than every single interaction, if that will satisfy relevance needs. However, be cautious: if you do know certain AI outputs are relevant, you cannot ignore them just because of the volume. A balanced approach is needed – preserve broadly enough to avoid losing key evidence, but use filtering and agreements to keep the scope reasonable. Keeping good records of how you applied proportionality (criteria used, etc.) is also advised, in case a court later questions your decisions.
Privacy, Confidentiality, and IP Risks
Preserving AI artifacts can sometimes conflict with other obligations, such as privacy laws or confidentiality concerns. Privacy: Imagine an AI chatbot used by a health clinic – if an employee prompted it with patient information to draft a letter, that prompt/output may contain sensitive personal health data. Normally, privacy regulations (like HIPAA or California’s CCPA) would require limiting retention of such data. European GDPR actually encourages minimizing storage of personal data. But if a legal hold requires preserving that AI prompt, you’re in a bind between deleting it for privacy vs. keeping it for litigation. So when your obligations to preserve and delete (PII) collide lawyers have to carefully navigate how to comply with both sets of obligations. Often, the legal hold will take priority to avoid spoliation, but companies should then secure the data appropriately (access controls, perhaps anonymization) to still honor privacy as much as possible. Also, any plan to preserve AI data should include capturing metadata (timestamps, user IDs, etc.) in a secure manner, which can help with authenticity but also raises issues if that metadata contains personal identifiers.
Confidentiality and privilege: If lawyers or employees are using generative AI for work product, the prompts and outputs might contain highly sensitive information – trade secrets, legal strategies, client data, etc. Yet if these become relevant in a dispute, there’s a risk they might have to be produced. In Tremblay, prompts crafted by attorneys were initially deemed work product but risked disclosure once the content was referenced in litigation. Likewise, in internal investigations or in-house counsel work, it’s currently unclear how courts will treat AI-generated material that contains legal advice or was prepared at the direction of counsel. Until law develops, the safest course is to assume minimal privilege protection for AI outputs – treat them as you would any other business-created document. If you want to keep something privileged, don’t blindly paste it into a public AI system in the first place. We’ve already seen cautionary tales: an attorney used ChatGPT’s free version to help rewrite a client email, only to realize later this may have compromised privilege and created discoverable records of that communication. This highlights the need for clear internal guidelines: e.g., lawyers should avoid inputting confidential client info into AI tools unless the tool is vetted and secure, and they should certainly save a copy of any output if they do use it, given it could end up subject to disclosure.
Intellectual property: Another angle – if you fine-tuned a model on proprietary data or licensed content, that fine-tuning dataset might be confidential or IP-sensitive. However, if a dispute arises (say, a third party claims you misused their data in training), you may have to reveal portions of that fine-tuning material to defend your client or comply with discovery. Courts have handled this by using protective orders and controlled review processes similar to source code reviews. You might be allowed to provide access in a secure room or via hashes/indexes instead of handing over raw data. The practical tip here is: if your client is developing or fine-tuning AI, inventory what data is in the training sets and how you would produce it if challenged. Because deleting that training data to avoid trouble is not a solution – aside from compliance issues, courts might consider such deletion spoliation if done when litigation was foreseeable.
Third-Party Control of Data
Many generative AI services are cloud-based or provided by third-party vendors, which complicates preservation. If you ask ChatGPT a question via its API or website, the actual data resides on OpenAI’s servers, not on your local machine. Some AI vendors store user prompts and outputs for a limited time (often to further train their models or for moderation purposes), while others may allow opting out of retention. For example, Google’s AI models allow users to turn off the chat history, meaning prompts won’t be saved to your account. Microsoft’s Copilot had no such user-controlled setting as of early 2024. If a company relies on an external AI platform, preservation may require cooperation from the vendor to access logs or data exports. This is why negotiating data handling in your client’s vendor contracts is critical. Corporate legal teams are now making sure that contracts with AI providers “support retrieval and export of all prompts and outputs (either via functionality or contract assurances).” In other words, if you’re going to use a SaaS AI tool, include a clause that the vendor will assist in data collection for legal matters. Small businesses might feel they have little leverage with major AI vendors, but even using the vendor’s enterprise tier often comes with admin controls to export user histories or audit logs. If no such mechanism exists, the organization should at least have a policy for end-users to manually save important outputs.
Moreover, monitoring usage is part of this challenge. A company can’t preserve what it doesn’t know exists. If employees experiment with AI tools without informing anyone, relevant data could be sitting out of reach in a personal account. To mitigate this, some companies restrict use of unapproved AI services, funneling employees to approved tools that the company can supervise. Others simply educate staff: if you use any AI for work, you must do so through your corporate account or otherwise keep a copy of what you did. The theme here is control – the more an organization can bring generative AI usage under its IT governance umbrella, the easier it will be to preserve (and later collect) the artifacts. Conversely, shadow AI usage (the AI equivalent of shadow IT) could lead to lost evidence or nasty surprises in litigation.
Adapting Information Governance and Retention Policies
Faced with these challenges, organizations should not treat generative AI data on an ad hoc, case-by-case basis. Instead, they should proactively adapt their information governance and records retention policies to account for AI. This is a point emphasized in the June 2025 Reuters commentary: companies need to decide if AI prompts and outputs are “records” under their internal definitions and then update retention schedules accordingly. If, for example, an AI output is essentially performing the role of a first draft of a business document, your policy might classify it as a transient record (to be kept 30 days unless used) or perhaps as part of the documentation of that business process (to be kept like any other draft). Every organization may answer differently, but the key is to have an answer and put it in writing. Without a policy, employees won’t know what to do, and IT won’t know what to save or delete. Without a policy, when processes inevitably diverge in their handling of similar data, courts may scrutinize opposition counsel claims of intentional mishandling of data.
A strong governance approach includes: (1) Policy language that covers generative AI (what it can be used for, what must be saved, what shouldn’t be input, etc.), (2) Technical measures to enforce those policies (like disabling AI tools that don’t meet retention or security standards), and (3) Training and awareness. Training is critical – users need to understand that AI isn’t “magic” that operates outside normal rules. As one article put it, “if you’re using GenAI, document it properly.” Employees should be taught that AI outputs used for work should be treated like any document: saved in the appropriate repository and subject to the same retention rules. They also must be aware of the risks of AI “hallucinations” (fabricated information) – a fascinating risk unique to AI is that it might create inaccurate records that look official. Without caution, an organization could inadvertently preserve and rely on an AI-generated falsehood. Thus, some advise that “any AI-generated output must be reviewed and verified before preservation” to ensure it’s accurate . This doesn’t mean you should delete it if it’s wrong (especially not under a legal hold), but you might annotate it or avoid treating it as truth until verified. Training employees to double-check AI work product before circulating it can prevent a lot of headaches – both for business accuracy and later legal defensibility.
From a retention scheduling perspective, organizations might categorize AI artifacts under existing categories (e.g., treat AI communications as you would email or chats). Some governments are analogizing prompts to internet search history and deeming them transitory (no retention required unless used). Private companies might similarly decide that “temporary AI drafts” are not official records and can be purged quickly – unless a draft is used or shared in which case it graduates to a real record. This kind of nuance can be baked into policy. The bottom line is that, as one legal expert noted, “broad integration of GAI into a corporate environment… requires a thoughtful and comprehensive approach.” It’s not just an IT issue or a legal issue; it’s a multidisciplinary governance issue.
Another aspect of governance is updating ESI protocols and agreements with opposing counsel in litigation. Parties can specifically address AI data in their discovery plans. For instance, if certain AI outputs exist only in a database, lawyers can agree on a format for production (perhaps exporting them as PDF or CSV). If there are concerns about confidentiality (like sensitive training data), the protective order can include special provisions (inspection only, attorneys’ eyes only, etc.). Forward-thinking counsel have even suggested that ESI protocols include whether either side will use AI tools during discovery review and whether that needs to be disclosed or limited. All of this is negotiable, and including it early can prevent disputes. Small businesses involved in litigation should not shy away from raising these points; even if they use simpler tools, clarifying obligations around AI can save costs later.
Regulatory Compliance: Beyond Litigation (Records and Retention Requirements)
Thus far, we’ve focused on litigation and discovery, but organizations also must consider regulatory and compliance obligations related to AI-generated records. In highly regulated industries like financial services, communications must often be retained for set periods and supervised for compliance breaches, regardless of whether litigation is on the horizon. Regulators such as the U.S. Financial Industry Regulatory Authority (FINRA) have made it clear that using a generative AI tool doesn’t exempt firms from existing recordkeeping rules. FINRA recently reminded member firms that its rules “continue to apply when a member firm uses GenAI or similar technologies,” expressly extending to recordkeeping requirements. FINRA’s guidance implies that if, say, a broker-dealer uses ChatGPT to generate a message to a client, that message must be archived just like any other business communication. Indeed, FINRA has stated that firms are responsible for communications “regardless of whether they are generated by a human or AI,” meaning AI-generated client communications must be retained and supervised just as human-written ones are.
The same likely holds for other sectors: If a company in a regulated space uses AI to create records (e.g. an AI-generated safety report in a pharmaceutical company, or an AI chatbot’s conversation with a customer in insurance), those outputs may fall under existing retention mandates (FDA record rules, state insurance regs, etc.). There may not be new laws yet specifically about “AI records,” but the old laws are tech-neutral – a communication is a communication, whether AI had a hand in it or not. For example, SEC rules for investment advisers require retention of advertising materials and certain disclosures; an adviser who uses AI to draft a social media post would need to keep that post. The general consensus from compliance experts is that if generative AI is used within business communications, “those channels will need to be captured to meet recordkeeping obligations.”
We should also consider public companies and their disclosure obligations. If AI outputs contribute to decision-making or forecasts that feed into public disclosures, companies might need to document the basis of those decisions. While not a direct “preservation” rule, good governance suggests maintaining a clear record of how AI contributed to any statements, in case regulators (like the SEC) come asking.
Finally, beyond the U.S., international frameworks are emerging which emphasize accountability for AI. For instance, the European Union’s draft AI Act (expected to take effect in the coming years) will likely require providers of certain AI systems to maintain documentation about training data, algorithms, and usage of the AI. This doesn’t directly force companies to save every prompt, but it does push for thorough record-keeping about how AI models are developed and used. Additionally, data protection laws abroad (e.g., GDPR) require that if AI systems process personal data, individuals have rights to access that data or have it erased in some cases – which gets complicated if that data is intertwined with an AI’s training or output. Multinational companies thus face a double complexity: ensuring compliance with U.S. litigation holds while also not running afoul of international data deletion requirements. Typically, an exemption in GDPR allows data to be retained for the establishment or defense of legal claims, which would cover preservation during U.S. litigation, but companies must document that rationale. It’s a delicate balance – one privacy misstep, and an attempt to preserve AI data could trigger regulatory scrutiny overseas.
In sum, the regulatory landscape is still catching up, but the safest assumption is that any rule that applies to business records or communications also applies to AI-generated material. Until specific AI regulations say otherwise, treat AI outputs as you would any other output of your business processes when it comes to retention and oversight. As one global compliance blog succinctly noted, firms should be “mindful of the potential implications [of AI] for their regulatory obligations” – a gentle way of saying, don’t let the glitz of ChatGPT make you forget the good old rules.
Conclusion: Navigating the Preservation Minefield of Generative AI
Generative AI offers incredible efficiencies and creative power for organizations, but it also introduces a host of new legal considerations. The artifacts of AI – prompts, responses, fine-tuning data, training sets – are now part of the corporate information landscape. As we’ve discussed, U.S. law is rapidly adapting to ensure these AI artifacts are preserved and producible in litigation when relevant. Companies large and small must be proactive: failing to account for AI in your preservation and compliance practices is a risk you can’t afford. The duty to preserve doesn’t care if information was generated by a human or an algorithm; if it’s relevant evidence, it must be saved. At the same time, practical challenges from ephemeral outputs to enormous datasets require new strategies and collaboration between legal, IT, and business teams. Lawyers advising organizations on AI usage should take the lead in building understanding and frameworks to manage these obligations.
On a forward-looking note, we can expect clearer standards to evolve – perhaps new court rules or legislation will eventually address AI record-keeping explicitly. But waiting for that would be a mistake. The time to lay the groundwork is now, when policies are malleable and before disputes arise. With thoughtful planning, companies can enjoy the fruits of generative AI without stepping into legal quagmires over lost or mishandled data. It’s about integrating good old-fashioned record management with cutting-edge tech deployment.
Actionable Recommendations for Lawyers: To wrap up, here are some concrete steps legal professionals can take to help their organizations or clients meet the challenges of preserving generative AI materials:
Update Policies and Training – Develop clear internal guidelines on generative AI use and retention. For example, require employees to save AI outputs used in work product to approved storage and to avoid inputting sensitive data into unofficial AI tools. Update records retention schedules to specify how long AI-generated content is kept (or not kept) under various scenarios . Conduct training sessions so that staff (and fellow attorneys) understand that AI prompts/outputs may be business records that need saving, and highlight the risks (privilege waiver, accuracy issues) of improper use .
Incorporate AI into Legal Hold and E-Discovery Processes – The next time you issue a litigation hold, explicitly mention generative AI data. In interviews or custodian questionnaires, ask about AI usage (“Did you use any AI tools related to this matter?”). Coordinate with IT to retrieve any relevant AI logs, chat histories, or outputs before they disappear . In meet-and-confer discussions, raise the topic of AI-produced ESI and negotiate scope and format (e.g., agreeing on search parameters for AI logs or using a sampling protocol for large datasets). Proactively updating your standard e-discovery protocols now to address AI will save scrambling later.
Engage with IT and Vendors Early – Work closely with your IT department to understand what generative AI tools are in use (officially or unofficially) and how their data is stored. Map out data flows for each tool (where do prompts/outputs reside? for how long?). Ensure that enterprise AI tools are configured to log or retain data as needed. Just as importantly, review and negotiate vendor contracts: include provisions requiring the AI service provider to preserve and supply your organization’s data upon request. If a vendor cannot comply with basic e-discovery needs (e.g., no way to export conversation histories), consider that a red flag and either mitigate it or choose a different solution. By having these conversations at the procurement stage, legal teams can prevent downstream preservation nightmares.
Monitor Legal Developments and Adjust – Stay informed on the fast-evolving case law and regulatory guidance regarding AI. For instance, keep an eye on court decisions that discuss AI-related spoliation or admissibility issues, and watch for any new rules from rulemaking bodies or legislation at state or federal levels. If your organization operates internationally, monitor those jurisdictions as well for AI record-keeping requirements or conflicts (like EU’s AI Act or privacy regulations). Be ready to update your advice and internal policies as new best practices emerge. In the meantime, leaning on general principles – preserve what’s relevant, ensure authenticity, and protect confidentiality – will guide you well. As one commentator noted, integrating AI into corporate environments “requires a thoughtful and comprehensive approach” – which lawyers are well-suited to lead.
By taking these steps, lawyers can help their organizations harness generative AI’s benefits while confidently managing the legal obligations that come with it. Preservation of AI artifacts may be a new challenge, but with a proactive and informed strategy, it’s one that can be met in a defensible and pragmatic way. The tools may be new, but the mission remains the same: maintain the integrity of the evidence and records, and thereby protect the organization’s legal interests in an AI-powered world.
Sources:
Tara Lawler et al., “It’s time to address preservation of generative AI prompts and outputs,” Reuters Legal News(June 10, 2025) .
Lauren G. Leipold & Owen R. Wolfe, “Rules for use of AI-generated evidence in flux,” Reuters (Sept. 23, 2024) .
Andrew M. Good et al., “Litigation and Investigation Implications for Companies Adopting GenAI,” Skadden Insights (Mar. 4, 2024) .
Rose Jones et al., “Understanding Discovery Obligations in the Era of Generative AI,” Law360 (Nov. 29, 2023) .
Sasha S. Rao & Richard A. Crudo, “Discovery of Training Data in AI Litigation,” Corporate Counsel (Apr. 30, 2025) .
Rian Kennedy, “Legal Hold Compliance Challenges in the Age of Generative AI,” CS Disco Blog (Jan. 22, 2025) .
MRSC, “Navigating the Intersection of Public Records and Generative AI” (July 2024) .
Global Relay, “What are the recordkeeping rules for generative AI platforms?” (July 8, 2024) .
Consulting
Expert legal consulting for technology-focused cases.
dean taylor, esq
© 2025. All rights reserved. LegalAIPractice