Preparing Data for AI

Preparing data for artificial intelligence (AI) calls for a holistic approach involving data readiness, security, ethical frameworks, and robust governance.

In September 2024, I discussed this topic at the 12th annual 930gov, the Digital Government Institute’s convention for government IT professionals in the Washington, D.C. area.

There are several key areas for successfully preparing data for AI.

  • Understanding the Data Landscape: Government agencies have a unique data landscape, including highly sensitive, classified, and unstructured data. A comprehensive data inventory is essential — mapping where data resides, its type, and its security classification. This process involves breaking down silos between departments and ensuring data is centralized and accessible.

  • Data Governance and Compliance: Due to the complexity of government regulations, compliance with standards like FedRAMP, DOD Impact Levels, and NIST frameworks is non-negotiable. Agencies need clear data governance policies to ensure data quality, ethical usage, and compliance. Also, AI models must be auditable so users can comprehend and trust the results while avoiding bias.

  • Data Cleanliness and Quality: AI systems thrive on quality data. Agencies must focus on data cleansing that ensures the removal of duplicates, incomplete records, and outdated information. To further enhance AI effectiveness, agencies should implement data normalization and common taxonomies across departments. Adopting interoperable data standards, such as universal formats, APIs, and protocols, can ensure seamless data exchange across departments and agencies.

  • Ethical AI and Bias Mitigation: AI should align with ethical standards, especially in government use cases where decisions can impact public policies or citizen services. Implementing bias detection mechanisms and ensuring ethical AI frameworks are built into models from the start can prevent unequal outcomes and build public trust. Additionally, agencies should consider the broader concept of data diversity. For AI models to serve all citizens effectively, they must be trained on datasets that are representative of diverse populations and not just focus on preventing algorithmic bias.

  • Model Training and Continuous Learning: AI thrives on iterative learning and refining models. Agencies must focus on creating high-quality training datasets, enabling AI systems to learn and adapt continuously. This involves implementing robust feedback loops and ensuring datasets are diverse enough to reflect all aspects of the federal landscape.

Preparing Data for AI

Next Steps: 3 Action Items to Prepare Data for AI

With the aforementioned priority areas in mind, government agencies can take the following steps to start the process.

3 Action Items to Prepare Data for AI

1. Conduct a Data Inventory

Agencies should understand what data they have and what data is in scope.

By having a clear data inventory, agencies gain clarity surrounding what they’re trying to do and how they can move forward. In really complex environments — where there are several layers to the data and different security protocols — the stakes are even higher, and one person may not even have immediate insight into what other groups or sections have.

Get started by cataloging data types (structured, unstructured, and semi-structured) along with their locations. Consider creating a visual representation of data flows across departments to understand how data is accessed and shared. Make sure you regularly update the data inventory and governance practices to adapt to changing needs and technologies.

2. Clean Up the Data

Another really important step is cleaning the data to delete duplicates, incomplete records, and outdated information. Data should be as normalized as possible for the AI model.

There may be some situations where agencies don’t have full control over the data, but taking action to establish a common taxonomy is really important. That way, agencies can train the model the right way, and get good data out in the end. The “garbage in, garbage out” expression is in play here, so the more of it that can be removed, the better the situation.

Implement a classification system to label data based on sensitivity (e.g., public, confidential, classified). Ensure compliance with relevant laws and regulations for data protection. Also, be sure to develop clear policies for data management, access, sharing, and usage. Remember to comply with your records retention policies. 

Finally, assign roles and responsibilities for data stewardship within departments. Remember to comply with your records retention policies. 

3. Leverage Experience and Expertise

Agencies don’t have to do it alone. A trusted partner can make the process much easier and more efficient.

There’s a lot that goes into preparing data for AI, and it’s easy to get caught up in the steps. At Casepoint, we’ve seen agencies' outdated information and lingering data. How should an agency deal with a bunch of backup tapes and data stored on legacy technologies?

Working with a company like Casepoint can provide the experience and expertise needed. After all, we know how to do this and have so many times. For instance, take a look at how Casepoint helped a major U.S. federal regulatory agency efficiently migrate up to 1 petabyte of legacy data across multiple teams and within stringent security environments — achieving time and cost savings with AI and advanced analytics.

Contact us to discuss your agency’s specific needs and explore how we can help in preparing your data for AI.

How Government Agencies Can Prepare Data for AI
Kevin Albert

Author

Kevin Albert

Director of Data Operations and Product Support, Government

Kevin Albert is the Director of Data Operations and Product Support for Casepoint’s government team. He oversees department-wide management, focusing on aligning operations to Casepoint’s strategic vision by managing resource assignments and prioritizing activities based on customer requirements and SLAs to ensure the highest quality of services…

Categories: