Artificial Intelligence

Gartner: Ways to ensure your data is AI-ready

3 Mins read
AI-ready data

A data strategy has always been crucial. However, with the emergence of AI and GenAI, the conversation surrounding data has reached a new level. According to Gartner research, only 4% of organizations have reported that their data is prepared for AI. In other words, 96% of organizations are not prepared.

Organizations that do not recognize the significant distinctions between data requirements for AI and traditional data management put their AI initiatives at risk. Leaders in data and analytics must be able to demonstrate the readiness of their data to effectively meet the demands for AI-ready data.

AI-ready data means that your data must be representative of the use case, including all patterns, errors, outliers, and unexpected emergence that is needed to train or run the AI model for the specific use. Data readiness for AI is not a one-time task, nor can it be prepared in advance for all data. It is an ongoing process and practice that relies on the availability of metadata to align, qualify and govern the data.

To prove data is AI-ready, data and analytics (D&A) and AI teams will need to be capable of quickly iterating and converging to identify data that is fit for use along the full development and operationalization of the AI use case. D&A leaders must address the following recommendations to ensure their data is AI-ready.

Align Data with Use-Case Requirements

Every AI use case should describe what data it needs, which will also depend on the AI technique that is used. This may not be fully defined upfront but will emerge as the data is being used and the AI requirements are being met. D&A leaders must ensure their data meets AI use-case expectations in terms of the following parameters:

  • AI techniques, which set some specific requirements on the data. For instance, GenAI training data will have different requirements than a simulation model. Therefore, the data must meet the expectations of the AI technique being applied to support the use case.
  • Quantification involves ensuring that there is enough data, such as having sufficient training data over multiple years if there is a seasonal pattern. In this case, potential solutions like using synthetic data to supplement the existing data can be included in a remediation plan to meet the quantification requirements.
  • Semantics, annotation and labelling to enrich the data. This includes annotation or labeling in case of images or videos along with taxonomies and ontologies that are often represented in the form of a knowledge graph.
  • Quality defines the extent to which the data meets the data quality requirements of the AI use case.
  • Trust encompasses AI use-case requirements regarding the origin of data, the level of trust in the sources and pipelines involved, and the output from other participating AI models.
  • Diversity requirements for AI guarantee the inclusion of sufficiently diverse sources to avoid any bias stemming from the sources.
  • Lineage provides end-to-end transparency about where the data is coming from and how it is used across all data usage scenarios in the context of a specific AI use case.

Qualify Use to Meet AI-Expected Confidence Requirements

Qualifying the use ensures that the data continuously meets the requirements, whether it is for training, developing or running a model in operations. Use the following parameters to ensure that the data meets expected confidence requirements for AI use cases:

  • Validation and verification, which ensures that all data requirements are regularly met at development and during operations.
  • Performance, cost, and non-functional requirements ensure that the data meets the minimum operational service level agreements (SLAs), such as response time, timeliness, high availability, disaster recovery, or cost.
  • Versioning, which makes sure every use of the data is versioned, with the possibility of reverting back to an older version of the AI-ready data and auditing all data versions.
  • Continuous regression testing where teams need to work to come up with a variety of test cases to test these systems to detect when things go wrong.
  • Observability metrics and monitoring provide transparency and support in tracking the system’s health.

Govern AI-Ready Data in the Context of the Use Case

D&A leaders must define the ongoing data governance requirements that the data must meet in order to support the AI use case, using the parameters listed below:

  • Data stewardship, which guarantees that the use case has proper policies applied across its full life cycle. This is supported by defining and monitoring the required observability metrics.
  • Data and AI standards and regulations such as the AI EU Act, are currently being developed. These new regulations will add to existing regulatory and compliance requirements on data.
  • AI ethics requirements, which forms part of the governance requirements of the use case.

Managing data is a task that continues in perpetuity. The underlying premise beyond the pillars is that this is not a one-time process; it requires continuous effort.

Gartner analysts will discuss more topics related to AI and data governance and management at the Gartner IT Symposium/Xpo conference, taking place November 11-13, in Kochi, India.

Author Bio: Roxane Edjlali, Senior Director Analyst at Gartner

Image by DC Studio on Freepik

Read next: AI divide in HR: 38% embrace AI technology, while non-AI users hesitate, revealing significant gap in adoption and understanding

Leave a Reply

Your email address will not be published. Required fields are marked *

− 2 = 2