The growing adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies is becoming a key driver for organisations looking to become leaders in their fields and earn the label of being ‘best in class’. However, sadly, too many AI and ML projects fail to reach their full potential.

There can be numerous and varying reasons for this, including poor goal setting, budgetary constraints, and function creep during the planning, proof of concept and realisation phases.

Bad data is also a key reason for project failure. It comes down to the oft-repeated truism: ‘Garbage In, Garbage Out,’ which is as relevant today as it has ever been. 

Two of the four key roles and responsibilities for AI projects outlined by analyst firm, Gartner, relate to data. Alongside AI Architect and ML Engineer, the Gartner lists Data Scientist (responsible for identifying use cases, determining which data sets and algorithms are required, and building AI models), and Data Engineer (responsible for making the appropriate data available, with a focus on data integration, modelling, optimisation, quality and self-service). Organisations ignore these data-related roles and the wider importance of data at their peril and risk falling on the wrong side of the 50% of IT leaders Gartner has identified that will struggle to move their AI projects beyond proof of concept before the end of 2023.

Work backwards from your intended outcome

No AI or ML project has a chance of being successful if there is not an accompanying data strategy. A key aspect of getting this in place is to work backwards from your desired outcome to figure out what ingredients it requires.

For this, the project goals should be articulated with crystal clarity. It is not enough to say “AI will help us streamline our production plant”. That needs to be broken down into greater detail, with a focus on specific process elements.

Now it is possible to see what data is required to achieve each goal. And this can’t be generalised. It is important to list all the data elements required. These might not all exist before the project begins, so working out what they are and how they will be collected, is vital.

This is often much trickier than it sounds, and external data scientists and data engineers, with experience of working in AI/ML development, will bring an ability to ask the right questions, look round corners at problems, keep a lid on function creep and make sure the most difficult questions are addressed rather than parked.

Bringing them in early can mean an organisation doesn’t find itself having to do this work at a later stage when it can add expense and time to a project or worse – contribute to its failure.


We look at the role of an effective data management strategy in business transformation


By the end of the ‘working backwards’ process, an organisation should know what it needs to progress with confidence having worked through what they need to know, what knowledge of these needs it already has, and what it needs to obtain this know-how.

Establish an effective data policy

Organisations can’t assume that because they collect data already, that they can just pass it right over to the AI/ML developer and it’ll drop neatly into new applications, out the other end of which will come amazingly informative dashboards of new information – if only it was that simple. 

The way data is collected has changed in recent years, in part due to the European Union’s General Data Protection Regulation (GDPR). Even in Southeast Asian countries such as Malaysia and Singapore where Personal Data Protection Acts (PDPA) are in place, is the regulations are still very heavily based on the European GDPR. Therefore this year’s data set isn’t compatible or comparable with that from four years ago from a privacy policy perspective, and may require more from your organisation than what has previously been in place. . With users in Southeast Asia becoming more conscious of where and how their personal data is being used, data privacy requirements are only set to increase. Perhaps significant amounts of historical data – even recent historical data – are missing. Perhaps the organisation needs to put in place entirely new data collection policies to start from a designated ‘Day 1’.

Working out a ‘Day 1’ data policy is one thing. However, to hit the ground running with an AI/ML project as soon as it kicks in, some historical data will be useful. This requires several decisions to be made, including which data sets are most important, how far back to go, whether to work on a subset for proof of concept and bring in more data sets later, and whether some poorly managed data can be cleaned enough for use – or not.

Making the right decisions will be vitally important, and this is another area where that trusted third party view of the AI/ML specialist will be extremely valuable.

Dismantle data silos 

It is possible an entirely new data policy will be needed going forward to keep the AI/ML system fed with the right quality data. Not only might new data need to be gathered, but new working practices might also be needed. This means there could be significant implications across the whole organisation. For example, it might be important to do a ‘once-and-for-all’ purge of data silos that can often hold data-related projects back.

Recent research has found that many IT teams are spending 40% of their time managing and maintaining data infrastructure, and only 32% of data available to enterprises is put to work, while the remaining 68% goes unleveraged.

In too many organisations there are still different lines of business capturing the same data for their own use. This is cost-inefficient, causes data silos which lead to data fragmentation and governance issues, and inevitably means there is variance in the accuracy and quality of data. 

Which data set should the AI/ML project use? That’s the wrong question to ask. The right question to ask is “How do we ensure there is just one set of this data, shared across all lines of business?”

Ask the question, find the answer, then implement it and repeat for all silos. This will help the current AI/ML project immensely, should create cost efficiencies, and should support future AI/ML and other projects going forwards. It will also be beneficial for other data management processes such as backup, restore and archive.

Even in the era of digital transformation, Fortune 500 companies take weeks or, most often, months to deliver clean data to their teams, often mandating a carefully coordinated effort across multiple teams. Further, this has necessitated the use of ingenious, albeit often insufficient, methods such as the use of synthetic data sets or subsets of data.

The answer is to deploy zero-cost clones. When users can instantly provision clones of backup data, files, objects, or entire views, they can be presented to support a variety of use cases. Zero-cost clones are extremely efficient and can be instantly created without having to move data. This is in stark contrast to the inefficiency of the traditional DevTest paradigm, in which full copies of data are created between infrastructure silos. This is a dramatic shift to modernisation.

By decoupling data from the underlying infrastructure in this way, we enable organisations to automate data delivery, and provide data mobility. Zero-cost clones can be spun up in minutes rather than weeks. As a result, customers have been able to reduce their service level agreement (SLAs) for data delivery, accelerated application delivery and migration, and greatly simplified their data preparation.

With the right data flowing in, AI and ML projects can provide dashboards of insights that can be used by the organisation in the transformational ways it envisions. Focusing on the data from the start of an AI or ML project can help an organisation land on the right side of Gartner’s 50%, however, this focus must occur from the outset.

This article has been contributed by Sheena Chin, Head of ASEAN, Cohesity

About the author

Sheena Chin is the managing director for COHESITY Association of Southeast Asian Nations (ASEAN), responsible to drive Cohesity’s rapid growth in the region, including go-to-market strategy and execution for the ASEAN region, working with channel partners, systems engineering, operations, and marketing teams. Sheena brings years of data management expertise to Cohesity. She joins Cohesity from Veritas, where she served as the Singapore country director, building a strong presence in the region and significantly impacting revenues during her tenure with consistent year on year growth. Previously, Sheena worked as an enterprise sales director for Symantec, growing the company’s cybersecurity and information management software business in Southeast Asia. With more than 20 years in enterprise technology sales, Sheena has extensive experience in developing deep customer engagements with large global customers and securing strategic wins in hyper-competitive markets.