Home Technology Artificial Intelligence Why data must be refined like oil to drive business value: Cloudera CEO Charles Sansbury tells Gulf Business that data like oil requires substantial effort and infrastructure to unlock its full potential by Marisha Singh September 13, 2024 Image credit: Getty Images In this conversation with Gulf Business, Charles Sansbury, CEO of Cloudera, discusses the parallels between data and oil, emphasising the importance of refining raw data through advanced data engineering to unlock its full business potential. Charles Sansbury, CEO, Cloudera. Image credit: Sourced from Cloudera Q: How has data management evolved for the industry over the past decade, and how is Cloudera keeping pace with that change? Cloudera has been at the forefront of the big data movement for over 15 years. As one of the first companies to assist global organisations in managing vast amounts of data, we’ve enabled them to store, analyse, and utilise unprecedented volumes of information. Today, we provide data platforms and analytics for some of the world’s largest enterprises, including nine of the ten biggest global banks, major telecom operators, insurance companies, and government branches. Over the past decade, businesses have realised that their massive data reserves contain valuable insights. Initially, they experimented with machine learning and data science to extract this information. However, in recent years, the focus has shifted to leveraging advanced AI technologies, particularly generative AI, to support better decision-making. Our platform manages more data than nearly any other company worldwide, and our customers increasingly turn to us to safely and securely enable their AI initiatives. One key challenge for organisations is managing disparate data from various sources, often stored in different formats. This has led to the rise of the ‘open data lakehouse’ – an architecture which Cloudera powers. Our technology allows companies to consolidate structured and unstructured data, preparing it for AI training models. By shaping and standardising the data, we enable businesses to run AI models and gain actionable insights. We’ve been integral to the big data revolution and remain pivotal in helping our clients embrace AI for future growth. Image credit: Sourced from Cloudera Q: Data is the new oil – this phrase was commonly used in the 2000s and 2010s. Does this analogy still apply, especially in this region, known for its oil industry? A: I think there are clear parallels between data and oil, particularly in the sense that data, much like oil, is a raw material that exists within a company’s internal ecosystem. However, the key to realising its value lies in how well it is refined. As with oil, you need to process and refine data to make it usable. Data engineering plays a crucial role in transforming raw data into something actionable and meaningful for business applications. Companies today are increasingly recognising that the data they possess—whether it’s customer data or transaction records—is one of their most valuable assets. The challenge now is that every boardroom is asking how AI can be used to improve business outcomes. CEOs and CIOs are under pressure to show progress in their AI initiatives, and a key part of that is gaining control over their raw data. This involves significant data engineering to refine the data so that it can be applied effectively in analytics. So yes, there are definite parallels between oil as a raw material and data as one. Both require substantial effort to unlock their full potential. In fact, I hadn’t fully considered it before, but the comparison is quite fitting—data, like oil, needs to be refined to drive real value. Q: How are you approaching the mass adoption of generative AI? A: We don’t build large language models ourselves. Our core focus is on helping companies gather, manage, store, and make large pools of data accessible. We provide the tools that allow businesses to apply large language models to that data. This includes support with model management, performance evaluation, and ensuring these models work effectively with the data held within our platforms. Generative AI isn’t a competitor to us. Rather, it is a technology that companies are eager to leverage, and Cloudera plays an essential role in enabling this. Every client we work with is asking how they can better organise their AI operations and quickly realise the value hidden in their vast data reserves. Historically, many businesses lacked the tools, focus, or capabilities to uncover insights from their data, even though they knew it held valuable information. Our role is to assist in this transformation, helping them to harness generative AI by ensuring their data is in the right shape and format. In this way, Cloudera is integral to enabling the AI initiatives our customers are keen to pursue, ensuring they extract meaningful insights from their data effectively and efficiently. Q: When you realised that generative AI was coming on a mass scale, did you, as CEO of Cloudera, experience a shift in how you approach AI for your clients and Cloudera itself? A: About a year ago, shortly after I joined Cloudera, we held a customer event in New York, where some of our largest clients discussed how they were using Cloudera to drive their AI initiatives. These included the world’s leading technology companies, auto manufacturers, and financial institutions. What struck me from that event was that these companies were thought leaders, ahead of the curve. It became evident that Cloudera wasn’t just relevant; we were central to enabling secure and reliable data for AI initiatives. Being able to trust your data and use it safely in the context of both analytical and generative AI was becoming foundational for the world’s largest companies. This led me to two realisations: first, the importance of Cloudera’s role, and second, how we could help our customers by sharing the reference architecture we’ve seen early AI adopters implement. We could advise on organising their data lakes, managing storage, and evaluating the models—whether open-source or proprietary—they apply to their data. Initial use cases such as code generation, chatbots, and automated customer support were quick wins, but we’re still in the early stages of understanding AI’s potential to add business value. One compelling use case involves a global pharmaceutical research company. They have 30 years of research and clinical trial data in various formats and languages. Using Cloudera, they’ve built a common data structure, allowing them to query this disparate data using natural language. This has enabled them to identify relationships between organic compounds and genes, accelerating drug discovery and saving hundreds of millions in R&D costs. This is just one example of AI’s emerging potential. While it’s still early, such applications will appear across industries. We’re committed to enabling our customers to explore and uncover these possibilities by empowering their data scientists with the right tools and capabilities. Q: There’s a lot of development happening across the GCC countries. Are you focusing on any particular industries or customer profiles? Who are you targeting next? A: This region is probably one of the fastest-growing, if not the fastest-growing, in the world for data and analytics software. Within Cloudera, it is among our fastest-expanding markets. Our customer base consists mainly of large global organisations and major public sector entities. If you look at our clients, you’ll find we work with nearly every bank, large manufacturer, and several branches of government across the region. We already have a strong customer foundation here. Our goal is to ensure these clients understand the full potential of our technology so that we can grow together. In addition, we’re actively targeting new customers in areas such as global infrastructure projects, financial services, and government sectors. There’s also a lot of innovation happening here, which makes it an exciting environment. I’ve been visiting the region for over 20 years, and the pace of growth always amazes me. The GCC is a critical region for us, and we are investing significant time and resources here. What’s happening in the region is very exciting, and seeing the rapid trajectory, especially after a few years away due to the pandemic, is truly impressive. Read: Mozn: Pioneering AI solutions in fintech, fraud prevention, and compliance Tags Big Data Cloudera Data management GenAI Generative AI open lakehouse architecture You might also like MIT Sloan, Astra Tech share insights on GenAI’s potential in Middle East The great GenAI trade-off: Balancing responsiveness with responsibility How GenAI, private cloud synergy are transforming the financial sector The future of work: How entrepreneurs are redefining work-life balance