India must shift from AI consumer to contributor

Key Points

India has 21.9 million open-source developers but contributes little to foundational AI projects
Open models typically catch up with proprietary models within three to six months
Red Hat using Linux Foundation’s Model Openness Framework to evaluate AI model transparency

India has the world’s second-largest community of open-source contributors, with 21.9 million developers. Yet the country contributes relatively little to the foundational projects — PyTorch, vLLM (a high-performance library for running large language models efficiently), and the core architectures — that power modern AI systems.

Vincent Caldeira, chief technology officer for Asia-Pacific at Red Hat, believes this gap represents both a problem and an opportunity. In an interview with TechObserver.in’s Mohd Ujaley, he outlined how Red Hat plans to invest in India’s engineering capability to help the country move beyond AI adoption into foundational innovation.

EVENT

Digital Senate

Digital Senate is a premier conference uniting government leaders, technologists and innovators to share ideas, success stories and strategies on digital governance, public sector transformation, cybersecurity and emerging technologies in India.

📅 Thu, 30 Jul 2026
📍 New Delhi

EVENT

CIO Prism

CIO Prism unites forward-thinking technology leaders to exchange transformative insights, shape digital strategies, and foster innovation, empowering enterprises to excel in an era of rapid technological change.

📅 Fri, 18 Sep 2026
📍 Goa

“At Red Hat, we want to invest in India by helping the country evolve from being primarily a consumer of AI technologies to becoming a contributor and influencer in AI innovation. That is where our engineering focus lies,” he said.

Caldeira also addressed the practical challenges facing enterprise technology leaders: how to select AI models when open alternatives catch up with proprietary systems within months, why centralised data lakes create compliance problems for agentic AI, and what India’s government-infrastructure, private-sector-applications model means for the country’s AI trajectory.

Edited excerpts:

AI models now evolve faster than traditional enterprise technology cycles. How should technology leaders respond?

Over the past 18 months, we have seen a significant shift. In the early days of ChatGPT, improvements came from training models with increasingly larger datasets. That is no longer the primary driver.

Open models have matured considerably. Studies increasingly show they typically catch up with proprietary models within three to six months. Models such as Qwen and Mistral are good examples.

Innovation is now driven by advances in model architecture and specialisation. Techniques such as Mixture of Experts (a design where multiple smaller specialist models work together, with a routing system deciding which expert handles each query — this makes models faster and cheaper to run) are becoming much more important. The industry is focusing on three areas: model affordability, serving costs and domain expertise.

For enterprises, the challenge is no longer simply selecting a model. It is about identifying the right use case, choosing a trusted model, enriching it with enterprise data, and adapting it to specific requirements. The model is only one component. To deliver real business value, enterprises must build an entire AI system around it.

With so many models available, what should CIOs prioritise when selecting one?

Enterprises need to consider two dimensions. The first, and perhaps the biggest barrier to AI adoption, is safety. Whether in financial services, healthcare or defence, every organisation needs its own risk management framework for AI.

Organisations must define which use cases they are comfortable pursuing and how they will assess AI readiness for those specific applications. These decisions require clear policies, governance and organisational understanding of how AI should be deployed.

From the IT perspective, the underlying architecture and information retrieval strategy matter as much as model selection. Should enterprise data fine-tune the model, or remain separate and be accessed through retrieval? These application design decisions significantly impact the overall solution.

This is similar to where the industry was seven or eight years ago when adopting microservices (an architecture where applications are built as independent, loosely coupled services rather than one large program). The challenge was not understanding the technology but learning how to use it to solve business problems. AI is following exactly the same path.

Enterprises can choose from open-source models like Mistral and DeepSeek or proprietary ones like GPT-4 and Claude. Why should organisations consider open-source?

We should not look at this as a binary choice. Model openness exists on a continuum.

At the Linux Foundation, we developed the Model Openness Framework, which evaluates sixteen aspects of a model: training data, training code, fine-tuning code, evaluation process and documentation. Rather than asking whether a model is open or closed, we assess each artefact individually.

Take training data. It is very rare for a model to use entirely open or entirely closed data. OpenAI does not disclose the data used to train its models. Most commercial AI developers follow a mixed approach. The same applies to models themselves — they exist at different levels of openness.

IBM’s Granite models are highly open. Qwen and Mistral also provide relatively high openness. The question should not be whether a model is open-source or proprietary. It should be whether you understand how open your chosen model is and whether that aligns with your risk appetite.

Can you give a practical example of how this assessment works?

If I am a bank building a customer support chatbot, I first assess the risk level. If the chatbot answers questions about digital services or documentation, it is not high-risk. If it provides regulatory guidance or investment advice, the risk changes significantly.

The model you choose should be driven by the specific use case and its risk profile, not by whether it is labelled open-source or proprietary.

As AI moves to the edge, enterprises deal with data across data centres and IoT devices. What security considerations should they not overlook when building AI-ready data platforms?

About two years ago, I spoke about what I called the reversal of the data gravity discussion. Many organisations believed the answer was moving all data into the cloud. That approach is not only impractical but can introduce significant security risks.

When you centralise data into a single data lake, you often lose the compliance controls that exist within individual systems. A better approach is keeping data distributed while applying policies at the layer that consumes it.

Modern data platforms do not depend on moving all data to one location. They create a unified governance and querying layer across data wherever it resides. On top of that, organisations build what we call a data product — a business view of information accessing multiple sources in real time while preserving each source’s compliance policies.

Which technologies support this architecture?

Several open-source technologies now support it. Apache Iceberg (an open table format for large datasets that lets organisations query data without moving it, while maintaining version history and schema changes) provides large-scale indexing. Rather than centralising data, it allows organisations to keep data within existing governance boundaries while creating unified discovery and access.

The next layer is a data compliance engine. Every request is evaluated against enterprise policies before access is granted. The consumer could be a business intelligence user, an application, or an AI agent.

That last point is important. AI agents will have their own identities and will only access data required for specific, approved use cases. Distributed architecture suits agentic AI far better than centralised data lakes. It becomes extremely difficult to associate the identity of a distributed process with a centralised data environment.

India is one of the fastest-growing AI markets. Where do you see its greatest opportunity?

I have spoken about what I call India’s middle-path model. In the United States, AI innovation is driven by private investment and venture capital. In China, it is government-driven. India has adopted a different approach.

The government builds foundational infrastructure — affordable GPU compute, initiatives like BharatGen and Sarvam AI. The private sector then builds practical use cases on that foundation. This has enabled India to build strong engineering capabilities around AI adoption and implementation.

By the numbers

21.9 million: Open-source developers in India
3-6 months: Time for open models to match proprietary ones
16: Aspects evaluated in Model Openness Framework

Where India still has opportunity is contributing more to foundational technologies. India has become the world’s second-largest community of open-source contributors, with 21.9 million developers. That is remarkable.

However, India still contributes relatively little to low-level engineering innovation in projects such as PyTorch (the open-source machine learning framework developed by Meta that has become the standard for AI research), vLLM and other foundational AI technologies. These are the engines that power modern AI.

What is Red Hat doing to address this gap?

At Red Hat, we want to invest in India by helping the country evolve from being primarily a consumer of AI technologies to becoming a contributor and influencer in AI innovation. That is where our engineering focus lies.

Your Questions, Answered

How quickly do open-source AI models catch up with proprietary ones?

According to Red Hat’s APAC CTO, studies show open models typically catch up with proprietary models within three to six months. Models like Qwen and Mistral demonstrate this trend.

What is the Linux Foundation’s Model Openness Framework?

It evaluates sixteen aspects of AI models including training data, training code, fine-tuning code, evaluation process and documentation to assess how transparent a model actually is, rather than simply labelling it open or closed.

Why are centralised data lakes problematic for AI?

Centralising data into a single lake often removes compliance controls that exist within individual systems and makes it difficult to associate AI agent identities with data access permissions in agentic AI scenarios.

What is India’s middle-path model for AI?

Unlike the US (private investment-driven) or China (government-driven), India’s government builds foundational infrastructure like GPU compute while the private sector builds applications on top, combining both approaches.

Source link

Welcome to Myvipani.com

Welcome to Myvipani.com

India must shift from AI consumer to contributor

Like this:

Related

Leave a Reply Cancel reply

Blog

India must shift from AI consumer to contributor

Share this:

Like this:

Related

Leave a Reply Cancel reply