Sarvam launches Indus AI chat app in India with a 105B model
Sarvam has launched Indus, a beta AI chat app for iOS, Android, and the web, backed by its new 105-billion-parameter model. Access is gated through a waitlist, and for now it’s limited to India while the company adds more compute. That makes this lau...
Sarvam’s Indus app puts a 105B Indian model in users’ hands, and that matters beyond the chatbot race
Sarvam has launched Indus, a beta AI chat app for iOS, Android, and the web, backed by its new 105-billion-parameter model. Access is gated through a waitlist, and for now it’s limited to India while the company adds more compute.
That makes this launch worth paying attention to. Sarvam is trying to prove that an Indian model stack can hold up in a market where the global leaders still have clear edges in model quality, tooling, and infrastructure.
Its pitch is also pretty specific. Sarvam is building for India first, with local-language support, code-mixed inputs, voice input and output, and a product plan that extends into enterprise systems, feature phones, and cars. That’s a serious strategy if it works.
Why this launch matters
India is already one of the biggest AI markets in the world. TechCrunch reported that ChatGPT has around 100 million weekly active users in India. Anthropic is seeing usage there too. Sarvam is entering a market where users already know what a strong assistant feels like.
So the standard is high from day one.
A domestic model has to handle the way people actually write and speak: Hinglish, transliterated Hindi, Tamil mixed with English product names, Bengali in native script, voice prompts from noisy environments, and all the rest of the linguistic messiness that trips up English-first systems. If Sarvam gets that right, it has room to compete in places where US frontier models still need work.
That’s where the 105B figure matters, though only to a point. Fit matters more than raw size.
The technical bet
Sarvam hasn’t published a full architecture paper for the 105B model, so some details are still inferred rather than confirmed. But the basic shape is fairly obvious.
The core model is almost certainly a decoder-only Transformer, still the default design for chat LLMs. To work well across Indian languages, it also needs a tokenizer that doesn’t butcher Indic scripts or blow up token counts on transliterated text. That sounds like plumbing. It has a direct effect on product quality.
Tokenizer quality is one of the least glamorous parts of multilingual AI, and one of the most important. Bad tokenization on Devanagari, Tamil, Telugu, or Bengali can split words badly, which hurts efficiency and output quality. If users are mixing scripts and Romanized text in the same prompt, tokenization stops being an academic detail.
Indus also supports voice input and audio replies, which likely means the app is stitching together multiple systems:
- ASR to convert speech to text
- LLM inference on the text prompt
- TTS to return spoken output
That’s the practical way to ship this today. A unified multimodal model is possible, but separate components are easier to tune, replace, and scale.
A 105B model is expensive to run
A 105B-parameter model is expensive to serve, full stop.
At fp16, you’re looking at roughly 210 GB of VRAM just for model weights. At int8, around 105 GB. At int4, roughly 52 GB, before overhead such as the KV cache, batching, and runtime infrastructure. In production, that usually means multiple GPUs, tensor parallelism, and a lot of serving work.
So when Sarvam uses a waitlist and talks about capacity, that tracks. Serving a model this size to consumer traffic is hard.
It also helps explain the rough edges. TechCrunch noted that:
- response times can be slow
- reasoning mode can’t be disabled
- users can’t clear chat history without deleting the account
That last one is bad. A consumer AI app without basic chat deletion controls looks sloppy. For enterprise buyers, it immediately raises data-governance questions.
Reasoning mode has a cost
One of the more telling details is the app’s non-optional reasoning mode. That suggests Sarvam is using some form of multi-step inference, deliberate decoding, or chain-of-thought-style internal exploration before generating the final answer.
That may help quality. It also burns tokens and compute.
So the latency complaints may not be only about insufficient GPU capacity. Some of the slowness may be structural. If the system depends on inference-time reasoning to get stronger answers, the cost per query goes up fast, especially under load.
This is where the broader model lineup matters. Sarvam has also announced a 30B model. For many real deployments, that may be the more interesting one.
A 30B model, quantized and tuned well for multilingual use, could land at a better cost-performance point for:
- enterprise assistants
- document QA
- customer support
- workflow copilots
- speech-first products with tight latency budgets
The 105B model still has a place for harder reasoning and cross-lingual synthesis. But production fit is usually decided by serving economics.
Sarvam is building a stack
This is the part that matters most for product teams and enterprise buyers.
Sarvam isn’t treating Indus as only a consumer app. It has already pointed to deals with HMD for AI on Nokia feature phones and Bosch for automotive AI. Those are very different environments from a chatbot, but they fit the same pattern.
The company appears to be aiming for a full stack:
- large cloud-hosted models for top-end tasks
- smaller distilled models for constrained devices
- speech systems that work across low-end hardware
- enterprise integrations where local compliance and language support matter
That’s a harder business than shipping a chat app. It’s also a sturdier one if the execution holds.
Feature phone support will force real discipline. A 105B model is obviously not running locally there. Sarvam will need either efficient server-side paths with low enough latency for weak networks, or smaller distilled models for offline and semi-offline tasks. Probably both.
That’s where plenty of AI companies find out whether they have real systems depth or a polished demo.
Developers still need the missing pieces
Indus is useful as a signal. The developer story depends on what Sarvam exposes beyond the app.
A serious platform needs:
- API access
- structured outputs
- tool or function calling
- retrieval hooks for
RAG - context window specs
- rate limits and pricing that are workable
One of the biggest open questions is context length. If the model tops out below 32k tokens, that limits its usefulness for long-document analysis, enterprise search, and retrieval-heavy workflows. If Sarvam wants to sell into BFSI, healthcare, or government use cases, that detail will matter quickly.
Evaluation matters too. Any team considering these models should test beyond standard English benchmarks. A useful India-focused eval suite needs:
- native script inputs
- transliterated text
- code-mixed prompts
- named entities across languages
- speech samples from varied accents and noisy environments
If your indexing pipeline can’t match a Romanized user query to a source document in native script, retrieval quality falls apart before the model even answers. That’s a common failure mode, and a bigger model doesn’t fix it.
The sovereignty case is real, with limits
There’s a straightforward argument for domestic AI infrastructure in India: local hosting options, better support for Indian languages, and easier alignment with the Digital Personal Data Protection Act, 2023 and sector-specific requirements.
That matters for regulated buyers. A bank or hospital may care less about squeezing out the last benchmark points and more about auditability, data handling, latency, and vendor responsiveness.
But local positioning doesn’t excuse weak product decisions. Privacy controls matter. So do transparent retention policies and admin-grade deployment options. Sarvam still has work to do there.
And the global players will keep improving multilingual performance if the market demands it. Sarvam’s opening comes from focus and distribution. It’s not a permanent moat.
What to watch next
The launch matters, but the next steps matter more.
Watch for:
- published model specs, especially context length
- API and enterprise access details
- actual quality on Indic languages under real usage
- whether the 30B model becomes the practical workhorse
- clearer privacy controls and data lifecycle options
- evidence that the HMD and Bosch partnerships turn into deployed systems
If Sarvam can deliver strong multilingual performance with sane serving economics, it has a credible place in India’s AI stack. If the 105B model ends up as a prestige layer sitting above a laggy app, that edge shrinks quickly.
The market already has plenty of chatbots. What it needs are models that can handle Indian language reality without collapsing on cost, latency, or governance. Indus is an early test of that.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Turn data into forecasting, experimentation, dashboards, and decision support.
How a growth analytics platform reduced decision lag across teams.
Clem Delangue, the CEO of Hugging Face, said this week that we’re in an LLM bubble, not an AI bubble, and that he expects it to start deflating next year. The distinction matters. If he’s right, the damage won’t spread evenly across AI. It’ll hit the...
Revelo says demand from U.S. companies is rising, and AI is driving a lot of it. The company, which connects U.S. employers with a network of more than 400,000 vetted developers in Latin America, told TechCrunch that LLM-related roles made up over 20...
Adobe is facing a proposed class-action lawsuit over how it trained SlimLM, its compact language model for on-device document assistance. The complaint, filed on behalf of Oregon author Elizabeth Lyon, says Adobe used pirated copies of books during p...