Key Points
- Hush model is 8 MB, needs no GPU and processes audio in under 1 millisecond
- Model trained on 10,000 hours of data and works across all spoken languages
- Ranked fifth on Hugging Face Audio-to-Audio leaderboard at launch
Noida-based weya AI on Thursday released Hush, an open-source tool designed to filter out background noise and competing voices from audio streams used by voice-based artificial intelligence systems. The model, which the company claims can process audio in under one millisecond, is aimed at improving the reliability of AI-powered phone agents and call centre systems.
Voice AI refers to systems that can understand and respond to human speech, such as automated customer service lines, voice assistants and real-time transcription tools. These systems often fail not because the underlying language processing is flawed but because the audio input itself is poor, with background chatter, traffic noise or multiple speakers confusing the system.
Hush attempts to solve this by isolating the primary speaker’s voice while suppressing everything else, including secondary speakers, background hum and ambient noise, in real time.
How the Hush works
At 8 MB in size, Hush is designed to run on standard computer processors without requiring a graphics processing unit, the specialised hardware typically needed for AI workloads. The company said the model can process a 10-millisecond audio frame in under one millisecond, making it suitable for real-time applications where delays would disrupt conversation flow.
The model has been trained on more than 10,000 hours of mixed audio data. According to weya AI, 60 per cent of this training data included competing human voices at signal-to-interference ratios of 12 to 24 decibels. In simpler terms, the model learned to pick out a primary speaker even when other voices were present at significant volume levels.
Hush is built on DeepFilterNet3, an existing open-source architecture for audio enhancement, with an additional component called an Auxiliary Separation Head that helps distinguish between multiple speakers. The model is language-agnostic, meaning it works across all spoken languages without requiring separate training for each one, according to the company.
Performance and availability
At launch, Hush ranked fifth on Hugging Face’s Audio-to-Audio leaderboard, a benchmark that compares open-source models for audio processing tasks. Hugging Face is a platform widely used by AI researchers and developers to share and evaluate machine learning models.
The model contains 1.8 million parameters, the adjustable values that determine how an AI system processes information. For comparison, large language models used for text generation typically contain billions of parameters, making Hush significantly lighter and easier to deploy.
Advertisement
weya AI has released Hush as open-source software, so developers and organisations can use, modify and distribute it freely. The company said it built the tool because audio quality issues were a primary reason voice AI systems failed in production environments.
“We built this because we kept seeing high-quality language models fail in the field, not because of the model, but because of the audio it was receiving. This is the first of several models we are developing internally, all oriented toward a single vision: giving enterprises — banks, financial institutions, and regulated industries the ability to deploy world-class AI entirely on-premises, with full control over their data and infrastructure,” weya AI CTO Atul Singh said.
The model is now available for download through standard open-source channels. Developers working on voice AI applications can integrate Hush into their existing systems.
Weya AI describes itself as a company focused on the banking, financial services, and insurance sector, building AI agents for customer onboarding, sales, and collections, and it claims Kotak Mahindra Bank among its clients.
Your Questions, Answered
What is weya AI’s Hush model?
Hush is an open-source speech enhancement tool that filters background noise, competing voices and ambient sounds from audio streams used by voice AI systems such as call centre bots and phone agents.
Does Hush require specialised hardware to run?
No. The model is 8 MB in size and runs on standard processors without requiring a GPU. It can process a 10-millisecond audio frame in under one millisecond.
What languages does Hush support?
Hush is language-agnostic, meaning it works across all spoken languages without requiring separate training data or configuration for each language.
Is Hush free to use?
Yes. weya AI has released Hush as open-source software, allowing developers and organisations to use, modify and distribute it freely.
