• About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us
AimactGrow
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing
No Result
View All Result
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing
No Result
View All Result
AimactGrow
No Result
View All Result

Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Help

Admin by Admin
February 15, 2026
Home AI
Share on FacebookShare on Twitter






The panorama of generative audio is shifting towards effectivity. A brand new open-source contender, Kani-TTS-2, has been launched by the staff at nineninesix.ai. This mannequin marks a departure from heavy, compute-expensive TTS methods. As an alternative, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.

Kani-TTS-2 presents a lean, high-performance various to closed-source APIs. It’s at the moment accessible on Hugging Face in each English (EN) and Portuguese (PT) variations.

The Structure: LFM2 and NanoCodec

Kani-TTS-2 follows the ‘Audio-as-Language‘ philosophy. The mannequin doesn’t use conventional mel-spectrogram pipelines. As an alternative, it converts uncooked audio into discrete tokens utilizing a neural codec.

The system depends on a two-stage course of:

  1. The Language Spine: The mannequin is constructed on LiquidAI’s LFM2 (350M) structure. This spine generates ‘audio intent’ by predicting the subsequent audio tokens. As a result of LFM (Liquid Basis Fashions) are designed for effectivity, they supply a quicker various to straightforward transformers.
  2. The Neural Codec: It makes use of the NVIDIA NanoCodec to show these tokens into 22kHz waveforms.

Through the use of this structure, the mannequin captures human-like prosody—the rhythm and intonation of speech—with out the ‘robotic’ artifacts present in older TTS methods.

Effectivity: 10,000 Hours in 6 Hours

The coaching metrics for Kani-TTS-2 are a masterclass in optimization. The English mannequin was skilled on 10,000 hours of high-quality speech information.

Whereas that scale is spectacular, the velocity of coaching is the true story. The analysis staff skilled the mannequin in solely 6 hours utilizing a cluster of 8 NVIDIA H100 GPUs. This proves that large datasets not require weeks of compute time when paired with environment friendly architectures like LFM2.

Zero-Shot Voice Cloning and Efficiency

The standout characteristic for builders is zero-shot voice cloning. In contrast to conventional fashions that require fine-tuning for brand spanking new voices, Kani-TTS-2 makes use of speaker embeddings.

  • The way it works: You present a brief reference audio clip.
  • The outcome: The mannequin extracts the distinctive traits of that voice and applies them to the generated textual content immediately.

From a deployment perspective, the mannequin is very accessible:

  • Parameter Depend: 400M (0.4B) parameters.
  • Velocity: It includes a Actual-Time Issue (RTF) of 0.2. This implies it might generate 10 seconds of speech in roughly 2 seconds.
  • {Hardware}: It requires solely 3GB of VRAM, making it appropriate with consumer-grade GPUs just like the RTX 3060 or 4050.
  • License: Launched beneath the Apache 2.0 license, permitting for business use.

Key Takeaways

  • Environment friendly Structure: The mannequin makes use of a 400M parameter spine primarily based on LiquidAI’s LFM2 (350M). This ‘Audio-as-Language’ method treats speech as discrete tokens, permitting for quicker processing and extra human-like intonation in comparison with conventional architectures.
  • Fast Coaching at Scale: Kani-TTS-2-EN was skilled on 10,000 hours of high-quality speech information in simply 6 hours utilizing 8 NVIDIA H100 GPUs.
  • Immediate Zero-Shot Cloning: There isn’t any want for fine-tuning to copy a selected voice. By offering a brief reference audio clip, the mannequin makes use of speaker embeddings to immediately synthesize textual content within the goal speaker’s voice.
  • Excessive Efficiency on Edge {Hardware}: With a Actual-Time Issue (RTF) of 0.2, the mannequin can generate 10 seconds of audio in roughly 2 seconds. It requires solely 3GB of VRAM, making it totally useful on consumer-grade GPUs just like the RTX 3060.
  • Developer-Pleasant Licensing: Launched beneath the Apache 2.0 license, Kani-TTS-2 is prepared for business integration. It presents a local-first, low-latency various to costly closed-source TTS APIs.

Try the Mannequin Weight. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.


Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at remodeling advanced datasets into actionable insights.






Earlier articleGetting Began with OpenClaw and Connecting It with WhatsApp


Tags: 3GB400MCloningKaniTTS2MeetmodelOpenParamrunsSourceSupportTexttoSpeechVoiceVRAM
Admin

Admin

Next Post
Google Ties Suspected Russian Actor to CANFAIL Malware Assaults on Ukrainian Orgs

Google Ties Suspected Russian Actor to CANFAIL Malware Assaults on Ukrainian Orgs

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended.

The Finest Upcoming Co-Op Video games To Look Ahead To In 2026

The Finest Upcoming Co-Op Video games To Look Ahead To In 2026

January 19, 2026
Instruments and the lengthy tail

Room temperature | Seth’s Weblog

January 11, 2026

Trending.

The right way to Defeat Imagawa Tomeji

The right way to Defeat Imagawa Tomeji

September 28, 2025
Satellite tv for pc Navigation Methods Going through Rising Jamming and Spoofing Assaults

Satellite tv for pc Navigation Methods Going through Rising Jamming and Spoofing Assaults

March 26, 2025
Introducing Sophos Endpoint for Legacy Platforms – Sophos Information

Introducing Sophos Endpoint for Legacy Platforms – Sophos Information

August 28, 2025
How Voice-Enabled NSFW AI Video Turbines Are Altering Roleplay Endlessly

How Voice-Enabled NSFW AI Video Turbines Are Altering Roleplay Endlessly

June 10, 2025
Learn how to Set Up the New Google Auth in a React and Specific App — SitePoint

Learn how to Set Up the New Google Auth in a React and Specific App — SitePoint

June 2, 2025

AimactGrow

Welcome to AimactGrow, your ultimate source for all things technology! Our mission is to provide insightful, up-to-date content on the latest advancements in technology, coding, gaming, digital marketing, SEO, cybersecurity, and artificial intelligence (AI).

Categories

  • AI
  • Coding
  • Cybersecurity
  • Digital marketing
  • Gaming
  • SEO
  • Technology

Recent News

Google Ties Suspected Russian Actor to CANFAIL Malware Assaults on Ukrainian Orgs

Google Ties Suspected Russian Actor to CANFAIL Malware Assaults on Ukrainian Orgs

February 15, 2026
Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Help

Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Help

February 15, 2026
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us

© 2025 https://blog.aimactgrow.com/ - All Rights Reserved

No Result
View All Result
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing

© 2025 https://blog.aimactgrow.com/ - All Rights Reserved