Youtube icongithub icontwitter iconreddit icondiscord iconinstagram iconfacebook iconthreads icon

blog / ai

Microsoft's New Homegrown AI Models

Microsoft has unveiled its first in-house AI models: MAI-Voice-1 and MAI-1-preview. MAI-Voice-1 is a natural speech generation model capable of creating human-like audio in seconds, delivering high efficiency by generating a full minute of speech in under a second with just one GPU. On the other hand, MAI-1-preview is a text-based model trained on 15,000 Nvidia H100 GPUs, designed to offer helpful, instruction-following responses for everyday queries.

Microsoft's New Homegrown AI Models

Microsoft’s AI division has launched its first new in-house AI models: MAI-Voice-1 and MAI-1-preview. These new models mark a significant milestone in the company’s journey towards AI race and solutions that cater to both consumers and businesses alike. Let’s take a technical look at these models.

MAI-Voice-1 is Microsoft’s first natural speech generation model, which can convert any context into speech in seconds. It is designed to deliver distortion-free expressive audio that can mimic human-like speech in a variety of contexts.

MAI-Voice-1 has proven to be an incredibly efficient model. With the ability to generate an entire minute of audio in under a second using just a single GPU, it’s one of the fastest speech models currently available. This efficiency enables Microsoft to power features like Copilot Daily and Copilot Podcasts.

MAI-1-preview is a text-based model, trained end-to-end on around 15,000 Nvidia H100 GPUs, marking Microsoft’s first venture into developing a foundational model in-house. It’s built to excel at providing helpful responses to everyday queries, all while following detailed instructions.