Microsoft Teams now uses AI to improve echo, interruptions, and acoustics - The Verge
Microsoft is rolling out new audio quality improvements to Teams. The software maker is using AI to improve echo, interruptions, and room acoustics.
Microsoft has spent the past two years adding flashy new productivity features to Teams, and now the company is overhauling how the fundamentals work thanks to AI. We’ve all been on a call where someone has poor room acoustics making it hard to hear them, or seen two people try to talk at the same time creating an awkward “no, you go ahead” moment. Microsoft’s new AI-powered voice quality improvements should improve or even eliminate these day-to-day annoyances.
Microsoft is now using a machine learning models to improve room acoustics so you’ll no longer sound like you’re hiding in a cave. “While we have been trying our best with digital signal processing to do a really good job in Teams, we have now started using machine learning for the first time to build echo cancellation where you can truly reduce echo from all the different devices,” explains Robert Aichner, a principal program manager for intelligent conversation and communications cloud at Microsoft, in an interview with The Verge.
Microsoft has been testing this for months, measuring its models in the real world to ensure Teams users are noticing the echo reduction and improvements in call quality. The software maker used 30,000 hours of speech to help train its models, and captured thousands of devices through crowd sourcing where Teams users are paid to record their voice and playback audio from their device.
“We also simulate about 100,000 different rooms... the room acoustics play a big role in echo cancellation,” says Aichner. The result is big improvements in call audio quality, and an elimination of echo that also allows multiple people to speak at the same time. You can see all of the improvements in action in the video above.
If Teams detects sound is bouncing or reverberating in a room resulting in shallow audio, the model will also convert captured audio and process it to make it sound like Teams participants are speaking into a close-range microphone instead of an echoey mess.