
Having a voice capture system tuned and tracking a single person speaking while many other conversations are taking place in the same space, sometimes with even higher intensity, is a known issue that the audio industry has tried to solve in many different ways. At the front-end, some companies focused on creating directional microphones and cancellation for unwanted sources, while others explored sound recognition and user ID techniques, even before machine learning solutions were available. More recently, with the availability of AI, multiple companies are using source identification and separation to be able to track and process multiple concurrent sounds captured at the same time.
Attention Labs’ first major achievement was to find a processing solution for the persistent ‘howling’ interference problem in 2022, helping 40 different videoconferencing platforms to achieve the quality required for natural hybrid meetings.
The Toronto-based company is now exploring voice AI and voice personal assistant applications where its SAA layer of artificial intelligence would add the ability to actually "understand" human conversations. As the company reports, SAA will work on different media devices such as headsets, TVs, smart glasses, tablets, and robots.
"If a room is full of multiple voices, humans can pick out one of them, even though all the sounds have equal volume. Until the advent of SAA, machines lacked this ability, so it was nearly impossible to single out the important voice in a conversation," explains David J. Kim, CEO and co-founder of Attention Labs. "Our selective auditory attention engine runs locally on devices, delivering crystal‑clear audio with millisecond latency and zero cloud dependency that averages 97% accuracy. Voices keep their snappy interaction timing, even with crosstalk."
"SAA mimics a psycho-acoustic phenomenon known as the cocktail party effect, based on our human brain, which can tune into a specific voice in a noisy environment. If a room is full of multiple voices, humans can pick out one of them, even though all the sounds have equal volume," he adds.

SAA features a real‑time auditory perception layer that allows voice recognition systems to respond more quickly and reliably. "Voice AI mishears in group conversations. This is the challenge we set out to address. The result was SAA’s real‑time attention‑driven conversation intelligence. Our on-device engine identifies and routes relevant voices. AI identifies and finesses high‑fidelity audio‑visual data, capturing the spatial and conversational nuances of group conversations because scenes constantly shift with acoustics and behavior," Kim states.
"Our engine models a selective attention engine in real time across devices, enabling dynamic, inclusive interactions. The technology supports 2 to 8 microphones with flexible setups for built‑in mics, headsets, and peripherals."
The on‑device audio engine runs entirely on existing hardware, isolating relevant voices with sub‑100 ms latency and an ultra‑low power draw. Its sensitive audio processing stays on‑device so raw voices and soundscapes never leave the hardware. Speech is locally separated, data is kept private, response times are reduced, and accuracy is maintained as the AI can recognize and enhance the sound among diverse accents or overlapping talk. Applying AI to the sound helps understand context.
"SAA instantly finds, focuses, and separates the voices that matter directly on your device," adds Kim. "Plug it into any LLM and get real‑time context‑aware responses."
Attention Labs is working with leading developers and Tier‑1 OEMs and platform teams to bring the solution to the market.
www.attentionlabs.ai