Phonexia has just launched Deep EmbeddingsTM – the latest generation of its voice biometrics engine for speaker identification and verification. The new technology exclusively uses deep neural networks (DNN) to map voices directly to their unique small and fixed length records called voice-prints.
Deep EmbeddingsTM – available within the Phonexia Speech Platform – is the world’s first commercially available voice biometric engine with this machine learning capability.
Applied DNN technology brings clear benefits
Phonexia Deep EmbeddingsTM uses a discriminative training model to identify the truly unique features in each individual’s voice. As a result of incorporating these new training models in its DNN, the new Deep EmbeddingsTM technology is able to create voiceprints twice as fast, have an accuracy that is 2.4 times greater, and have a memory consumption which is just a quarter of the previous Phonexia voice biometric engine – which was already one of the fastest and most accurate on the market.
“The technical benefits –accuracy, speed, and reduced memory use – from transitioning completely to deep neural networks in our engine have exceeded our expectations,” stated Petr Schwarz, Phonexia CTO. “We are looking forward to our clients seizing these benefits as they implement our technology in their systems.”
Increased accuracy creates value
Deep EmbeddingsTM has had a significant improvement in its accuracy as measured by the Equal Error Rate – the combination of False Accept and False Reject scores. Deep EmbeddingsTM reduced these scores by 2.4 times in comparison to the previous voice biometric engine.
“At the end of the day, higher accuracy saves money — whether this is decreasing the probability of a client having a false rejection during a call center’s phone verification or increasing the accurate identification of fraudsters misusing someone’s identity to take out a bank loan,” said Pavel Matějka, Phonexia CSO.
Free up memory for other processing tasks
In addition to increased speed and accuracy, Deep EmbeddingsTM slashes the memory requirements for marking and processing voice prints, taking just one seventh of the RAM required earlier. This allows for unblocking GPU processing, leading to an additional increase in processing speed.
“Cutting the memory requirement thanks to DNN is a real revolution in speeding up voice identification and verification. In addition to making biometric adoption easier for traditional clients, we expect this new performance to accelerate a much broader adoption of speaker identification into new segments such as 4.0 devices, automotive, smart wearables, IoT devices, and devices with no permanent connection to the Internet,” explained Mr. Schwarz.
A platform designed for integration
The Deep EmbeddingsTM engine is part of the modular Phonexia Speech Platform which provides a wide portfolio of technologies such as speech-to-text, keyword spotting, language identification, gender identification, and age estimation all within a single platform. Designed to be fully scalable according to the individual client needs, this approach enables the platform to be easily integrated into other solutions. Due its high modularity and ease of integration, it is the ideal technology for system integrators needing voice biometric components in their client solutions. Phonexia is now rolling out Deep EmbeddingsTM for speaker identification and verification to its partners.