Google pixel recorder
Google describes how its artificial intelligence (AI) labels speakers in Pixel Recorder.
In the near future, the feature is expected to improve even further.
Speaker Labels have recently been introduced by Google to the incredibly useful Pixel Recorder app. The tool automatically identifies various speakers in a recording and gives each one a distinct label in the transcript. After that, users can give such labels speaker names. It seems so easy. But Recorder put a lot of time and effort into its in-app solution for tagging speakers.
In a blog post, Google explains that Turn-to-Diarize, its new speaker diarization algorithm, is what powers Speaker Labels. It makes use of a number of highly efficient machine learning models and techniques to enable real-time audio logging of hours of audio while utilising the limited processing power of Pixel phones.
An encoder model that isolates speech traits from each speaker allows the system to identify changes in speaker. After that, each speaker is annotated with speaker labels using a multi-stage clustering approach.According to Google, audio recordings made with the Recorder app can last anywhere from a few seconds to up to 18 hours. The model gains confidence in guessing speaker labels as it consumes more audio. Additionally, it occasionally modifies labels for speakers with low confidence that had been predicted. Throughout the recording, the speaker labels on the screen are automatically updated by the Recorder app to reflect the most recent and precise predictions.
That your phone can accomplish all of that seems almost miraculous, don't you think?
The Speaker Labels function will use less power in the future, according to Google, because of modifications it is making. The system is now running on Google Tensor chips' CPU block. The business is currently focusing on increasing the amount of computing work that goes on the TPU block, which will help the diarization system use less power.
Modification and Personalization
As the model consumes additional audio data in our real-time streaming speaker diarization system, it gains confidence in its projected speaker labels and may occasionally correct previously predicted speaker labels with low confidence. During recording, the speaker labels on the screen are automatically updated by the Recorder app to reflect the most recent and precise predictions.
The user interface (UI) of the Recorder app allows the user to change the anonymous speaker labels (such as "Speaker 2") to personalised labels for easier reading and memorization inside each recording (for example, "car dealer").
Future Work
Currently, Google Tensor, a custom-built semiconductor that powers more current Pixel phones, serves as the primary CPU for our diarization system. We are striving to further minimise the overall power consumption of the diary system by shifting more computations to the TPU block. Utilizing the multilingual capabilities of speaker encoders and voice recognition models will be another path for future study in order to add more languages to this feature.
News reference a blog post
Click Next Article (4 Left to get CODE)
Post a Comment