Summary:
UCLA researchers in the Department of Electrical and Computer Engineering have developed a technology for high-accuracy, real-time lip-reading that can be integrated into wearable smart devices.
Background:
Silent speech is a promising interaction modality for information flow between human users and ambient intelligence. A current hole exists in the wearable smart device space that could potentially be filled by accurate, real-time silent speech recognition. Human speech is a process that involves intertwined movements of the lips tongue, and other facial muscles. Conventional visual speech recognition approaches use visible light, or RGB cameras to “read” these facial movements, however these techniques are susceptible to environmental noise from lighting conditions, skin tone and background differences. Integrating silent speech recognition with wearable technology presents extra challenges as well; device size constraints, unpredictable and variable sensor position, and near constant, extended movement make facial analysis difficult. Depth sensing has proven a precise and compact solution for gesture sensing, like that required for sign language transcription or extended reality projection. An accurate speech detection technology is still needed, and it could greatly benefit from advancements in depth sensing modalities.
Innovation:
UCLA researchers have developed a novel silent speech-recognizing smartwatch modality that utilizes depth sensing as a robust facial analysis tool. This technology enables users to both give commands without speaking and to transcribe voice using commercially available wearable technology more accurately. The depth sensing employed in this work adds an additional measured dimension to speech recognition, showing improved accuracy when compared to standard RGB video modalities due to its ability to account for environmental noise. The viseme recognition algorithm developed by the inventors maps 3D visemes to corresponding phonemes, decreasing the computing power necessitated by traditional translation algorithms. This technology is optimized to be operated in a smartwatch, increasing its accessibility and utility in the daily lives of users.
Demonstration Video:
Watch Your Mouth: Silent Speech Recognition with Depth Sensing (Paper Submission Video)
Potential Applications:
• Speech therapy
• Hearing impairments
• Multi-modal human-computer interaction
• Virtual and augmented reality
• Biometric verification
• Linguistic research
• Virtual assistants
• Security measures
Advantages:
• Smartwatch enabled
• Immune from environmental noise
• Improved speech recognition
• Alleviates privacy concerns
State of Development:
The inventors have developed software for visual speech detection and conducted in-lab and in-the-wild user studies demonstrating efficacy.
Related Papers:
• Xiaoying Yang, Xue Wang, Gaofeng Dong, Zihan Yan, Mani Srivastava, Eiji Hayashi, and Yang Zhang. 2023. Headar: Sensing Head Gestures for Confirmation Dialogs on Smartwatches with Wearable Millimeter-Wave Radar. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 3, Article 138 (September 2023), 27 pages. https://doi.org/10.1145/3610900
• Zhang, Y. and Harrison, C., 2015, November. Tomo: Wearable, low-cost electrical impedance tomography for hand gesture recognition. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (pp. 167-173).
Reference:
UCLA Case No. 2023-266
Lead Inventor:
Yang Zhang, UCLA Professor of Electrical and Computer Engineering