Vocal Feature Extraction in Environments with Extraneous Sounds
Mentor:Kenneth Faller, Assistant Professor of Computer Engineering, California State University Fullerton
Researchers at the National Center for Border Security and Immigration at the University of Arizona have created a system to interview travelers at ports-of-entry. The purpose of this system is to assist border agents with identifying travelers attempting to enter the U.S. illegally or import contraband. One way the system does this is by extracting vocal features that can be used to detect deception. However, ports-of-entry are “noisy” environments and contain significant extraneous sounds (e.g., chatter, laughing, sliding chairs, etc.). These sounds can corrupt the desired vocal features required for deception detection and can lead to erroneous classification of deceptive speech. Reducing the influence of these sounds is a non-trivial task and usually requires expensive microphone arrays to sufficiently suppress the background sources. The work discussed here describes a method of extracting vocal features from vocal tracts acquired in noisy environments using a low-cost two-microphone array and an open-source speech analysis package called Praat. Removal of the stationary noise can be achieved using noise removal techniques. However, these techniques are generally ineffective for non-stationary sources. As a result, a segmentation technique was developed which attempts to isolate the vocal responses in the vocal tract. To do this, the vocal feature information from the vocal tract is used to distinguish between background sounds and the vocal response(s) of interest. Additionally, signal processing techniques were applied to further refine the resulting segmentation. Initial analysis indicates that the method is capable of isolating non-overlapping vocal responses (i.e., the vocal response and background sound do not occur simultaneously). Future work will include the investigation of methods of vocal feature extraction that work with overlapping vocal responses and background sounds.