Publications | Speech Communication Laboratory

Copyright and all rights therein for the documents available in this webpage are maintained by the authors or by other copyright holders. The documents made available here are purely meant for ensuring timely dissemination of scholarly and technical work on a non-commercial basis. It is understood that all persons accessing, storing or copying the information in any of these documents will adhere to the terms and constraints invoked by each copyright holder. These works may not be reposted without the explicit permission of the copyright holder.

Journal Papers

Siriwardena, Y. M., Boyce, S. E., Tiede, M. K., Oren, L., Fletcher, B., Stern, M., & Espy-Wilson, C. Y. (2024). Speaker-independent speech inversion for recovery of velopharyngeal port constriction degree. The Journal of the Acoustical Society of America, 156(2), 1380–1390. (paper)
M. Marge, C. Espy-Wilson, N. Ward, A. Alwan, Y. Artzi, M. Bansal, G. Blankenship, J. Chai, H. Daumé III, D. Dey , M. Harper, T.Howard, C. Kennington I. Kruijff-Korbayová, D. Manocha, C. Matuszek, R. Mead, R.Mooney, R. K. Moore, M. Ostendorf, H. Pon-Barry, A. Rudnicky, M. Scheutz, R. St. Amant, T. Sun, S. Tellex, D. Traum, Z. Yu . (2022) “Spoken Language Interaction for Robotics: Research Issues and Recommendations”, Computer Speech & Language. (paper)
S. Sahu, R. Gupta and C. Espy-Wilson, “Modeling Feature Representations for Affective Speech using Generative Adversarial Networks”, IEEE transactions on Affective Computing. (paper)
Sivaraman G., Mitra V., Nam H., Tiede M., Espy-Wilson C. (2019), “Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion”, The Journal of the Acoustical Society of America. (paper)
Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Mark Tiede, (2017),”Hybrid Convolutional Neural Networks For Articulatory And Acoustic Information Based Speech Recognition“,Speech Communication. (paper)
S. Gordon-Salant, D. Zion and C. Espy-Wilson (2014),”Recognition of time-compressed speech does not predict recognition of natural fast-rate speech by older listeners“,Journal of the Acoustical Society of America, Express Letters, vol. 136, pp. 268-274. (paper)
X. Zhou, J. Zhou, M. Stone, J. Prince and C. Espy-Wilson (2013),”Improve vocal tract reconstruction and modeling using an image super-resolution technique“,Journal of the Acoustical Society of America Express Letters, vol. 133, no. 6, pp. 439-445. (paper)
H. Nam, V. Mitra, M. Tiede, M. Hasegawa-Johnson, C. Espy-Wilson, E. Saltzman, L. Goldstein (2012),”A procedure for estimating gestural scores from speech acoustics“, Journal of the Acoustical Society of America? vol. 132, no. 6, 3980-3989. (paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein (2012),”Recognizing articulatory gestures from speech for robust speech recognitionem>”, Journal of the Acoustical Society of America, vol. 131, no. 3, pp. 2270-2287. (paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein (2011),”Articulatory Information for Noise Robust Speech Recognition“, IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 7, pp. 1913-1924. (paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein,”Retrieving Tract Variables from Acoustics: a comparison of different Machine Learning strategies“, IEEE Journal of Selected Topics on Signal Processing, Sp. Iss. on Statistical Learning Methods for Speech and Language Processing, Vol. 4, Iss. 6, pp. 1027-1045, 2010.(paper)
Xinhui Zhou, Carol Y Espy-Wilson, Mark Tiede, Suzanne Boyce, Christy Holland and Ann Choe, “A magnetic resonance imaging-based articulatory and acoustic study of “retroflex?and “bunched?American English /r/“, J. Acoust. Soc. Am., June 2008, pp. 4466-4481. (paper)
Amit Juneja and Carol Y Espy-Wilson, “A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition“, J. Acoust. Soc. Am., June 2008, pp. 1154-1168. (paper)
Om D Deshmukh, Carol Y Espy-Wilson and Laurel H Carney, “Speech Enhancement Using The Modified Phase Opponency Model“, J. Acoust. Soc. Am., June 2007, pp. 3886-3898. (paper)
Tarun Pruthi, Carol Y Espy-Wilson and Brad H Story, Simulation and analysis of nasalized vowels based on MRI data , J. Acoust. Soc. Am., June 2007, pp. 3858-3873. (paper)
“The Development and Testing of a Phase-Opponent Noise-Reduction Algorithm“, submitted to IEEE Transactions on Speech and Audio Processing.
Om Deshmukh, Carol Y Espy-Wilson, Ariel Salomon and Jawahar Singh, “Use of Temporal Information: Detection of the Periodicity and Aperiodicity Profile of Speech“, IEEE Transactions on Speech and Audio Processing, Vol. 13 (5), pp. 776-786, Sept. 2005. (paper)
Zhaoyan Zhang and Carol Y Espy-Wilson, “A vocal tract model of American English /l/“, J. Acoust. Soc. Am., March 2004, pp. 1274-1280. (paper)
Ariel Salomon, Carol Y Espy-Wilson and Om Deshmukh, “Detection of Speech Landmarks: Use of Temporal Information“, J. Acoust. Soc. Am., March 2004, pp. 1296-1305. (paper)
Tarun Pruthi and Carol Y Espy-Wilson, “Acoustic parameters for automatic detection of nasal manner“, Speech Communication, 43(3), pp 225-239, 2004. (paper)
Michel Tah-Tung Jackson, Carol Espy-Wilson and Suzanne E Boyce, “Verifying a vocal tract model with a closed side-branch“, J. Acoust. Soc. Am., June 2001, pp. 2983-2987. (paper)
Carol Y Espy-Wilson, Suzanne E Boyce, Michel Jackson, Shrikanth Narayanan and Abeer Alwan, “Acoustic Modeling of American English /r/“, J. Acoust. Soc. Am., July 2000, pp. 343-356. (paper)
Frank H Guenther, Carol Y Espy-Wilson, Suzanne E Boyce, Melanie L Matthies, Majid Zandipour and Joesph S Perkell, “Articulatory tradeoffs reduce acoustic variability during American English /r/ production“, J. Acoust. Soc. Am., May 1999. (paper)
Carol Espy-Wilson, Venkatesh Chari, Joel M MacAuslan, Caroline B Huang and Michael J Walsh, “Improvement of Electrolaryngeal Speech by Adaptive Filtering“, Journal of Speech, Language and Hearing Research Vol. 41, 1253-1264. (paper)
Suzanne Boyce and Carol Y Espy-Wilson, “Coarticulatory Stability in American English /r/“, J. Acoust. Soc. Am., June 1997, pp. 3741-3753. (paper)
Venkatesh R Chari and Carol Y Espy-Wilson, “Extraction of Formant Frequencies by Adaptive Enhancement of Fourier Spectra“, IEEE Transactions on Speech and Audio Processing, vol.3, pp. 35-39, Jan.1995. (paper)
Carol Y Espy-Wilson, “A Feature-Based Approach to Speech Recognition“, J. Acoust. Soc. Am., 96 , pp. 65-72, 1994. (paper)
Carol Espy-Wilson, “Acoustic measures for linguistic features distinguishing the semivowels /wjrl/ in American English“, J. Acoust. Soc. Am.,92, pp. 736-751, 1992. (paper)
Carol Y Espy and Jae S Lim, “Effects of Additive Noise on Signal Reconstruction from Fourier Transform Phase“, IEEE Transactions on Acoustics, Speech, and Signal Processing, 31, 1983. (paper)

Conference Papers

Attia, A. A., Demszky, D., Ogunremi, T., Liu, J., & Espy-Wilson, C. (2025). CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2409.14494. Accepted for ICASSP 2025 (paper)
Premananth, G., & Espy-Wilson, C. (2025). Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms. arXiv [Eess.AS]. Retrieved from http://arxiv.org/abs/2409.09733. Accepted for ICASSP 2025 (paper)
Premananth, G., & Espy-Wilson, C. (2025). Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion. arXiv [Eess.AS]. Retrieved from http://arxiv.org/abs/2411.06033. Accepted for ICASSP-SPADE workshop 2025 (paper)
Ojha, S., Gervits, F., & Espy-Wilson, C. (2025). Speaking with Robots in Noisy Environments. Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, 1057–1061. Melbourne, Australia: IEEE Press. (paper)
Attia, A. A., Liu, J., Ai, W., Demszky, D., & Espy-Wilson, C. (2024). Kid-whisper: Towards bridging the performance gap in automatic speech recognition for children vs. adults. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 74–80. (paper)
Benway, N.R., Preston, J.L., Espy-Wilson, C. (2024) Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets. Proc. Interspeech 2024, 5138-5142. (paper)
Premananth, G., Siriwardena, Y.M., Resnik, P., Bansal, S., L.Kelly, D., Espy-Wilson, C. (2024) A Multimodal Framework for the Assessment of the Schizophrenia Spectrum. Proc. Interspeech 2024, 1470-1474. (paper)
Siriwardena, Y. M., Swedlow, N., Howard, A., Gitterman, E., Darcy, D., Espy-Wilson, C., & Fanelli, A. (2024). Accent Conversion with Articulatory Representations. Proc. Interspeech 2024, 4383-4387 (paper)
Attia, A. A., Siriwardena, Y. M., & Espy-Wilson, C. (2024). Improving speech inversion through self-supervised embeddings and enhanced tract variables. 2024 32nd European Signal Processing Conference (EUSIPCO), 306–310. (paper)
Premananth, G., Siriwardena, Y. M., Resnik, P., & Espy-Wilson, C. (2024). A multi-modal approach for identifying schizophrenia using cross-modal attention. 2024 46th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). (paper)
Siriwardena, Y.M., Espy-Wilson, C., Shamma, S. (2023) Learning to Compute the Articulatory Representations of Speech with the MIRRORNET. Proc. INTERSPEECH 2023
Benway, N.R., Siriwardena, Y.M., Preston, J.L., Hitchcock, E., McAllister, T., Espy-Wilson, C. (2023) Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /ɹ/ in Child Speech Sound Disorders. Proc. INTERSPEECH 2023
Siriwardena, Y.M., Espy-Wilson, C., Boyce, S., Tiede, M., Oren, L. (2023) Speaker-independent Speech Inversion for Estimation of Nasalance. Proc. INTERSPEECH 2023
Attia, A.A., Tiede, M., Espy-Wilson, C. (2023) Enhancing Speech Articulation Analysis Using A Geometric Transformation of the X-ray Microbeam Dataset. Proc. INTERSPEECH 2023
Siriwardena, Y. M., Adel Attia, A., Sivaraman, G., and Espy-Wilson, C. “Audio Data Augmentation for Acoustic-to-Articulatory Speech Inversion,” 2023 31st European Signal Processing Conference (EUSIPCO) (paper)
Adel Attia, A., and Espy-Wilson, C., “Masked Autoencoders are Articulatory Learners,” ICASSP 2023 (paper)
Siriwardena, Y. M., and Espy-Wilson, C., “The Secret Source: Incorporating Source Features to Improve Acoustic-To-Articulatory Speech Inversion,” ICASSP 2023 (paper)
Siriwardena, Y.M., Sivaraman, G., Espy-Wilson, C. “Acoustic-to-articulatory Speech Inversion with Multi-task Learning”. Proc. Interspeech 2022 (paper)
Parikh, R., Seneviratne, N., Sivaraman, G., Shamma, S., Espy-Wilson, C. “Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals”. Proc. Interspeech 2022 (paper)
Parikh, Rahil, Gaspar Rochette, Carol Espy-Wilson, and Shihab Shamma. “An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models.” Proc. Interspeech 2022 (paper)
Parikh, Rahil, Ilya Kavalerov, Carol Espy-Wilson, and Shihab Shamma. “Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems.” In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 536-540. IEEE, 2022 (paper)
N. Seneviratne and C. Espy-Wilson, “Multimodal Depression Classification using Articulatory Coordination Features and Hierarchical Attention Based text Embeddings,” ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6252-6256 (paper).
Siriwardena, Y.M., Kitchen, C., Kelly, D.L. & Espy-Wilson, C. “Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia using Convolutional Neural Networks”, In Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI ’21) (paper)
Nadee Seneviratne and Carol Espy-Wilson, “Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model”. Proc. Interspeech 2021 (paper)
Nadee Seneviratne and Carol Espy-Wilson, “Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables”. Proc. Interspeech 2021 (paper)
Nadee Seneviratne, Carol Espy-Wilson, James Williamson, Adam C. Lammert, Thomas F. Quatieri. “Classification of Depression by Quantifying Neuromotor Coordination Using Inverted Vocal Tract Variables”. 12th International Seminar on Speech Production (ISSP 2020).
Siriwardena, Y.M., Kitchen, C., Kelly, D.L. & Espy-Wilson, C. (2021). Inverted vocal tract variables and facial action units to quantify neuromotor coordination in schizophrenia. In Proceedings of the 12th International Seminar on Speech Production (ISSP 2020), 174-177. (paper)
Seneviratne, N., Williamson, J.R., Lammert, A.C., Quatieri, T.F., Espy-Wilson, C. (2020) Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression. Proc. Interspeech 2020. (paper)
Espy-Wilson, C., Lammert, A.C., Seneviratne, N., Quatieri, T.F. “Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables“. Proc. Interspeech 2019 (paper)
Sahu, S., Mitra, V., Seneviratne, N., Espy-Wilson, C. “Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs with Ground Truth Transcription“. Proc. Interspeech 2019 (paper)
Seneviratne, N., Sivaraman, G., Espy-Wilson, “Multi-Corpus Acoustic-to-Articulatory Speech Inversion”. Proc. Interspeech 2019 (paper)
S. Sahu and C. Espy-Wilson, “On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks“, In Proceedings of Interspeech, 2018.(paper)
N. Seveviratne, G. Sivaraman, V. Mitra and C. Espy-Wilson, “Noise Robust Articulatory to Acoustic Speech Inversion“, In Proceedings of Interspeech, 2018.(paper)
S. Sahu, R. Gupta, G. Sivaraman, and C. Espy-Wilson, “Smoothing Model predictions Using Adversarial Training Procedures for Speech based Emotion Recognition“, In Proceedings of ICASSP, 2018.(paper)
R. Gupta, S. Sahu, C. Espy-Wilson and S. Naryanan, “Semi-supervised and Transfer learning approaches for low resource sentiment classification“, In Proceedings of ICASSP, 2018.(paper)
R. Gupta, S. Sahu, C. Espy-Wilson, and S. Narayanan, “An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks“, In Proceedings of Interspeech, 2017.(paper)
S. Sahu, R. Gupta, G. Sivaraman, W. AbdAlmageed, and C. Espy-Wilson, “Adversarial auto-encoders for speech based emotion recognition“, In Proceedings of Interspeech, 2017, pp. 1243–1247.(paper)
G. Sivaraman , C. Espy-Wilson, M. Wieling, “Analysis of acoustic-to-articulatory speech inversion across different accents and languages”, In Proceedings of Interspeech 2017, pp. 974-978 (paper)
G. Sivaraman, V.Mitra, H. Nam, M.K. Tiede, C. Espy-Wilson, “Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion“, In Proceedings of Interspeech 2016.(paper)
S.Sahu, C. Espy-Wilson, “Speech features for depression detection“, In Proceedings of Interspeech Sept. 2016, pp. 1928-1932.(paper)
V. Mitra, W. Wang, Y. Lei, A. Kathol, G. Sivaraman and C. Espy-Wilson, “Robust features and system fusion for reverberation and their role in speech recognition“, ICASSP, 2015.(paper)
G. Sivaraman, V. Mitra, M. Tiede, E. Saltzman, L. Goldstein and C. Espy-Wilson, “Analysis of coarticulated speech using estimated articulatory trajectories“, Interspeech, 2015.(paper)
V. Mitra, G. Sivaraman, H. Nam, C. Espy-Wilson, E. Saltzman, “Articulatory features from deep neural networks and their role in speech recognition“, Proceedings of ICASSSP 2014.(paper)
G. Sivaraman, V. Mitra and C. Espy-Wilson, “Fusion Of Acoustic, Perceptual And Production Features For Robust Speech Recognition In Highly Non-Stationary Noise“, presented at the CHIME Challenge, ICASSP 2013.(paper)
X. Zhou, J. Woo, M. Stone and C. Espy-Wilson, “A Cine Mri-Based Study Of Sibilant Fricatives Production In Post-Glossectomy Speakers“, Proceedings of ICASSP 2013, pp. 7780-7784.(paper)
D. Garcia-Romero, X. Zhou and C. Espy-Wilson, “Multicondition Training Of Gaussian PLDA Models In I-Vector Space For Noise And Reverberation Robust Speaker Recognition“, Proceedings of ICASSP 2012.(paper)
X. Zhou, D. Garcia-Romero, N. Mesgarani, M. Stone, C. Espy-Wilson and S. Shamma, “Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations“, Proc. of Interspeech, September 2012.(paper)
V. Mahadevan and C. Espy-Wilson, “Maximum Likelihood Pitch Estimation Using Sinusoidal Modeling“, Proceedings of the International Conference on Communications and Signal Processing, 2011.(paper)
X. Zhou, D. Garcia-Romero,C.Y.Espy-Wilson,”Linear versus Mel- Frequency coefficients for speaker recognition “ASRU 2011(IEEE Automatic Speech Recognition and Understanding Workshop) accepted).
X. Zhou, M. Stone, C.Y. Espy-Wilson,”A comparative acoustic study on speech of patients and normal subjects“,in proceedings of Interspeech Florence, Italy, August, 2011. (paper)
D. Garcia-Romero and C. Y. Espy-Wilson,”Analysis of I-vector Length Normalization in Speaker Recognition Systems“,in proceedings of Interspeech Florence, Italy, August, 2011, pp. 249-252. (paper)
J. Zhou, D. Garcia-Romero and C. Y. Espy-Wilson,”Automatic Speech Codec Identification with Applications to Tampering“,in proceedings of Interspeech, Florence, Italy, August, 2011, pp. 2533-2536. (paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein,”Gesture-based Dynamic Bayesian Network for Noise robust Speech Recognition “, to appear in the Proc. of ICASSP, 2011.(paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein,”Speech Inversion: Benefits of Tract Variables over Pellet Trajectories,“, Proceedings of ICASSP, 2011.(paper)
X. Zhou, C.Y. Espy-Wilson, M. Tiede, S. Boyce, “An MRI-based articulatory and acoustic study of lateral sound in American English“, ICASSP2010, Dallas, USA.(paper)
D. Garcia-Romero and C. Y. Espy-Wilson, “Joint Factor Analysis for Speaker Recognition reinterpreted as Signal Coding using Overcomplete Dictionaries“, in Proc. of Odyssey 2010: The Speaker and Language Workshop, Brno, Czech Republic, July 2010, pp. 43-51.(paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein, “Robust Word Recognition using articulatory trajectories and Gestures“, Proc. of Interspeech, pp. 2038-2041 Japan, 2010.(paper)
H. Nam, V. Mitra, M. Tiede, E. Saltzman, L. Goldstein, C. Espy-Wilson, M. Hasegawa-Johnson, “A procedure for estimating gestural scores from natural speech“, Proc. of Interspeech, pp. 30-33, Japan, 2010.(paper)
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein, “Noise Robustness of Tract Variables and their Application to Speech Recognition“, Proc. of Interspeech, pp. 2759-2762, Brighton, UK, 2009.
V. Mitra, B.J. Borgstrom, C. Espy-Wilson, A. Alwan,” A Noise-type and Level-dependent MPO-based Speech Enhancement Architecture with Variable Frame Analysis for Noise-robust Speech Recognition“, Proc. of Interspeech, pp. 2751-2754, Brighton, UK, 2009.(paper)
V. Mitra, I. ?zbek, H. Nam, X. Zhou, C. Espy-Wilson,” From Acoustics to Vocal Tract Time Functions“, Proc. of International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4497-4500, Taiwan, 2009.(paper)
Daniel Garcia-Romero and Carol Y Espy-Wilson, “Intersession Variability in Speaker Recognition: A Behind the Scene Analysis“, in Proceedings of Interspeech 2008, Melbourne, Australia, pp. 1413-1416. (paper)
Vikramjit Mitra, Daniel Garcia-Romero and Carol Y Espy-Wilson, “Language and Genre detection in Audio Content Analysis“, in Proceedings of Interspeech 2008, Melbourne, Australia, pp. XX-XX. (paper)
Srikanth Vishnubhotla and Carol Espy-Wilson, “An Algorithm for Multi-Pitch Tracking in Co-Channel Speech“, in Proceedings of Interspeech 2008, Melbourne, Australia, pp. XX-XX. (paper)
Vikramjit Mitra, Daniel Garcia-Romero and Carol Y Espy-Wilson, “Language Detection in Audio Content Analysis“, in Proceedings of IEEE ICASSP, 2008, Las Vegas, USA, pp. 2109-2112. (paper)
Xinhui Zhou, Carol Y Espy-Wilson, Mark Tiede and Suzanne Boyce, “An Articulatory and Acoustic Study of ‘Retroflex’ and ‘Bunched’ American English Rhotic Sound Using Magnetic Resonance Imaging“, in Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 54-57. (paper)
Tarun Pruthi and Carol Y Espy-Wilson, “Acoustic Parameters for the Automatic Detection of Vowel Nasalization“, in Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 1925-1928. (paper)
Carol Y Espy-Wilson, Tarun Pruthi, Amit Juneja and Om Deshmukh, “Landmark-based Approach to Speech Recognition: An Alternative to HMMs“, in Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 886-889. (paper)
Srikanth Vishnubhotla and Carol Y Espy-Wilson, Detection of Irregular Phonation in Speech, The 16th International Congress of Phonetic Sciences (ICPhS) 2007, Saarbrucken, Germany, Aug 2007, pp. 2053-2056. (paper)
Om D Deshmukh and Carol Y Espy-Wilson, “Speech Enhancement Using Modified Phase Opponency Model“, in Proceedings of Interspeech 2006, Pittsburgh, USA. (paper)
Om Deshmukh and Carol Y Espy-Wilson, “Modified Phase Opponency Based Solution to the Speech Separation Challenge“, in Proceedings of Interspeech 2006, Pittsburgh, USA. (paper)
Tarun Pruthi and Carol Y Espy-Wilson, “An MRI based Study of the Acoustic Effects of Sinus Cavities and its Application to Speaker Recognition“, in Proceedings of Interspeech 2006, Pittsburgh, USA, pp. 2110-2113. (paper)
Carol Espy-Wilson, Sandeep Manocha and Srikanth Vishnubhotla, “A New Set of Parameters for Text-Independent Speaker Identification”, in Proceedings of Interspeech 2006, Pittsburgh, USA. (paper)
Srikanth Vishnubhotla and Carol Espy-Wilson, “Automatic Detection of Irregular Phonation in Continuous Speech”, in Proceedings of Interspeech 2006, Pittsburgh, USA. (paper)
Om Deshmukh and Carol Espy-Wilson, “Speech Enhancement Using Auditory Phase Opponency Model“, in Proceedings of Eurospeech, Lisbon, Portugal, pp. 2117-2120, 2005. (paper)
Om Deshmukh, Jawahar Singh and Carol Espy-Wilson, “A Novel Method for Computation of Periodicity, Aperiodicity and Pitch of Speech Signals“, in Proceedings of IEEE ICASSP, 2004, Montreal, Canada, pp. I.117-120. (paper)
Om Deshmukh and Carol Espy-Wilson, “Detection of Periodicity and Aperiodicity in Speech Signal Based on Temporal Information “, in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain, pp. 1365-1368, 2003. (paper)
Tarun Pruthi and Carol Y Espy-Wilson, Automatic Classification of Nasals and Semivowels , in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain, pp. 3061-3064, 2003. (paper)
Zhaoyan Zhang, Suzanne Boyce, Carol Espy-Wilson, Mark Tiede, “Acoustic Strategies for Production of American English retroflex /r/“, in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain, 2003.
Amit Juneja and Carol Espy-Wilson, “An Event-Based Acoustic-Phonetic Approach to Speech Segmentation and E-set Recognition“, in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain, 2003. (paper)
Om Deshmukh and Carol Espy-Wilson, “A measure of Periodicity and Aperiodicity in Speech“, in Proceedings of IEEE ICASSP 2003, Hong Kong, pp. 448-451. (paper)
Zhaoyan Zhang, Carol Espy-Wilson and Mark Tiede, “Acoustic Modeling of American English Lateral Approximants“, in Proceedings of Eurospeech, 2003, Switzerland.
Om Deshmukh, Carol Y Espy-Wilson and Amit Juneja, “Acoustic-Phonetic Speech Parameters for Speaker-Independent Speech Recognition“, in Proceedings of IEEE ICASSP 2002, May 13-17, 2002, Orlando, Florida, pp. 593-596. (paper)
Amit Juneja, Om Deshmukh and Carol Espy-Wilson, “An Event-Based Acoustic-Phonetic Approach For Speech Segmentation And E-set Recognition“, Presented in the Student Forum, IEEE ICASSP 2002. (paper)
Beth Logan, Pedro Moreno and Om Deshmukh, “Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio” in Human Language Technology Conference (HLT), March 2002. (paper)
Amit Juneja and Carol Espy-Wilson, “Segmentation of Continuous Speech Using Acoustic-Phonetic Parameters and Statistical Learning“, in Proceedings of ICONIP 2002, Singapore, November 18-22, 2002. (paper)
Suzanne Boyce and Carol Espy-Wilson, “Reading Tongue Configuration for /r/ from Acoustic Data“, American Speech-Language-Hearing Association, Nov. 2000, Washington DC.
Harriet Fell, Linda Ferrier, Carol Espy-Wilson, Susan Worst, Eric Craft, Karen Chenausky, Joel MacAuslen and Glenna Hennessey, “Automatic Analysis of Infant Babbling in EVA, the Early Vocalization Analyzer“, American Speech-Language-Hearing Association, Nov. 2000, Washington DC.
Kun Xia and Carol Espy-Wilson, “A New Formant Tracking Algorithm Based on Dynamic Programming“, in Proceedings of ICSLP, Oct. 2000, Beijing, China pp. III55-58.
Ariel Salomon and Carol Espy-Wilson, “Automatic Detection of Speech Landmarks from Temporal Cues“, in Proceedings of ICSLP, Oct. 2000, Beijing, China, pp. III762-765.
Carol Espy-Wilson and Suzanne Boyce, “A Simple Tube Model for American English /r/“, in Proceedings of the International Congress of Phonetic Sciences (ICPhS), August, 1999, San Francisco, CA, pp. 2137-2140. (paper)
C Espy-Wilson, P Demirel, K Ma and J MacAuslan, “Using a Natural Excitation Signal to Improve Artificial Larynx Speech“, in Proceedings of Eurospeech 99. (paper)
Ariel Salomon and Carol Espy-Wilson, “Automatic Detection of Manner Events for a Knowledge-based Speech Signal Representation“, in Proceedings of Eurospeech, Sept. 1999, Budapest Hungary, pp. 2797-2800. (paper)
Carol Espy-Wilson, Shrikanth Narayanan, Suzanne Boyce and Abeer Alwan “Acoustic modeling of American English /r/”, in Proceedings of Eurospeech, Patras, Greece, September 1997, pp. 393-396. (paper)
Carol Espy-Wilson and Nabil Bitar, “The Design of Acoustic Parameters for Speaker-Independent Speech Recognition”, in Proceedings of Eurospeech, Patras, Greece, September 1997, pp. 1239-1242. (paper)
Nabil Bitar and Carol Espy-Wilson, “Knowledge-Based Parameters for HMM Speech Recognition”, in Proceedings of the IEEE ICASSP, pp. 29-32, Atlanta, GA, May 1996. (paper)
Suzanne Boyce and Carol Espy-Wilson, “Coarticulatory Stability of American English /r/”, in Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp.1577-1580, Philadelphia, PA., October 1996. (paper)
Nabil Bitar and Carol Espy-Wilson, “A Signal Representation of Speech Based on Phonetic Features“, in Proceedings of the 1995 IEEE Dual-Use Technologies & Applications Conference, May 22-25, SUNY Inst. of Tech., Utica/Rome, 1995. (paper)
Nabil Bitar, Carol Espy-Wilson, Hamid Nawab, and Ramamurthy Mani “Issues in Feature-Based Recognition of Speech Mixed with Impulsive Sounds“, in Proceedings of IJCAI-95 Workshop on Computational Auditory Scene Analysis, Montreal, Canada, August 1995. (paper)
Nabil Bitar, Carol Espy-Wilson, Ramamurthy Mani, and Hamid Nawab, “Knowledge-Based Analysis of Speech Mixed with Sporadic Environmental Sounds“, in Proceedings of IJCAI-95 Workshops on Computational Auditory Scene Analysis, Montreal, Canada, August 1995. (paper)
Carol Espy-Wilson and Nabil Bitar, “Speech Parameterization Based on Phonetic Features: Application to Speech Recognition“, in Proceedings of Eurospeech 95, Madrid, Spain, September 1995. (paper)
Carol Espy-Wilson and Nabil Bitar, “Knowledge-Based vs. Cepstral-Based Parameters for Broad-Class HMM Speech Recognition“, in Proceedings of the IEEE Workshop on Speech Recognition, pp. 203-204 Snowbird, Utah, December, 1995. (paper)
Nabil Bitar, “A Feature-Based Broad Speech Classifier“, in Proceedings of the Conference of the Acoustical Society of America, Boston, June 1994.
Nabil Bitar, Ramamurthy Mani, Carol Espy-Wilson and Hamid Nawab, “Issues in Feature-Based Recognition of Speech Mixed with Impulsive Sounds“, in Proceedings of the Conference of the Acoustical Society of America, Boston, June 1994.
Nabil Bitar and Armen Balian, “Strident-Feature Extraction in English Fricatives“, in Proceedings of the Conference of the Acoustical Society of America, Ottawa, May 1993.
Armen Balian and Nabil Bitar, “Methods for Separating Adjacent Sounds with the Same Manner of Articulation“, in Proceedings of the Conference of the Acoustical Society of America, Ottawa, May 1993.
Suzanne Boyce and Carol Espy-Wilson, “Coarticulatory Stability in American English /r/s“, Speech Communication Group Working Papers, Vol. IX, pp. 80-93 Research Laboratory of Electronics, MIT, Cambridge, MA, December 1993.
Carol Espy-Wilson, “Consistency in /r/ trajectories in American English“, in Proceedings of the 12th International Congress of Phonetic Sciences (ICPhS), Aix En Provence, France, pp. 370-373, August 1991. (paper)
Carol Espy-Wilson, “A Semivowel Recognition System“, in Proceedings of the 11th Eleventh International Congress of Phonetic Sciences, Tallinn, Estonia, U.S.S.R., pp. 403-406. August 1987. (paper)
Carol Espy-Wilson, “A Phonetically Based Semivowel Recognition System“, in Proceedings of the IEEE ICASSP, Tokyo, Japan, pp. 2775-2778, April 1986. (paper)
Carol Espy and Jae Lim, “Effects of Noise on Signal Reconstruction from Fourier Transform Phase“, in Proceedings of the IEEE ICASSP, Paris, France, pp. 1833-1836, May 1982. (paper)

Book Chapters

C. Espy-Wilson, G. Sivaraman, M. Tiede, V. Mitra, E. Saltzmann, L. Goldstein, H. Nam (2018), “Modeling of Articulatory Gestures to Control Effects of Production Variability on Speech Technologies”. In Cangemi, Clayards, Niebuhr, Schupler & Zellers (eds). Rethinking Reduction, Berlin: Mouton de Gruyter.
M. Tiede, S. Boyce and C. Espy-Wilson (2010), “Variability of North-American /r/ production in response to palatal perturbation”. In Maassen, Ben and van Lieshout, Pascal (eds.), Speech Motor Control: New developments in basic and applied research, Oxford University Press.
Carol Espy-Wilson (2007), “Phonological Models of Variation in Computer Speech Processing: Commentary on the papers by Nam, Son et al.., and Hirschberg”. In Cole, Jennifer and Hualde, José I. (eds.), Laboratory Phonology 9. Berlin: Mouton de Gruyter. pp. 535-546.
S. Nawab, C. Espy-Wilson, R. Mani, and N. Bitar, (1998)“Knowledge-Based Analysis of Speech Mixed with Sporadic Environmental Sounds,” Computational Auditory Scene Analysis edited by Rosenthal and Okuno Lawrence Erlbaum Associates Inc. Publishers.

Workshops

V. Mitra, H. Nam and C. Espy-Wilson, “Robust speech recognition using articulatory gestures in a Dynamic Bayesian Network framework“, Automatic Speech Recognition and Understanding Workshop, Dec. 2011, Hawaii.
X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson and S. Shamma, ““Linear versus Mel- Frequency Cepstral Coefficients for Speaker Recognition” Automatic Speech Recognition and Understanding Workshop, Dec. 2011, Hawaii.
H. Nam, V. Mitra, M. Tiede, C. Espy-Wilson, M. Hasegawa-Johnson, E. Saltzman and L. Goldstein, “ Automatic gestural annotation of the U. Wisconsin X-ray Microbeam corpus“, U Penn workshop on New Tools and Methods for Very-Large-Scale Phonetics Research, to appear, January 2011.
H. Nam, V. Mitra, K. Iskarous, “ Artificial Neural Network prediction of mid-sagittal pharynx shape from dynamic ultrasound image“, Ultrafest V, Haskins Laboratories, New Haven CT, 19-21 March 2010.
V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman, L. Goldstein, “ Machine Learning strategies for recovering Speech Articulatory trajectories and Gestures from Speech“, 159th meeting of the ASA., J. Acoust. Soc. Am. 127, 2041, 2010.
H. Nam, V. Mitra, M. Tiede, E. Saltzman, L. Goldstein, C. Espy-Wilson, M. Hasegawa-Johnson, “A procedure for estimating gestural scores from articulatory data, 159th meeting of the ASA., J. Acoust. Soc. Am. 127, 1851, 2010.

V. Mitra, H. Nam, C. Espy-Wilson, ” A Step in the Realization of a Speech Recognition System based on Gestural Phonology and Landmarks”, 157th Meeting of the Acoust. Soc. of Am., Portland, J. Acoust. Soc. Am. 125, 2530, 2009.

Presentations

D. Zion, C. Espy-Wilson, S. Gordon-Salant, “Recognition of natural-rate, time-compressed, and natural fast-rate sentences by younger and older listeners“,presented at the Aging and Speech Communication Conference, 5th International and Interdisciplinary Research Conference, Indiana University, Bloomington, October 6-9, 2013.

Research Posters at Conferences, Technical Reports and Undergraduate Research

V. Mitra,” Machine Learning strategies for recovering Speech Articulatory trajectories and Gestures from Speech“,Invited Speaker, 159th meeting of the Acoust. Soc. of Am., Baltimore, 2010.
V. Mitra, “From Speech to Articulatory Regime“,, HESP seminar series, Hearing & Speech Sciences Dept., U. of MD, 2009.
V. Mitra, “ Articulatory information to improve robustness of speech recognition systems“,, Electrical and Computer Engineering Graduate students Association (ECEGSA) seminar, U. of MD, 2009.
V. Mitra, C. Espy-Wilson, H. Nam, E. Saltzman, L. Goldstein, “Tract variables and their application for noise robust speech recognition“,, ISR Research day, 2009.
V. Mitra and C. Espy-Wilson, “An Enhancement of Modified Phase Opponency for Noise-Robust Speech Recognition“,, Institute of Systems Research open house, 2009.
V. Mitra, I.Y. Ozbek, Hosung Nam, Xinhui Zhou and C. Espy-Wilson, “From Acoustics to Vocal Tract time functions“,, Institute of Systems Research open house, 2009.
V. Mitra and C. Espy-Wilson, “Speech-Enhancement for Noise-Robust Speech Recognition“,, Systems Symposium, Institute of Systems Research, 2008.
V. Mitra, D. Garcia-Romero and C. Espy-Wilson, “Language Detection for Music Information Retrieval“,, Systems Symposium, Institute of Systems Research, 2008.
Om Deshmukh, Amit Juneja, Carol Espy-Wilson, “Synergy of Acoustic-Phonetics and Peripheral Auditory Modeling Towards Robust Speech Recognition“, TECH 2004, University of Maryland, College Park.
Tarun Pruthi and Carol Y Espy-Wilson, “Acoustic Parameters for Automatic Detection of Nasal Manner,” TECH 2004, University of Maryland, College Park.
Om Deshmukh and Carol Espy-Wilson, “Detection of the Periodicity and Aperiodicity Profile and Pitch of Speech Signals Using Temporal Cues“, TECH 2003, University of Maryland, College Park.
Tarun Pruthi and Carol Y Espy-Wilson, “Automatic Classification of Nasals and Semivowels,” TECH 2003, University of Maryland, College Park.
Om Deshmukh, Amit Juneja and Carol Espy-Wilson, “Acoustic-Phonetic Speech Parameters for Speaker-Independent Speech Recognition.“, TECH 2002, University of Maryland, College Park.
Om Deshmukh and Carol Espy-Wilson, “Detection of Periodicity, Pitch and Aperiodicity in Speech Signals Using Strictly Temporal Cues“, TECH 2002, University of Maryland, College Park.
Om Deshmukh and Carol Espy-Wilson, “Evaluating the Perceptual Quality of Speech Signals Enhanced Using the Modified Phase Opponency Model“, submitted to the 152nd meeting of the Acoustical Society of America.
Tarun Pruthi and Carol Y. Espy-Wilson, “Acoustic Parameters for Nasality Based on a Model of the Auditory Cortex“, 151st meeting of the Acoustical Society of America, Providence, Rhode Island, Jun 2006.
Om Deshmukh and Carol Espy-Wilson, “Speech Enhancement based on Modified Phase-Opponency Detectors“, 150th meeting of the Acoustical Society of America, Minneapolis, Minnesota, 2005.
Tarun Pruthi and Carol Y. Espy-Wilson, “Simulating and understanding the effects of velar coupling area on nasalized vowel spectra“, 150th meeting of the Acoustical Society of America, Minneapolis, Minnesota, 2005. Best Student Paper Award.
Srikanth Vishnubhotla and Carol Espy-Wilson, “Analysis of Modal and Creaky Voice Quality Variations”, 150th meeting of the Acoustical Society of America, Minneapolis, Minnesota, 2005.
Om Deshmukh, Michael Anzalone, Carol Espy-Wilson and Laurel Carney, “A Noise-Reduction Strategy for Speech based on Phase-Opponency Detectors“, 149th meeting of the Acoustical Society of America, Vancouver Canada, 2005.
Tarun Pruthi and Carol Y Espy-Wilson, “Advances in the Acoustic Correlates of Nasals from Analysis of MRI Data“, 148th meeting of the Acoustical Society of America, New York, May 2004.
Om Deshmukh and Carol Espy-Wilson, “A Measure of Aperiodicity Content in Speech“, 145th meeting of the Acoustical Society of America, Nashville, TN, 2003. Best Presentation Award.
Om Deshmukh, Carol Espy-Wilson and Ariel Salomon “Robust Speech Event Detection Using Strictly Temporal Information.“, 141st meeting of the Acoustical Society of America, Chicago, IL, 2001. Best Presentation Award
Carol Y Espy-Wilson and Suzanne E Boyce, “The Role of F4 in Determining Tongue Shape for American English /r/“, in the 137th Meeting of the Acoustical Society of America, March 1999.

Theses

Saurabh Sahu, Ph.D. Thesis, “Towards Building Generalizable Speech Emotion Recognition Models”, 2019 (thesis)
Ganesh Sivaraman, Ph.D. Thesis, “Articulatory representations to address acoustic variability in speech Variable Models”, 2017 (thesis)
Daniel Garcia-Romero, Ph.D. Thesis, “Robust Speaker Recognition Based on Latent Variable Models”, 2012 (thesis)
Srikanth Vishnubhotla, Ph.D. Thesis, “Segregation of Speech Signals in Noisy Environments”, 2011 (thesis)
Vikramjit Mitra, Ph.D. Thesis, “Articulatory Information for Robust Speech Recognition”, 2010 (thesis)
Tarun Pruthi, Ph.D. Thesis, “Analysis, Vocal-tract Modeling and Automatic Detection of Vowel Nasalization”, 2006 (thesis)
Srikanth Vishnubhotla, Master’s Thesis, “Detection of Irregular Phonation in Speech”, 2006 (thesis)
Om Deshmukh, Ph.D. Thesis, “Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition”, 2006 (thesis)
Sandeep Manocha, Master’s Thesis, “Robust Voice Mining Techniques for Telephone Conversations”, 2006 (thesis)
Amit Juneja, Ph.D. Thesis, “Speech Recognition Based on Phonetic Features and Acoustic Landmarks”, 2006 (thesis)
Ariel Salomon, Master’s Thesis, “Speech Event Detection Using Strictly Temporal Information”, 2006 (thesis)
Nabil Bitar, Ph.D. Thesis, “Acoustic modeling of speech based on phonetic features”, Recipient of GRS Travel Award (thesis)
Demetri Paneras, Ph.D. Thesis, “Lexical Access based on phonetic features”, Recipient of GRS Travel Award
Deborah Schwartz, Master’s Thesis, “Signal Processing Algorithms for Electrolaryngeal Speech Enhancement”, 1996
Neeraj Deshmukh, Master’s Thesis, “A Strategy for Acoustic Modeling to Increase Efficiency of HG”, 1995
Kazuhito Niimi, Bachelor’s Thesis, ”Automatic classification of stop consonants”, 1994
Armen Balien, Bachelor’s Thesis, “Automatic Detection of Acoustic Properties that Separate Adjacent Sounds with the Same Manner of Articulation”, 1993
Stephanie Zierten, Bachelor’s Thesis, “Automatic Detection of Place of Articulation in Stop Consonants”, 1993
Venkatesh Chari, Master’s Thesis, “Extraction of Formant Frequencies by Adaptive Enhancement of Fourier Spectra”, 1992
Vinay Chandra, Bachelor’s Thesis, “Automatic Discrimination of Strident and Nonstrident Fricatives”, 1992
Tamer Onat, Master’s Thesis, “Vowel recognition using neural networks and phonetic features”, 1992
Jack McLaughlin, Master’s Thesis, “Extraction of the glottal waveform using inverse filtering”, 1992
Kenneth Grimes, Master’s Thesis, Formant estimation of vowels using Critical-band Filtering”, 1992
Valerie Padilla, Bachelor’s Thesis, “Detecting linguistic features for use in a speech recognition system”, 1991
Carla Valera, “Common Features of Devoiced Semivowels”
Carol Espy-Wilson, “An Acoustic Phonetic Approach to Speech Recognition: Application to the Semivowels”, RLE Technical Report 531, 1987. (thesis)