Forensic comparison of ageing voices from automatic and auditory perspectives
Issue: Vol 22 No. 2 (2015)
Journal: International Journal of Speech Language and the Law
Subject Areas: Linguistics
DOI: 10.1558/ijsll.v22i2.21760
Abstract:
Comparison of non-contemporary speech samples occurs frequently in forensic speaker-recognition cases. While the ageing-related changes in the voice have been well investigated, the effect of ‘vocal ageing’ on forensic speaker recognition has yet to be fully established. In this article, auditory and automatic experiments providing a deeper insight into the impact of ageing on forensic speaker recognition are presented. A listener test investigating the extent to which vocal ageing is detectable by lay listeners is first presented. A test set of 10 males and 10 females, with recordings spanning approximately 30 years per speaker, are taken from the Trinity College Dublin Speaker Ageing (TCDSA) database. Correct detection of ageing in two samples of the same speaker is found to increase from 64% at a 10-year age difference to 86% at a 30-year age difference. Ageing is significantly more detectable in female speakers than male speakers, and female listeners are significantly better at detecting ageing than male listeners. A link between ageing detectability and speaking fundamental frequency is also observed. A forensic automatic speaker recognition (FASR) experiment with ageing speakers is then presented. Given a test set of five male speakers from the TCDSA database, each with multiple recordings spanning 30–50 years, ageing is shown to progressively weaken the strength-of-evidence (likelihood ratios) of same-speaker comparisons. While there is inter-speaker variability in the extent of the ageing effect, instances of erroneous support for the different-speaker hypothesis are introduced for all speakers within a time-lapse of 10 years. The detrimental effect of ageing on the overall FASR system is also illustrated via Tippett plots.
Author: Finnian Kelly, Naomi Harte
References :
AGNITIO (2015) BATVOX Voice Biometrics Tool. Retrieved from www.agnitio-corp.com/products/government/voice-recognition-system
Aitken, C. G. G., and Taroni, F. (2004) Statistics and the Evaluation of Evidence for Forensic Scientists. London: John Wiley & Sons. http://dx.doi.org/10.1002/0470011238
American Presidential Speech Archive (2015) Retrieved from http://millercenter.org/president/speeches
Apted, M. (Writer) (1977–2012) The Up Series. In G. Television (Producer).
Beck, J. M. (2010) Organic variation of the vocal apparatus. In W. J. Hardcastle, J. Laver and F. E. Gibbon (eds) The Handbook of Phonetic Sciences 153–201. Oxford: Blackwell. http://dx.doi.org/10.1002/9781444317251.ch5
Boersma, P. (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proceedings 17(1193): 97–110.
Brandschain, L., Graff, D., Cieri, C., Walker, K., Caruso, C. and Neely, A. (2010) Greybeard – voice and aging. Paper presented at the Seventh conference on International Language Resources and Evaluation (LREC ’10), Valletta, Malta.
Bruckl, M. and Sendlmeier, W. (2003) Aging female voices: An acoustic and perceptive analysis. Paper presented at the VOQUAL ’03, Geneva.
Champod, C. and Evett, E. W. (2000) Commentary on A. P. A. Broeders (1999) ‘Some observations on the use of probability scales in forensic identification’, Forensic Linguistics 6(2): 228–241. International Journal of Speech, Language and the Law 7(2): 239–243.
Cole, R., Noel, M. and Noel, V. (1998) The CSLU speaker recognition corpus. Paper presented at the International Conference on Spoken Language Processing.
Decoster, W. and Debruyne, F. (2000) Longitudinal voice changes: facts and interpretation. Journal of Voice 14(2): 184–193. http://dx.doi.org/10.1016/S0892-1997(00)80026-0
Drygajlo, A. (2007) Forensic automatic speaker recognition. Signal Processing Magazine, IEEE 24(2): 132–135. http://dx.doi.org/10.1109/MSP.2007.323278
Drygajlo, A. (2012) Automatic speaker recognition for forensic case assessment and interpretation. In A. Neustein and H. A. Patil (eds) Forensic Speaker Recognition 21–39. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-0263-3_2
Endres, W., Bambach, W. and Flösser, G. (1971) Voice spectrograms as a function of age, voice disguise, and voice imitation. Journal of the Acoustical Society of America 49(6B): 1842–1848. http://dx.doi.org/10.1121/1.1912589
Eriksson, A. (2005) Tutorial on forensic speech science. Part 1. Forensic phonetics. Paper presented at the InterSpeech 2005, Lisbon, Portugal.
Evett, I. W., Jackson, G., Lambert, J. A. and McCrossan, S. (2000) The impact of the principles of evidence interpretation on the structure and content of statements. Science & Justice 40(4): 233–239. http://dx.doi.org/10.1016/S1355-0306(00)71993-9
French, J. P. F., Harrison, P. and Windsor-Lewis, J. (2006) R v John Samuel Humble: the Yorkshire Ripper hoaxer trial. International Journal of Speech, Language and the Law 13(2): 256–273.
Gold, E. and French, P. (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293–307. http://dx.doi.org/10.1558/ijsll.v18i2.293
Gold, E. and Hughes, V. (2012) Issues and opportunities for the application of the numerical likelihood ratio framework to forensic speaker comparison. Paper presented at the IAFPA 2012, Tampa, Florida.
Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M. and Ortega-Garcia, J. (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language 20(2–3): 331–355.
Gonzalez-Rodriguez, J. and Ramos, D. (2007) Forensic automatic speaker classification in the ‘coming paradigm shift’. In C. Müller (ed.) Speaker Classification. I. Fundamentals, Features, and Methods 205–217. Berlin: Springer. http://dx.doi.org/10.1016/j.csl.2005.08.005
Gonzalez-Rodriguez, J., Rose, P., Ramos, D., Toledano, D. T. and Ortega-Garcia, J. (2007) Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing 15(7): 2104-2115. http://dx.doi.org/10.1109/TASL.2007.902747
Greenberg, C., Stanford, V., Martin, A., Yadagiri, M., Doddington, G., Godfrey, J. and Hernandez-Cordero, J. (2013) The 2012 NIST speaker recognition evaluation. Paper presented at the InterSpeech 2013, Lyon, France.
Harnsberger, J. D., Brown Jr, W. S., Shrivastav, R. and Rothman, H. (2010) Noise and tremor in the perception of vocal aging in males. Journal of Voice 24(5): 523–530. http://dx.doi.org/10.1016/j.jvoice.2009.01.003
Harrington, J., Palethorpe, S. and Watson, C. (2007) Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. Paper presented at the InterSpeech 2007.
Hollien, H. and Schwartz, R. (2000) Aural-perceptual speaker identification: problems with noncontemporary samples. International Journal of Speech, Language and the Law 7(2): 199–211. http://dx.doi.org/10.1558/sll.2000.7.2.199
Hollien, H. and Schwartz, R. (2001) Speaker identification utilizing noncontemporary speech. Journal of Forensic Sciences 46(1): 63–67. http://dx.doi.org/10.1520/jfs14912j
Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the population size effect on the performance of forensic speaker classification. Paper presented at the Interspeech 2008.
Kelly, F., Drygajlo, A. and Harte, N. (2013) Speaker verification in score-ageing-quality classification space. Computer Speech & Language 27(5): 1068–1084. http://dx.doi.org/10.1016/j.csl.2012.12.005
Kelly, F. and Harte, N. (2013) Auditory detectability of vocal ageing and its effect on forensic automatic speaker recognition. Paper presented at the InterSpeech 2013, Lyon, France.
Kelly, F., Saeidi, R., Harte, N. and Leeuwen, D. v. (2014) Effect of long-term ageing on i-vector speaker verification. Paper presented at the InterSpeech 2014, Singapore.
Kinnunen, T. (2005) Optimizing Spectral Feature Based Text-Independent Speaker Recognition. PhD dissertation, University of Joensuu.
Kinnunen, T. and Li, H. (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication 52(1): 12–40. http://dx.doi.org/10.1016/j.specom.2009.08.009
Künzel, H. J. (2007) Non-contemporary speech samples: auditory detectability of an 11 year delay and its effect on automatic speaker identification. International Journal of Speech, Language and the Law 14(1): 109–136. http://dx.doi.org/10.1558/ijsll.v14i1.109
Lawson, A. D., Stauffer, A. R., Cupples, E. J., S. J., W., Bray, W. P. and Grieco, J. J. (2009) The Multi-Session Audio Research Project (MARP) corpus: goals, design and initial findings. Paper presented at the INTERSPEECH 2009, Brighton.
Leeuwen, D. A. v. and Brümmer, N. (2007) An introduction to application-independent evaluation of speaker recognition systems. Speaker Classification 1: 330–353. http://dx.doi.org/10.1007/978-3-540-74200-5_19
Linville, S. E. (2001) Vocal Aging. San Diego: Singular Thomson Learning.
Macmillan, N. and Creelman, D. (2004) Detection Theory: A User’s Guide. Mahwah, NJ: Lawrence Erlbaum.
Meuwly, D. and Drygajlo, A. (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian Mixture Modelling (GMM). Paper presented at the Odyssey 2001, Crete, Greece.
Mueller, P. B. (1997) The Aging Voice. Seminars in Speech and Language 18(02): 159, 169. http://dx.doi.org/10.1055/s-2008-1064070
Nolan, F., McDougall, K., de Jong, G. and Hudson, T. (2009) The DyViS databse: stylecontrolled recordings of 100 homogenous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31–57.
Reubold, U., Harrington, J. and Kleber, F. (2010) Vocal aging effects on F0 and the first formant: a longitudinal analysis in adult speakers. Speech Communication 52(7–8): 638–651. http://dx.doi.org/10.1016/j.specom.2010.02.012
Reynolds, D. A., Quatieri, T. F. and Dunn, R. B. (2000) Speaker verification using adapted Gaussian Mixture Models. Digital Signal Processing 10(1–3): 19–41. http://dx.doi.org/10.1006/dspr.1999.0361
Rhodes, R. (2011) Changes in the voice across the early adult lifespan. Paper presented at the The International Association of Forensic Phonetics and Acoustics (IAFPA) 2011, Vienna, Austria.
Rhodes, R. (2012) Assessing the Strength of Non-Contemporaneous Forensic Speech Evidence. PhD dissertation, The University of York.
Rose, P. (2002) Forensic Speaker Identification. New York: Taylor & Francis. http://dx.doi.org/10.1201/9780203166369
Rose, P. and Morrison, G. S. (2009) A response to the UK Position Statement on forensic speaker comparison. International Journal of Speech, Language and the Law 16(1): 139–163. http://dx.doi.org/10.1558/ijsll.v16i1.139
Saeidi, R., Lee, K. A., Kinnunen, T., Hasan, T., Fauve, B., Bousquet, P.-M., . . . Ambikairajah, E. (2013) I4U Submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification. Paper presented at the InterSpeech, Lyon, France.
Schötz, S. (2006) Perception, Analysis and Synthesis of Speaker Age. PhD dissertation, Lund University.
Stathopoulos, E. T., Huber, J. E. and Sussman, J. E. (2011) Changes in acoustic characteristics of the voice across the life span: measures from individuals 4–93 years of age. Journal of Speech, Language, and Hearing Research 54: 1011–1021. http://dx.doi.org/10.1044/1092-4388(2010/10-0036)
Torre III, P. and Barlow, J. A. (2009) Age-related changes in acoustic characteristics of adult speech. Journal of Communication Disorders 42(5): 324–333. http://dx.doi.org/10.1016/j.jcomdis.2009.03.001
Vipperla, R., Renals, S. and Frankel, J. (2010) Ageing voices: the effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing, 2010.