Welcome to the website for the ESRC-funded project 'Person-specific automatic speaker recognition: understanding the behaviour of individuals for applications of ASR' (ES/W001241/1; £1,012,570). This is a three year project running from 2022 to 2025 led by Dr Vincent Hughes (PI), Professor Paul Foulkes (CI) and Dr Philip Harrison in the Department of Language and Linguistic Science at the University of York. The project involves collaboration with the Netherlands Forensic Institute and Oxford Wave Research.
Project overview
The voice encodes considerable speaker-specific information. It is thus widely used by commercial organisations as a biometric, e.g. to verify an individual’s identity when accessing bank accounts. The voice is also used widely for investigative purposes and as forensic evidence presented to courts. These tasks are increasingly undertaken using automatic speaker recognition (ASR) technology: software that processes and analyses voices with minimal human input. This field is now dominated by machine learning algorithms focused on reducing error rates, yet systems still perform better or worse with certain voices (e.g. making some bank accounts more susceptible to hacking). A fundamental question therefore remains: what makes a particular voice easy or difficult for ASR to recognise? This interdisciplinary project uses innovative methods and large-scale data to address this question, uniting expertise from linguistics, speech technology, and forensic speech analysis, from the academic, professional, and commercial sectors. The overall aim of this project is to analyse systematically how ASR systems perform with individual speakers and develop methods to handle problematic types of speakers.
Research questions
(1) What systematic properties of speakers make them more or less susceptible to ASR errors, in terms of voice (e.g. pitch, voice quality) and demographic factors (e.g. accent, ethnicity, age, sex)? And how do the magnitudes of these effects compare to known technical effects?
(2) How consistent are results for individual speakers within and across ASR systems?
(3) How do results produced by techniques that combine ASR and linguistic methods on a person-specific basis compare with the current one-size-fits-all approach?
(4) How generalisable are methods and results across datasets and languages?
Impact objectives
(1) To improve ASR performance by developing ways to identify problematic speakers (i.e. those more susceptible to errors) and tailoring methods to deal with them.
(2) To guide data collection for ASR, particularly for validation in forensic contexts.
(3) To contribute towards ASR becoming admissible as forensic evidence in England & Wales by validating performance on forensically realistic materials and through engagement with the legal community.
News
04/07/22: We are recruiting again! We're looking for a post-doctoral research associate with expertise in automatic speaker recognition, speech technology and/or machine learning. The position is for 2.5 years and due to start on 01/01/23 based at the University of York. The closing date for applications is 12/08/22. More information, including job description and information about how to apply, can be found here. For informal enquiries, please contact the PI, Vincent Hughes.
01/07/22: The project has now officially started. We'll be at IAFPA in Prague next week and we'll update the website with our poster!
04/03/22: We are currently recruiting for a post-doctoral research associate with expertise in linguistics (forensic speech science, phonetics, sociophonetics). The position is for 2.5 years and due to start on 01/06/22 based at the University of York. The closing date for applications is 08/03/22. More information, including job description and information about how to apply, can be found here. For informal enquiries, please contact the PI, Vincent Hughes.