skip to Main Content

​Speech Intelligibility

Speech intelligibility is a field that forensic audio examiners are engaged frequently. An audio recording containing dialogue or speech which requires enhancement to provide a more intelligible narrative often faces several challenges.

The scope of this article is to outlay some of the challenges existing in several key areas of the recording, as well as describing several processes forensic audio specialists engage to tackle speech intelligibility issues.

Apart from the recording itself being in an acceptable balance between acoustical noise and dialogue, the speech itself must be able to provide an acceptable level also of articulation.

From a listener’s point of view sounds vary in pitch, loudness and quality. In vowels the resonant frequencies of the vocal tract are known as formants. The majority people cannot hear the pitches of individual formants in formal speech but are able to identify them more easily in whispered speech.

Disorders impacting the form of speech sounds are usually referred to as articulation disorders.

Below are some key intelligibility elements:

Speech level and distance from the microphone of the recording device affect greatly the perceived intelligibility.
Crest factor (difference between peak & RMS {Root Mean Square} levels). Narrow band, wide band, full band digital audio coding algorithms.

Frequency ranges are of extreme importance in speech intelligibility particularly in the 1kHz-4kHz range.

Noises present in the background of a recording affect greatly the intelligibility of the speech present in it. In a recording containing dialogue any other signal present can be expressed as noise.
Another potential element that can be considered as noise is reverberation which can interfere with the intelligibility of the speech smearing out the consonants.

Directivity: the position of the speaking person’s head and body and distance from the microphone.

Other factors such as selection of microphone to be used in a recording also have an impact on intelligibility.

Enhancing Speech

Depending on the quality of the recording and the speech present in it, the forensic audio examiner may use filtering, that is, equalising certain frequencies or frequency ranges, to provide a more intelligible version.

Noise reduction, ambient space enhancement or reduction tools, dynamic processors, single band or multi band, and several other processors may be employed in order to tackle speech intelligibility issues.

All processes mentioned above must be used in frequent comparison with the original version to make sure that a more intelligible version of the speech is produced.

In certain cases we may even need to isolate and attempt to enhance words letter by letter in order to increase the intelligibility. The degree of difficulty increases dramatically in such attempts making the task rather time consuming and challenging.

Clean recordings as well as fine enunciated speech are essential in how we understand and hear dialogue. Noisy recordings with inconsistent volume levels will result in poor and uncertain speech intelligibility.
Objective measurement methods can be applied to obtain intelligibility in certain cases such as Signal to Noise ratio “A” weighted, Articulation Index, and STI (Speech Transmission Index). Subjective methods are usually not well accepted in court rooms.

In summary intelligibility is a subjective perceptual judgement based on what percentage of the speaker’s content is understood by the listener.

Back To Top
Enquiry Now