Individual voices are not uniformly similar to others, even when factoring out speaker characteristics such as sex, age, dialect, and so on. Some speakers share common features and can cohere into groups based on gross vocal similarity but, to date, no attempt has been made to describe these features systematically or to generate a taxonomy based on such “voice types.” For this purpose, perceived similarity judgments of voice pairs using a database of 100 male and female American English voices were collected and submitted to a hierarchical clustering analysis to generate the initial groupings of individual voices into types, separately for male and female voices. These types, in turn, were labeled based on auditory judgments by expert listeners on five nominal scales (voice quality, nasality/orality, mean pitch, pitch variability, and speaking rate) as well as an initial acoustic analysis using automated measures. The new typology revealed a total of 9 female and 12 male voice types, with speaking rate, pitch variability, and mean pitch playing the largest roles in determining the taxonomy for both sexes. This new vocal typology of American voices, along with future study and revision will find utility in academia (phonetics, discourse, sociolinguistics, genetics, and other fields), forensic linguistics, public and private sector business and marketing, voice acting, and public interest.

That voice types naturally occur is not especially surprising. Speaker identities can be confused over a phone or in other degraded listening conditions. Colloquial terms exist for vocal qualities that are not necessarily pathological but are distinctive, such as “nasally,” “whiny,” “gravelly,” “droning,” “staccato,” and others. What remains, however, is a systematic approach for identifying the number and type of the most common vocal stereotypes, or types, that speakers cohere into based on human perception. An inventory of voice types should be developed which is independent of other speaker characteristics (e.g., age, sex, dialect, pathology) and which serves to reduce the vast population of speaker identities by voice into a more manageable taxonomy of common types. Such voice types may play a role, as do other indexical properties, as perceptual units that partly influence the processing of linguistic and nonlinguistic information by human listeners. Their existence also points to numerous applications. In the forensic domain, speaker identification is a very common analysis required of audio evidence in cases and, yet, the duration of the speech samples and their quality can often preclude a highly confident judgment of the match/mismatch to the voice of a defendant or a relevant party in the case. However, such evidence recordings may be of sufficient caliber to permit a match/mismatch determination on the basis of a more gross category, such as a voice type. The evaluation of voice talent is also a growing field, given the increasing use of digital animation in the entertainment industry. While individual vocal attributes such as “pleasantness” or “authority” have been examined in prior work (Beebe-Center, 1965; Oyer & Trudeau, 1984; Bugental & Lin, 1997; and others), there is currently no rubric or automated procedure for classifying all of the relevant characteristics of a talented voice. Voice talent could be fit into a voice type taxonomy for increased ease of identifying the proper vocal talent for a given commercial application. This would include public service announcements and advertisement narration, where vocal pleasantness correlates such as trustworthiness, sex appeal, and overall pleasantness or friendliness play a significant role in listener impression, attention to message, and overall decision making and effectiveness of the message. The positing of a voice type taxonomy ultimately serves to reduce the vast number of speaker identities within a given sex/age/dialect subpopulation down to a manageable and useful number of categories.