Taxonomic scoring system

The figures in brackets throughout the taxonomy sections correspond to the scores assigned to the characters via the Tobias criteria scoring system. Learn more about the criteria through the following excerpts from the introduction to the HBW and BirdLife International Illustrated Checklist of the Birds of the World.

Under a recently proposed system of quantitative criteria for species delimitation (Tobias et al. 2010) phenotypic differences (i.e. differences in plumage, morphology, measurements and vocalizations) are scored as minor (1), medium (2), major (3) and exceptional (4), depending on their perceived degree of strength. Co-varying differences (e.g. longer wing length and proportionately longer bill size) can be scored only once, and (to avoid maximizing the value of minor differences) only three morphological, two morphometric and two vocal differences may be scored. Molecular differences between taxa are not given quantitative scores because genetic and phenotypic differences have no consistent correlation; although this omission has drawn criticism, genetic information is repeatedly used in this work to illuminate or infer evolutionary history, and in some cases molecular evidence has been central to the way species have been arranged and their limits drawn. Ecological and behavioural differences are also taken into account and, if present, they receive an extra score of 1 (with 2 allowed for “non-overlapping differences in courtship display”). Finally, distributional data are also incorporated, and, while allopatric (no matter how disjunct) ranges do not score, parapatry scores 3, a narrow zone of hybridization 2 and a broad zone of hybridization 1 (see text for details and reasoning). Taxa scoring a total of 7 or more are considered distinct enough to be accorded full species status (based on scores achieved by similar species living in sympatry and compared with lower scores for taxa widely considered to be subspecies). The abbreviation “ns” is used for “not scored”. Since in the scoring system there is a “frequency of scoring” for each type of taxonomic character, there is a limit to the number of characters that can be counted towards the score. In some cases where the limit was reached, but there were other characters that were interesting to explain, the explanations were included with the abbreviation “(ns[#])”. This serves to explain that the characters were “not scored” and also to give the score they would have received if they could have been counted. The “Tobias criteria” were not introduced as a new species concept or a truly objective method, but rather as a practical tool to help assess the degree of difference between non-sympatric taxa, in as consistent and transparent a way as possible.

The Tobias criteria: an outline

In assessing the overall degree of difference between taxa the key threshold under the Tobias criteria is a total score of 7: any taxon at or above this score is adjudged to possess species status. This total score can be reached by combinations of smaller scores generated by two types of criteria: phenotypic and distributional.

Phenotypic criteria

Phenotypic differentiation between taxa (involving plumage colour, pattern and structure, morphometric evidence and vocal characters) is scored according to four categories of magnitude, each so far as possible defined by quantitative thresholds; some smaller allowance is also made for differences in ecology and behaviour. Morphometric differences are quantified by using effect sizes (a measure of the magnitude of a relationship based on the spread of individual data-points) for the largest degree of difference computed from means and standard deviations (which show the degree of variation from the mean) and presented as the Cohen’s d statistic; characters in a taxon that evidently co-vary (e.g. longer wing and longer tail) can be scored only once against another taxon, but characters that evidently do not co-vary (e.g. longer wing and shorter tail) can both be scored, involving the strongest increase and strongest decrease in effect size. Vocal characters are scored through spectrographic analysis based on the strongest temporal and strongest spectral effect size in analogous vocalizations in two taxa.

The four categories of magnitude in the phenotypic criteria are minor, which scores 1, medium 2, major 3 and exceptional 4.

  • A minor difference involves weak divergence in a plumage or morphometric character, in the form of a slightly different wash or suffusion on an area of feathering or on a bare part (although minor differences in bare part coloration are either not common or infrequently detected). A minor morphometric or vocal character is one in which the effect size is 0.2–1.99.
  • A medium difference involves a distinctly different tone (shade: light yellow vs dusky yellow, etc.) on an area of feathering or bare part. A medium morphometric or vocal character is one in which the effect size is 2–4.99.
  • A major difference involves a contrastingly different hue (colour: e.g. white vs yellow) on an area of feathering or bare part, and/or the presence of an entirely different patterning (such as strong spotting vs strong stripes). A strong morphometric or vocal character is one in which the effect size is 5–9.99.
  • An exceptional difference involves a radically different coloration or pattern (a striking contrast in colours or shapes) applying to the majority of the plumage area, or any trait directly involved in courtship and mate choice. An exceptional morphometric or vocal character is one in which the effect size is 10 or more.

Obviously it is the highest-scoring characters that must be used in the assessment of species rank. However, to constrain the effects of interdependence in phenotypic characters, several conditions apply. The number of characters relating to differences in plumage and bare-part colours and patterns is capped at three. The number of morphometric and vocal characters is capped at two. Differences in ecology and behaviour can be scored only once, and except for non-overlapping differences in courtship display (allowed a score of 2) all such differences are limited to a score of 1.

Distributional criteria

These involve five conditions of geographical relationship: allopatry, broad hybrid zone, narrow hybrid zone, parapatry and sympatry. Of these, allopatry scores 0, because it cannot be quantified and supplies no evidence of evolutionary separation, while sympatry automatically scores 7 since the taxa in question are behaving demonstrably as species. The three intermediate conditions, however, can be allowed scores which reflect the approximate degree of the resistance of the taxa to phenotypic merging.

  • A broad hybrid zone is one in which hybridization between two taxa occurs over a range more than 200 km wide at its maximum point. The breadth of the zone suggests a relatively low resistance, thus allowing a “minor” score of 1.
  • A narrow hybrid zone is one in which hybridization between two taxa occurs over a range less than 200 km wide at its maximum point. The narrowness of the zone suggests relatively high resistance, reflected in a “medium” score of 2.
  • Parapatry involves an extremely narrow line along which the boundaries of two taxa abut with no or minimal hybridization. The taxa are not dissimilar enough ecologically to coexist in sympatry, but appear to exclude each other (i.e. there is no assistance from a geographical barrier such as a broad river), suggesting strong resistance worthy of a “major” score of 3.

Obviously, these three conditions exclude each other: a taxon can be scored only once on distributional criteria.

In contrast, a score of 7 can be reached purely on phenotypic characters, but combinations of phenotypic characters and a particular distributional condition can also make up the necessary total. However, scores of 7 that are achieved on minor characters only (which here include a broad hybrid zone) are disallowed as triggering species status.

The fact that hybridization is treated in these criteria as a positive rather than a negative characteristic in determining species rank must appear counterintuitive to many people who, perhaps for many decades, have assumed that almost any serious degree of hybridization between two taxa is evidence of their reproductive compatibility and hence of their conspecificity. The fact that at least 9% of all bird species have interbred in the wild (Grant & Grant 1992) tends, however, to suggest that hybridization is on the one hand a widespread and common phenomenon and on the other very rarely capable of producing significant changes in parent taxa (mostly on oceanic islands and only as a result of anthropogenic interference). So if taxa—lineages—meet and hybridize on a regular basis but their genomes have not merged (as judged by molecular or phenotypic evidence), then there is every reason to consider them species (Johnson et al. 1999, Helbig et al. 2002, Carling & Brumfield 2009, Harr & Price 2012). If Icterine Warblers Hippolais icterina and Melodious Warblers H. polyglotta were allopatric, the relatively low levels of differentiation between them would form an arguable case for their conspecificity; but, precisely because they slightly overlap and hybridize without merging into one another, this possibility is quashed outright. Thus, in this checklist, we accept the specific status of a suite of taxa which previously had been considered subspecies because of their hybridizations—Franklin’s Grouse Falcipennis franklinii, White-faced Barbet Pogonornis macclounii, Iberian Green Woodpecker Picus sharpei, various Colaptes flickers and, perhaps most notably, a suite of Pteroglossus and Ramphastos toucans from Amazonia.

The key point is that, in evolutionary terms, hybrids are less fit (Harr & Price 2012). If hybrids were fully viable, genomes fully compatible and signals not reproductively isolating, then the contact zone between two hybridizing taxa would be a broad cline, and in this case the taxa would be conspecific—and indeed there is an increasing trend not to give any taxonomic recognition to components (even the two ends) of a cline, which thus become (part of) the range of a single taxon. (One might then add “cline” to “allopatry” in the list of distribution conditions above, and allow them both no score.) Between the cline and the line of parapatry lie the two types of hybrid zone determined by their width, on the reasonable assumption that fitness decreases with decreasing width of zone; hence a narrow hybrid zone provides evidence of greater genomic integrity and should be scored accordingly. (The inevitable corollary is, of course, that very broad hybrid zones reflect relatively high levels of hybrid fitness, and we acknowledge that these pose challenges that deserve thought and reflection, for example in the cases of Masked and Black-shouldered Lapwings Vanellus miles and V. novaehollandiae and of Campo and Pampas Flickers Colaptes campestris and C. campestroides, both pairs of which we split, with some uncertainty; indeed in one case, involving the Oriental Dwarf-kingfisher Ceyx erithaca, the hybrid zone between northern nominate erithaca and southern rufidorsa is so wide—far wider than the range of pure rufidorsa—that logic and practicality militate altogether against establishing the taxa as species.)

A further important point made but not discussed in any detail by Tobias et al. (2010) is that, although it may play a part in the speciation process, disjunction is not a taxonomic character. In recent years several splits have been proposed on the basis of the existence of a great distance between one taxonomically distinct population and another (and indeed distance between islands forms part of a system for determining taxonomic rank proposed in Pratt 2010). Paradoxically it is also sometimes remarked that two taxa separated by only a short distance could also be judged two species because, in spite of their proximity, they have managed to maintain the integrity of their characters. In both cases, however, it needs to be recognized that the distance between the ranges of taxa, whether very small or very large in size, has no taxonomic value per se. Disjunction is simply the circumstance that triggers the need for criteria to gauge the differences in character between the taxa involved. It cannot then also be invoked as one of the factors on which the degree of difference is assessed.

It is perhaps also worth noting that broad rivers render the ranges of understorey birds disjunct, since such species cannot cross them; but this means that these rivers do not represent a line of parapatry. On the other hand, the same rivers should not pose a barrier to larger canopy species such as parrots and toucans, so for these kinds of bird rivers may indeed be considered, potentially, as forming lines of parapatry.