Non-unique species names

If you’re given a species name of the form genus + specific epithet, does it uniquely identify a particular species level taxon? Nope.

In R, playing again with taxa.csv from https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip, provided by iNaturalist:

taxa <- read.csv("~/Downloads/inaturalist-taxonomy.dwca/taxa.csv", stringsAsFactors=FALSE)

species <- taxa[taxa$taxonRank == "species", ]$scientificName

> length(species)
[1] 1065860
> length(unique(species))
[1] 1065767
> length(species) - length(unique(species))
[1] 93

(This reminded me of an old conversation I had on r-sig-mac: https://stat.ethz.ch/pipermail/r-sig-mac/2008-September/005304.html.)

Extracting the non-unique species:

speciesCounts <- table(species)
nonUnique <- names(speciesCounts)[speciesCounts > 1]

nonUniqueDf <- taxa[taxa$scientificName %in% nonUnique, c("id", "kingdom", "phylum", "class", "order", "family", "scientificName")]

nonUniqueDf <- nonUniqueDf[order(nonUniqueDf$scientificName, nonUniqueDf$kingdom, nonUniqueDf$phylum, nonUniqueDf$class), ]

> max(speciesCounts)
[1] 2

> write.csv(nonUniqueDf, file="~/Downloads/non_unique_species.csv", row.names=FALSE)

Here’s a spreadsheet with the results of the above: https://docs.google.com/spreadsheets/d/1nJKqWPyx9XuhWNgnqUtaby_3J_Jqrr75nb3URf5AblQ/edit?gid=1239078923#gid=1239078923. One thing that stands out: many of the apparent duplicates seem to actually be duplicate entries — the same species appearing twice, with two distinct IDs.

Looking at one of these apparently real duplicates, Utetheisa watubela, I see that the two iNaturalist entries differ in that one, https://www.inaturalist.org/taxa/1522834-Utetheisa-watubela, includes a subgenus in the hierarchy while the other, https://www.inaturalist.org/taxa/1550471-Utetheisa-watubela, does not.

Comparing columns across consecutive rows, it looks like the duplicates fall into pretty much two groups: ones for which all the levels of the hierarchy are the same (duplicate entries?) and ones that have more differences — distinct taxa that just happen to share the same genus and specific epithet.

# 1, 3, 5, ... n - 1
i <- seq(1, nrow(nonUniqueDf), 2)

# Compare columns across consecutive rows
numEqual <- rowSums(nonUniqueDf[i, -1] == nonUniqueDf[i + 1, -1])

> table(numEqual)
numEqual
 1  5  6 
39  1 53 

For a project, I had been hoping that keeping things separate at the kingdom level would be enough. However, that one pair with 5 things equal is a problem:

> print(nonUniqueDf[i, ][numEqual == 5, ], row.names=FALSE)
      id  kingdom     phylum   class      order      family        scientificName
 1490794 Animalia Arthropoda Insecta Coleoptera Anthribidae Rhaphitropis guttifer

> print(nonUniqueDf[i + 1, ][numEqual == 5, ], row.names=FALSE)
      id  kingdom     phylum   class      order    family        scientificName
 1567926 Animalia Arthropoda Insecta Coleoptera Brentidae Rhaphitropis guttifer

So, it looks like there are two distinct species named Rhaphitropis guttifer one in family Anthribidae, the other in the family Brentidae. Wikispecies only seems to know about the one in Anthribidae: https://species.wikimedia.org/wiki/Rhaphitropis. Some of photos of the cutie: https://www.inaturalist.org/taxa/1490794-Rhaphitropis-guttifer/browse_photos.

Finally, here are the other 39 non-unique species names along with the kingdoms that divide them.

> print(cbind(nonUniqueDf[i, ][numEqual == 1, c("scientificName", "kingdom")], kingdom2=nonUniqueDf[i + 1, ][numEqual == 1, ]$kingdom), row.names=FALSE)
          scientificName   kingdom kingdom2
         Acmella pusilla  Animalia  Plantae
            Actaea bocki  Animalia  Plantae
      Actaea jacquelinae  Animalia  Plantae
        Canarium elegans  Animalia  Plantae
   Chrysopogon castaneus  Animalia  Plantae
        Clavulina rugosa Chromista    Fungi
            Clusia flava  Animalia  Plantae
      Diplotaxis simplex  Animalia  Plantae
           Eulalia aurea  Animalia  Plantae
         Ficus variegata  Animalia  Plantae
       Fischeria bicolor  Animalia  Plantae
 Fritillaria messanensis  Animalia  Plantae
        Gaussia princeps  Animalia  Plantae
         Iris orientalis  Animalia  Plantae
            Laelia rosea  Animalia  Plantae
              Lasia rufa  Animalia  Plantae
         Lasia splendens  Animalia  Plantae
      Lecania olivacella  Animalia    Fungi
   Leschenaultia expansa  Animalia  Plantae
          Lola insularis  Animalia  Plantae
    Mallotus floribundus  Animalia  Plantae
     Nidularia pulvinata  Animalia    Fungi
        Orestias elegans  Animalia  Plantae
         Ormosia nobilis  Animalia  Plantae
         Orthosia mollis  Animalia  Plantae
        Osbornia cornuta  Animalia  Plantae
     Pilophorus clavatus  Animalia    Fungi
     Psychopsis gracilis  Animalia  Plantae
  Rhaphidophora beccarii  Animalia  Plantae
    Schaefferia oaxacana  Animalia  Plantae
  Sirindhornia pulchella  Animalia  Plantae
      Solenopsis bicolor  Animalia  Plantae
       Solieria pacifica  Animalia  Plantae
             Stelis alta  Animalia  Plantae
       Stelis anthracina  Animalia  Plantae
          Stelis elegans  Animalia  Plantae
      Stelis franciscana  Animalia  Plantae
         Stelis maculata  Animalia  Plantae
        Tritonia pallida  Animalia  Plantae

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *