If you’re given a species name of the form genus + specific epithet, does it uniquely identify a particular species level taxon? Nope.
In R, playing again with taxa.csv from https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip, provided by iNaturalist:
taxa <- read.csv("~/Downloads/inaturalist-taxonomy.dwca/taxa.csv", stringsAsFactors=FALSE)
species <- taxa[taxa$taxonRank == "species", ]$scientificName
> length(species)
[1] 1065860
> length(unique(species))
[1] 1065767
> length(species) - length(unique(species))
[1] 93
(This reminded me of an old conversation I had on r-sig-mac: https://stat.ethz.ch/pipermail/r-sig-mac/2008-September/005304.html.)
Extracting the non-unique species:
speciesCounts <- table(species)
nonUnique <- names(speciesCounts)[speciesCounts > 1]
nonUniqueDf <- taxa[taxa$scientificName %in% nonUnique, c("id", "kingdom", "phylum", "class", "order", "family", "scientificName")]
nonUniqueDf <- nonUniqueDf[order(nonUniqueDf$scientificName, nonUniqueDf$kingdom, nonUniqueDf$phylum, nonUniqueDf$class), ]
> max(speciesCounts)
[1] 2
> write.csv(nonUniqueDf, file="~/Downloads/non_unique_species.csv", row.names=FALSE)
Here’s a spreadsheet with the results of the above: https://docs.google.com/spreadsheets/d/1nJKqWPyx9XuhWNgnqUtaby_3J_Jqrr75nb3URf5AblQ/edit?gid=1239078923#gid=1239078923. One thing that stands out: many of the apparent duplicates seem to actually be duplicate entries — the same species appearing twice, with two distinct IDs.
Looking at one of these apparently real duplicates, Utetheisa watubela, I see that the two iNaturalist entries differ in that one, https://www.inaturalist.org/taxa/1522834-Utetheisa-watubela, includes a subgenus in the hierarchy while the other, https://www.inaturalist.org/taxa/1550471-Utetheisa-watubela, does not.
Comparing columns across consecutive rows, it looks like the duplicates fall into pretty much two groups: ones for which all the levels of the hierarchy are the same (duplicate entries?) and ones that have more differences — distinct taxa that just happen to share the same genus and specific epithet.
# 1, 3, 5, ... n - 1
i <- seq(1, nrow(nonUniqueDf), 2)
# Compare columns across consecutive rows
numEqual <- rowSums(nonUniqueDf[i, -1] == nonUniqueDf[i + 1, -1])
> table(numEqual)
numEqual
1 5 6
39 1 53
For a project, I had been hoping that keeping things separate at the kingdom level would be enough. However, that one pair with 5 things equal is a problem:
> print(nonUniqueDf[i, ][numEqual == 5, ], row.names=FALSE)
id kingdom phylum class order family scientificName
1490794 Animalia Arthropoda Insecta Coleoptera Anthribidae Rhaphitropis guttifer
> print(nonUniqueDf[i + 1, ][numEqual == 5, ], row.names=FALSE)
id kingdom phylum class order family scientificName
1567926 Animalia Arthropoda Insecta Coleoptera Brentidae Rhaphitropis guttifer
So, it looks like there are two distinct species named Rhaphitropis guttifer one in family Anthribidae, the other in the family Brentidae. Wikispecies only seems to know about the one in Anthribidae: https://species.wikimedia.org/wiki/Rhaphitropis. Some of photos of the cutie: https://www.inaturalist.org/taxa/1490794-Rhaphitropis-guttifer/browse_photos.
Finally, here are the other 39 non-unique species names along with the kingdoms that divide them.
> print(cbind(nonUniqueDf[i, ][numEqual == 1, c("scientificName", "kingdom")], kingdom2=nonUniqueDf[i + 1, ][numEqual == 1, ]$kingdom), row.names=FALSE)
scientificName kingdom kingdom2
Acmella pusilla Animalia Plantae
Actaea bocki Animalia Plantae
Actaea jacquelinae Animalia Plantae
Canarium elegans Animalia Plantae
Chrysopogon castaneus Animalia Plantae
Clavulina rugosa Chromista Fungi
Clusia flava Animalia Plantae
Diplotaxis simplex Animalia Plantae
Eulalia aurea Animalia Plantae
Ficus variegata Animalia Plantae
Fischeria bicolor Animalia Plantae
Fritillaria messanensis Animalia Plantae
Gaussia princeps Animalia Plantae
Iris orientalis Animalia Plantae
Laelia rosea Animalia Plantae
Lasia rufa Animalia Plantae
Lasia splendens Animalia Plantae
Lecania olivacella Animalia Fungi
Leschenaultia expansa Animalia Plantae
Lola insularis Animalia Plantae
Mallotus floribundus Animalia Plantae
Nidularia pulvinata Animalia Fungi
Orestias elegans Animalia Plantae
Ormosia nobilis Animalia Plantae
Orthosia mollis Animalia Plantae
Osbornia cornuta Animalia Plantae
Pilophorus clavatus Animalia Fungi
Psychopsis gracilis Animalia Plantae
Rhaphidophora beccarii Animalia Plantae
Schaefferia oaxacana Animalia Plantae
Sirindhornia pulchella Animalia Plantae
Solenopsis bicolor Animalia Plantae
Solieria pacifica Animalia Plantae
Stelis alta Animalia Plantae
Stelis anthracina Animalia Plantae
Stelis elegans Animalia Plantae
Stelis franciscana Animalia Plantae
Stelis maculata Animalia Plantae
Tritonia pallida Animalia Plantae
Leave a Reply