Here’s something fun: boundaries for cities in California: https://lab.data.ca.gov/dataset/california-city-boundaries-and-identifiers.
Working with this in R:
> library(sf)
> cities <- read_sf("~/Downloads/California_Cities_and_Identifiers_Blue_Version_view_-6943741225906831761.gpkg")
There are a few columns with the city names; “CENSUS_PLACE_NAME” and “CDTFA_CITY” (CA Department of Tax and Fee Administration?) agree so let’s just use “CDTFA_CITY”
> sum(cities$CENSUS_PLACE_NAME != cities$CDTFA_CITY)
[1] 0
It’s nice to see that there are no duplicate city names. Very thoughtful of the town founders. Since new cities don’t get added so often, that uniqueness seems like something pretty safe to depend on when looking at California data specifically.
> length(unique(cities$CDTFA_CITY))
[1] 483
> nrow(cities)
[1] 483
483 seemed too small to me for the number of cities, but Wikipedia confirms it: https://en.wikipedia.org/wiki/List_of_municipalities_in_California — “California is divided into 58 counties and contains 483 municipalities.”
Let’s also pull in those stats that people have so nicely shared on Wikipedia, following the recipe provided by snee:
library(httr)
library(XML)
wikiTables <- readHTMLTable(
doc=content(GET("https://en.wikipedia.org/wiki/List_of_municipalities_in_California"), "text"))
# After a little tweaking
> stats <- setNames(tail(wikiTables[[2]], n=-2), c("city_name", "type", "county", "pop_2020", "pop_2010", "pop_delta", "area_mi2", "area_km2", "pop_density", "incorporation_date"))
> nrow(stats)
[1] 483
# Take a peek
> head(stats)
city_name type county pop_2020 pop_2010 pop_delta area_mi2 area_km2 pop_density incorporation_date
3 Adelanto City San Bernardino 38,046 31,765 +19.8% 52.87 136.9 719.6/sq mi (277.8/km2) December 22, 1970
Let’s munge it a bit more to make it nicer to work with:
> stats$county_seat <- grepl("†|‡", stats$city_name)
> stats$city_name <- gsub("†|‡", "", stats$city_name)
> sum(stats$county_seat)
[1] 50
Doh! There should be 58 county seats, right? Doing a quick search in the browser on https://en.wikipedia.org/wiki/List_of_municipalities_in_California only shows 50 rows marked there also.
I happen to have all the county seats here: https://github.com/fadend/county_seat_coords/blob/main/county_seat_coords/generated/ca_usa_county_seats.csv. (See my post California County Seats.) Let’s compare:
seats <- read.csv(url("https://raw.githubusercontent.com/fadend/county_seat_coords/refs/heads/main/county_seat_coords/generated/ca_usa_county_seats.csv"), stringsAsFactors=FALSE)
# Test whether all county seats are in the list of cities.
> all(seats$county_seat %in% stats$city_name)
[1] FALSE
# What the ...?!
> setdiff(seats$county_seat, stats$city_name)
[1] "Markleeville" "San Andreas" "Independence" "Mariposa" "Bridgeport" "Quincy" "Downieville" "Weaverville"
# Test for agreement on what is a county seat otherwise.
> all(stats$county_seat[stats$city_name %in% seats$county_seat])
[1] TRUE
# # Test for agreement on what is *not* a county seat otherwise.
> all(!stats$county_seat[!(stats$city_name %in% seats$county_seat)])
[1] TRUE
What the heck is going on here? Why aren’t Bridgeport, Downieville, Independence, Mariposa, Markleeville, Quincy, San Andreas, or Weaverville getting counted as cities? Having visited it, Mariposa seems very city-like.
Ah, Wikipedia to the rescue again: https://en.wikipedia.org/wiki/Mariposa,_California “Mariposa… is an unincorporated community and census-designated place (CDP)…”
Geometries for the census-designed places are also available: https://lab.data.ca.gov/dataset/california-census-designated-places-cdps. If we did want to include, say, Mariposa, can we come up with a clear rule for which of them are “city-like” enough? At least it seems weird to be missing county seats from the list of California cities even if for whatever reason their residents have never incorporated officially as a city.
Leave a Reply