I had a Google Sheet with data like the following:
City | Count |
Los Angeles | 10000 |
San Diego | 9000 |
San Jose | 1000 |
Tiburon | 10 |
Palo Alto | 5 |
I wanted to understand what fraction of the count was in cities lying north of Fresno — “Northern California”.
My first thought was to look up latitude using a geocoding service; the solution here using Google’s Geocoder API worked pretty well for me: https://www.googlecloudcommunity.com/gc/Exploring-Curating-Data/Get-latitude-longitude-for-any-location-through-Google-Sheets/m-p/570196. Also helpful for this was this list of bounding boxes for the US states (CSV here). But then I hit the daily limit for calls to the API.
So, instead I got the list of California cities and longitude + latitude from https://www.kaggle.com/datasets/camnugent/california-housing-feature-engineering and inserted it into a new worksheet “CACities”:
Name | Latitude | Longitude |
Adelanto | 34.582769 | -117.409214 |
Agoura Hills | 34.153339 | -118.761675 |
Alameda | 37.765206 | -122.241636 |
Albany | 37.886869 | -122.297747 |
Then in the first sheet, we can look up the latitude for a given city with a formula like the following:
=VLOOKUP($A2, CACities!$A$2:$B$460, 2, FALSE)
Looking in CACities, we see that Fresno has latitude 36.746842. Latitude increases as you go north (hitting 90 at the north pole) so we just need to look for latitudes greater than this.
To find the sum of counts north of Fresno, we can use the following:
=SUMIF(C:C, "> 36.746842", B:B)
To find the total sum of counts with a known latitude:
=SUMIF(C:C, "<>#N/A", B:B)
And putting these together, we can find the fraction (weighted by count) north of Fresno as:
=SUMIF(C:C, "> 36.746842", B:B) / SUMIF(C:C, "<>#N/A", B:B)
Is there a better way to do this?
Leave a Reply