Help Needed: Gather admin levels for Countries

I need help gathering the admin levels for regions & cities in countries. The first sheet (How To) describes the process you’ll take to get this data & the second sheet (Countries) is where you’ll enter the actual data. The third sheet is there for reference - it’s a complete list of all Countries with all of their Regions as they exist in CityStrides. This might be useful for determining what the region admin level value is for a country (if the list for an admin level matches / closely matches that list, then that’s probably the right value).

If you have some free time, and my instructions actually make sense, I’d love your help in filling out this list.
If the process is too much for you, but there’s a country that you want worked on, click on its name and apply bold formatting.

This is part of my ongoing work to completely automate the city/street/node data process. One of the biggest hurdles to automating the city import process is this step of figuring which data should be imported. Each country has their regions & cities at different levels, so I have to figure this out for every country that I want to bulk-import.

Update: I just found this list of admin values for specific countries, so that may help.

Update: I’ve got enough of the countries/regions mapped out to match the data currently in CityStrides … everything from here on out in that spreadsheet is pure progress towards worldwide coverage. :tada:

Also: If any of you are familiar with Overpass API, I could use some help with it - I’m trying to figure out how to limit searches to Regions within specific Countries

2 Likes

James:

A few helpful things I have discovered that may or may not be helpful…

It’s not perfectly universal, but admin_level=8 is the right city/town/municipal boundary level for most countries. You could probably use that and then later deal with any oddball exception countries by exception. But, admin=8 should deal with the vast vast majority of use cases and simplify your processing to the point where you don’t actually care what country/region the admin=8 boundary is in. In fact, I’m pretty sure every single admin=8 boundary by spec would be a valid “city” per the concept of citystrides.

The following overpass query will generate a list of admin=8 boundaries whose bounding box intersects a rectangle specified by coordinates. You could replace the part in parentheses with some other constraint. The “out tags” makes it only return the meta information and not the boundary itself. So you could even come up with a scheme where it adds a city only when an activity is first registered there.

The two parentheses sections are actually two intersecting rectangles which form a cross-shape in order to deal with the case where the searched rectangle is entirely inside the boundary of a city. If that makes sense…

[out:json];
(
relation["type"="boundary"]["admin_level"=8]({s},{ww},{n},{ee})({ss},{w},{nn},{e});
);
out tags;
>;

Yeah, I’ve noticed how a 4/8 combo usually works.
When I run this process on a global scale, I expect it to take many hours, so I’d prefer that my first run is as close to “done” as possible.

Check out the last link in the original post - I share my Overpass API query in there, where I’m limiting level 8 results by only querying within their corresponding Region (usually level 4). This way, I don’t need a bounding box - this seems simpler to me because I already have the Country/Region relation but I don’t have bounding box data. I’m may be missing something there, though…

Update: Thanks for mentioning out tags; … that helps trim down my queries quite a bit (I’m only using the id & the name tag in this step).

The stackexchange suggestion to use is_in is pretty terrible… that tag is recommended for deletion in the openstreetmap project.

So far I’ve found that OSM doesn’t have a reliable way to express parent/child boundary relationships, other than using the polygons themselves (workable, but painful).

The iso country code is an interesting idea that may work at the country level.

Does citystrides really need to know that THIS Pleasantville is different from THAT Pleasantville, or is it just enough to know that they are different cities?

1 Like

Kind of.

The query I use to collect streets is basically “give me streets in this region” … but if two regions have the same name in different countries then I get the cities in both regions.

I’m really hoping that the ISO idea is going to let me change that query to “give me the streets in this region (the one in this country)”…

If it were me, I would do a global out=tags query for all admin=8 boundaries and just record name and id (and maybe some other tags if you want them for display purposes). Then, I would one by one query for all of the streets in each city.

The public overpass endpoints may not let you hammer them that hard, so you might consider downloading an offline copy of OSM and running your own overpass instances for the initial load.

Interesting - I think that would mean dropping the Country/Region layers in my Rails app … :thinking: I’m not sure if that matters though … maybe I don’t need those layers (as browsable pages in the site, anyway).

You’ve helped a bunch, especially with how I even think about this problem. I figured it might help if you saw the overall flow for my current efforts. There are so many OSM tools out there, and the data is a bit difficult to work with (the way that I want to work with it) - maybe some of my efforts can be simplified at a higher level.


For each Region in the CityStrides database, the process can be broken down into three large steps:

  1. The first big step is generating the .poly files for each city in the Region.
    I currently only do this once, but as things get more automated I may do this monthly/quarterly/yearly. This section is scripted out in a more automated way, but for ease of explanation here…
    I have a query.xml file containing:
[out:json];
area['ISO3166-2'~'^COUNTRY_CODE']['admin_level'='REGION_ADMIN_LEVEL']['name'='REGION_NAME'];
(relation['admin_level'='CITY_ADMIN_LEVEL'](area););
out;
  • I use this to query Overpass API for all the Cities in that Region:
    • wget -nc www.overpass-api.de/api/interpreter --post-file=data/query.xml -O REGION_NAME.json
  • I have a script that iterates over the elements in that REGION_NAME.json file, downloading the response from this URL for each city:
  • Now I have a .poly file for every city in that Region

  1. Next, I have to massage the data into a usable format.
    I download the smallest geofabrik extract of that Region (http://download.geofabrik.de/)
  • Extract the area of the .poly file from the geofabrik extract file
    • osmconvert REGION-latest.osm.pbf -B=CITY.poly -o=CITY.osm
  • Extract just the streets from that osm file
    • osmium tags-filter CITY.osm w/highway!=motorway_link -o CITY.osm
  • Convert the streets to json format (you’ve seen this script, where you suggested ignoring more tags … the previous step is new, and handles most of that effort for me)
    • ruby get_streets.rb CITY.osm CITY_streets.json
  • Convert the nodes to json format
    • ruby get_nodes.rb CITY.osm CITY_nodes.json

  1. Finally, I have a script that reads those json files, writing the streets/nodes into the database. I could probably rewrite the get_streets.rb and get_nodes.rb scripts into one script that went directly into the database instead of the interim json file step…

So … after all the :scream: :scream_cat: subsides … can you spot anywhere that I can improve the overall process?
Your suggestion here makes me think there is a whole world of Overpass API queries that I haven’t figured out yet:

If it were me, I would do a global out=tags query for all admin=8 boundaries and just record name and id (and maybe some other tags if you want them for display purposes). Then, I would one by one query for all of the streets in each city.

I’m pretty sure there’s a massive simplification available here, unless I’m missing something. I have not actually tried any of this processing at a global level so I’m not sure how it would scale but I see no reason why it wouldn’t work.

First you need the list of city IDs. The overpass query below will generate that list for the entire globe. You could also use out ids instead of out tags and in theory that will be just the ID values and you could separately query to get the tags. I tried running the out ids version against overpass turbo and it eventually blew up with an out of memory error, but you could probably make it work against one of the other servers, download an offline copy, or else break the globe up into chunks.

[out:json];
(
    relation["type"="boundary"]["admin_level"=8];
);
out tags;
>;

Okay, so now you have a list of OSM ids for city boundary relations. For example, the boundary relation for Honolulu has an ID of 119231.

Next, you have to get the area ID for Honolulu. Areas don’t exist in OSM, it’s only an overpass concept. The area ID is just the relation ID plus 3.6 billion. So, the area ID for Honolulu is 3600119231.

Now, you make an overpass query for all of the ways in Honolulu, excluding all the things that aren’t streets:

[timeout:180][out:json];

area(3600119231);
(
way(area)
["name"]
["highway"]
["highway" !~ "path"]
["highway" !~ "steps"]
["highway" !~ "motorway"]
["highway" !~ "motorway_link"]
["highway" !~ "raceway"]
["highway" !~ "bridleway"]
["highway" !~ "proposed"]
["highway" !~ "construction"]
["highway" !~ "elevator"]
["highway" !~ "bus_guideway"]
["highway" !~ "footway"]
["foot" !~ "no"]
["access" !~ "private"]
["access" !~ "no"];
);
 
(._;>;);
out body;

I ran that on overpass turbo and it gave me a result in about 5 seconds. Not bad for a medium-sized city. The response will have all of the nodes first (id, lat, lon) followed by all the ways which list the nodes contained within each way, in sequence order.

Voila!

2 Likes

(Apologies, I feel like we’ve already discussed this but I want to make sure I understand)

As I’m iterating over the last query’s response elements array… I’m going through all of the items where type == way, and each segment of the street will be returned to me “in order”.

  • Is this “in order” in reference to an order that is drawable on a map?
    • If I draw each segment in its provided order, placing each of its nodes on the map in their given order, will I get a single continuous line?
    • Or, is OSM expecting me to handle each segment in isolation, with no continuous path from “start” to “end” of the full street?
  • Is this order based on the id field?
    • Can I sort by id to retain this order?
    • Or, is this order based on the actual index location of this item in this elements array?
1 Like

The order is based on the order that they appear in the array within each way. Each node has a unique ID and can appear in multiple ways.

Per spec:

A way is an ordered list of nodes

3 Likes

Thank you!

With your help, I think I’ve come up with this new plan which will allow me to keep the Country/Region/City relation structure that already exists (with your global city collection plan, I don’t think I would have knowledge of the Country/Region data for each city). It should

I’ll do this process for every Region in the CityStrides database:

Get all the Cities within the Region

For example, retrieve all Cities within Massachusetts, United States:

[out:json];
area['ISO3166-2'~'^US']['admin_level'='4']['name'='Massachusetts'];
(relation['admin_level'='8'](area););
out tags;
Iterate over the elements array, using the ID to collect all of the streets in each city

For example, retrieve all the Streets (with their Nodes) within Holyoke, MA United States:

[timeout:180][out:json];
area(3601181614);
(
way(area)
["name"]
["highway"]
["highway" !~ "path"]
["highway" !~ "steps"]
["highway" !~ "motorway"]
["highway" !~ "motorway_link"]
["highway" !~ "raceway"]
["highway" !~ "bridleway"]
["highway" !~ "proposed"]
["highway" !~ "construction"]
["highway" !~ "elevator"]
["highway" !~ "bus_guideway"]
["highway" !~ "footway"]
["foot" !~ "no"]
["access" !~ "private"]
["access" !~ "no"];
);
(._;>;);
out body;
Iterate over all of the "way" items items in the elements array to create the Street/Node records

I think for now I’ll stick with my current data structure (condensing down all of OSM’s ‘segments’ into single Street records within CityStrides), then I’ll find/create the Street by the name value for each item, and create each Node in the nodes array (storing its index in the array as an osm_order field of some sort).
I may opt to expand that osm_order field by including both the way ID as well as the node index e.g. 8766660-0, 8766660-1, 8766660-2, etc. This would allow me to draw Streets on the map as lines.

I do expect that I’ll need to stand up my own Overpass server to do all this work against, so that I can avoid rate limiting. I am worried about the build time there - based on the docs, it looks like it could take 48 hours to start up an Overpass server.

I think that I can do the whole process in parallel across Regions. Maybe have 10-20 processes running at once, each working on their own Region.

Update: I’ve just realized that I need a geojson copy of the city border, in order to display that on the map. It looks like I can use out geom; to include the lat/lon values that comprise each City’s border. Then I’d have to build the geojson by hand…

This example for all the cities in Massachusetts
[out:json];
area['ISO3166-2'~'^US']['admin_level'='4']['name'='Massachusetts'];
(relation['admin_level'='8'](area););
out geom;

The next hurdle is how to continue doing this (monthly/quarterly/yearly?) to keep everything up to date. I think that depends on how long the process above takes…
My thought at the moment is to have this whole process writing to a brand new database, which I can swap out in the live site. This would also mean that I’d need to reprocess all activities against the new database - so I might also extract that out to its own database that also gets swapped out in the live site.

  1. New database for Country/Region/City/Street/Node data & new database for completion data
  2. Run the above process to completely populate Country/Region/City/Street/Node data
  3. Reprocess activities into the new completion database
  4. Swap out both databases from old → new for the website

:man_shrugging: we’ll see…

1 Like

Updates you say?

Overpass has you covered. Add:

[diff:"2019-08-01T00:00:00Z"]

…to the first line, and it will give you only the differences since the listed date. It only supports XML, but it returns very small output files with what has been created, deleted, or changed. In theory you should be able to do incremental re-processing rather than complete bulk loads.

2 Likes

Ugh. I just hit a snag with this result set… Maybe you can help me figure it out.

This query returns all of the streets, with their nodes, in a city. :+1:
The nodes in the returned nodes array for each way are in a displayable/usable order. :+1:
However, streets can be split up among multiple way records & these don’t appear to be in a usable/displayable order. :frowning_face:

I spot-checked a single street (Cherry Street in Holyoke, MA US because I know that this is split into many sections). If I work through the instances in the order they’re provided to me (seems to be sorted by OSM ID) then the sections are out of order. This means I can’t convert it to an encoded polyline.

:man_shrugging: I guess it’s not the end of the world, but encoded polylines are much more efficient. :slightly_frowning_face:
Do you have any ideas, or would you just plop the individual sections on the map (or set an encoded polyline for each section, instead of each street)?

No worries if all you’ve got is a big :man_shrugging: you’ve already been a huge help already!

OSM only knows about nodes, ways, and relations. So yes, there is no implied ordering between multiple ways that happen to have the same name – they’re completely separate objects. So, if you must conjoin adjacent ways together, you have to walk through the node endpoints and stitch them together. It’s actually not a terribly difficult piece of code to write. City boundaries have this problem also, you have to stitch the individual ways together to make a usable polygon.

Also keep in mind that streets are not necessarily contiguous. They could be divided into several disjoint sections in the same city. For a particularly extreme example of this, check out Cape Coral, Florida.

I might be misunderstanding - coding is a bit over my head and I got lost at encoded polyline - but is your intention to combine all streets with the same name in a city into one? This seems to be what happens at present - it might just be a quirk of English road names being quite repetitive, but I’ve noticed fairly often that running all the nodes in a street named, say, Church Street, won’t complete it because there are five other distinct Church Streets in the city and all nodes for all of them are lumped in together.

I don’t know if this is an OSM thing or a citystrides import thing, but I thought it might be worth mentioning.

is your intention to combine all streets with the same name in a city into one?

At the moment, yes, that’s my plan.

The reason is that OpenStreetMap doesn’t have the same concept of “Street” that CityStrides has. In CityStrides, there’s one thing - a Street - that has many Nodes. While in OSM, there are just piles of different ways which have many nodes - so a single street may be split up across a dozen different ways.
When I import that data into CityStrides, I need to decide how to handle it. At the moment I base it on the name of the way, so all of the ways with the name of “Church Street” are all lumped together as one Street. In my area, that’s not a problem - it sounds like this can be an issue in some areas.

It would be quite a bit of effort to determine which streets need to be split up, and maintain that split over time (an upcoming effort is to continuously sync OSM data into CityStrides).

:thinking: :man_shrugging:

2 Likes

That makes sense, thanks for explaining.

A post was split to a new topic: Belgium regions need cleanup