Fixing street double count

On a recent run I had an idea of how to resolve the street double-counting problem. I am not a database expert, so maybe there is something obviously wrong with this approach that makes it unworkable. If this idea is bad, hopefully better ones will emerge through Cunningham’s Law :smile: . Aside from the synching stuff, I think the lack of comparability of strider stats due to double counting is one of the main issues on the site, so I’m curious to hear everyone’s thoughts!

The idea is to add a level to the current Country->Region->City->Street->Node structure, turning it into Country->Region->City-> SubCity ->Street->Node.

In the new hierarchy, the City level would be similar to the current Region level, in the sense that it would contain a number of SubCities. The way to implement this would be to turn all existing Cities into SubCities of themselves, such that for each City SubCity0 is essentially the city itself. Additional SubCities can then be added as required, like it is possible to add Cities to Regions currently. Importantly, the City would take street completion data from SubCity0 only, and City level data would continue to be used for any reporting on a macro level.

For example, Manhattan and Manhattan Community Board 1 are currently causing double counting.

The structure is:

USA->NewYork->Manhattan
USA->NewYork->Manhattan Community Board 1

This would turn into:

USA ->NewYork->Manhattan->

  1. Manhattan
  2. Manhattan Community Board 1

The pros:

  • The processing of streets would stay unchanged
  • Double counting is eliminated, since only streets completed in SubCity0 (ie. the city itself) are counted toward runner totals. All street count figures become comparable; as do leader boards
  • Each SubCity can have its own completion stats and leader boards, since processing etc. is the same as it is for cities currently
  • As long as suitable objects exist in OSM it is easy to create smaller projects for those daunted by big cities
  • SubCities are not constrained geographically or in number. Basically, it would be possible to add in any number of SubCities without breaking the structure. This opens a number of possibilities, like custom areas (maybe with a citystrides tag in OSM), or challenges like “run this random selection of 100 streets”.

The cons:

  • Current overlapping areas would have to be manually assigned to their “proper” city as Subs and deleted. Since it’s a finite list this it should be fixable
  • All accounts would need to be reprocessed (a good reason to implement this before map updates, imo)
  • The layout of the site would need to be adjusted to reflect the new structure

Do you have insight into his data structures already? I made some suggestions on another thread but that was just shooting from the hip as I don’t have a deep understanding of how he stores the data or how the city data is related. For example I have “Village of Menands” & “Village of Colonie” which are both part of “Town of Colonie” but is that information readily available from OSM and stored in City Strides? If so the problem seems not overly complex but I’m not familiar with how many other “overlap” issues exist…only the ones in my area. Also someone brought up the problem of a road that cuts across many cities.

In my below example City AB and City AC are actually part of City A and share full streets and City D as a shared street where part of the road is in both cities. The roads for AB, AC and A that overlap should not be separate records but the same street with the same id. Use the complete column on the city_streets table to designate these roads as complete for the towns. A street is only marked complete in the streets table when it has been marked complete for all the cities it exists in yet still allows for you to determine that a specific City is complete even though the part of the street that extends beyond the city is not. So if half of the road is in City A and City D but you only ran City A’s part it will show complete in the city_streets table but not in the cities table until you finish the D portion as well. When running aggregate queries for leaderboards/challenges you would only ever count a street once unless you want to allow for the shared streets across towns (not sub-towns) to count multiple times. Anyways as I said in the other thread I’m sure things are more complex than this and James probably has a lot of caching tables and other complexities to deal with, just a thought based on my limited knowledge.

assuming tables: cities, streets, city
cities

city_id city_name
1 City A
2 City AB
3 City AC
4 City B

streets

street_id street_name complete
1 Street A 1
2 Street B 0

city_streets

city_id street_id complete
1 1 1
2 1 1
3 1 1
4 1 1
1 2 1
4 2 0

My understanding is that there are two separate issues: streets that straddle the border of a city incorrectly extending outside the city and being duplicated in the neighbouring city, and entire geographical areas existing in multiple cities in citystrides (in your example, both Colonie and Village of Menands are overlapping with Town of Colonie).

I believe the first issue is already being addressed, but needs a reimport of cities to take effect. My idea only addresses the second duplication scenario. Here the problem is that cities geographically overlap, causing the double count. My suggestion removes that by having overall strider completion figures sourced from only the “main” city (in your example, Town of Colonie would be SubCity0, Village of Menands SubCity1, etc.), leaving the cities (or other areas) contained inside it as personal challenges that don’t affect street counts. In OSM Town of Colonie is an admin level 7 town, the other two are admin level 8 villages, making the overlap possible in the first place.

1 Like

I think with the right database design it doesn’t matter if the subcities exist as you can easily write queries to dedupe. I “think” the issue is really more around the problem that “Street A” that exists in the main city and possibly one or more sub-cties is stored in the database as separate records instead of using the same street record for all of them. The fact that I can go to the same street in the example links below with different ids seems to support this (street_id: 21496 & 198114 are the same except for what town they belong to). I think ideally should be just one record for this road, then you could list all the cities that road belongs to using a table that sets up and stores that relationship (city_streets above). All of this assumes that you can actually identify that street 21496 & 198114 are actually the same street and store it as just one record instead of two. All of that being said this is probably a bigger change than just identifying and excluding sub-cities but its probably the better approach long term.

edit: I do like your idea for a short term fix, it does seem somewhat simple if you can identify what cities are subcities of what easily (this is really the key for both scenarios). I think it’s a less clean approach from a data/architecture side but it is also seems like a much smaller simpler change then restructuring everything.

Considering that he imports individual cities as requested, it would appear to me that the database is structured with streets as a subsection of city, with each city being a separate item.

If I’m correct, then the geographical update will merely truncate and separate the street that straddles two cities into two separate streets, one for each city. BUT each piece will be inside the city, and not extending outside the city limits.

Remember that he’s working with the data in Open Street Map, which has no concept of a street, just segments. He takes all the segments with the same name in a city and assumes that they all make up a single road, whether they are contiguous or not.

I know nothing about the data in open street map. Truncating or breaking up a single street into multiple streets seems like a much different issue than duplicate streets but I included it as just part of the exercise. Most of the duplicates are just the exactly same street multiple times. If the segments that are used to define a street are the same then the streets should be the same right? I mean I care a lot less about some 40 mile street being broken into 3 separate streets that crosses 3 different cities than duplicating the same half mile road 2-4 times for hundreds of streets in a town.

I’m still confused as to your point.

Do you mean the same city block counts multiple times within a single city? Or do you mean the same city block counts as a street segment in multiple separate cities? If the latter, that’s being worked on. Basically, if it’s outside the city limits it will be removed from that city’s streets. Eventually.

The third option is the one best displayed by Manhattan (NY, USA), where there are two “cities” covering the same geographic location. One being the actual island of Manhattan, and the second being an administrative boundary within the island of Manhattan. (Interestingly, I don’t think the site considers New York City a city, just each of the separate boroughs. It’s probably a question of just too many streets in the entire city.)

The site does not (as far as I can tell) even consider unincorporated sections of the world. For example, I live in Deland, FL, which is in Volusia County. While there are a lot of cities in the county – perhaps you’ve heard of Daytona Beach – there’s a large amount of the county that’s not in any city. If I run a street in Deleon Springs or Glenwood, it wouldn’t count for anything on City Strides. What should City Strides did about those unincorporated communities, if anything.

Fredrick, I give specific examples earlier in the thread. Sub-cities count for the sub-city as well as the bigger city. The Village of Menands & Village of Colonie are both contained entirely in Town of Colonie. Whenever you run a street in Menands it also counts for Town of Colonie as 2 separate streets when its the exact same street. This means that instead of linking the same street to two different cities there are two copies of the same street. This is very prevalent in my area. If you click on any of the leaders in the leaderboard that made their profile public you can see they have huge amounts of duplicates in most of their activities (pretty much the only way you can get into the leaderboards).

One issue I see with storing streets as single records is that subcities could include only a part of a longer street. For example, city A has a fully enclosed subcity B, and there is a street with 3 nodes where all 3 are in A, but only 2 in B.

I have an attempt at this being released right now.

I approached this by:

  • Adding the ability to mark a city as “nested”
  • Adding the ability to relate a nested city to its parent city (a link to the parent city is displayed on nested city pages)
  • Not including nested cities when doing street counts
  • Not including nested cities in the Challenge calculations (unless the Challenge happens to be on a nested city)

The caveat is that I have to mark cities as “nested” manually. I’ve opened up a new thread for people to report those cities: Discussing 'nested' cities

I’m going to keep this thread open for a bit, to capture bugs / opinions.

4 Likes

Hi James, I’m running some Boroughs of London which were double counted in Greater London making my street count much too high. I notice that my total street count is now much lower but it is now actually lower than the the number of streets displayed that I have run in a single borough. I.e. total number of streets run = 2198 but number of streets run in Borough of Croydon alone = 2475

:thinking:
The updated code changes the counting (or, well, should have :flushed: :grimacing:) from
“all streets”
to
“all streets in cities that aren’t nested”

So my guess (I haven’t had time to look into it) is that you’ve got some streets in nestedCity that are complete, but not complete in parentCity.
That or there are some streets that exist in nestedCity that don’t exist in parentCity.

:thinking:

Another guess (haven’t looked at the data yet, just wanted to write down the thought for later) is that the Boroughs of London were imported after I fixed the “streets extending beyond the city border” issue & you’ve completed some streets in the boroughs, but not their entirety as they exist within London.

I notice the ‘Completed Streets’ total from the dropdown list is still the double-counted version as the list of streets continues to show duplicates. The 2198 (too low) figure is the one displayed at the top-left of my profile page.
Personally, I’m not that interested in my overall total - just the individual cities’ totals and they’re all showing correct as far as I can tell.

Great to see this being implemented!

Is there a way to see what areas are nested within a parent? It would be cool to have this as a tab similar to Streets/Striders, otherwise, there is no way of knowing what nested areas are available in a given city.

Maybe nested areas could also be marked somehow in the profile city list (indented maybe?).

I also noticed that the city border outlines have been changed to a different shade of blue that is very similar to what mapbox uses for waterways. It makes the outline very hard to see, especially as part of a busy lifemap (example).

Not yet. I’m displaying the ‘parent’ city as a link on the nested city pages e.g. Manhattan Community Board 1, New York - CityStrides includes “Nested in Manhattan”. I don’t have the reverse UI built yet.

Yeah, seems like that might be a good idea. That whole display needs a re-think anyway.
Eventually I also want to update the completed/progressed street lists to either indicate their city or separate/mark them as parent/nested.

Yeah, someone else mentioned that as well in Something amiss with colour scheme - #14 by hjkiddk
I do plan on fixing that … just wild busy with this street-count effort :sweat_smile:

2 Likes