Filter out buildings and other non-runnable features

Okay, I see what they did. Basically this script pulls down every that has a “name” tag.

I modified this a bit to also exclude any ways that have one of a handful of “definitely not runnable” features.

See Code: https://repl.it/@BrianSperlongan/QuickwittedIndolentNonagon

Sorry, not able to test or run it, but I think you get the idea…

2 Likes

Quick question… is the process to flag the non-runnable features AND mark them manually complete, or just flag them?

I think I mentioned somewhere else, that it would be great if we can flag/complete individual nodes… such as gated communities as one example. The street might be a runnable street, but then end prematurely due to gated community, so don’t want to manually complete the entire street, just the nodes I can’t get to.

4 Likes

Marking them complete will screw up your percentages. Frankly, I wish the “mark manually complete” option would go away. If it’s in the system and it’s runnable, you should have to run it.

1 Like

Could something like this be done?

CityStrides users/volunteers could submit a list of all streets, in a predetermined format, for their town and that list would be used as the standard for completing 100%.of the streets in that town. If a user completes all the streets on that list, it would show 100%. If a new street was created due to a new development, a user could submit a new street name to be added to that list and the users would now see a completion of 99%, signaling them that a new street has been created in that town.

Users could continue to run other features, like parks, trails, etc and have them display on their LifeMap but would not count towards % of streets complete.

2 Likes

sounds like the prefect solution to me

The ‘manual completed’ idea came from the worry that someone might be an avid runner, but not using a run tracking service … then starting both run tracking & CityStrides … and being able to tick off all the streets they know they’ve run (maybe they have a particular loop they do all the time).
That, or GPS issues during tracking that manage to miss some Nodes on a run, or something.

Thanks for sharing your perspective on this. I’m imagining what things might look like without that feature. I think the GPS issues problem is real (especially the way that I currently determine if a Node is ‘completed’ or not), but I’m not sure if that occurs frequently enough to really matter. I think the ‘new user with an untracked history’ problem is garbage, upon reconsidering it. :laughing:

I want to see what everyone thinks about this idea, so I added it to the Ideas category.

@JamesChevalier, what are you using as the input for generating the streets list? Are you working from the offline copy of OSM? I’ve been playing with a subsetting tool called osmfilter, and it looks like it’s possible to filter an offline file in-place which could solve the vast majority of the problem with non-runnable features without having to change any code – the input global OSM file could just be pre-filtered before letting citystrides loose on it.

Is that a viable approach?

Also, this could considerably cut down on processing time if your starting OSM file is much smaller.

Re ‘new user with untracked history’, I’m one such user, in a way. While I did start out with a Garmin watch, for a few months/years (?) I switched to some app that rewarded my runs with gift cards but is not recognized by CityStrides. Completely, i.e. I used only one tracking device at a time. My plan is to re-run whatever streets I know I already ran back then. It may be possible to find the old data and somehow import into CityStrides, but it’s more enjoyable just re-doing the physical work.

I’ve thought about this also, but I realized before the advent of the GPS device, most runs were out-and-back or loop courses where the distance was determined by measuring with an odometer from a car or bike. Even if all my years of old data were uploaded, I would have to run those routes again just to access all the missing side streets that I didn’t run. So…it is much easier for a new user to start with a clean slate and enjoy the process.

I would love that option too–I’m down to the last 2% of my city and have quite a few streets that just have a few nodes I can’t get to because they’re in driveways or behind a fence or something. I agree that it would be nice to just be able to mark the node as non-runnable and not the whole street. Regarding the process of flagging them, I think maybe it’s changed, because when I went to flag one yesterday I no longer had the option of explaining why the street wasn’t runnable. There used to be a box for a comment but I’m not seeing that anymore. I’m guessing maybe because James doesn’t have the time to review them anyway? I know at one point he was deleting all of the non-runnable features manually and it was taking forever!

1 Like

Sorry, still super busy post-house-purchase ( :tada: ) … I’ll try to reply with more thoroughness as soon as I get the chance.

There is a new street flagging system. Along with a new “flagged street review system” (for lack of a better term). There is a Flagged Streets link in the menu for subscribers. It’s not available to everyone yet because I’m still working on it (monthly contributors get early access to some stuff).

These changes enable me to make a “street modification system” (again, I need some Marketing Terms :smile: ). So in the future we will be able to suggest new streets and suggest changes (delete nodes or move them around).

:rocket:

2 Likes

Congrats on the house! just a thought, would it be an idea to use the real source to improve your DB? that is openstreetmap. So when a few nodes are off, or not a public street, one could enter the changes in openstreetmap and you could refresh from that source. So we have a double winn, OSM is improved and you wouldn’t have to manually do too much (when osm downloads is an easy idea)

How about we start by removing all the non-streets from the database?

I love your enthusiasm.

What you’re stating is very simple from a non-technical perspective. It is quite difficult to do while also not deleting tons of good data.

I’m doing everything I can to give CityStrides users the best, within my limits. :smile:

1 Like

What you’re stating is very simple from a non-technical perspective. It is quite difficult to do while also not deleting tons of good data.

@JamesChevalier, this idea of accidentally throwing away good data is simply not true. If you were to use the osmfilter utility to strip out all nodes that don’t have a highway= tag, you would eliminate 95%+ of the bad data in the data set. It would also eliminate zero bad data. zero. EVERY path, trail, street, or highway has this tag. I challenge you to come up with a single runnable feature that doesn’t have a highway= tag.

You are wasting your own time and everyone else’s time trying to do human review on every lake, park, and McDonald’s in the data set.

Also, if you pre-filtered the data set, it would make it smaller and cut down on processing time.

I’d be happy to help with coming up with the right command line switches for osmfilter.

1 Like

A big problem in all of this, and the reason that I claim it’s quite difficult, is that I already have 70GB+ of data in CityStrides that was imported with the old script. No amount of OSM tooling is going to fix this problem. While technically possible I’m not willing/able to spend the hundreds of dollars over some months to duplicate portions of my infrastructure, do a clean import, reprocess all activities against the new data, and switch over to use this new data.

I might be able to use your suggestions (going back to December and January in this thread) to make sure that new cities don’t include lots of bad data. In the past I’ve found that If I only import ‘highway’ tags, I would lose out on things that are not tagged completely. You’re making a heavy argument against this, so perhaps some things have changed - I’ll have to take another look.

This thread has run quite long, so I want to point out some key entries that explain some of the details:

2 Likes

I might be able to use your suggestions (going back to December and January in this thread) to make sure that new cities don’t include lots of bad data.

This is a pragmatic approach. It would at least stop the unconstrained growth of the “flagged streets” queue and make it manageable.

While technically possible I’m not willing/able to spend the hundreds of dollars over some months to duplicate portions of my infrastructure, do a clean import, reprocess all activities against the new data, and switch over to use this new data.

I would totally donate to that project :smiley: . If there were a process for doing this, you would also solve the problem of adding new streets to cities (which I’ve definitely encountered) as new streets are added all the time.

Hypothetically – is there any pedigree maintained between CS and OSM? If IDs were maintained, then you could build in all sorts of functionality by reaching back to the big data set on a piecemeal rather than bulk-import basis.

1 Like

@JamesChevalier so check this out.

I was able to follow your process for generating JSON files and I used the small town of Moosup, CT as a test case. I created an OSM file for Moosup by using the corresponding .poly from your repo and chopping it out of the Connecticut OSM file. So far so good.

Next, I create a “clean” OSM file for Moosup (stripping out non-runnable features) with the following osm command:

osmfilter moosup.osm --keep="highway= and name=" --drop="access=no or access=private" --drop-relations -o=moosup_clean.osm

This command keeps all nodes/ways that have a highway= tag AND has a name. It also drops relations, which are not used by the ruby scripts. It also drops any nodes/ways explicitly marked as legally no access or private.

I then ran the two ruby scripts against both the original and the clean version. I did some cursory checks and it looks like the streets are still present in the clean version but stuff like rivers is gone.

Also, I was able to get some serious file size reduction!

OSM file:
Original = 438 kB
Clean = 166 kB

Nodes JSON file:
Original = 112 kB
Clean = 45 kB

Streets JSON file:
Original = 80 kB
Clean = 42 kB

I put my example files in a GitHub repo for reference.

What would you say to trying this pre-filter on a new city as a test case?

2 Likes

Can we see a list of whats been deleted? I’ve lost a good number of “streets” I’ve actually run, legally - Someone in Denver is deleting stuff they just don’t want to run.

3 Likes

I am currently the only person capable of deleting data. I’ve been deleting many non-street records based on the names (well known business names, parks, ponds, rivers, etc).
Is there any chance you recall a street name that you have completed that is no longer present?