Filter out buildings and other non-runnable features

kimluce99 · March 21, 2019, 11:14pm

I would love that option too–I’m down to the last 2% of my city and have quite a few streets that just have a few nodes I can’t get to because they’re in driveways or behind a fence or something. I agree that it would be nice to just be able to mark the node as non-runnable and not the whole street. Regarding the process of flagging them, I think maybe it’s changed, because when I went to flag one yesterday I no longer had the option of explaining why the street wasn’t runnable. There used to be a box for a comment but I’m not seeing that anymore. I’m guessing maybe because James doesn’t have the time to review them anyway? I know at one point he was deleting all of the non-runnable features manually and it was taking forever!

JamesChevalier · March 21, 2019, 11:38pm

Sorry, still super busy post-house-purchase ( ) … I’ll try to reply with more thoroughness as soon as I get the chance.

There is a new street flagging system. Along with a new “flagged street review system” (for lack of a better term). There is a Flagged Streets link in the menu for subscribers. It’s not available to everyone yet because I’m still working on it (monthly contributors get early access to some stuff).

These changes enable me to make a “street modification system” (again, I need some Marketing Terms ). So in the future we will be able to suggest new streets and suggest changes (delete nodes or move them around).

petje · March 22, 2019, 7:05am

Congrats on the house! just a thought, would it be an idea to use the real source to improve your DB? that is openstreetmap. So when a few nodes are off, or not a public street, one could enter the changes in openstreetmap and you could refresh from that source. So we have a double winn, OSM is improved and you wouldn’t have to manually do too much (when osm downloads is an easy idea)

zelonewolf · March 23, 2019, 10:34pm

How about we start by removing all the non-streets from the database?

JamesChevalier · March 24, 2019, 2:57am

I love your enthusiasm.

What you’re stating is very simple from a non-technical perspective. It is quite difficult to do while also not deleting tons of good data.

I’m doing everything I can to give CityStrides users the best, within my limits.

zelonewolf · March 24, 2019, 11:32am

What you’re stating is very simple from a non-technical perspective. It is quite difficult to do while also not deleting tons of good data.

@JamesChevalier, this idea of accidentally throwing away good data is simply not true. If you were to use the osmfilter utility to strip out all nodes that don’t have a highway= tag, you would eliminate 95%+ of the bad data in the data set. It would also eliminate zero bad data. zero. EVERY path, trail, street, or highway has this tag. I challenge you to come up with a single runnable feature that doesn’t have a highway= tag.

You are wasting your own time and everyone else’s time trying to do human review on every lake, park, and McDonald’s in the data set.

Also, if you pre-filtered the data set, it would make it smaller and cut down on processing time.

I’d be happy to help with coming up with the right command line switches for osmfilter.

JamesChevalier · March 24, 2019, 2:52pm

A big problem in all of this, and the reason that I claim it’s quite difficult, is that I already have 70GB+ of data in CityStrides that was imported with the old script. No amount of OSM tooling is going to fix this problem. While technically possible I’m not willing/able to spend the hundreds of dollars over some months to duplicate portions of my infrastructure, do a clean import, reprocess all activities against the new data, and switch over to use this new data.

I might be able to use your suggestions (going back to December and January in this thread) to make sure that new cities don’t include lots of bad data. In the past I’ve found that If I only import ‘highway’ tags, I would lose out on things that are not tagged completely. You’re making a heavy argument against this, so perhaps some things have changed - I’ll have to take another look.

This thread has run quite long, so I want to point out some key entries that explain some of the details:

zelonewolf · March 24, 2019, 5:33pm

I might be able to use your suggestions (going back to December and January in this thread) to make sure that new cities don’t include lots of bad data.

This is a pragmatic approach. It would at least stop the unconstrained growth of the “flagged streets” queue and make it manageable.

While technically possible I’m not willing/able to spend the hundreds of dollars over some months to duplicate portions of my infrastructure, do a clean import, reprocess all activities against the new data, and switch over to use this new data.

I would totally donate to that project . If there were a process for doing this, you would also solve the problem of adding new streets to cities (which I’ve definitely encountered) as new streets are added all the time.

Hypothetically – is there any pedigree maintained between CS and OSM? If IDs were maintained, then you could build in all sorts of functionality by reaching back to the big data set on a piecemeal rather than bulk-import basis.

zelonewolf · March 25, 2019, 12:44am

@JamesChevalier so check this out.

I was able to follow your process for generating JSON files and I used the small town of Moosup, CT as a test case. I created an OSM file for Moosup by using the corresponding .poly from your repo and chopping it out of the Connecticut OSM file. So far so good.

Next, I create a “clean” OSM file for Moosup (stripping out non-runnable features) with the following osm command:

osmfilter moosup.osm --keep="highway= and name=" --drop="access=no or access=private" --drop-relations -o=moosup_clean.osm

This command keeps all nodes/ways that have a highway= tag AND has a name. It also drops relations, which are not used by the ruby scripts. It also drops any nodes/ways explicitly marked as legally no access or private.

I then ran the two ruby scripts against both the original and the clean version. I did some cursory checks and it looks like the streets are still present in the clean version but stuff like rivers is gone.

Also, I was able to get some serious file size reduction!

OSM file:
Original = 438 kB
Clean = 166 kB

Nodes JSON file:
Original = 112 kB
Clean = 45 kB

Streets JSON file:
Original = 80 kB
Clean = 42 kB

I put my example files in a GitHub repo for reference.

What would you say to trying this pre-filter on a new city as a test case?

mayeradamj · March 26, 2019, 11:32pm

Can we see a list of whats been deleted? I’ve lost a good number of “streets” I’ve actually run, legally - Someone in Denver is deleting stuff they just don’t want to run.

JamesChevalier · March 27, 2019, 3:15am

I am currently the only person capable of deleting data. I’ve been deleting many non-street records based on the names (well known business names, parks, ponds, rivers, etc).
Is there any chance you recall a street name that you have completed that is no longer present?

mayeradamj · March 27, 2019, 3:46am

Ah ok, I didn’t know if there was some logic like if X% of people agreed on deleting it was auto removed…

I’m guessing they are all buildings or non street features, lots of stuff from the University of Denver college campus - stuff I got by actually running some weird routes but not genuine streets from what I can see

JamesChevalier · March 27, 2019, 1:03pm

No, I don’t have that logic in place yet - I want to keep an eye on what “bubbles up” to my admin level to get a sense of how accurate everyone is in reviewing flagged streets & if we all share the same perspective.

It looks like I’m confusing more than just you. I thought I was helping … I mean, I am, but I’m not being clear enough along the way & I’m unsure how to indicate that data within CityStrides is dynamic. Maybe I’ll have a better way of showing this after I build out the street modification system…

petje · March 27, 2019, 1:10pm

How about a popup for everyone who logs in at a certain point in time, with a big RED/Purple flashing colour stating you are maintaining the quality of the streets/non streets thingy and statistics and everything related will change in the coming period?

zelonewolf · March 28, 2019, 2:24am

How will the street modification system deal with updates from OSM?

The more I think about this, the better it seems to leave this to an OSM function. i.e., if there’s a street that’s partially private, the correct approach should be to mark that section of the street private in OSM and then CityStrides should update from OSM.

It seems silly to create a whole infrastructure and user interface for map editing when there’s a mature existing infrastructure already in place in OSM.

petje · March 28, 2019, 9:11am

I mentioned already in other threads, that for instance they build a 28 km long new highway over here. That messed up the infrastructure really good. Some runners have ran the streets a few years ago, but just this morning for instance, i had to put a street to manually completed, cos it doesn’t exist anymore! and several others to manual completed, cos now it’s an unrunnable highway

JamesChevalier · March 28, 2019, 12:47pm

There is no OSM → CityStrides (ongoing/continuous) sync and I don’t think it’s possible to build this. While I was first building CityStrides, I read that the IDs in OSM cannot be trusted because they can change and they can be reused. If that’s still true, then there’s no way to maintain parity between the two sites.

zelonewolf · March 28, 2019, 5:33pm

That’s a good point, actually. If you were to mark part of a street private in OSM, what you’d really be doing is splitting a way into two ways, each with it’s own ID.

I think what would have to happen is that a user could mark a street as needing update (this button used to exist). Separately, you could have a way to mark a missing street on a map. The UI could point the user to a page that describes what needs to be edited in OSM and the user would confirm that OSM has the correct data.

Then, you could prompt the user put in the direct link to the corrected/new way or ways (example: Way: ‪Main Street‬ (‪19386891‬) | OpenStreetMap)

That way, the system could just query for the one specific way and avoid resource-intensive actions.