Server issues March 21 ~7-9pm Eastern Time

JamesChevalier · March 22, 2021, 1:42am

The server that stores all the background job info ran out of memory because someone mentioned CityStrides in #ukrunchat on Twitter. I already knew that you all run too much, but it turns out there’s a bunch of other people who haven’t heard of CityStrides yet who also run too much (hey, maybe you can be friends!) so now that I’ve resized that server (goodbyeeeeeee ) things are back in shape.

The time frame is a bit of a guess it definitely ended by 9:30pm Eastern, though.

I think there could be some issues around activities synced during this time frame where either the full activity won’t be present in CityStrides or the progress won’t be correctly/fully calculated. So if you notice any troubles with activities that would have synced in that time frame, let me know here and I can clean it up for you.
An activity might have synced during that time frame if you saved it around then or if you signed up sometime before then & your sync was ongoing at that time. That’s not super helpful, but hopefully it’s better than nothing.

If you’re new here and this is your introduction to CityStrides oh no

Update: Of course, everything happens all of the time … Today the database also ran low on space. I have alerting set up to catch this in time, so there was never any danger of completely running out. The process to resize that disk, however, causes the performance of the database to be degraded. The fact that this is happening right when there’s a massive increase in background job activity means I’m having trouble keeping the site up. I’m looking into throttling the background jobs in a way that lets the site run.

richwarne2003 · March 22, 2021, 4:07pm

I’ll keep subscribing!!!

matthewlindley · March 22, 2021, 6:38pm

My run from 3/21 at 3:37 EDT hasn’t synced yet. Is this part of the broader server issue?

JamesChevalier · March 22, 2021, 6:44pm

Could be … I queued up a sync in case it is.

jpbari · March 22, 2021, 7:12pm

looks like another swamping/attack.

JamesChevalier · March 22, 2021, 7:47pm

Yeah, it’s the same situation from my ‘update’ on the first post here.

I had to increase the size of the database disk, which causes the database to be less performant … that, paired with all the new signups, means things are extremely fragile right now. I’d normally resize the disk at night, to avoid all this trouble, but everything all kind of happened at once. I’ve got some solid ideas on how to avoid this in the future (mainly taking earlier action on my existing ‘low disk’ alerts ).

For the original issue - lots of signups queuing up more work than can be handled - I’ll be spending a bunch of time tonight figuring out where I can ‘pause’ things if server resources decrease enough.

fredrik.coulter · March 22, 2021, 8:06pm

Being selfish, since at this point all my new road runs are on weekends (I’ve run all the roads that I can get to and back in an hour), feel free to do all your maintenance activities during the week. It’s unimportant if it affects anyone else as long as I’m OK.