According to the website there are 775 subscribers. Anyone who runs later in the day has to wait until the next day to have their run upload which I think everyone will agree sucks. Can you reserve at least 1 run per subscriber so that we don’t have to wait forever? Also, I didn’t realize until recently that some people go to the track and save each interval as a separate strava activity. Can we limit the number of uploads per day so we don’t always hit the 30k API call limit?
Its funny Jeremy I was just going to post exactly the same thing! (Regarding reserving a slot)
Have 2 pools of Strava slots, one for subscribers (subscriber count * 1.1 or whatever multiplier handles most cases, maybe its less than 1.0) and one for non-subscribers. If extra slots are available in either pool near the end of the day before the limit is reset then use them up in the 30 min before?
I can’t directly do this, because of how everything works (between CityStrides as well as Strava limits). There might be something that can be done, though… Here are the details, maybe people smarter than me have some ideas.
This is detailed in CityStrides
- 600 requests every 15 minutes
- 30,000 requests every 24 hours
- Full account syncs; this is new members and any Supporters using the “Sync Now” feature
- Individual activity syncs; Strava sends me alerts (webhooks) every time someone saves an activity in Strava which CityStrides responds to by saving the activity
- An activity sync uses 2 API requests to Strava.
- A new-member account sync is 2 API requests per activity plus 1 API request per 100 activities. So 1,000 activities would be
(1000/100)+(1000*2) = 2,010.
- A Supporter using the “Sync Now” feature is a little more complicated because I’m skipping activities that already exist in CityStrides but I’m still iterating through all of them. The 1 API request per 100 activities is a constant but the 2 API requests per activity is only for activities that don’t exist in CityStrides.
This is detailed in CityStrides
The reason for these throttles is to not block people from signing into CityStrides (sign in is an API request)
- Syncing is paused with 100 remaining requests for the 15 minute limit
- Syncing is paused with 1,000 remaining requests the 24 hour limit
- Limit “Sync Now” to once per day
- Allow webhooks to work as-is for Supporters, and handle non-Supporters differently (some ideas below)
- Schedule non-Supporters to some time in the second half of the limiting period - if the API limits are all ok, then that would mean webhooks would schedule their activity sync to some time later in the 15-min period (or the next period if the current is exceeded) … if the 24-hour limit is reached, then their activity sync would be scheduled some time later in the next 24-hour period
- Set up a daily non-Supporter limit e.g. 5,000 activities synced per day
- Throttle new member activity syncing (or tie this into the daily non-Supporter limit)
- Limit the number of new signups per day (I’m guessing that the bulk of the rate limit issue is on full-history syncing, which is from new signups)
I guess, taking a step back, CityStrides can sync roughly 15,000 activities per day and I need to figure out a way to split that up among new members, existing non-Supporters, and Supporters.
As I sit here having just mapped roughly 30 new streets but needing to wait almost 6 hours to find out, whatever option prevents this situation I’m fine with.
Without knowing the exact breakdown of new, non-sub, and sub I can’t suggest a solution. If it was mostly new or non-sub I would probably just limit those to a daily limit. I also like your idea of putting them in the second half of the period, but that wouldn’t solve the problem of thousands queuing up at the end of the day.
Ok, so I’ll take a stab, hopefully I’m not missing something entirely obvious…
If I’m reading your post correctly, Strava wants to have API calls limited in total, and also spread over 24h. If you have the ability to control the timing of the calls you make (it sounds like you do) it is possible to structure these in a way that would have a better outcome than first-come-first-served. I think pre-processing the required calls into different priority tiers and directing them to take full advantage of the limits would work.
The priority tiers could look something like this:
- High: activities from subscribers (webhook or “synch now”)
- Medium: activities from non-subscribers
- Low: past activities from new members
- Filler: activities of members who have not signed in to the website for x months
Once the lists are established, the calls could be made like so:
- High: immediate
- Medium: first (312 - calls made via High), send calls at the last possible time in the 15 min slots
- Low: if there are calls left after Medium, send these
- Filler: if there are calls left, send these
Unless I missed something, this should leave subscribers with no wait at all, ever, regular activities have same/better wait time, new members imports get processed without clogging up everything else, and fillers obviously don’t care about delays…
The new day started at 8pm EST. It’s now 10pm and my run from 2pm still hasn’t uploaded and there are 5800 activities queued. That means we started today with a backlog of about 8200 activities, or about 30% of the next day’s limit. I don’t want to come off as an entitled ass, but this is starting to get pretty ridiculous. It really takes a lot of shine off the product when I have to wait until a day later to view my run as a paying supporter. The longer it takes to find a solution to this the worse it’s going to get for everyone.
I suggest trying to be a little a patient.
I have been here for more than 2 years and have never seen so many new members, in such a short time
I think there have been something like 1500 new members in the last 2 weeks.
Not sure where all the new members come from, but CityStrides must have been featured in some popular media somewhere.
And keep in mind this is a one man hobby project. I know James does whatever he can to keep up, but as long as Strava has limits on their API, there will problems when so many new members sign up in such a short time.
You could consider using RunKeeper instead. You can copy all your activities with https://tapiriik.com/
This virus is the best thing for James hobby as people are looking for new ways to have fun running without people. I was promoting it as an alternative on a couple of our local running group pages. Time to take it full time James!
@8f7162110d9eeaf907ab I think that’s a great idea! I’m curious how many accounts on CityStrides are essentially inactive (ie the users don’t log in or actually care about seeing updates). Is there some way to filter those out? I have to assume that’s at least a decent percentage.
Great tiering suggestion, just wanted to second what Just_one_more_street said above^
Thanks for your giving us information about the processes and your approach. It is difficult to give any advice without knowing all what you have done already, and please excuse me if most of my points in this long message are obvious, but if I were you, I would do/check the following.
Subscribers: I would not think twice: I would first reserve 1000 daily Strava API requests for them and then check how to optimize the whole Strava API process.
Strava API: Obviously, the biggest gain would be if you could upload activities using 1 API request instead of 2. Probably it is not doable, but that’s something I would check at least once year just to be sure.
Strava Daily limit:
a. Use the last daily API requests you have to alleviate the problem for the next day. I have the impression that almost 1,000 API requests are lost whenever the daily Strava limits are reached.
These almost 1.000 unused API requests should be used in the last 60 minutes before midnight. The conditional pausing removal at the end of the day should be made so as not to lose more than a few API requests per day, without preventing the sign in.
b. Apply once a year for more Strava API requests.
Eliminating process flaws: Check that the API really works the way it is supposed to work.
We know that recently, 29.000 API requests have often been done as early as 18:00 UTC, sometimes even earlier.
Do you really upload more than 14.000 activities in 18 hours on these days? If yes, you need to understand why because something goes wrong (depending on the nature of the problem(s), there may be a work around or not).
To my mind, There must be a limited number of processes or users causing a large number of API requests. It could be that activities are processed several times or wrongly for a yet unidentified reason. For example:
a. an update in Strava: I expect that most Strava users edit an activity within minutes when it is uploaded to Strava (to change the title, write comments etc.) . Are you sure that you don’t multiply the requests this way? You could use old data to predict whether the API request should be immediate or not.
b. impatient ‘Sync now’ requests by subscribers. (It should not have serious consequences on the number of API requests , but you need to be 100% sure)
c. API requests for activities that are not relevant in CityStrides (bike, swimming…)
Monitoring: Set a monitoring of the API request distribution clustered by different groups over time:
a. Individual activity syncs
b. Full accounts sync for subscribers (aka Sync now)
c. Full accounts sync for new CityStrides members
a. Individual activity syncs should grow slowly overtime when your site gains members, but the numbers should be both stable and unproblematic.
b. Full accounts syncs for subscribers should not call a lot of API requests since most activities must be synced already.
c. That leaves us with the new members who can have varying numbers of Strava activities. Let’s say from 1 to 5000 for most users. I fully understand why you want new member to enjoy most from the CityStrides features from the very first day, and contrary to others on this thread, I encourage you to give them a high priority, but if the daily API calls for this group is very volatile and sometimes exceeds 20.000, maybe you need to impose some limits specific for this group. My point is that your approach of “new users first” doesn’t work as soon as the syncing is paused for everybody. The current process does not support your goal.
- Managing new user syncing
a. Time-based approach
If the number of API requests leads to a bottleneck in the afternoon or the evening, you can limit the number of API requests per new user, first limiting the syncing to let’s say one year, thus allowing fast syncing for other Strava accounts.
Something like: if Daily_API_request_left > TIME *(30,000-1000)/24 then syncing iff Today - Last_Synced_Activity_Date <365
b. Second approach
Beyond the time stamp, you can also make the condition a function of the number of new users in the queue. If you have more new users than usual at a given time of the day, you may further limit the complete syncing so as all of them can enjoy CityStrides. Obviously, you should eventually upload all the activities, and if you want to avoid questions, do it within a reasonable number of days while explaining proactively that the upload may take some time.
I am not a yield management expert, but you can certainly find ideas to improve other my first ideas online. Since I am curious about that, I am happy to help further as well.
I don’t have time to respond to everyone individually, I’m sorry. There are some good ideas in here, and I appreciate the effort from everyone!
I’m going to release an update to how API limits are handled in a few minutes. I’m starting simple, and then I’m going to branch out from there.
I’ve reviewed some numbers and usage, and based on that I’m going to limit the overall usage of the API calls in a way that guarantees Supporters to have 1/3 of the total number of calls per day.
This should be enough to cover all Supporters logging one activity every day (and then some). This does not cover people signing up and then immediately becoming Supporters, and I don’t have a great way to gauge what usage levels that accounts for (yet?).
Based on some poking around, it looks like there’s anywhere from 3-12 people how have done this each day in the past week.
I’m (probably) going to implement a limit on the number of new users every day. That may end up being A Whole Thing, so I want to monitor this situation first for a little bit before I start working on that.
Hi James did you roll the new code? I have 2 hikes and a run today that don’t show in the Status page and don’t look processed. Not sure if its related to your change or not.
Yeah, step one is done. The queue is weird right now, so I’m probably not going to take any further action until sometime later tomorrrow.
Can I manually upload recent runs to MapMyRun while Strava gets figured out? Do you do any deduplication from runs that are basically the same?
edit: also if it helps, looks like my run from a few hours ago synced before it paused but my runs from yesterday did not.
We hit the call limit for the day and we have so many activities queued that we’ve already hit tomorrows limit also. I really like this website but did we just come to the surface for our last breath yesterday? If we exceed 15k activities per day we’ll never catch up. Is there a path forward?
It’s a bit early to think about throwing in the towel, no? The problem only becomes unsolvable if there are in excess of 15k new strava activities per day, which works out to somewhere around 2(?) activities per user per day, on average. As long as that number isn’t reached it’s not a capacity problem but “merely” one of allocation while working through the glut of new signups.
The number of new signups seems to be dropping off a bit now, so it’s possible that the problem will resolve itself in due course. If not there are still ways to shift the wait so it doesn’t suck for active users. These are exceptional times and good solutions don’t happen overnight…
@dallas.devries There’s no de-duplication. The reason a new activity can sync before an old activity is because the old activity could have gotten bumped out due to a limit, then the new activity could have arrived at a time that we were under the limit.
@f05f244860901569362a It’s not that “we’ll never catch up” it’s that it will always be delayed. There’s probably some math to figure out when the delay could be considered infinite e.g. the theoretical addition of 100 million activities each day while only being able to process 15k would effectively make it “stopped” as opposed to “delayed”.
@8f7162110d9eeaf907ab The 30k daily requests works out to roughly 15k activities (2 calls per activity, but there is other API activity like logins that decrease the total throughput). There are 11,323 connected Strava accounts right now, which means that we’re pretty close to the limit being 1 activity per day per connected Strava account.
New signups kick off a full account sync - anywhere from dozens to thousands of activities. This is likely the most important collection for me to focus on.
I released a change last night that significantly prioritizes supporters for syncing, but I mistakenly ordered the queues in a way that puts too much priority on activity processing. So, while I expect/hope that all supporter activities will be able to sync over the course of a day, that syncing will be slowed until I can release a fix for that queue order.
Is there a way to prioritize supporter activities based on date? I assume that new users that became supporters are monopolizing the 1/3 pool of requests with importing their new activities. Also is the 1/3 pool used up right away or is it distributed throughout the day? For example a few new users that become supporters will dry up the pool for those wanting to sync later in the day (assuming thats how it works). Seems tricky but just thinking of ideas that get some of our activites from a few days ago going.
Wow, I was expecting some bias towards Strava (maybe a 50-25-25 split), but nothing like >80% Strava! Really not much wiggle-room left then. Would pausing synch for inactive members free up a significant amount of calls?