We've resolved the issue we wrote about yesterday, where server performance had been degraded over the past few days. We tracked down the cause as a ballooning of the number of entries in our Redis cache, caused by what we believe to be an information-seeking spam operation on our login endpoint. A spammer might issue hundreds of thousands of requests against our login endpoint, each time with a different email, in an attempt to determine whether a particular email is registered with us. Done with many emails, this can allow spammers to compile a list of registered users with a service.
The spammer in this case will have wasted a tremendous amount of resources only to gather nothing—we're well protected against these kinds of scenarios. Typically, a service will just return "invalid email or password" during a login attempt to disguise whether the account actually exists or not. But there are two tell-tale signs that can sometimes leak out of these endpoints. One is an account lock after too many attempts. If account locks are only applied to real accounts, this informs the spammer that if after signing in too many times they get a lockout error, then the account does indeed exist.
The other leak is 2FA. If on a sign-in attempt you are prompted for 2FA, this is also an indication the account exists.
Our servers protect against both scenarios and do not allow account-existence leaks. For the case of account locking, we lockout all email addresses after a certain number of invalid logins, regardless of whether the email represents a real account or not.
For 2FA existence leaks, we present a decoy 2FA prompt randomly (but deterministically) for any email address, regardless of whether the account exists or not. In addition, some services only ask for 2FA only after your password is correctly verified. We take this a step further and do not allow password verification without the correct 2FA token first. This prevents outsiders from attempting to make password guesses without having the correct 2FA token.
Back to the server incident—our account lockout entries were stored in temporary cache in Redis. The number of these entries had ballooned due to a sudden increase of lockout entries for emails that didn't exist. We identified an area of our usage of Redis where we had used the SCAN command for searching instead of a constant-time lookup method. After making some small refactors to our Redis usage to better handle a large amount of entries, our server congestion issues were immediately remedied, and we are back to full health.
Unrelated to this incident, we've also recently made improvements to our server-side caching mechanism that should have perceptible performance improvements for syncing requests. If your sense of time is particularly acute, you might notice that syncing now completes about 40ms quicker.
Thanks for reading ✓