Last week's downtime and what you can expect from us in the future
April 29, 2014
Between April 17th and April 22nd, many of our customers experienced a service interruption due to a series of large, sophisticated denial of service attacks against our infrastructure. We apologize for the inconvenience and service interruption you experienced. Our team has worked around the clock to try to help keep your blogs up and running, but we couldn’t keep your sites online to the level that you expect from us, or to the level that we expect ourselves. We also want to give you some insight into what happened, what steps we took, and what steps we are taking to try to ensure that we can deliver you the service that you deserve.
First, was my data at risk?
No. This was purely a denial of service attack, and your data was never at risk.
So, what happened?
What follows is a reasonably technical description of the attacks we faced. The short version is we were hit by repeated, large denial of service attacks that made Typepad intermittently available over the course of five days. If you’re not interested in the technical details, you can skip to the next section.
Starting Thursday, April 17, we were hit by a very large DDOS attack leveraging NTP, DNS, and SNMP amplification attacks. We worked with our upstream providers to block the attack before it hit our network, and we moved all traffic over to the provider that was able to block the attack and bring sites back online.
On Friday, April 18, using a new attack vector, a SYN flood was directed at the IPs we use to host Typepad’s services. To stop the attack, we had to move services around inside the Typepad network. We had services fully back online by 1pm EDT Friday, but about 600 customer blogs were down. Once the attack was mitigated, we reached out and worked with a new vendor to help us further protect our infrastructure and implement long-term mitigations.
The attack returned on Easter Sunday, April 20, spreading across our IP space. We once again worked to stabilize the infrastructure by moving services around. The attack returned again on Monday, April 21, at which point we moved all of the blogs behind a service that could mitigate the attack. This meant that mapped domains would not load — i.e., customers with domains would not load, but .typepad.com blogs were unaffected. All services were back up by 10am EDT on Monday, with the exception of the mapped domains. We spent the rest of Monday working on getting as many customer blogs up as possible, while putting in place a broader mitigation.
We completed the full mitigation implementation on Tuesday, April 22, just as we were hit with another massive flood. We finished bringing all blogs and mapped domains back online (with the exception of customers who were mapped to IP addresses). By early evening, we were just waiting for DNS records to update so that customers could see their sites were loading.
Since Tuesday, we have seen new attacks, but our mitigation is holding up (you may have noticed a few minutes of slowness on Friday — when our new defenses were being utilized to stop an attack). As a result of the changes we made to protect our network, you may have experienced some small issues resulting from our changes to protect our network (things like CSS caching longer than expected, certain feed readers being blocked, etc.). We believe most of these issues have now been resolved at this point. However, if you are still experiencing lingering issues, please contact our support team via a ticket. They are ready to jump on any issues they see.
Why did it take this long to stop?
The simplest answer: Typepad had never been subjected to this type of attack before, which placed stress on parts of our infrastructure. Each day, Typepad serves up a tremendous number of content, images, video files, etc. Our systems have always been able to scale and handle whatever traffic was thrown at them. However, these attacks were much larger and more persistent than anything we previously experienced. We believe that these attacks are similar to the attacks that have recently brought other online services down.
So, this is never going to happen again, right?
Unfortunately, there’s just no way we can say that. This bad actor has proven to be very determined, looping back against us trying new attack vectors and looking for weaknesses. Every day, we’re putting in place new rules and new ways of stopping attacks. But there is no way for anyone to be 100% protected. We’re working with our partners, providers, and our team to make sure that we are proactive in defending ourselves against future attacks.
Can you give us more detailed information?
Right now, we’re not able to provide more detailed information. The authorities are involved, and providing more information could make it easier for the attacker to find a way to exploit our system. We are going to err on the side of being conservative here. If you really need more information, please submit a support ticket, and our team will do their best to answer your question.
Why didn’t you email us to tell us this was happening?
During an issue like this, our team is very active on Twitter and Facebook, and answering support tickets. Our Twitter feed is also reflected on our status site as a one-stop shop to get an update on where things stand.
We tend not to use email during an issue like this since email relies on our @typepad.com domain, and that is often the service that is not responding. Using off-site means of updates is a safer, more effective means of making sure we can keep everyone up-to-date.
We wanted to say thank you to the Typepad community, the vast majority of who were overwhelmingly supportive during the last few days. We’ve gotten so many wonderful tweets, Facebook messages, support tickets, and blog posts supporting us and backing us, that we can’t possibly express our gratitude.
If you have any questions, comments, or concerns, please reach out via support ticket, and we will get back to you as fast as possible.
Typepad General Manager