Now that you know what DoS attacks are and why attackers perform them, let's discuss how you can protect yourself and your services. Most common mitigation techniques work by detecting illegitimate traffic and blocking it at the routing level, managing and analyzing the bandwidth of the services, and being mindful when architecting your APIs, so they're able to handle large amounts of traffic.
The first step of any mitigation strategy is understanding when you are the target of a DoS attack. Analyzing incoming traffic and determining whether or not it's legitimate is the first step in keeping your service available and responsive. Scalable cloud service providers are great (and may even "absorb" a DoS attack transparently) which is fantastic until you receive an enormous bill for bandwidth or resource overuse. Making sure your cloud provider makes scaling decisions based only on legitimate traffic is the best way to ensure your company is not spending unnecessary elasticity dollars due to an attack. Early detection of an attack dramatically increases the efficacy of any mitigation strategy.
The simplest defense against a DoS attack is either whitelisting only legitimate IP addresses or blocking ones from known attackers. For instance, if the application is meant to be used only by employees of a specific company, a hardware or software rule could be created to disallow any traffic, not from a specific IP range. For example, 192.168.0.0/16 would allow any IP address between 192.168.0.0 and 192.168.255.255. The rule rejects any IP address outside that range. If the software is only meant to be used by US citizens, a rule could be created only to allow access to US IP addresses. Inversely, IP blacklisting adds a rule to reject traffic from specific IP addresses or IP ranges making it possible to create rules to disallow traffic coming from China or Russia.
It is important to remember that blocking IP addresses in this way may prevent legitimate traffic from those countries. Blacklisting IP addresses is also dangerous in that you may end up blacklisting all users sharing an IP address, even if many of those users are legitimate. For example, what would happen if a bad actor used an Amazon EC2 server instance to attack a host and that host blocked all Amazon EC2 IP addresses? While the attack might stop, all legitimate Amazon users are now blacklisted from accessing the service.
Also, this strategy may not be effective against DDoS attacks or DoS attacks using spoofed IP addresses. In the distributed scenario, there may be zombie computers with IP addresses all over the place. Creating a rule to filter them out may become complicated and untenable. For instance, if an attacker is generating many requests to your service using a single spoofed IP address, when you block that address the attacker can start spoofing a new IP address and continue the attack.
Rate limiting is the practice of limiting the amount of traffic available to a specific Network Interface Controller (NIC). It can be done at the hardware or software level to mitigate the chances of falling victim to a DoS attack. At the hardware level, switches and routers usually have some degree of rate-limiting capabilities. At the software level, it's essential to have a limit on the number of concurrent calls available to a specific customer. Giving users strictly defined limits on concurrent requests or total requests over a given duration (50 requests per minute) can be an excellent way to reject traffic and maintain service stability. The rate limit is usually tied to the customer's plan or payment level. For example, customers on a free plan may only get 1,000 API calls, whereas customers at the premium level may get 10,000 API calls. Once the user reaches their rate limit, the service returns an HTTP status code indicating "too many requests" (status code 429).
While rate limiting is useful, depending on it alone is not enough. Using a router's rate limiting features means that requests will still reach the router. Even the best routers can be overwhelmed and DoSed. At the software level, requests still need to reach your service even if a rate-limit has been reached to serve up a 429 status code. This means that your service could still be overwhelmed by requests, even if your service is only returning an error status code.
One of the best mitigation strategies is to filter requests upstream, long before it reaches the target network. Done effectively, your API never even sees this traffic, so any rate limiting policies are not triggered. There are many providers of "Mitigation Centers" that will filter the incoming network traffic. For example Amazon Shield and Cloudflare both offer products that allow for protection against DoS and DDoS attacks by checking incoming packet IPs against known attackers and BotNets and attempt to only forward legitimate traffic. Various API gateways have the same capabilities but can also filter based on the requested endpoint, allowed HTTP verbs, or even a combination of verbs and endpoints.
Passing DoS mitigation responsibility to upstream providers can be a great way to reduce liability and risk as mitigation can be incredibly complex and is an ever-changing cat-and-mouse game between service providers and attackers.
These companies typically offer support should your service be currently under attack in an attempt to minimize damages. It then becomes the responsibility of the provider to keep abreast of new DDoS attack vectors and strategies, leaving you to focus on building your service.
With the proliferation of easily-scalable cloud services, it's easy to become lazy and not think about efficient development patterns. Sometimes it's easy to spot DoS-vulnerable parts of your application while other times it's not so apparent. It's vital to offload resource-intensive processes to systems that are designed to handle those operations. In some cases, you may even be able to queue expensive work for later batch processing, reducing DoS attack surface area. For instance, uploading or encoding images or video can take a lot of processing power, and it's essential that your application is not affected by those processes. In some cases, a well-configured cache — at the network or application level — can return data previously processed and unchanged. After all, the fastest processing possible is the processing you don't have to perform.
Sometimes, when a startup is first creating their product, the team pays less attention to performance and more attention to shipping features. While this can be okay early on, as a service becomes popular, it's hard to go back and fix performance issues before they cause a widened surface area for attackers. It's good practice to make performance testing part of the development cycle and continuous integration process. By running the Apache Bench command, you can get basic performance information about your service. You can also use AB to write automated tests that simulate many users and check that your service responds to requests within a specified time. These performance tests can be run during the continuous integration process to ensure the application code performs at a level that is satisfactory to your organization.