Get Started with Spring Boot, OAuth 2.0, and Okta

avatar-matt_raible.jpg
Matt Raible
  ·

If you’re building a Spring Boot application, you’ll eventually need to add user authentication. You can do this with OAuth 2.0 (henceforth: OAuth). OAuth is a standard that applications can use to provide client applications with “secure delegated access”. It works over HTTP and authorizes devices, APIs, servers, and applications with access tokens rather than credentials.

Very simply, OAuth is a protocol that supports authorization workflows. It gives you a way to ensure that a specific user has specific permission.

OAuth doesn’t validate a user’s identity — that’s taken care of by an authentication service like Okta. Authentication is when you validate a user’s identity (like asking for a username / password to log in), whereas authorization is when you check to see what permissions an existing user already has.

In this tutorial you’ll build an OAuth client for a Spring Boot application, plus add authentication with the Okta Platform API. You can sign up for a forever-free Okta developer account here.

If you don’t want to code along, feel free to grab the source code from GitHub! You can also watch the a video of this tutorial below.

Get Started with Spring Cloud

Spring Cloud Security is a project from the good folks at Pivotal that “offers a set of primitives for building secure applications and services with minimum fuss”. Not only is it easy to use in platforms like Cloud Foundry, but it builds on Spring Boot, Spring Security, and OAuth. Because it builds on OAuth, it’s easy to integrate it with an authentication API like Okta’s.

The Spring Cloud Security project includes a great quickstart that will help you get started with very few lines of code.

Create a Secure Spring Boot App

Creating a Spring Boot application is dirt simple if you use the Spring CLI. It allows you to write Groovy scripts that get rid of the boilerplate Java and build file configuration. This allows you, the developer, to focus on the necessary code. Refer to the project’s official documentation for installation instructions. To install Spring CLI, I recommend using SDKMAN!:

sdk install springboot

Or Homebrew if you’re on a Mac.

brew tap pivotal/tap
brew install springboot

Create a helloWorld.groovy file that has a Controller in it.

@Grab('spring-boot-starter-security')
@RestController
class Application {

  @RequestMapping('/')
  String home() {
    'Hello World'
  }
}

The @Grab annotation invokes Grape to download dependencies and having Spring Security in the classpath causes its default security rules to be used. That is, protect everything, allow a user with the username user, and generate a random password on startup for said user.

Run this app with the following command:

spring run helloGroovy.groovy

Navigate to http://localhost:8080 and you’ll be prompted to login with your browser’s basic authentication dialog. Enter user for the username and copy/paste the generated password from your console. If you copied and pasted the password successfully, you’ll see Hello World in your browser.

Hello World

Create an Authorization Server in Okta

To start authenticating against Okta’a API, you have to first create a developer account on http://developer.okta.com. After activating your account, sign in and navigate to Security > API and click on the Add Authorization Server button.

Add Authorization Server

Enter the name and Resource URI of your choosing. The names aren’t important at this time. I used the following values:

  • Name: Oktamus Prime
  • Resource URI: http://authenticat.is.easy/withokta

Authorization Server Settings

The Metadata URI you see in this screenshot will come in handy later when you need to specify accessTokenUri and userAuthorizationUri values.

Create an OpenID Connect App in Okta

To get a client id and secret, you need to create a new OpenID Connect (OIDC) app. Navigate to Applications > Add Application and click on the Create New App button. The application name isn’t important, you can use whatever you like.

OIDC App Name

Click Next to configure OIDC. Add http://localhost:8080 as a Redirect URI and click Finish.

OIDC Redirects

The next screen should look similar to the following screenshot.

OIDC Settings

Your clientId and clientSecret values for this app will be just below the fold.

Create a Spring Boot OAuth Client

Create a helloOAuth.groovy file that uses Spring Security and its OAuth2 support.

@Grab('spring-boot-starter-security')
@RestController
@EnableOAuth2Sso
class Application {

  @GetMapping('/')
  String home() {
    'Hello World'
  }
}

Adding the @EnableOAuth2Sso annotation causes Spring Security to look for a number of properties. Create application.yml in the same directory and specify the following key/value pairs.

security:
  oauth2:
    client:
      # From OIDC app
      clientId: # clientId
      clientSecret: # clientSecret
      # From Authorization Server's metadata
      accessTokenUri: # token_endpoint
      userAuthorizationUri: # authorization_endpoint 
      clientAuthenticationScheme: form
    resource:
      # from your Auth Server's metadata, check .well-known/openid-configuration if not in .well-known/oauth-authorization-server
      userInfoUri: # userinfo_endpoint
      preferTokenInfo: false

Start your app with spring run helloOAuth.groovy and navigate to http://localhost:8080. You’ll be redirected to Okta, but likely see the following error.

Bad Request, Invalid Redirect

This happens because Spring Security sends a redirect_uri value of http://localhost:8080/login. Navigate to your Okta developer instance and change your OIDC app to have this as a Redirect URI.

Add Redirect URI

If you hit http://localhost:8080 again, this time you’ll get an error that doesn’t explain as much.

No Scopes

The whitelabel error page doesn’t tell you anything, but your browser’s address window does: no scopes were requested. Modify application.yml to have a scope property at the same level as clientAuthenticationScheme. These are some standard OIDC scopes.

      clientAuthenticationScheme: form
      scope: openid profile email

Try http://localhost:8080 again and you’ll get an error that User is not assigned to the client app. Again, you’ll have to look in the address bar to see it.

User Not Assigned

Open your OIDC app in Okta and Assign People to it. Adding your own account is the easiest way to do this.

The next error you’ll see when trying to authenticate is Policy evaluation failed.

Policy Evaluation Failure

In Okta’s UI, navigate to Security > API and click on your Authorization Server’s name and Access Policies. Click Add Policy to continue.

Access Policies

Enter a name and description and set it to apply to all clients.

Add Policy

Click Create Policy to continue. Once that completes, click the Add Rule button.

Add Rule

Give the rule a name, accept the default values, and click the Create Rule button.

Default Grant Rules

Try http://localhost:8080 again and this time it should work. If it does - congrats!

You can make one additional change to the helloOAuth.groovy file to prove it’s really working: change the home() method to return Hello $name where $name is from javax.security.Principal.

@GetMapping('/')
String home(java.security.Principal user) {
  'Hello ' + user.name
}

This should result in your app showing a result like the following.

Success

Get the Source Code

The source code for this tutorial and the examples in it are available on GitHub.

Summary

This tutorial showed you how to use Spring CLI, Groovy, Spring Boot, Spring Security, and Okta to quickly prototype an OAuth client. This information is useful for those that are developing a Spring MVC application with traditional server-rendered pages. However, these days, lots of developers are using JavaScript frameworks and mobile applications to build their UIs.

In a future tutorial, I’ll show you how to develop one of these fancy UIs in Angular and use the access token retrieved to talk to a Spring Boot API that’s secured by Spring Security and does JWT validation.

Get Started with Spring Boot, SAML, and Okta

avatar-matt_raible.jpg
Matt Raible
  ·

Today I’d like to show you how build a Spring Boot application that leverages Okta’s Platform API for authentication via SAML. SAML (Security Assertion Markup Language) is an XML-based standard for securely exchanging authentication and authorization information between entities—specifically between identity providers, service providers, and users. Well-known IdPs include Salesforce, Okta, OneLogin, and Shibboleth.

My Okta developer experience began a couple years ago (in December 2014) when I worked for a client that was adopting it. I was tasked with helping them decide on a web framework to use, so I built prototypes with Node, Ruby, and Spring. I documented my findings in a blog post. Along the way, I tweeted my issues with Spring Boot, and asked how to fix it on Stack Overflow. I ended up figuring out the solution through trial-and-error and my findings made it into the official Spring documentation. Things have changed a lot since then and now Spring Security 4.2 has support for auto-loading custom DSLs. And guess what, there’s even a DSL for SAML configuration!

Ready to get started? You can follow along in with the written tutorial below, check out the code on GitHub, or watch the screencast I made to walk you through the same process.

Sign Up for an Okta Developer Account

Fast forward two years, and I find myself as an Okta employee. To start developing with Okta, I created a new developer account at http://developer.okta.com. Make sure you take a screenshot or write down your Okta URL after you’ve signed up. You’ll need this URL to get back to the admin console.

You’ll receive an email to activate your account and change your temporary password. After completing these steps, you’ll land on your dashboard with some annotations about “apps”.

Create a SAML Application on Okta

At the time of this writing, the easiest way to create a SAML-aware Spring Boot application is to use Spring Security’s SAML DSL project. It contains a sample project that provides instructions for configuring Okta as a SAML provider. These instructions will likely work for you if you’re experienced Spring Boot and Okta developer. If you’re new to both, this “start from scratch” tutorial might work better for you.

Just like I did, the first thing you’ll need to do is create a developer account at https://developer.okta.com. After activating your account, login to it and click on the “Admin” button in the top right.

Okta UserHome

On the next screen, click “Add Applications” in the top right.

Okta Dashboard

This will bring you to a screen with a “Create New App” green button on the left.

Create New App

Click the button and choose “Web” for the platform and “SAML 2.0” for the sign on method.

New App with SAML 2.0

Click the “Create” button. The next screen will prompt you for an application name. I used “Spring SAML”, but any name will work.

Enter App name

Click the “Next” button. This brings you to the second step, configuring SAML. Enter the following values:

  • Single sign on URL: https://localhost:8443/saml/SSO
  • Audience URI: https://localhost:8443/saml/metadata

SAML Integration

Scroll to the bottom of the form and click “Next”. This will bring you to the third step, feedback. Choose “I’m an Okta customer adding an internal app” and optionally select the App type.

Customer or Partner

Click the “Finish” button to continue. This will bring you to the application’s “Sign On” tab which has a section with a link to your applications metadata in a yellow box. Copy the Identity Provider metadata link as you’ll need it to configure your Spring Boot application.

SAML Metadata

The final setup step you’ll need is to assign people to the application. Click on the “People” tab and the “Assign to People” button. You’ll see a list of people with your account in it.

Assign People

Click the assign button, accept the default username (your email), and click the “Done” button.

Create a Spring Boot Application with SAML Support

Navigate to https://start.spring.io in your favorite browser and select Security, Web, Thymeleaf, and DevTools as dependencies.

start.spring.io

Click “Generate Project”, download the generated ZIP file and open it in your favorite editor. Add the spring-security-saml-dsl dependency to your pom.xml.

<dependency>
    <groupId>org.springframework.security.extensions</groupId>
    <artifactId>spring-security-saml-dsl</artifactId>
    <version>1.0.0.M3</version>
</dependency>

You’ll also need to add the Spring Milestone repository since a milestone release is all that’s available at the time of this writing.

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/libs-milestone</url>
    </repository>
</repositories>

If you’d like to see instructions for Gradle, please view the project’s README.md.

In src/main/resources/application.properties, add the following key/value pairs. Make sure to use the “Identity Provider metadata” value you copied earlier (hint: you can find it again under the “Sign On” tab in your Okta application).

server.port = 8443
server.ssl.enabled = true
server.ssl.key-alias = spring
server.ssl.key-store = src/main/resources/saml/keystore.jks
server.ssl.key-store-password = secret

security.saml2.metadata-url = <your metadata url>

From a terminal window, navigate to the src/main/resources directory of your app and create a saml directory. Navigate into the directory and run the following command. Use “secret” when prompted for a keystore password.

keytool -genkey -v -keystore keystore.jks -alias spring -keyalg RSA -keysize 2048 -validity 10000

The values for the rest of the questions don’t matter since you’re not generating a real certificate. However, you will need to answer “yes” to the following question.

Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
  [no]:

Create a SecurityConfiguration.java file in the com.example package.

package com.example;

import static org.springframework.security.extensions.saml2.config.SAMLConfigurer.saml;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Configuration;
import org.springframework.security.config.annotation.method.configuration.EnableGlobalMethodSecurity;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.annotation.web.configuration.WebSecurityConfigurerAdapter;

@EnableWebSecurity
@Configuration
@EnableGlobalMethodSecurity(securedEnabled = true)
public class SecurityConfiguration extends WebSecurityConfigurerAdapter {
    @Value("${security.saml2.metadata-url}")
    String metadataUrl;

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http
            .authorizeRequests()
                .antMatchers("/saml/**").permitAll()
                .anyRequest().authenticated()
                .and()
            .apply(saml())
                .serviceProvider()
                    .keyStore()
                        .storeFilePath("saml/keystore.jks")
                        .password("secret")
                        .keyname("spring")
                        .keyPassword("secret")
                        .and()
                    .protocol("https")
                    .hostname("localhost:8443")
                    .basePath("/")
                    .and()
                .identityProvider()
                    .metadataFilePath(metadataUrl)
                    .and();
    }
}

Create an MvcConfig.java file in the same directory and use it to set the default view to index.

package com.example;

import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.ViewControllerRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;

@Configuration
public class MvcConfig extends WebMvcConfigurerAdapter {
    @Override
    public void addViewControllers(ViewControllerRegistry registry) {
        registry.addViewController("/").setViewName("index");
    }
}

Since you chose Thymeleaf when creating your application, you can create a src/main/resources/templates/index.html and it will automatically be rendered after you sign-in. Create this file and populate it with the following HTML.

<!DOCTYPE html>
<html>
<head>
    <title>Spring Security SAML Example</title>
</head>
<body>
Hello SAML!
</body>
</html>

Run the App and Login with Okta

Start the app using your IDE or mvn spring-boot:run and navigate to https://localhost:8443. If you’re using Chrome, you’ll likely see a privacy error.

Connection Not Private

Click the “ADVANCED” link at the bottom. Then click the “proceed to localhost (unsafe)” link.

Proceed to localhost

Next, you’ll be redirected to Okta to sign in and redirected back to your app. If you’re already logged in, you won’t see anything from Okta. If you sign out from Okta, you’ll see a login screen such as the one below.

Okta Login

After you’ve logged in, you should see a screen like the one below.

Hello SAML

Source Code

You can find the source code for this article at https://github.com/oktadeveloper/okta-spring-boot-saml-example.

Learn More

This article showed you how to create a SAML application in Okta and talk to it using Spring Boot and Spring Security’s SAML extension. The SAML extension hasn’t had a GA release, but hopefully will soon. I also believe it’s possible to take the SAML DSL (in SecurityConfiguration.java) and create a Spring Boot starter that allows you to get started with SAML simply by configuring application properties.

Have questions or comments? Post your question to Stack Overflow with the “okta” or “okta-api” tag, hit me up via email at matt.raible@okta.com, or ping me on Twitter @mraible. In future articles, I’ll show you to to configure Spring Boot with OAuth 2.0 and Okta. Then I’ll explore different techniques of authenticating with Angular and using the access token to talk to a secured Spring Boot application. Until then, happy authenticating! 😊

How to use KentorIT AuthServices with Okta

avatar-raphael.jpg
Raphael Londner
  ·

If you’re wondering how to configure an ASP.NET application with KentorIT’s AuthServices and Okta, you’ve come to the right place. But before delving into the specifics of how to make Okta work with an SAML-enabled ASP.NET application powered by KentorIT AuthServices, is is worth spending some time going over a critical, but easily fixable issue:

Important note : As of March 22nd, 2016, you have 2 choices:

  1. Either get the source code of the AuthServices assemblies and compile them on your own machine. In this case, no specific adjustment is necessary.

  2. Or use the v0.17 KentorIT NuGet assemblies. In this case, if you plan to use the SampleApplication project (not the SampleMvcApplication) for testing purposes, make sure you remove the following line from the web.config file:

    <requestedAuthnContextclassRef="Password" comparison="Minimum" />

    If you don’t, the SP-initiated login flow will fail because Okta won’t manage to deserialize the SAMLRequest parameter (due to a case issue).

Here’s how you should configure an app powered by Kentor AuthServices to make it work with Okta:

  1. Download the latest version of KentorIT’s AuthServices from https://github.com/KentorIT/authservices and open the Kentor.AuthServices.sln solution in Visual Studio.
  2. Identify the SampleApplication project and make a note of its URL property: Visual Studio Project properties
  3. Go to you Okta organization and navigate to Admin => Applications.
  4. Press the Add Application button and the green Create New App button Press the Create a new Okta app
  5. Select the SAML 2.0 option and press the Create button. Choose the SAML 2.0 template
  6. Give your application a name and optionally upload a custom logo. We’ll call it “Kentor AuthServices App 1Give your Okta app a name
  7. Press Next.
  8. In the Single sign on URL field, enter the url you retrieved above in step #2 and append “ /AuthServices/Acs”, for instance http://localhost:18714/SamplePath/AuthServices/Acs
  9. For the Audience URI field, enter the Url you retrieved above in step #2 and append “ /AuthServices”, for instance http://localhost:18714/SamplePath/AuthServices
  10. In the Name ID format field, select the default Unspecified (or select any other value of your choice).
  11. Select the Show Advanced Settings link. For the Signature Algorithm field, we suggest that you leave the default value, SHA-256. However, if you do, you will need to add the following line of code to the Application_Start() method of your Global.asax.cs file:

    Kentor.AuthServices.Configuration.Options.GlobalEnableSha256XmlSignatures();

    Otherwise, you may switch to RSA-SHA1 though we do not recommend it (as it less secure than SHA-256).

  12. In the Attribute Statements section, optionally enter additional attributes, such as in the following screenshot: Optional Attribute Statements
  13. Press the Next button. Select the I’m a software vendor option (if you’re indeed a vendor - if you are developing an internal app, select the first option) and press the Finish button. Select the customer or vendor option
  14. Now edit the web.config file of the SampleApplication project.
  15. In the <kentor.authServices> section, enter the following values:
  16. In the section, enter the following values:
    • entityId = Identity Provider Issuer from Sign On => View Setup Instructions View setup instructions Identity Provider Issuer
    • signOnUrl = value of the Identity Provider Single Sign-On URL below Identity Provider Single Sign-On URL
    • In the <signingCertificate> section, download the okta.cert X.509 certificate from the instructions page in the Okta app and put it in the App_Data folder of your web application. Then reference it accordingly (such as with fileName=”~/App_Data/okta.cert”) in the web.config file.

You should be good to go now! Don’t forget to assign users to your Okta application and test that you can sign in into your SAML application both from the Okta portal (IdP-initiated sign-in flow) and from your SAML application itself (SP-initiated sign-in flow).

If you run into any issue while using the SP-initiated login flow (when a user clicks on the “Sign In” link of the /SamplePath page), then try to recompile the KentorIT.AuthServices project and make sure it is used by your project. If your project uses v0.17 of the NuGet corresponding library, make sure to comment out any <requestedAuthnContext > section in your web.config file.

Happy Okta’ing!

REST Service Authorization with JWTs

avatar-jon_todd.jpg
Jon Todd
  ·
William Dawson
  ·

Many companies are adopting micro-services based architectures to promote decoupling and separation of concerns in their applications. One inherent challenge with breaking applications up into small services is that now each service needs to deal with authenticating and authorizing requests made to it. Json Web Tokens (JWTs) offer a clean solution to this problem along with TLS client authentication lower down in the stack.

Wils Dawson and I presented these topics to the Java User Group at Okta’s HQ in December and are thrilled to offer the slides, code, and the following recording of the presentation. In the talk, we cover authentication and authorization both at a server level with TLS and a user level with OAuth 2.0. In addition, we explain claims based auth and federation while walking through demos for these concepts using Java and Dropwizard. We purposely skipped over client (e.g. browser) side authentication as it’s enough material for a future talk and focused on solutions for authentication and authorization between services within an application.

Demystifying OAuth

avatar-karl.png
Karl McGuinness
  ·

It seems that OAuth 2.0 is everywhere these days. Whether you are building a hot new single page web application (SPA), a native mobile experience, or just trying to integrate with the API economy, you can’t go far without running into the popular authorization framework for REST/APIs and social authentication.

During Oktane15, Karl McGuinness, our Senior Director of Identity, demystified the powerful, yet often misunderstood, world of OAuth 2.0 and shared details on Okta’s growing support for OpenID Connect.

Slides

TLS Client Authentication for Internal Services

avatar-william_dawson.jpg
William Dawson
  ·

If you’re like me, the most aggravating thing is finding a Stack Overflow question that exactly describes the issue you are facing, only to scroll down and see that it has remained unanswered since 2011. I was recently trying to configure Transport Layer Security (TLS) client authentication (also referred to as mutual SSL) between two internal services at Okta and found the lack of complete examples astonishing. I hope that this blog post provides a better understanding of how to accomplish client authentication in your applications and makes all that hard security stuff a bit easier.

TLS Background

In a normal TLS handshake, the server sends its certificate to the client so that the client can verify the authenticity of the server. It does this by following the certificate chain that issued the server’s certificate until it arrives at a certificate that it trusts. If the client reaches the end of the chain without finding a certificate that it trusts, it will reject the connection. For an example of what a server might send, see this gist.

TLS handshake

Image reprinted with permission from CloudFlare

In mutual SSL, the client also sends its certificate to the server for the server to authenticate along with an additional message (called the CertificateVerify message), which assures the server that the client is the true owner of the certificate. The server follows the same process of checking the certificate chain until it finds one it trusts, refusing the connection if it can’t find such a certificate.

So why is that useful? You probably interact with typical TLS all the time in your browser. For example, when you visit https://www.okta.com, your browser is verifying that the server serving Okta’s site is authentic (that it’s not impersonating a legitimate Okta server). But Okta’s server has no idea who your browser is. In this case it doesn’t care too much, so it lets you connect.

When we start talking about services talking to each other, authenticating the client becomes important because it lowers the risk of our servers divulging information to machines impersonating our services. For example, let’s say we have a service called the User Service that holds all the information about users in our application. We have another service called the Home Page Service that serves up the home page to the browser. The home page has the user’s name, email, phone number, and other personal information. The Home Page Service needs to talk to the User Service to get the user’s name to display on the page. In this case, the Home Page Service is the client and the User Service is the server. If we only used normal TLS, only the User Service would be authenticated! We need TLS client authentication to make sure the User Service doesn’t provide data to a random client.

Implementing TLS Client Authentication

In our case, the client and server are internal services communicating with each other. I won’t cover configuring a browser client or other clients that may be not under your control. In this post, I’ll give examples for the technology we use at Okta. Specifically, we use Dropwizard as the server framework and Jersey for the client framework. We’ll also use Java’s keytool for building the key and trust stores in Java KeyStore (JKS) format. The examples below use these technologies, but I hope they’ll be fairly transferable to choices you make in your applications. In addition, these samples are not meant to be complete, so you may need to modify them to fit in your environment.

Certificates and Key Stores

CA heirarchy

First, let’s setup our trust store, which is just a key store that will only contain certificates. Let’s assume we have a layered Certificate Authority (CA) structure, like the image above, with a root CA and a subordinate global CA. The root CA has its private key stored offline and its certificate is the one we want our services to trust. The root certificate is the only certificate we want our services to trust on that channel. We don’t even want a certificate issued by a reputable 3rd party CA to be trusted by our service. So our trust store will contain only the root certificate, which means the server will only establish connections from clients that have a certificate issued by the root CA or its child, the global CA, which will be the issuer of our server’s certificate. This way, it’s quite easy to rotate our server’s certificate, either when it expires or if it is somehow compromised; we can just change it on that service and don’t have to worry about the other services it communicates with losing trust because they trust the root. If all our services trusted each other explicitly, the rotation would be much more difficult, especially if you can’t take downtime. We’ll use the trust store for both the client and the server, so you only need to make one, which you can copy if you need to.

# Import your root certificate into a new trust store and follow the prompts
keytool -import -alias root -file root.crt -keystore truststore.jks

Now that we’ve set up trust, we want to issue the certificate for our service that chains up to the root. We’ll use the global CA to issue our server its certificate, and since the global CA’s certificate is issued by the root CA, we have a chain of trust. When we create the server’s certificate, we’ll include the chain as well for clients to verify. The TLS standard specifies that the certificate chain does not require the actual root of trust since the endpoints will have it already, so we’ll omit it to save bandwidth. Once we have the certificate we’ll put it in a JKS for our Dropwizard application to use. If your client does not have a certificate for service-to-service communication, you can follow a similar pattern to create its certificate. But if it does have an existing certificate, you can just reuse that one.

# Create our server's key
openssl genrsa -out server.key 2048

# Create the csr and follow the prompts for country code, ou, etc
openssl req -new -key server.key -sha256 -out server.csr

# Sign the csr with your CA
openssl ca -in server.csr -days 365 -config my-ca-conf.cnf -out server.crt

# Cat the cert chain together (except the root)
cat server.crt global.crt > chain.crt

# Create pkcs12 file for key and cert chain
openssl pkcs12 -export -name server-tls -in chain.crt -inkey server.key -out server.p12

# Create JKS for server
keytool -importkeystore -destkeystore keystore.jks -srckeystore server.p12 -srcstoretype pkcs12 -alias server-tls

Server Configuration

Now that we have our key and trust stores, let’s configure the server’s Dropwizard application connector.

server:
  applicationConnectors:
    - type: https
    port: 8443

    # Key store settings
    keyStorePath: path/to/keystore.jks
    keyStorePassword: "notsecret"
    certAlias: server-tls
    enableCRLDP: true

    # Trust store settings
    trustStorePath: path/to/truststore.jks
    trustStorePassword: "notsecret"

    # Fail fast at startup if the certificates are invalid
    validateCerts: true

    # Whether or not to require authentication by peer certificate.
    needClientAuth: true

    # Check peer certificates for validity when establishing a connection
    validatePeers: true

    # The list of supported SSL/TLS protocols. You may need to modify
    # this section to support clients that you have.
    supportedProtocols: ["TLSv1.2"]
    supportedCipherSuites: ["TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"]
    allowRenegotiation: false

Dropwizard code is Copyright © 2010-2013 Coda Hale, Yammer Inc., 2014-2015 Dropwizard Team and/or its affiliates. Apache 2.0.

That was pretty easy, huh? No cryptic OpenSSL commands! Now our server should be configured to refuse connections from clients not presenting a root issued certificate chain. We can test to make sure that happens! We can start our server, telling Java to debug the SSL handshakes, and make sure we see it refusing the connection for the right reason. In one terminal start the Dropwizard server debugging SSL.

$ java -Djavax.net.debug=SSL,keymanager,trustmanager -jar your/jar.jar server config.yml

In another terminal run the following curl commands and verify you get the expected results. First, make sure that the server does not talk HTTP over our port.

$ curl localhost:443
curl: (52) Empty reply from server

# The server should print something like the following because of no TLS:
# javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

Next, check that the server is sending your certificate back over HTTPS. curl has a preconfigured list of trusted certs and chances are your root certificate is not in there.

$ curl https://localhost:443
curl: (60) SSL certificate problem: Invalid certificate chain

# The server will print a bunch of stuff ending with something like:
# javax.net.ssl.SSLException: Received close_notify during handshake

Finally, ensure that the server terminates the connection if no client cert is provided.

$ curl -k https://localhost:443
curl: (35) Server aborted the SSL handshake

# The server will, again, print a bunch of stuff ending with something like:
# javax.net.ssl.SSLHandshakeException: null cert chain

Client Configuration

Now we’ll configure our client to talk to the server. I’ll use the Jersey 2.X API, but there are equivalents in the 1.X as well as in the Apache HTTP library.

// Assume the following variables are initialized already
String password;
RSAPrivateKey clientKey;
X509Certificate clientCert;
X509Certificate globalCert;
X509Certificate rootCert;

X509Certificate[] certChain = {clientCert, globalCert};

// setup key store
KeyStore clientKeyStore = KeyStore.getInstance("JKS");
clientKeyStore.load(null, password.toCharArray());
clientKeyStore.setKeyEntry("service-tls", clientKey, password.toCharArray(), certChain);

// setup trust store
KeyStore clientTrustStore = KeyStore.getInstance("JKS");
clientTrustStore.load(null, password.toCharArray());
clientTrustStore.setCertificateEntry("root-ca", rootCert);

// setup Jersey client
SslConfigurator sslConfig = SslConfigurator.newInstance()
        .keyStore(clientKeyStore)
        .keyStorePassword(password)
        .keyPassword(password)

        .trustStore(clientTrustStore)
        .trustStorePassword(password)

        .securityProtocol("TLSv1.2");

SSLContext sslContext = sslConfig.createSSLContext();
Client client = ClientBuilder.newBuilder().sslContext(sslContext).build();

Jersey code is Copyright © 2010-2015 Oracle and/or its affiliates. GPL 2.0 Selected.

Hooray authentication!

xkcd-identity

Comic is Copyright © xkcd.com. CC BY-NC 2.5.

Tightening Things Up

Now we are just granting any service with a certificate signed by our root CA to talk to our server. Chances are we’d like to trim this down to only clients that should be talking to the server so we can refuse some other service that has no business with our server even though it has a certificate issued by our root CA. This is useful for preventing another service we have from accessing our new service. For example, suppose in addition to a User Service and a Home Page Service, we have an Event Service. We may want to block the Event Service from communicating with the User Service while allowing the Home Page Service to do that communication.

To accomplish this, we could change our server’s trust store to only contain the public key of the client, but this presents problems (and more work) when we try to rotate that key pair. So, instead, let’s try having the server check that the hostname of the client is one that it expects to hear from. We can also do this in the other direction (client verifying the server).

Several options exist for verifying the hostname on the server side. The first is one that Dropwizard supports this verification with a tricky configuration change for the underlying Java SSL connection.

server:
  applicationConnectors:
    - type: https
      #...
      endpointIdentificationAlgorithm: HTTPS

The HTTPS endpoint identification algorithm will cause Java to do hostname verification against your cert. Specifically, this will check the hostname of the client that made the request against the DN that is given in the client’s certificate. If they do not match, the connection will be refused. This is a great, standard way to solve this problem, however it can be tricky to know what the hostnames will be or to make a wildcard pattern (or subject alternative name extension) for your clients. We can take a higher-level approach than hostname comparison.

We can, instead, provide our server with a regular expression that matches the DNs that we expect in our certificates. This means we no longer have to worry about hostnames. So as services move from host to host, they can keep the same certificate and everything will Just Work™. Additionally, a certificate can belong to a service rather than an individual host now so there’s less management that needs to happen. To do this, we just need to set up a filter in our server and configure a regex to match the DN in the certificate(s) that are allowed to communicate with our service or else return a 403 response.

import javax.annotation.Priority;
import javax.servlet.http.HttpServletRequest;
import javax.ws.rs.Priorities;
import javax.ws.rs.container.ContainerRequestContext;
import javax.ws.rs.container.ContainerRequestFilter;
import javax.ws.rs.container.PreMatching;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.Response;
import java.io.IOException;
import java.security.cert.X509Certificate;
import java.util.regex.Pattern;

/**
* A ContainerRequestFilter to do certificate validation beyond the tls validation.
* For example, the filter matches the subject against a regex and will 403 if it doesn't match
*
* @author <a href="mailto:wdawson@okta.com">wdawson</a>
*/
@PreMatching
@Priority(Priorities.AUTHENTICATION)
public class CertificateValidationFilter implements ContainerRequestFilter {

    private static final String X509_CERTIFICATE_ATTRIBUTE = "javax.servlet.request.X509Certificate";

    private final Pattern dnRegex;

    // Although this is a class level field, Jersey actually injects a proxy
    // which is able to simultaneously serve more requests.
    @Context
    private HttpServletRequest request;

    /**
     * Constructor for the CertificateValidationFilter.
     *
     * @param dnRegex The regular expression to match subjects of certificates with.
     *                E.g.: "^CN=service1\.example\.com$"
     */
    public CertificateValidationFilter(String dnRegex) {
        this.dnRegex = Pattern.compile(dnRegex);
    }

    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        X509Certificate[] certificateChain = (X509Certificate[]) request.getAttribute(X509_CERTIFICATE_ATTRIBUTE);

        if (certificateChain == null || certificateChain.length == 0 || certificateChain[0] == null) {
            requestContext.abortWith(buildForbiddenResponse("No certificate chain found!"));
            return;
        }

        // The certificate of the client is always the first in the chain.
        X509Certificate clientCert = certificateChain[0];
        String clientCertDN = clientCert.getSubjectDN().getName();

        if (!dnRegex.matcher(clientCertDN).matches()) {
            requestContext.abortWith(buildForbiddenResponse("Certificate subject is not recognized!"));
        }
    }

    private Response buildForbiddenResponse(String message) {
        reutrn Response.status(Response.Status.FORBIDDEN)
                .entity("{\"message\":\"" + message + "\"}")
                .build();
    }
}

Dropwizard code is Copyright © 2010-2013 Coda Hale, Yammer Inc., 2014-2015 Dropwizard Team and/or its affiliates. Apache 2.0. Jersey code is Copyright © 2010-2015 Oracle and/or its affiliates. GPL 2.0 Selected.

Circling Back

We defined TLS client authentication and went over how it can help secure your backend services. We walked through configuring a Dropwizard server with mandatory TLS client authentication and creating a Jersey client to provide the appropriate credentials when talking to that server. We also talked about options to further restrict clients’ ability to talk to the server based on their certificates. I hope you have a better understanding of how to implement mutual SSL in your applications. Below are a few things to also keep in mind as you implement these authentication concepts in your applications.

References

  1. Common keytool commands
  2. Common openssl commands
  3. Dropwizard https configuration manual
  4. Jersey client documentation

The New Age of Trust

avatar-vimarsh_karbhari.jpg
Vimarsh Karbhari
  ·

I recently read an excellent article about how amazing products shape the trust relationship with customers. I think great products are the first step in building a trust relationship. And like other aspects of the product that are derived from the product but are not physically part of it, the trust relationship is now more important than ever before.


When you use a product, every engagement with that product has a direct correlation with your perception of the value of that product. — From Product Loyalty Follows Trust Like Form Follows Function


These days, making sure your product solves problems that customers face everyday is not just a good idea, it’s table stakes. And the bar is rising constantly. The more products improve, the less customers are willing to tolerate bad experiences. With the consumerization of the enterprise, the importance of product design and favorable user experiences is indispensable to every kind of product distribution model.

Two of the most important aspects of the product are Product Status/Trust and Customer Support. Having dissected both of these from many angles in my recent work, I know that businesses neglect them at their peril.

Communicate transparently

The Product Status/Trust page is the first place customers visit on your site when they have a problem. Consider yourself lucky if the customer only had an problem with a browser (like cache), or an issue with their laptop. For more serious issues, customers expect transparency and effective communication from the Product Status/Trust page. They want information that communicates exactly what the problem is. 

Serve multiple audiences

A Product Status/Trust page is also one of the first places that prospects are likely to visit. In fact, when the product is gaining momentum, prospect traffic may exceed customer traffic. 

I’ve rarely seen a Status/Trust page actually serve both customers and prospects effectively. The key is to find the correct balance. As the product becomes popular, customers and prospects visit the page with very different mindsets.

For customers, the Status/Trust page must be the source of truth for issues, downtime, and root cause analysis. This requires radical transparency and effective communication.

For prospects, the Status/Trust page must do at least the following:

  • Display all the information they need to make an informed decision
  • Make them love the product even more
  • Serve as an effective talking point for your sales reps

Once users get their hands on a product, they often find new ways to use it that the founders never imagined. This experimentation can shape the future of products and platforms. This is especially true of products that also provide APIs. The growth of B2B2C, B2C2B and other hybrid distribution models means that you are putting your product in front of many different types of audiences, including direct customers, partners, resellers, channel partners, and early and late stage prospects. Understanding how all of these audiences report on the service, interpret the SLA, and react to downtime is crucial if you want to create products that users value.

###Be highly available but prepare for risk 

It is a truism that one cannot solve for all technical constraints. Choosing the right platform on which to build your Product and Status/Trust pages is very important, but hosting both pages on the same platform risks both being down at the same time if your site crashes. 

Obviously, you cannot afford to let your Trust/Status page go down, so high availability is key. But just in case the page ever does go down, you need a risk mitigation plan. Central to this is making sure that you are contsantly monitoring your Trust/Status page with enterprise-class monitoring tools.

###Send rapid, robust, and consistent notifications In the event of trust page problems, make sure that you have ways to easily and automatically notify customers and site ops as soon as issues are detected. Employ RSS, Twitter, and other channels to notify customers; invest in monitoring tools to notify site ops.

If you send notifications manually, take care not to introduce human error when the site is having issues, as these are usually chaotic periods for your ops team.

Realize that many of your customers will probably check your trust page and call support, so it’s important that the trust page and the support team provide the same information. This also argues for ensuring that your support team is a key stakeholder in the development of your trust page. 

###Conclusion

Designing and developing the Trust Page has taught me much in the last few months. I’d be happy to hear from you on this topic, so please feel free to send comments or questions to vimarsh.karbhari@okta.com.

The Trust Page is the work of an amazing team that includes Tim Gu, Shawn Gupta, Nathan Tate, Wendy Liao, and myself.

How Okta Chased Down Severe System CPU Contention in MySQL

avatar-okta_logo.jpg
Okta Staff
  ·

Sometimes fixing a problem causes or reveals a new one. And sometimes this sets off a chain reaction of problems and fixes, where each solution exposes a deeper issue. In technology, cascades like these are common, often painful, and occasionally welcome.

Our battle against CPU contention last fall is a good example of such a cascade. What began as a buffer pool adjustment triggered a series of issues and fixes that generated plenty of stress, but ultimately strengthened our platform.

Underlying each of the challenges we faced in that period was the huge amount of business our Sales organization had closed in late summer and early Fall of 2014. Growth brought a dramatic increase in the number of new customers running large import jobs and new orgs running agents.

As problems go, growing pains are good problems to have. But they usually come at a cost: the increased traffic caused significant CPU contention, as shown in the following image.

Before tuning the database

Those red and yellow spikes in late October, 2014 seized our attention and spurred an aggressive response from Okta’s site operations team. The team took immediate action to prevent this situation from getting worse and potentially causing a issue with our site.

Tuning the database

As a first step, we tuned our MySQL database to fully utilize the amount of RAM in our server instances. We had been running with a relatively small buffer pool compared to the amount of available RAM, which meant that we were sacrificing both performance and money. Increasing the size of the buffer pool decreased page response times and nearly eliminated disk reads.

Almost eliminated disk reads

Doubling hardware resources

Despite the buffer pool adjustment, we continued to see significant CPU contention. In response, we doubled the size of our servers (244 GB of RAM, 32 CPU cores, and 2 x 320 GB HDDs). CPU contention decreased (see the trough in the following image), but probably because of the Thanksgiving holiday, not the additional hardware.

After the holiday, CPU spikes returned, now worse than ever. Page render time slowed down, queries against the database took longer, and jobs backed up.

Thanksgiving holiday

Note: Flat areas in the graph showing no CPU usage indicate periods when we were running on a secondary server.

Why did CPU contention increase after we’d doubled the CPUs? Shouldn’t it have decreased?

Kernel mutex bottleneck

The alarming amount of yellow in our graphs showed extremely high system CPU usage (and user CPU usage was also too high). Clearly, the operating system was working very hard at something. The metrics we pulled revealed that all the InnoDB threads were busy waiting on the kernel mutex. We had known that kernel mutex was a bottleneck even before we’d doubled hardware resources, but we hadn’t understood why.

A closer look at the MySQL source code showed that kernel mutex was trying to allocate memory to all of our transactions. This is perfectly normal behavior, but it proved to be very limiting in our case because we perform approximately 85,000 transactions per minute. The kernel has to create a transaction ID for each transaction and allocate a tiny block of memory in RAM before giving it to the thread handling the transaction.

Now we knew why doubling the number of CPUs caused greater contention: instead of providing transaction IDs and associated memory to approximately 24 InnoDB threads, kernel mutex was now working like mad to provide IDs and memory to approximately 48 InnoDB threads. Imagine having a single toll booth on a 16 lane highway and then doubling the number of lanes.

In the discussions that followed, some called for rolling back to the smaller machines, reasoning that fewer threads would mean less CPU contention. Others believed that rolling backward would be a mistake, arguing that our business growth required the more powerful servers in any case, and that doubling the number of CPUs was not itself a problem, but rather part of the ultimate solution because it exposed the root cause of the extreme system CPU usage.

The right course – the one we ultimately took – was to stick with the more powerful servers and tune them properly.

Adopting TCMalloc

We quickly found several resources online, including a key blog post about TCMalloc (Thread-Caching Memory Allocation) and an article about debugging MySQL.

Traditional memory allocation schemes, like the glibc malloc that we were then using, employ a mutex to prevent concurrent access to the transaction ID counter. Preventing concurrency is totally wrong for a multi-core, multi-thread architecture like ours.

In contrast, TCMalloc allocates a small pool of memory to each CPU core. Individual processor threads obtain RAM directly from their core, ideally from the L2 cache nearest the thread’s section of the CPU. This sounded promising, so we switched to TCMalloc.

Following the switch, things looked pretty good. User CPU decreased dramatically, never to return to the +50% usage we’d seen before. We had finally solved the memory allocation bottleneck. If we hadn’t doubled the number of CPUs, we wouldn’t have found the problem that lead us to adopt TCMalloc.

Had we finally solved our scalability problem?

Transparent Huge Pages: Thanks for your help…please don’t help

By the next morning CPU contention was worse.

The alarmingly high system CPU usage that we’d seen in the previous 3 months was always due to MySQL using kernel mutex. But since we’d fixed that problem, what the heck was this?

We discussed turning off TCMalloc, but that would’ve been a mistake. Implementing TCMalloc was a critical link in the chain of problems and solutions that ultimately strengthened our platform.

We discovered very quickly that the culprit this time was a khugepaged enabled by a Linux kernel flag called Transparent Huge Pages (THP; turned on by default in most Linux distributions). Huge pages are designed to improve performance by helping the operating system manage large amounts of memory. They effectively increase the page size from the standard 4kb to 2MB or 1Gb (depending on how it is configured).

THP makes huge pages easier to use by, among other things, arranging your memory into larger chunks. It works great for app servers that are not performing memory-intensive operations.

Which is why THP is so wrong for our platform. By late 2014 we were using 95% of the RAM and 58% of the 32 CPU cores in our servers . In order to store all of those tiny transaction IDs, we were rewriting memory so rapidly that THP’s efforts to move pages around couldn’t keep up. Clearly, standard 4kb blocks were much more efficient for us than the larger page size that THP was “helping” us with. So we turned THP off. The following image tells the story.

TCMalloc

Note: Flat areas in the graph showing no CPU usage indicate periods when we were running on a secondary server.

In a sense, encountering the dramatic effect of THP, an operating system problem, was clarifying. It validated our previous remedies, and turning it off definitely strengthened our platform.

Lessons learned

Beyond the technical lessons we learned during this period, we were reminded that sometimes the best thing to do is stay the course. At times we were tempted to pull back, but moving forward ultimately paid off as each improvement we made exposed the inadequacy (for our platform) of a downstream component.

Okta Software Engineering Design Principles

avatar-jon_todd.jpg
Jon Todd
  ·

Okta has been an agile development shop since the beginning. One important aspect of being agile is enabling a mix of bottom-up and top-down decision making. Specifically where high level vision and strategy is clearly communicated enabling teams to autonomously deliver value while also feeding back learnings from the trenches to inform the high level goals.1 Below are the tacit engineering design principles we’ve used to guide development at Okta. They continue to evolve as we experiment and learn.

1. Create User Value

First and foremost, writing software is about creating value for users. This seems straight forward, but as systems evolve and become more complex we start introducing more abstraction and layering which brings us further away from the concrete problem we’re trying to solve. It’s important to keep in mind the reason for writing software in the first place and use the understanding of the audience to inform priority.

At Okta, our entire company is aligned on this principle because our #1 core value is customer success. In practice this means there’s almost always a number of customers eager to beta a new feature we’re working on. We collaborate closely with customers while building features allowing for continuous feedback as we iterate and get changes out in weekly sprints.

xkcd - pass the salt

2. Keep it Simple

Everything should be made as simple as possible, but no simpler — Albert Einstein

This truism has been around for ages, and it goes hand-in-hand with the first principle. If it doesn’t add value to users now, you ain’t gonna need it - YAGNI!

We all encounter overly complex code where it’s nearly impossible to reason about what it does. Part of this confusion is because it’s generally harder to read code than to write it but beyond that, there are clearly fundamental qualities of some code making it more intuitive than other code. There’s a lot of prior art on this topic and a great place to start is Clean Code by Robert C. Martin, aka Uncle Bob. The book breaks down the qualities of code which make it intuitive, and provides a framework for reasoning about code quality.

Here are some guiding principles about writing clean code we use in practice which are also covered in the book.

Clean code:

  • Makes intent clear, use comments when code isn’t expressive enough
  • Can be read and enhanced by others (or the author after a few years)
  • Provides one way, rather than many, to do a particular task
  • Is idomatic
  • Is broken into pieces which each does one thing and does it well

At the end of the day there is no substitue for experience, like any craft, writing clean code takes practice. At Okta every engineer is constantly honing their skills, we rely heavily on code reviews and pair programming to help hone each other’s skills.

wtfs per minute

3. Know Thy-Service With Data

In the world of “big data” this point needs little explanation. Okta collects massive amounts of operational data about our systems to:

  • Monitor health
  • Monitor performance
  • Debug issues
  • Audit security
  • Make decisions

With every new feature we add, developers are responsible for ensuring that their designs provide visibility into these dimensions. In order to make this an efficient process we’ve invested in:

  • Runtime logging control toggling by level, class, tenant, user
  • Creation of dashboards and alerts is self-service
  • Every developer has access to metrics and anonymous unstructured data
  • Request ID generated at edge is passed along at every layer of stack for correlation
  • Engineering control panel for common operational tasks like taking threaddumps

Technologies we use to gain visibility include: PagerDuty, RedShift, Zabbix, ThousandEyes, Boundary, Pingdom, App Dynamics, Splunk, ELK, S3.

4. Make Failure Cheap

Every software system will experience failures and all code has bugs. While we constantly work at having fewer, it’s unrealistic to assume they won’t occur. So, in addition to investing in prevention, we invest in making failure cheap.

The cost of failure becomes significantly more expensive further out on the development timeline. Making adjustments during requirements gather and design are significantly cheaper than finding issues in production.2

cost curve of development

One fundamental we take from both Agile and XP is to invest in pushing failure as early in the development timeline as possible. We mitigate failures from poor requirements gathering by iterating quickly with the customer as described in Principle 1. Once we get to design and development we make failure cheap through:

  • Design reviews with stakeholders ahead of writing code
  • TDD - developers write all tests for their code; test isn’t a separate phase from development
  • Keeping master stable - check-in to master is gated by passing all unit, functional and UI tests
  • Developers can trigger CI on any topic branch; CI is massively parallelized over a cloud of fast machines

Since our testing phase is done during development the next phase is production deployments. At this phase we reduce the cost of failure by:

  • Hiding beta features behind flags in the code
  • Incremental rollout first to test accounts and then in batches of customers
  • Automated deployment process
  • Code and infrastructure is forward and backward compatible allowing rollback
  • Health check and automatically remove down nodes
  • Return a degraded / read-only response over nothing at all

An escalator can never break; it can only become stairs – Mitch Hedberg

5. Automate Everything

All tasks performed routinely should be automated. These are automation principles we follow:

  • Automate every aspect of the deployment including long running db migrations
  • All artifacts are immutable and versioned
  • All code modules get dependencies automatically from central artifact server
  • Creation of base images and provisioning of new hardware is automated
  • All forms of testing are automated
  • Development environment setup is automated

Tools we use:

  • AWS - Automated provisioning of hardware
  • Chef - Configuration managment
  • Ansible - Automated deployment orchestration
  • Jenkins - Continuous integration
  • Gearman - To get Jenkins to scale
  • Docker - Containerizing services

6. With Performance, Less is More

We find especially with performance, there are typically huge wins to be had in up front design decisions which may come at very little to no cost. Our design mantras for performance are:

  1. Don’t do it
  2. Do it, but don’t do it again
  3. Do it less
  4. Do it later
  5. Do it when they’re not looking
  6. Do it concurrently
  7. Do it cheaper

In practice we implement a number of strategies to limit risk to poorly performing code:

  • Major new features and performance tunings live behind feature flags allowing slow rollout and tuning in real life environment
  • Chunk everything that scales on order of N. When N is controlled by customer enforce limits and design for infinity.
  • Slow query and frequent query monitoring to detect poor access patterns

if less is more, does that mean more is less?

Reference

  1. Ikujiro Nonaka, and Hirotaka Takeuchi. The Knowledge Creating Company. Oxford University Press, 1995. Print. https://books.google.com/books/about/The_Knowledge_creating_Company.html?id=B-qxrPaU1-MC 

  2. Scott Ambler. Examining the Agile Cost of Change Curve. Website. http://www.agilemodeling.com/essays/costOfChange.htm 

Productionalizing ActiveMQ

avatar-okta_logo.jpg
Okta Staff
  ·

This post describes our odyssey with ActiveMQ, an open-source version of the Java Messaging Service (JMS) API. We use ActiveMQ as the message broker among our app servers.

First, a word of thanks. To overcome the challenges we faced with ActiveMQ, we are greatly indebted to a very thorough description of an OpenJDK bug, as well as some other online resources. If you’re having problems with ActiveMQ, read on. Maybe our story can help you.

Growing Pains

Our problems with ActiveMQ date all the way back to 2012. They centered around high memory and CPU usage, message timeout errors, and message queue delays.

Let’s pick up the action in the spring 2014. At that time we were battling a new wave of timeout storms and message queue delays caused by our mixed ActiveMQ configuration (broker 5.4.1, client 5.7) and increasing traffic on our site.

Of course we welcomed the growth in traffic as a byproduct of our growing business. And although we did plan to address our mixed ActiveMQ configuration, we decided to delay doing so at that time, opting instead to tune the configuration. So we increased the maximum session size from 500 to 2000, and the page size from 200 to 2000 messages. Increasing the page size served to minimize “hung queue” scenarios — a side effect of using message selectors.

Another Inflection Point

Business and site traffic continued to grow, contributing to another inflection point in the fall of 2014. Timeout storms, CPU spikes, and memory issues returned. It was clear that we could no longer put off upgrading to a newer version of ActiveMQ.

We decided to skip versions 5.7 and 5.8 in favor of 5.10, mainly because 5.7 was considered unstable, and 5.10 provided improved failover performance.

Would this upgrade finally deliver the stability that had eluded us for so long?

When Upgrades Bite Back

Unfortunately, no. Within 24 hours, memory usage soared, CPUs spiked, and instability returned. Note the dramatic CPU spikes in the following screenshot.

Active MQ CPU

To prevent these issues from impacting customers, we were forced to restart brokers, which is always an option of last resort. Restarting brokers is a delicate operation, which can entail a less-than-smooth failover, risking message loss.

We immediately increased memory, but within a day or two we ran out of memory again.

Searching for the Root Cause

An online search turned up an OpenJDK bug that identified an out of memory issue in the ConcurrentLinkedQueue, which is a class in the java.util.concurrent package included in JVM version 1.6. When working properly, ConcurrentLinkedQueue allows elements to be added and removed from the queue in a thread-safe manner.

The bug caused a null object to be created whenever an element at the end of the queue was added and then deleted. This behavior is particularly unfavorable to the way we use queuing. We call ActiveMQ to create and destroy objects in the queue very quickly, tens of millions of times a day, as users and agents connect to Okta. As a result, null objects rapidly fill up the queue, memory usage soars, and CPUs spike.

Conference Call

With the site at risk of impacting customer authentication, several key engineers, including Hector Aguilar, Okta’s CTO and SVP of Engineering, met on a Saturday afternoon conference call. Discussion was intense, and our options were few and unappealing: (a) revert all the way back to broker version 5.4.1, or (b) upgrade to broker version 5.11, which was still unreleased and might introduce new problems.

As team members recall, Hector said very little during the first half of the meeting.

A bug in the JVM surprised Hector, as critical JVM bugs are relatively rare. Fortunately, the OpenJDK bug we’d found included a very thorough description of the problem, as well as sample code to reproduce it.

Initially motivated by curiosity, Hector analyzed the code and the bug description. He saw where the problem was, and then checked online to see if it had been fixed in newer JDK versions. He noticed that several things were changing in the class, and that others had attempted to resolve the bug in different ways, but none that would solve our particular problem. Hector developed a very simple fix of his own, trying to remain consistent with the work of others. He then verified his fix using the provided sample code.

The JVM has a mechanism called endorsed libraries that allows developers to override an existing class with a new class, effectively patching the JVM. Hector used this mechanism, packaged his fix into a jar file, tried it against ActiveMQ, and found that it worked.

The mood and direction of the meeting shifted dramatically when Hector said, “Guys, I have a wild idea. What if we patch the JVM?” As none of us had ever patched a JVM before, this seemed like a novel approach, even a long shot.

The Fix

Hector sent his JVM patch and sample code to the team and walked us through it. First, he explained why the other attempted fixes wouldn’t solve our particular problem. He then demonstrated how his override effectively patched the original (faulty) removal method. Members of the team volunteered to test the override at scale with our simulated environments. Within a few hours, we were fairly sure that Hector’s fix would work.

Deploying the Patch

We deployed the patch and restarted brokers. It was a success! ActiveMQ no longer ran out of memory and the CPU spikes ceased.

Active MQ CPU

Some minor memory leaks remained, but these were eliminated by upgrading to java-1.7.0.

Stable, and looking at other solutions

Patching the ConcurrentLinkedQueue with ActiveMQ v5.10 and upgrading to java-1.7.0 provided acceptable stability and faster failover performance. While this is a significant improvement over where we were last fall, our goal is zero failover time, which ActiveMQ cannot deliver. So, we’re exploring other messaging solutions.

In telling our story, we couldn’t resist tooting our own horn a bit. How many CTOs actually get their hands dirty tackling product issues? Our CTO doesn’t code very often, but when he does, he patches the JVM.

Android Unit Testing Part IV: Mocking

avatar-victor_ronin.png
Victor Ronin
  ·

This is the third of a four part series on Android Unit Testing. In the last two articles I discussed the general principles of having good tests and the way to run Android tests on JVM making them fast and how to make your code less coupled. This article will explain how to make tests isolated.

We need to mock a dependency, inject it, and then modify our test to indicate that we are not testing an end-to-end scenario anymore, but are now testing just one class at a time.

  • Modify application Gradle file

    Add the following code under the dependency section:

    androidTestCompile 'org.easymock:easymock:3.1'
    
  • Replace FooTest with the following code:

    package com.example.myapplication;
    
    import junit.framework.Assert;
    
    import org.easymock.EasyMockSupport;
    import org.junit.Before;
    import org.junit.Test;
    import org.junit.runner.RunWith;
    import org.robolectric.RobolectricTestRunner;
    
    import static org.easymock.EasyMock.expect;
    
    @RunWith(RobolectricTestRunner.class)
    public class FooTest extends EasyMockSupport {
        Foo sut;
    
        // Mocks
        Bar barMock;
    
        @Before
        public void setUp() {
            sut = new Foo();
    
            // Create mocks
            barMock = createMock(Bar.class);
    
            // Inject mock
            InjectHelper.injectMock(sut, barMock);
        }
    
        @Test
        public void testGetFoo_returns4() {
            // Arrange
            expect(barMock.getBar()).andReturn(4);
            replayAll();
    
            // Act
            int actualResult = sut.getFoo();
    
            // Assert
            verifyAll();
            Assert.assertEquals(4, actualResult);
        }
    }
    
  • Create a class InjectHelper under androidTest

    (I believe the original code for injecting fields is from Spring; however, it was modified afterwards.)

    package com.example.myapplication;
    
    import java.lang.reflect.Field;
    import javax.inject.Inject;
    
    public class InjectHelper {
    
        @SuppressWarnings("unchecked")
        public static void injectMock(Object target, Object mock)
        {
            Class targetClass = target.getClass();
            do {
                Field[] fields = targetClass.getDeclaredFields();
                // Iterate through all members
                for (Field field : fields) {
                    // Skip all non injectable members
                    if (field.getAnnotation(Inject.class) == null)
                        continue;
    
                    // Make private/prptected members accessible
                    field.setAccessible(true);
    
                    // Get a class of the member
                    Class injectedClass = field.getType();
                    Class mockClass = mock.getClass();
    
                    // Check that mock is essentially the same class
                    if (!injectedClass.isAssignableFrom(mockClass))
                        continue;
    
                    try {
                        // Inject mock
                        field.set(target, mock);
                    } catch (IllegalAccessException e)
                    {
                        throw new RuntimeException(e);
                    }
    
                    // return accessibility
                    field.setAccessible(false);
                }
                targetClass = targetClass.getSuperclass();
            }
            while (targetClass != null && targetClass != Object.class);
        }
    }
    

    Woo-Hoo! We are finally done!

    Now, your tests are:

    • fast — they are executed on a JVM and don’t require going to the network or a persistent layer.
    • repeatable — they don’t depend on emulator stability or network quality.
    • (potentially!) simple and consistent — there is a lot of good information out there on how to write good unit tests.
    • independent — since the persistent layer isn’t used, one test won’t influence another.

    In addition to all of this awesomeness, your code should actually be better off, too. Hopefully writing unit tests will force you to simplify classes with too many dependencies and more carefully think through interfaces.

    Thanks!

    Let me mention several people who helped me to put this article together: Wils Dawson made the initial move to use Robolectric, Nadeem Khan figured out all those pesky details about usage of Robolectric, and Hans Reichenbach put a lot of these integration steps in writing on our wiki. Thanks guys!

    https://github.com/vronin-okta/okta_blog_samples/tree/master/android_unit_testing

Android Unit Testing Part III: Disintegration

avatar-victor_ronin.png
Victor Ronin
  ·

This is the third of a four part series on Android Unit Testing. In the last two articles I discussed the general principles of having good tests and the way to run Android tests on JVM making them fast. This part will show how to make your Android code less heavily coupled. This is a preparation step to ensure that your tests are isolated from each other.

We want to test each unit of work separately to make sure that each piece of our machinery is working properly. We need to be able to inject all dependencies into classes under the test (instead of this class instantiating a production dependency). I am not a fan of manual dependency injection (i.e., passing them through a constructor or setters). Such a manual method requires a lot of code and drags all dependencies through multiple classes to inject them into the end class.

There are a lot of dependency injection frameworks for Android out there: Dagger, RoboGuice, SpringAndroid, Guice, and Transfuse are a few. I won’t go into a detailed comparison, but I like Dagger the most because it provides compile time injection, and doesn’t influence runtime (specifically startup time) too much.

Again, here there are detailed tutorials at the end of this post and my summary is below:

My Summary

  • Modify the application Gradle file

    Add the following code under the dependency section:

    compile 'com.squareup.dagger:dagger:1.2.1'
    provided 'com.squareup.dagger:dagger-compiler:1.2.1'
    
  • Modify the manifest

    Add the following code to the Application tags:

    android:name=".MyApplication"
    
  • Create the classes

    Add the MyApplication class.

    public class MyApplication extends Application {
        private ObjectGraph applicationGraph;
    
        @Override
        public void onCreate() {
            super.onCreate();
    
            applicationGraph = ObjectGraph.create(getModules().toArray());
        }
    
        protected List<Object> getModules() {
            return Arrays.<Object>asList(
                    new MyModule(this)
            );
        }
        public void inject(Object object) {
            applicationGraph.inject(object);
        }
    }
    
  • Add the MyModule class.

    package com.example.myapplication;
    
    import dagger.Module;
    import dagger.Provides;
    
    @Module(
            injects = {
                    MainActivity.class
            }
    )
    public class MyModule {
        private final MyApplication application;
    
        public MyModule(MyApplication application) {
            this.application = application;
        }
    }
    
  • Modify MainActivity and replace code private Foo foo = new Foo(); with:

    @Inject
    Foo foo;
    

    and add following code to onCreate():

    // This will inject all @Inject members
    // recursively for everything what is marked as @Inject
    ((MyApplication)getApplicationContext()).inject(this);
    
  • Modify the Foo class and replace code Bar bar = new Bar(); with:

    @Inject
    Bar bar;
    

    Modify Bar class and add

    @Inject
    Bar() {
    }
    

Now, instances of Foo and Bar are automatically injected in the runtime. However, your test will fail with NPE, because the Foo class has a Bar dependency which wasn’t delivered. I.e., we don’t want it injected by Dagger -— we want mocked dependency, not a real one.

Resources

Stay tuned for the final part of our series, where I will show you how to make tests isolated. You can also check out the full code at GitHub.

Android Unit Testing Part II: Escaping Dalvik’s Hold

avatar-victor_ronin.png
Victor Ronin
  ·

This is the second of a four part series on Android Unit Testing. In these posts, we’ll walk through the key steps engineers should take to make Android test fast by running them on JVM (versus running them on emulator).

For background information on the importance of Android testing, visit Part I of the series.

It appears that the need to run tests on an Android device or an emulator has concerned Android engineers for almost as long as Android has existed – and Christian Williams created Robolectric to solve this problem. Robolectric allows you to run unmodified test code (referring to Android specific classes) on your desktop (in a JVM) instead of running them on an emulator or device in the Android Virtual Machine, or Dalvik.

I have listed several good tutorials at the end of this post that illustrate exactly how this can be done, but they also include some details you may not yet need. So, use the tutorial links for details, but check out “My Summary” for a short overview of what you need to do:

My Summary

  • Create a new project in Android Studio (I used Studio 0.8.14)

    Choose as example Blank Activity project.

  • Modify the TOP Gradle file

    Add the following code to the dependencies section:

    classpath 'org.robolectric:robolectric-gradle-plugin:0.12.+'
    
  • Modify the application Gradle file

    Add the following code under apply plugin: ‘com.android.application':

    apply plugin: 'robolectric'
    
  • Add the following under the dependencies section:

    androidTestCompile('junit:junit:4.11')
    androidTestCompile('org.robolectric:robolectric:2.3')
    
  • Add this section:

    robolectric {
            include '**/*Test.class'
    }
    
  • Create the code that you want to test

    Modify MainActivity. Add the following code to it:

    private Foo foo = new Foo();
    public int getSomething() {
        return foo.getFoo();
    }
    

    Add the Foo class:

    package com.example.myapplication;
    
    public class Foo {
        Bar bar = new Bar();
    
        public int getFoo() {
            return bar.getBar();
        }
    }
    
  • Add the Bar class:

    package com.example.myapplication;
    
    public class Bar {
    
        public int getBar() {
            return 4;
        }
    }
    
  • Create a test

    Delete the ApplicationTest file.

    Create the following FooTest class under your androidTest:

    package com.example.myapplication;
    
    import junit.framework.Assert;
    
    import org.junit.Before;
    import org.junit.Test;
    import org.junit.runner.RunWith;
    import org.robolectric.RobolectricTestRunner;
    
    @RunWith(RobolectricTestRunner.class)
    public class FooTest {
        Foo sut;
    
        @Before
        public void setUp() {
            sut = new Foo();
        }
    
        @Test
        public void testGetFoo_returns4() {
            // Arrange
    
            // Act
            int actualResult = sut.getFoo();
    
            // Assert
            Assert.assertEquals(4, actualResult);
        }
    }
    
  • Create the configuration

    1. Create a gradle configuration.
    2. Set “Tests” as a name.
    3. Choose the top gradle file as a project.
    4. Type test in Tasks.

    Now, without launching the emulator, you can run this configuration and see that your test has passed. It is much faster than before—and repeatable. You can put this under build automation and it will totally work.

  • JVM

    There are alternative ways to run the test on a JVM. For example, you can create a JUnit task and ensure that all your tests and classes don’t touch any Android specific classes. However, this is not easy, as you must design all your code with this restriction in mind.

    The changes which we did to run on JVM are great, but we are still facing the limitations of using integration tests. For example, if the implementation of a Bar class changes and now uses the network, you might start seeing flakiness in the testGetFoo_returns4 test because of a bad network connection.

Additional Resources

Stay tuned for part three of our series, where I will show you how to achieve test isolation using dependency injection. You can also check out the full code at GitHub.