Monday, April 6, 2015

Reactive Systems

Manifestos

From time to time groups will try to succinctly dictate a series of guiding principals. A great example of this was the Agile Manifesto which taught software developers and their leadership to essentially adapt more than plan.

Until 2012 there was no succinct engineering explanation about the desired properties of a system built to take advantage of the cloud. The cloud used to be the territory of start ups who couldn't afford to personally run servers or web-scale companies who couldn't afford not to manage massive data centers all around the world.

More recently many companies of every size have seen the need for scalable systems. Due to massive increase in interest we are starting to see a consensus forming on the best practices for services in cloud environments. Enter the Reactive Manifesto.

Reactive Manifesto

The Reactive Manifesto was released in June 2012 by the good folks at Typesafe and Guru's like Erik Meijer.

The word "Reactive" is used because it's advocating systems that "reacts" to changes:

  • React to Messages (Be Message Driven)
  • React to Load (Be Elastic)
  • React to Failures (Be Resilient)
  • React to Users (Be Responsive)

Isolated Components

Before diving into the properties of a reactive system I think it's helpful to understand what an isolated component is.

An isolated component is a self-contained, encapsulated and isolated process. Your component could be a Window's service, Java Daemon, Erlang Module, RESTful API endpoint or any number of other technologies. It just has to isolate it's work from other components inside your system.



A and B are separate components. A and B do different work. They are isolated components.

Message-Driven

Having a Message-Driven system means relying on asynchronous messages between isolated components. We want to send a message to a different component in our system, have it read the message and do some work.

Let's take a look at an example:


  1. Component A sends a message to component B.
  2. B reads the message and does some work.
While this is a simple concept it's also powerful because:
  • A and B are Loosely Coupled:
    • A and B can be changed separately with little fear of one component's changes effecting the other.
    • A and B do not need to be on the same piece of hardware or even in the same building. A can place a message on a Message Queue and it will eventually make it to B.
    • B does not need to work on the message as soon as A is done. B can work on the message when it has the capacity to.
  • B is Non-Blocking to A:
    • A doing work is not tied to B being able to do work. After A sends the message to B, A is free to work on it's next task.
  • B can give Back-Pressure to A:
    • If B is falling behind on it's work, it can tell A to slow down or stop sending messages.
  • Because we are using a Message Queue we achieve Location Transparency:
    • A does not have to know where B is in physical space. A only needs to know the name of B's queue. The message will eventually make it to B.

Elastic

Your system needs to be elastic. Not only does your system need to be designed in a way that adding more capacity is a easy, it has to automatically add or remove capacity automatically. When the Reactive Manifesto was first introduced this property was named "Scalable." However, the authors didn't feel it emphasized automatic handling of capacity so it was changed to "Elastic."

A system has to be able to:
  • Scale up
  • Scale down
  • Detect when a component has failed and relaunch it

This ensures the following features:
  • Resilience:
    • If a component dies it will be relaunched.
  • Efficient use of resources: 
    • If a system is on a public cloud, it will only be charged for the minimum amount of resources it needed.
    • If it's in a private cloud, other systems of lesser priority will have access to resources when the resources become available.
  • Responsiveness:
    • If the system comes under heavy load we can scale to meet demand.

Resilience

Resilience is a measure of how quickly your system can recover from:
  • Software Failures
  • Hardware Failures
  • Connection Failures
Because each component encapsulates its own work, failures do not propagate to other components. Moreover, if server dies or a component crashes our system will relaunch the component somewhere else.

Responsive

The system should provide rapid and consistent response times even:
  • Under heavy load
  • When failures are encountered
The system responding in an untimely manor is not just considered poor performance, it's considered a failure and should be dealt with automatically.

Four Principals Interacting

At this point it should be obvious that each principal overlaps with other principals. Having a responsive system also means having a resilient system for the same basic reasons. This is usually demonstrated by showing the following diagram:


Message-Driven supports all of the other properties of Reactive Systems. Elastic and Resilient support each other and Responsive.

An N-Tier Application

Let's try to apply these principals. We want to build a system that:
  • Takes a request
  • Contacts an external resource for some data
  • Does a calculation
  • Stores the result

An N-Tier system would look like:



This design is fine for a small system that does not experience a lot of traffic. Limited to thousands of requests a day, this a perfectly appropriate solution. However, if it gets hundreds of thousands or millions of requests we need to start breaking it apart into isolated components.

This design also suffers from a number of problems:
  • All work for each request is done all at once:
    • If any part of the system fails, including the External Resource (which our system has no control over) the whole request fails and must be tried again.
    • Parts of our system sit idle waiting for other parts to finish. The Business Layer cannot do calculations until a result is returned from the External Resource.
  • In order to scale this application we have to deploy the whole system multiple times.
  • If we need to update one piece of the system there is a higher probability of effecting functionality in a different layer.

A Reactive Implementation

Let's take a look at a reactive implementation.



While this system is more complicated it provides a number of benefits:
  • Independent Scalability:
    • If Calculator takes more time to do work the system can spin up more instances of it independent of the rest of the system.
  • Isolated Components:
    • If Calculator fails or is slow it will not stop Data Retriever or DB updater from working.
    • The External Resource's impact to our system is isolated to the Data Retriever component.
    • I can make changes to any component with little fear of effecting the other components.
  • These components can be spread across the globe and I don't have to know their locations.

It's All Been Done Before

The principals outlined in the Reactive Manifesto are not new. In the 80's Erlang was using them. There's even significant overlap in Service Oriented Architecture and Microservices. However, having a succinct and accessible way to communicate a complex topic like cloud architecture is a great way to get discussions going inside of companies.