Vawks Clamantis

Screaming at the top of my fingers.

Caching With Hapi

In most cases, it’s better to serve old data quickly than to make the user wait for the “correct” data. It’s certainly faster. This is called caching. You want your APIs to be fast? Cache your APIs.

You could probably slap redis or memcached into your existing setup, but what if you’re running an environment like MRI that has to spin up a new process for each request? Or what if you’ve got a number of legacy systems that serve XML, and you want to combine parts of their data into a JSON API for your new mobile app?

Enter hapi.js, a full featured node.js server framework from WalmartLabs. It was first deployed to power Walmart.com under the extreme load of the Black Friday rush.

So yes, I’d say it’s production ready.

Starting Simple

Let’s start with the example app from the hapi.js README. I’ve made some slight modifications, removing comments and changing the route to be /.

Create a new directory, and copy the following into an index.js file:

1
2
3
4
5
6
7
8
9
10
11
12
13
var Hapi = require('hapi');

var server = Hapi.createServer('localhost', 8000);

server.route({
  method: 'GET',
  path: '/',
  handler: function (request, reply) {
    reply('hello world');
  }
});

server.start();

Create a package.json file, install hapi, and start your server by running:

1
2
3
npm init
npm install hapi --save
node .

You can now view your app at http://localhost:8000/

You might notice that Hapi doesn’t output anything. Unlike Express or Sinatra, there’s no helpful message to tell you where to point your browser, or any default request logging. If you can’t remember the settings you used, you can add a line like this to your file:

1
console.log("Listening on: " + server.info.uri);

Getting Data

In order to display data, we need to get data. We’ll use another module in the Hapi family called Nipple to make an HTTP request out to a public API at data.nasa.gov. All the Hapi modules are Ren & Stimpy themed, which is either awesome or awkward depending on who you have to explain it to. This module is a reference to season 2 episode 3.

First install the module: npm install nipple --save

Then require the module at the top of your file:

1
var Nipple = require('nipple');

Lastly, modify the server route to make the request:

1
2
3
4
5
6
7
8
9
server.route({
  method: 'GET',
  path: '/',
  handler: function (request, reply) {
    Nipple.get('http://data.nasa.gov/api/get_recent_datasets/', function (err, res, payload) {
      reply(err || payload);
    });
  }
});

Load up the page, and you’ll see a bunch of JSON, but wait! Check the headers on that response and you’ll see content-type: text/html.

Hapi automatically sets a content type based on what you hand to the reply function. Since we’re passing a string, Hapi sets a text-html content type. We could force the content type, but we don’t want to accidentally serve invalid JSON, so let’s parse the JSON we received from NASA. Modify the reply line:

1
reply(err || JSON.parse(payload));

We’re now passing an object to the reply function, so Hapi will stringify it and set the application/json content header. If NASA happens to return invalid JSON, our parse will throw an error and Hapi will respond with a 500 status code and the following JSON:

1
2
3
4
5
{
  "statusCode": 500,
  "error": "Internal Server Error",
  "message": "An internal server error occurred"
}

Caching

Hapi lets you set cache settings on routes, but that only affects the cache headers it sends down. In order to cache the server data we need to define a server method. (You can also cache manually, but server methods are so convenient that I encourage you to use them.)

Server methods are designed to be asynchronous, so they’ll take a node-style callback as the final argument: function (err, result){}. By convention, this callback is called next.

Define the following in your file:

1
2
3
4
5
6
7
8
9
10
getNasaData = function(next) {
  Nipple.get('http://data.nasa.gov/api/get_recent_datasets/', function (err, res, payload) {
    console.log("Getting data...") // log so we can tell when a request is made
    if (err) {
      next(err);
    } else {
      next(null, JSON.parse(payload));
    }
  });
};

Now we need to register this function as a server method:

1
server.method('getNasaData', getNasaData, {});

We’ve used the same name for the server method as our global method. This is purely for convenience. We can name the server method whatever we want. We’re also passing in an empty hash for the config. We’ll fix that shortly.

First, modify the route handler to use the new server method:

1
2
3
4
5
handler: function (request, reply) {
  server.methods.getNasaData(function(error, result) {
    reply(error || result);
  });
}

Restart your server, and you should see the same data, along with a Getting data... line in the log output.

Now for the cool part! Modify your server method to this:

1
2
3
4
5
6
SECOND = 1000;
server.method('getNasaData', getNasaData, {
  cache: {
    expiresIn: 10 * SECOND
  }
});

Restart the server, and hit that endpoint. You’ll see Getting data... the first time, but if you hit the endoint again within 10 seconds, you won’t see that log output.

If you find your data is not being cached the first thing to check is that you’re passing the results of your server function to next correctly. If you leave off the first argument to next, you’re passing the result in as the error argument, which (because it’s not really an error object) will get relayed as a response. The end result is that everything looks fine, but caching doesn’t work.

I’d Like to Have an Argument

Let’s get a little more complicated. It’s rare that you just want to get data from a URL that never changes. The NASA api accepts a count query param, so let’s add that in:

1
2
3
4
5
6
7
8
9
10
11
getNasaData = function(count, next) {
  var url = 'http://data.nasa.gov/api/get_recent_datasets/?count=' + count;
  Nipple.get(url, function (err, res, payload) {
    console.log("Getting data...")
    if (err) {
      next(err);
    } else {
      next(null, JSON.parse(payload));
    }
  });
};

Now we can make our route use the query param passed, if available:

1
2
3
4
5
6
7
8
9
10
server.route({
  method: 'GET',
  path: '/',
  handler: function (request, reply) {
    var count = request.query.count || 10;
    server.methods.getNasaData(count, function(error, result) {
      reply(error || result);
    });
  }
});

Restart your server, and repeated quick calls will not call the function. But if you change the count param, it will!

Since our arguments are simple strings/numbers, Hapi can generate our cache key automatically. If we use an object, we have to tell it how to generate the key. If the object has a unique ID, you can use that, but in many cases, the solution is as simple as stringifying the object into JSON.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
getNasaData = function(opts, next) {
  var count = opts.count || 10
  var url = 'http://data.nasa.gov/api/get_recent_datasets/?count=' + count;
  Nipple.get(url, function (err, res, payload) {
    console.log("Getting data...")
    if (err) {
      next(err);
    } else {
      next(null, JSON.parse(payload));
    }
  });
};

SECOND = 1000;
server.method('getNasaData', getNasaData, {
  cache: {
    expiresIn: 60 * SECOND
  },
  generateKey: function(opts) {
    return JSON.stringify(opts);
  }
});

server.route({
  method: 'GET',
  path: '/',
  handler: function (request, reply) {
    server.methods.getNasaData(request.query, function(error, result) {
      reply(error || result);
    });
  }
});

Of course, this approach means that http://localhost:8000/?count=10&foo=bar will be cached separately from http://localhost:8000/?count=10, which is not what we want. In this case, our previous solution was better.

Persisting the Cache

You can cache the results of your function for as long as you want, but if you restart the server with the code above, you’ll lose your cache. In order to prevent that, we’ll need to use an external persistence mechanism. Let’s show an example with redis.

First, make sure you have redis installed and running on your machine on the default port (6379). If you run redis-cli on your machine, you should get:

1
redis 127.0.0.1:6379>

Run npm install catbox-redis --save. Then change your createServer call to include some cache config:

1
2
3
var server = Hapi.createServer('localhost', 8000, {
  cache: 'catbox-redis'
});

Voila! Restart your server, and results should still be cached, even between server restarts. The default value for cache is catbox-memory, which is why were able to use caching before specifying this server config.

If your redis box isn’t on localhost or the default port, you can give more verbose config options like so:

1
2
3
4
5
6
7
var server = Hapi.createServer('localhost', 8000, {
  cache: {
    engine: 'catbox-redis',
    host: '0.0.0.0',
    port: 6379
  }
});

NOTE: Don’t use localhost or 127.0.0.1 for the host. Those won’t work.

If you’re using RedisToGo on Heroku (with sets a REDISTOGO_URL enviroment variable), you’ll need some code like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var cache_cfg;

if (process.env.REDISTOGO_URL) {
  var rtg = require("url").parse(process.env.REDISTOGO_URL)
  cache_cfg = {
    engine: 'catbox-redis',
    host: rtg.hostname,
    port: rtg.port,
    password: rtg.auth.split(":")[1]
  };
} else {
  cache_cfg = 'catbox-redis';
}

var server = Hapi.createServer('localhost', 8000, {
  cache: cache_cfg
});

There are better ways to write environment-specific configuration (including Hapi’s configuration module), but this will get you started.

One Weird Trick

If you play around with the server, you’ll see that cached results come back in just a couple milliseconds, but uncached results are much slower, around a second.

We can do better. Let’s fix it with some configuration:

1
2
3
4
5
6
7
8
9
SECOND = 1000;
MINUTE = 60 * SECOND;
server.method('getNasaData', getNasaData, {
  cache: {
    expiresIn: 60 * MINUTE,
    staleIn: 10 * SECOND,
    staleTimeout: 100
  }
});

Now values are considered stale after 10 seconds. If a value is stale, the server will try to generate a new value, but if that takes longer than the timeout (100 milliseconds) we respond with the cached value. The calculation of the new value still happens in the background, so if someone hits the endpoint a few seconds later, they’ll get the new cached value.

With this config in place, the only time someone will have to pay the full cost of the external API call is if nobody else has requested that data for an hour and the cache has actually expired. Otherwise they’ll never wait more than 100ms!

I would not recommend using these values in production, but they do illustrate the behavior nicely.

Conclusion

Hapi makes it simple to cache your server responses in very smart ways. The ability to cache intermediate results means that you repeat the least amount of work possible. The stale config settings warm your cache while remaining completely invisible to the user.

We haven’t even talked about monitoring your Hapi app in production, running packs of servers, or Hapi’s awesome configuration possibilities. The documentation for Hapi v3 is still a few months out, but the team is responsive on Github, and the project has a lot going for it.

Completed code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
var Hapi = require('hapi');
var Nipple = require('nipple');

var server = Hapi.createServer('localhost', 8000, {
  cache: 'catbox-redis'
});

getNasaData = function(count, next) {
  var url = 'http://data.nasa.gov/api/get_recent_datasets/?count=' + count;
  Nipple.get(url, function (err, res, payload) {
    console.log("Getting data... Count: " + count)
    if (err) {
      next(err);
    } else {
      next(null, JSON.parse(payload));
    }
  });
};

SECOND = 1000;
MINUTE = 60 * SECOND;
server.method('getNasaData', getNasaData, {
  cache: {
    expiresIn: 60 * MINUTE,
    staleIn: 10 * SECOND,
    staleTimeout: 100
  }
});

server.route({
  method: 'GET',
  path: '/',
  handler: function (request, reply) {
    var count = request.query.count || 10;
    server.methods.getNasaData(count, function(error, result) {
      reply(error || result);
    });
  }
});

server.start();
console.log("Listening on: " + server.info.uri);