9781449398583
chap_3.html

Chapter 3. Building Robust Node Applications

To make the most of the ServerSide JavaScript environment, it's important to understand some core concepts behind the design choices that were made for Node.js and JavaScript in general. Understanding the decisions and tradeoffs will make it easier for you to write great code and architect your systems. It will also help you explain to other people why Node.js is different from other systems they've used and where the performance gains come from. No engineer likes unknowns in their systems. Since "magic" is not an acceptable answer, so it helps to be able to explain why a particular architeture is benificial and under what circumstances.

This chapter will cover the coding styles, design patterns and production know-how you need to write good, robust Node code.

The Event Loop

A fundamental part of Node is the event loop, a concept underlying the behavior of JavaScript as well as most other interactive systems. In many languages, event models are bolted onto the side, but JavaScript events have always been a core part of the language. This is because JavaScript has always dealt with user interaction. Anyone who has used a modern Web browser is used to Web pages that do things "onclick," "onmouseover," etc. These events are so common that we hardly think about them when writing Web page interaction, but having this event support in the language is incredibly powerful. On the server, instead of the limited set of events based on the user-driven interaction with the Web page's DOM, we have an infinite variety of events based on what's happening in the server software we use. For example, the HTTP server module provides an event called "request," emitted when a user sends the Web server a request.

The event loop is the system that JavaScript uses to deal with these incoming request from various parts of the system in a sane manner. There are a number of ways people deal with "real-time" or "parallel" issues in computing. Most of them are fairly complex and frankly make our brains hurt. JavaScript takes a simple approach that makes the process much more understandable but does introduce a few constraints. By having a grasp of how the event loop works, you'll be able to use it to it's full advantage and avoid the pitfalls of this approach.

Node takes the approach that all I/O activities should be non-blocking (for reasons we'll explain more later). This means that HTTP requests, database queries, file I/O, and other things that require the program to wait do not halt execution until they return data. Instead, they run independently, and then emit an event when their data is available. This means that programming in Node.js has lots of callbacks dealing with all kinds of I/O. Callbacks often initiate other callbacks in a cascading fashion. This is a very different from browser programming. There is still a certain amount of linear setup, but the bulk of the code involves dealing with callbacks.

Because of this somewhat unfamiliar programming style, we need to look for patterns to help us effectively program on the server. That starts with the event loop. I think that most people intuitively understand event driven programming because it is like everyday life. Imagine you are cooking. You are chopping a bell pepper and a pot starts to boil over (Figure 3.1, “Event Driven People”). You finish the slice you are doing, and then turn down the stove. Rather than trying to chop and turn down the stove at the same time, you achieve the same result in a much safer manner by rapdily switching contexts. Event driven programming does the same thing. By allowing the programmer to write code that only ever works on one callback at a time, the program is both understandable but also able to quickly perform many tasks efficiently.

Figure 3.1. Event Driven People

Event Driven People

In every day life we are used to having all sorts of internal callbacks for dealing with events, and yet, like JavaScript, we always do just one thing at once. Yes, yes, I can see you are rubbing your tummy and patting your head at the same time, well done. But, if you try to do any serious activities at the same time, it goes wrong pretty quick. This is like JavaScript. It's great at letting events drive the action, but it's "single-threaded" so that only one thing happens at once.

This single-threaded concept is really important. One of the criticisms leveled at Node.js fairly often is its lack of "concurrency." That is, it doesn't use all of the CPUs on a machine to run the JavaScript. The problem with running code on multiple CPUs at once is that it requires co-ordination between multiple "threads" of execution. In order for multiple CPUs to effectively split up work, they would have to be able to talk to each other about the current state of the program, what work they'd each done, etc. While this is possible, it's a more complex model that requires more effort from both the programmer and the system. JavaScript's approach is simple: there is only one thing happening at once. Since everything that Node does is non-blocking, the time between an event being emitted and Node being able to act on that event is very small, because it's not waiting on things like disk I/O.

Another way to think about the event loop is to compare it to a postman (mailman). To our event loop postman, each letter is an event. He has a stack of events to deliver in order. For each letter (event) the postman gets, he walks to the route to deliver the letter (Figure 3.2, “The Event Loop Postman”). The route is the callback function assigned to that event (sometimes more than one). However, critically, since our postman only has a single set of legs, he can walk only a single code path at once.

Figure 3.2. The Event Loop Postman

The Event Loop Postman

Sometimes, while the postman is walking a code route, someone will give him another letter. This is the callback function he is visiting at the moment. In this case, the postman delivers the new message immediately (after all, someone gave it to him directly instead of going via the post office, so it must be urgent). The postman will diverge from his current code path and walk the proper code path to deliver the new event. He then carries on walking the original event that emitted the event he just walked.

Let's look at the behaviour of our postman in a typical program by picking something really simple. Suppose we have a Web (HTTP) server that get requests, retrieves some data from a database, and returns it to the user. In this scenario we have a few events to deal with. First (as in most cases) comes the request event from the user asking the Web server for a Web page. The callback that deals with the initial request (let's call it callback A) looks at the request object and figures out what data it needs from the database. It then makes a request to the database for that data, passing another function, callback B, to be called on the response event. Having handled the request, callback A returns. When the database has found the data, it issues the responseevent. The event loop then calls callback B, which sends the data back to the user.

This seems fairly straight forward. The obvious things to note here are the "break" in the code, which you wouldn't get in a procedural system. Since Node.js is non-blocking system, when we get to the database call that would make us wait, we instead issue a callback. This means that different functions must start handling the request and finish handling it when the data is ready to return. So we need to make sure that we pass any state we need to the callback, or make it available in some other way. JavaScript programming typically does it through closures. We'll discuss that in more detail later.

Why does this make Node more efficient? Imagine ordering food at a fast food resturant. When you get in line at the counter the server taking your order can behave in two ways. One of them is event driven and one of them isn't. Let's start with the typical approach taken by PHP and many other web platforms. When you ask the server for your order, he take it but won't serve any other customers until he has completed your order. There are a few things he can do after he's typed in your order: process your payment, pour your drink and so on. However the server is still going to have to wait an unknown amount of time for the kitchen to make your burger (to one of the authors, who is vegetarian, orders always seems to takes ages). If, as in the tradtional approach of web application frameworks, each server (thread) is allocated to just one request at a time, the only way to scale up is to add more threads. However, it's also very obvious that our server isn't being very efficient. He's spending a lot of time waiting for the kitchen to cook the food.

Obviously, in real life resturants use a much more efficient model. When a server has finished taking your order, you recieve a number which they can use to call you back. You could say a call-back number. This is how Node works. When slow things like I/O start, Node simply gives them a callback reference and then gets on with other work that is ready now, like the next customer (or event in Node's case). It's important to note that as we saw in the example of the postman at no time does a resturant server ever deal with two customers at the same time. When they are calling someone back to collect an order they are not taking a new one and vice versa. By acting in an event-driven way, the servers are able to maximize their throughput.

Another interesting thing this analogy illustrates is cases where Node fits well and where it doesn't fit. In a small resturant where the kitchen staff and the wait staff are the same people, no tradeoff can be made by becoming event-driven. Since all the work is being done by the same people event-driven architectures don't add anything. If all (or most of) the work your server does is computation, Node might not be the ideal model.

However, we can also see when the architecture fits. Imagine there are two servers and new four customers in a resturant (Figure 3.3, “Fast-food, fast-code”). If the servers serve only one customer at a time, the first two customers will get the fastest possible order, but the third and forth customers will get a terrible experience. The first two customers will get their food as soon as it is ready because the servers have dedicated their whole attention to fullfilling their order. That comes at the cost of the other two customers. In an event driven model, the first two customers might have to wait a short amount of time for the servers to finish taking the orders of the third and forth customers before they got their food, but the average wait time (latency) of the system will be much much lower.

Figure 3.3. Fast-food, fast-code

Fast-food, fast-code

Let's look at another example. We've given the event loop postman a letter to deliver that requires a gate to be opened. He gets there and the gate is closed, so he simply waits and tries again, and again. He's trapped in an endless loop waiting for the gate to open (Figure 3.4, “Blocking the event loop”). Perhaps there is a letter on the stack that will ask someone to open the gate so the postman can get through. Surely that will solve things, right? Unfortunately it won't unless the postman gets to deliver the letter, and currently he's stuck waiting endlessly for the gate to open. This is because the event that opens the gate is external to the current event callback. If we emit the event from within a callback, we already know our postman will go and deliver that letter before carrying on, but when events are emitted outside the currently executing piece of code, they will not be called until that piece of code has been fully evaluated to its conclusion.

Figure 3.4. Blocking the event loop

Blocking the event loop

As an example, the following code creates a loop that Node.js (or a browser) will never break out of:

Example 3.1. Event loop blocking code

EE = require('events').EventEmitter;
ee = new EE();

die = false;

ee.on('die', function() {
    die = true;
});

setTimeout(function() {
    ee.emit('die');
}, 100);

while(!die) {
}

console.log('done');

            

In this example, console.log will never be called because the while loop stops Node from ever getting a chance to call back the timeout and emit the die event. Although it's unlikely we'd program a loop like this that relies on an external condition to exit, it illustrates how Node.js can only do one thing at once, and getting a fly in the ointment can really screw up the whole server. This is why non-blocking I/O is an essential part of event driven programming.

Let's consider some numbers. When we run an operation in the CPU (not a line of JavaScript but a single machine code operation), it takes about 1/3 of a nano second. A 3ghz processor runs 3x109 instructions a second, so each instruction takes 10-9/3 seconds each. There are typically two types of memory in a CPU, L1 and L2 cache, each of which takes approximately 2-5ns to access. If we get data from memory (RAM), it takes about 80ns, which is about 2 orders of magnitude slower than running an instruction. However, all of these things are in the same ballpark. Getting things from slower forms of I/O is not quite so good. Imagine that getting data from RAM is equivalent to the weight of a cat. Retrieving data from the hard drive, then, could be considered to be the weight of a whale. Getting things from the network is like 100 whales. Think about how running var foo = "bar" versus a database query is a single cat versus 100 blue whales. Blocking I/O doesn't put an actual gate in front of the event loop postman, but it does send him via Timbuktu when he is delivering his events.

Given a basic understanding of the event loop, let's look at the standard Node.js code for creating an HTTP server:

Example 3.2. A basic HTTP server

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
console.log('Server running at http://127.0.0.1:8124/'

            

This code is the most basic example from the Node.js Web site (but as we'll see soon, it's not the ideal way to code). The example creates an HTTP server using a factory method in the http library. The factory method creates a new HTTP server and attaches a callback to the request event. The callback is specified as the argument to the createServer method. What's interesting here is what happens when this code is run. The first thing Node.js does is run the code above from top to bottom. This can be considered the 'setup' phase of Node programming. Since we attached some event listeners, Node.js doesn't exit, but waits for an event to be fired. If we didn't attach any events, Node.js would exit as soon as it had run the code.

So what happens when the server gets an HTTP request? Node.js emits the request event, which causes the callbacks attached to that event to be run in order. In this case, there is only one callback, the anonymous function we passed as an argument to createServer. Let's assume it's the first request the server has had since setup. Since there is no other code running, the request event is handled immediately and the callback is run. It's a very simple callback and it runs pretty fast.

Let's assume that our site gets really popular and we get lots of requests. If, for the sake of argument, our callback takes 1 second and we get a second request shortly after the first one, the second request isn't going to be acted on for another second or so. Obviously, a second is a really long time, and as we look at the requirements of real world applications, the problem of blocking the event loop becomes more damaging to the user experience. The operating system kernel actually handles the TCP connections to clients for the HTTP server, so there isn't a risk of rejecting new connections, but there is a real danger of not acting on them. The upshot of this is that we want to keep Node.js as event-driven and non-blocking as possible. In the same way that a slow I/O event should use callbacks to indicate the presence of data Node.js can act on, the Node.js program itself should be written in such a way that no single callback ties up the event loop for extended pieces of time.

This means that you should follow two strategies when writing a Node.js server:

  • Once setup has been completed, make all actions event-driven.

  • If Node.js is required to process something that will take a long time, consider delegating it to Web workers.

Taking the event-driven approach works effectively with the event loop (the name is a hint it would), but it's also important to write event-driven code in a way that is easy to read and understand. In the previous example, we used an anonymous function as the event callback, which makes things hard in a couple of ways. First, we have no control over where the code is used. An anonymous function's callstack starts from when it is used. Rather than when the callback is attached to an event. This affects debugging. If everything is an anonymous event, it can sometimes be hard to distinguish similar callbacks when an exception occurs.

Patterns

Event-driven programming is different from procedural programming. The easiest way to learn it is to practice routine patterns that have been discovered by previous generations of programmers. That is the purpose of this chapter.

Before we launch into patterns, we'll take a look at what is really happening behind various programming styles to give the patterns some context. Most of this chapter will focus on I/O, because, as discussed in the previous chapter, event-driven programming is focused on solving problems with I/O. When it is working with data in memory that doesn't require I/O, Node can be completely procedural.

The I/O Problem Space

We'll start by looking at the types of I/O required in efficient systems. These will be the basis of our patterns.

The first obvious distinction to look at is serial versus parallel I/O. Serial is obvious: do this I/O, and after it is finished do that I/O. Parallel is more complicated to implement but also easy to understand: do this I/O and that I/O at the same time. The important point here is that ordering is normally considered implicit in serial tasks, but parallel tasks could return in any order.

Groups of serial and parallel work can also be combined. For example two groups of parallel requests could execute serially: do this and that together, then do other and another together.

In Node, we assume that all I/O has unbounded latency. That means that any I/O tasks could take from 0 to infinite time. We don't know, and can't assume, how long these tasks take. So instead of waiting for them, we use placeholders (events), which then fire callbacks when the I/O happens. Since we have assumed unbounded latency, it's easy to perform parallel tasks. You simply make a number of calls for various I/O tasks. They will return whenever they are ready, in whatever order that happens to be. Ordered serial requests are also easy by nesting or referencing callbacks together so that the first callback will initiate the second I/O request, the second callback will initiate the third, etc. Even though each request is asynchronous and doesn't block the event loop, the requests are made in serial. This pattern of ordered requests is useful when the results of one I/O operation have to inform the details of the next I/O request.

So far we have two ways to do I/O: ordered serial requests, and unordered parallel requests. Ordered parallel requests are also a useful pattern. This happens when we allow the I/O to take place in parallel but we deal with the results in a particular sequence. Unordered serial I/O offers no particular benefits, so we won't consider it as a pattern.

Unordered parallel I/O

Let's start with unordered parallel I/O because it's by far the easiest to do in Node. In fact, all I/O in Node is unordered parallel by default. This is because all I/O in Node is asynchronous and non-blocking. When we do any I/O, we simply throw the request out there and see what happens. It's possible that all the requests will happen in the order we made them, but maybe they won't. When we talk about unordered, we aren't talking about randomized, simply that there is no guaranteed order.

Example 3.3. Unordered parallel I/O in Node

fs.readFile('foo.txt', 'utf8', function(err, data) {
  console.log(data);
};
fs.readFile('bar.txt', 'utf8', function(err, data) {
  console.log(data);
};

          

Simply making I/O requests with callbacks will create unordered parallel I/O. At some point in the future, both of these callbacks will fire. Which happens first is unknown, and either one could return an error rather than data without affecting the other request.

Ordered serial I/O

In this pattern, we want to do some I/O (unbounded latency) tasks in sequence. Each previous task must be completed before the next task is started. In Node, this means nesting callbacks so that the callback from each task starts the next task.

Example 3.4. Nesting callbacks to produce serial requests

server.on('request', function(req, res) {
  //get session information from memcached
  memcached.getSession(req, function(session) {
    //get information from db
    db.get(session.user, function(userData) {
      //some other web service call
      ws.get(req, function(wsData) {
        //render page
        page = pageRender(req, session, userData, wsData);
        //output the response
        res.write(page);
      });
    });
  });
});
      

          

Although nesting callbacks allows easy creation of ordered serial I/O, it also creates so called 'pyramid' code.[6] This code can be hard to read and understand, and as a consequence hard to maintain. For instance, a glance at the previous example doesn't reveal that the completion of the memcached.getSession request launches the db.get request, that the completion of the db.get request launches the ws.get request, and so on. There are a few ways to make this code more readable without breaking the fundamental ordered serial pattern.

First, we can continue to use inline function declarations, but can name them. This makes debugging a lot easier as well as giving an indication of what the callback is going to do.

Example 3.5. Naming function calls in callbacks

server.on('request', getMemCached(req, res) {
  memcached.getSession(req, getDbInfo(session) {
    db.get(session.user, getWsInfo(userData) {
      ws.get(req, render(wsData) {
        //render page
        page = pageRender(req, session, userData, wsData);
        //output the response
        res.write(page);
      });
    });
  });
});
      

          

Another approach that changes the style of code is to use declared functions instead of just anonymous or named ones. This removes the natural pyramid seen in the other approaches, which shows the order of execution, but it also breaks the code out into more manageable chunks.

Example 3.6. Using declared functions to seperate out code

var render = function(wsData) {
  page = pageRender(req, session, userData, wsData);
}; 

var getWsInfo = function(userData) {
  ws.get(req, render);
};

var getDbInfo = function(session) {
  db.get(session.user, getWsInfo);
};

var getMemCached = function(req, res) {
  memcached.getSession(req, getDbInfo);
};

          

The code just shown won't actually work. The original nested code used closures to encapsulate some variables and make them available to subsequent functions. Hence, declared functions can be good when state doesn't need to be maintained across three or more callbacks. If you need only the information from the last callback in order to do the next one, it works well. It can be a lot more readable (especially with documentation) than a huge lump of nested functions.

There are, of course, ways of passing data around between functions. Mostly it comes down to using the features of the JavaScript language itself. JavaScript has functional scope. That means that when you declare var within a function, the variable becomes local to that function. However, simply having { and } does not limit the scope of a variable. This allows us to define variables in the outer callback that can be accessed by the inner callbacks even when the outer callbacks have "closed" by returning. When we nest callbacks, we are implicitly binding the variables from all the previous callbacks into the most recently defined callback. It just turns out that lots of nesting isn't very easy to work with.

We can still perform the flattening refactoring we did, but we should do it within the shared scope of the original request, to form a closure environment around all the callbacks we want to do. This way all the callbacks relating to that intial request can be encapsulated and can share state via varaiables in the encapsulating callback.

Example 3.7. Encapsulating within a callback

       server.on('request', function(req, res) {

  var render = function(wsData) {
    page = pageRender(req, session, userData, wsData);
  };

  var getWsInfo = function(userData) {
    ws.get(req, render);
  };

  var getDbInfo = function(session) {
    db.get(session.user, getWsInfo);
  };

  var getMemCached = function(req, res) {
    memcached.getSession(req, getDbInfo);
  };

}

         

Not only does this approach organise code in a logical way, it also allows you to flatten a lot of the callback hell.

Other organizational innovations are also possible. Sometimes there is code you want to reuse across many functions. This is the province of middleware. There are many ways to do middleware. One of the most popular in Node is the model used by Connect, which could be said to be based on Rack from the Ruby world. The general idea behind its implementation is that we pass around some variables that represent not only the state but also the methods of interacting with that state.

In JavaScript, objects are passed by reference. That means when you call myFunction(someObject), any changes you make to someObject will affect all copies of someObject in your current functional scope. This is potentially tricky, but gives you some great powers if you are careful about any side effects created. Side effects are largely dangerous in asynchronous code. When something modifies an object used by a callback it can often be very difficult to figure when that change happened because it happens in a non-linear order. If you use the ability to change objects passed by argument be considerate of where those objects are going to be used.

The basic idea is to take something that represents the state and pass it between all functions that need to act on that state. This means that all the things acting on the state need to have the same interface so they can pass between themselves. This is why Connect (and therefore Express) middleware all take the form function(req, res, next). We will discuss Connect/Express middleware in more detail later in the book.

In the meantime lets look at the basic approach. When we share objects between functions earlier functions in the call stack can affect the state of those objects such that the later objects utilize the changes:

Example 3.8. Passing changes between fuctions

       var AwesomeClass = function() {
  this.awesomeProp = 'awesome!'
  this.awesomeFunc = function(text) {
    console.log(text + ' is awesome!')
  }
}

var awesomeObject = new AwesomeClass()

function middleware(func) {
  oldFunc = func.awesomeFunc
  func.awesomeFunc = function(text) {
    text = text + ' really'
    oldFunc(text)
  }
}

function anotherMiddleware(func) {
  func.anotherProp = 'super duper' 
}

function caller(input) {
  input.awesomeFunc(input.anotherProp)
}

middleware(awesomeObject)
anotherMiddleware(awesomeObject)
caller(awesomeObject)

         

Writing Code for Production

One of the challenges of writing a book is how to explain things in the simplest way possible. That runs counter to showing techniques and functional code that you'd want to deploy. While we should always strive to have the simplest, most understandable code possible, sometimes you need to do things that make code more robust, or faster at the cost of making it less simple. This section provides guidance about how to harden the applications you deploy, which you can take with you as you explore upcoming chapters. This section is about writing code with maturity that will keep your application running long into the future. It's not exhaustive but if you write robust code you won't have to deal with so many maintenance issues. One of the trade-offs of Node's single threaded approach is a tendancy to be brittle. These techniques help mitigate this risk.

Deploying a production application is not the same as running test programs on your laptop. Servers can have a wide variety of resource constraints but they tend to have a lot more resources than the typical machine you would develop on. Typically front-end servers have many more cores (CPUs) than laptop or desktop machine, but less hard drive space. They also have a lot of ram. Currently itself Node has some contraints, such as a maxmimum JavaScript heap size. This affects the way you deploy because you want to maximize the use of the CPUs and memory on the machine while using Node's easy to program single threaded approach.

Error Handling

As we saw earlier in the chapter, you can split I/O activities from other things in Node. Error handling is one of those things. JavaScript includes try/catch functionality, but it's appropriate only for errors that happen inline. When you do non-blocking I/O in Node, you pass a callback to the function. This means the callback is going to run when the event happens outside of the try/catch block. We need to be able to provide error handling that works in aysnchronous situtations. Consider the following code:

Example 3.9. Trying to catch an error in a callback, and failing

var http = require('http')

var opts = {
  host: 'sfnsdkfjdsnk.com',
  port: 80,
  path: '/'
}

try {
  http.get(opts, function(res) {
    console.log('Will this get called?')
  })
}
catch (e) {
  console.log('Will we catch an error?')
}

        

When you call http.get(), what is actually happening? We pass some parameters specifying the I/O we want to happen and a callback function. When the I/O completes, the callback function will be fired. However, the http.get() call will succeed simply by issuing the callback. An error during the GET cannot be caught by a try/catch block.

The disconnect from I/O errors is even more obvious in Node REPL. Since the REPL shell prints out any return values that are not assigned, we can see that the return value of calling http.get() is the http.ClientRequest object that is created. This means that the try/catch did its job by making sure the specified code returned without errors. However, since the hostname is nonsense, a problem will occur within this I/O request. This means the callback can't be succesfully completed. A try/catch can't help with this because the error has happened outside the JavaScript, and when Node is ready to report it we are not in that call stack any more. We've moved on to dealing with another event.

The way we deal with this in Node is by using the error event. This is a special event that is fired when an error occurs. It allows a module engaging in I/O to fire an alternative event to the one the callback was listening for to deal with the error. The error event allows us to deal with any errors that might occur in any of the callbacks that happen in any modules we use. Let's write the previous example correctly:

Example 3.10. Catching an I/O error with the error event

var http = require('http')

var opts = {
  host: 'dskjvnfskcsjsdkcds.net',
  port: 80,
  path: '/'
}

var req = http.get(opts, function(res) {
  console.log('This will never get called')
})

req.on('error', function(e) {
  console.log('Got that pesky error trapped')
})

        

By using the error event, we got to deal with the error (in this case by ignoring it). However our program survived, which is the main thing. Like try/catch in JavaScript, the error event catches all kinds of exceptions. A good general approach to exception handling is to set up conditionals to check for known error conditions and deal with them if possible. Otherwise, catching any remaining errors, logging them, and keeping your server running is probably the best approach.

Using multiple processors

As we've mentioned, Node is single threaded. This means Node is only using one processor to do its work. However, most servers have several "multi-core" processors, and you can a single multi-core processor has many processors. A server with two physical CPU sockets might have "24 logical cores," that is 24 processors exposed to the operating system. In order to make the best use of Node we should use those too. So if we don't have threads, how do we do that?

Node provides a module called cluster that allows you to deligate work to child processes. This means that Node creates a copy of its current program in another process (on Windows, it is actually another thread). Each child process has some special abilities, such as the ability to share a socket with other children. This allows us to write Node programs that start many other Node programs and then delegate work to them.

It is important to understand that when you use cluster to share work between a number of copies of a Node program, the master process isn't involved in every transaction. The master process manages the child processes, but when the children interact with I/O they do it directly, not through the master. This means if you set up a web server using cluster, requests don't go through your master process, but directly to the children. Hence, dispatching requests does not create a bottleneck in the system.

By using the cluster API you can distribute work to a Node process on every available core of your server. This makes best use of the resource. Let's look at a simple cluster script:

Example 3.11. Using Cluster to distribute work

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('death', function(worker) {
    console.log('worker ' + worker.pid + ' died');
  });
} else {
  // Worker processes have a http server.
  http.Server(function(req, res) {
    res.writeHead(200);
    res.end("hello world\n");
  }).listen(8000);
}

        

In this example, we use a few parts of Node core to evenly distribute the work across all of the CPUs available: the cluster module, the http module, and the os module. From the latter, we simply get the number of CPUs on the system.

The way cluster works is that each Node process becomes either a "master" or a "worker" process. When a master process calls the cluster.fork() method, it creates a child process that is identical to the master, except for two attributes that each process can check to see whether it is a master or child. In the master process, which is the one in which the script has been directly invoked by calling it with Node, cluster.isMaster returns true whereas cluster.isWorker returns false. cluster.isMaster returns false on the child, whereas cluster.isWorker returns true.

The example shows a master script that invokes a worker for each CPU. Each child starts an HTTP server. This is another unique aspect of cluster. When you listen() to a socket where cluster is in use, many processes can listen to the same socket. If you simply started serveral Node processes with node myscript.js, this wouldn't be possible, because the second process to start will throw the EADDRINUSE exception. cluster provides a cross-platform way to invoke several processes that share a socket. And even if the children all share a connection to a port, if one of them is jammed, it doesn't stop the other workers from getting connections.

We can do more with cluster than simply share sockets, because it is based on the child_process module. This gives us a number of attributes, some of the most useful ones relating to the health of the child processes. In the previous example, when a child dies, the master process uses console.log() to print out a death notification. However, a more useful script would cluster.fork() a new child.

Example 3.12. Forking a new worker when a death occurs

      if (cluster.isMaster) {
  //Fork workers.
  for (var i=0; i<numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('death', function(worker) {
    console.log('worker ' + worker.pid + ' died');
    cluster.fork();
  });
}

        

This simple change means that our master process can keep restarting dying processes to keep our server firing on all CPUs. However, this is just a basic check for running processes. We can also do some more fancy tricks. Since workers can pass messages pass to the master, we can have each worker report some stats, such as memory usage, to the master. This will allow the master to determine when workers are becoming unruly or confirm that workers are not freezing or getting stuck in long running events.

Example 3.13. Monitoring worker health using message passing

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

var rssWarn = (12 * 1024 * 1024)
  , heapWarn = (10 * 1024 * 1024)

if(cluster.isMaster) {
  for(var i=0; i<numCPUs; i++) {
    var worker = cluster.fork();
    worker.on('message', function(m) {
      if (m.memory) {
        if(m.memory.rss > rssWarn) {
          console.log('Worker ' + m.process + ' using too much memory.')
        }
      }
    })
  }
} else {
  //Server
  http.Server(function(req,res) {
    res.writeHead(200);
    res.end('hello world\n')
  }).listen(8000)
  //Report stats once a second
  setInterval(function report(){
    process.send({memory: process.memoryUsage(), process: process.pid});
  }, 1000)
}

        

In this example, workers report on their memory usage and the master sends an alert to the log when a process uses too much memory. This replicates the functionality of many health reporting systems that operations teams already use. It gives control to the master Node process, however, which has some benefits. This message passing interface allows the master process to send messages back to the workers too. This means you can treat a master process as a lightly loaded admin interface to your workers.

There are other things we can do with message passing that we can't do from the outside of Node. Since Node relies on an event loop to do its work, there is the danger that the callback of an event in the loop could run for a long time. This means that other users of the process are not going to get their requests met until that long running event's callback has concluded. Since the master process has a connection to each worker we can tell it to expect an "all ok" notification periodically. This means we can validate the event loop has the appropriate amount of turn-over and that it hasn't become stuck on one callback. Sadly, identifying a long-running callback doesn't allow us to make a callback for termination. Because any notification we could send to the process will get added to the event queue, it would have to wait for the long running callback to finish. This means that while using the master process allows us to identify zombie workers, our only remedy is to kill the worker and lose all the tasks it was doing.

Some preparation can give you the capability to kill an individual worker who threatens to take over its processor.

Example 3.14. Killing zombie workers

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

var rssWarn = (50 * 1024 * 1024)
  , heapWarn = (50 * 1024 * 1024)

var workers = {}

if(cluster.isMaster) {
  for(var i=0; i<numCPUs; i++) {
    createWorker()
  }
  
  setInterval(function() {
    var time = new Date().getTime()
    for(pid in workers) {
      if(workers.hasOwnProperty(pid) &&
         workers[pid].lastCb + 5000 < time) {

        console.log('Long running worker ' + pid + ' killed')
        workers[pid].worker.kill()
        delete workers[pid]
        createWorker()
      }
    }
  }, 1000)
} else {
  //Server
  http.Server(function(req,res) {
    //mess up 1 in 200 reqs
    if (Math.floor(Math.random() * 200) === 4) {
      console.log('Stopped ' + process.pid + ' from ever finishing')
      while(true) { continue }
    }
    res.writeHead(200);
    res.end('hello world from '  + process.pid + '\n')
  }).listen(8000)
  //Report stats once a second
  setInterval(function report(){
    process.send({cmd: "reportMem", memory: process.memoryUsage(), process: process.pid})
  }, 1000)
}

function createWorker() {
  var worker = cluster.fork()
  console.log('Created worker: ' + worker.pid)
  workers[worker.pid] = {worker:worker, lastCb: new Date().getTime() - 1000} //allow boot time
  worker.on('message', function(m) {
    if(m.cmd === "reportMem") {
      workers[m.process].lastCb = new Date().getTime()
      if(m.memory.rss > rssWarn) {
        console.log('Worker ' + m.process + ' using too much memory.')
      }
    }
  })
}

        

In this script we've added an interval to the master as well as the workers. Now whenever a worker sends a report to the master process, the master stores the time of the report. Every second or so, the master process looks at all its workers to check wheter any of them haven't responded in longer than 5 seconds (using > 5000 because timeouts are in milliseconds). If that is the case, it kills the stuck worker and restarts it. In order to make this process effective, I moved the creation of workers into a small function. This allows me to do the various pieces of setup in in a single place, whether I am creating a new worker or restarting a dead one.

I also made a small change to the HTTP server in order to give each request a 1 in 200 chance of failing, so you can run the script and see what it's like to get failures. If you do a bunch of parallel requests from several sources, you'll see the way this works. These are all entirely seperate Node programs that interact via message passing. This means that no matter what happens, the master process can check on the other processes, because the master is a really small program that won't get jammed.



[6] This term was coined by Tim Caswell

Site last updated on: February 29, 2012 at 06:00:28 PM PST
Cover for Up and Running with Node.js

View 1 comment

  1. Paul Sherwood – Posted April 10, 2011

    sorry, but i don't think this diagram is particularly useful.

    the postman is standing away from the event stack, there are two "callback" names but three lines, one of the lines is really wavy... these details are confusing to me.

    i think this picture + the text should somehow help explain delivering the letter/event and also what happens on returning from each delivery?

Add a comment

View 3 comments

  1. awenkhh – Posted March 23, 2011

    of the ServerSide JavaScript

    before you wrote

    server-side

  2. awenkhh – Posted March 23, 2011

    why a particular architeture is benificial and under what circumstances.

    should read

    why a particular architecture is beneficial and under what circumstances.

  3. edubkendo – Posted March 28, 2012

    For some reason there was nowhere to comment on chapter 2, but you should be aware that for much of the chapter, especially when working on the twitter clone, it was very confusing because nothing was labelled so I had a hard time figuring out which file each segment of code belonged in and in some cases it was even difficult to figure out what filename to give them at all. Generally in computer books, there is a label over code included in the text giving the filename and directory positiion. In general, this is the clearest writing about node I've yet encountered, but that one point made following along very difficult.

Add a comment

View 1 comment

  1. awenkhh – Posted March 23, 2011

    has used a modern Web brower

    should read

    has used a modern Web browser

Add a comment

View 3 comments

  1. FrankieShakes – Posted March 25, 2011

    First line, "... uses to deal with these incoming request from ... "

    should read:

    "... uses to deal with these incoming requests from ..." (missing an S on request)

  2. Paul Sherwood – Posted April 10, 2011

    "to use it to it's full advantage and avoid the pitfalls of this approach" there should be no apostrophe in "its"

  3. razweekly – Posted April 25, 2011

    You do like the word system and I think it is confusing in the first sentence. Maybe... "The event loop is how JavaScript processes these incoming requests form ..."

Add a comment

View 2 comments

  1. trueshot – Posted April 4, 2011

    "This is a very different from browser programming. "

    should be

    This is very different from browser programming.

  2. Paul Sherwood – Posted April 10, 2011

    "for reasons we'll explain more later" should be

    for reasons we'll explain in more detail later

Add a comment

View 1 comment

  1. Luke Girvin – Posted June 24, 2011

    "...help us effectively program on the server" - "program effectively" would be better.

Add a comment

View 1 comment

  1. awenkhh – Posted March 23, 2011

    Node.js

    or

    Node

    The decision should be global in the book. Personally, I prefer Node ...

Add a comment

View 1 comment

  1. awenkhh – Posted March 23, 2011

    the event loop isto compare

    should read

    the event loop is to compare

Add a comment

View 2 comments

  1. awenkhh – Posted March 23, 2011

    It's hard to understand for people who are not familiar with using callback function or closures. So maybe it is be a good to explain the concept of callback functions. Also event loops in JavaScript.

    For sure it depends on the fact for whom you're writing the book and which skills the reader should bring to understand and be able to follow the writing.

  2. Tyler Coffin – Posted May 21, 2011

    I think a different analogy may be needed. Door-to-door mail isn't as common anymore, plus the analogue seems stretched (i.e. all mail goes through central processing centers now. the postman doesn't deal with it)

Add a comment

View 5 comments

  1. awenkhh – Posted March 23, 2011

    Web server for a Web page

    or

    web-server for a web-page

    or

    web-server for a webpage

    Should also be decided globally

  2. awenkhh – Posted March 23, 2011

    Let's look at the behaviour of

    should read

    Let's look at the behavior of

  3. awenkhh – Posted March 23, 2011

    Supose we have a

    should read

    Suppose we have a

  4. John Heron – Posted April 10, 2011

    Might want to rename callback A to requestHandler and callback B to responseHandler.

  5. Leonardo Cassarani – Posted April 21, 2011
    it issues the responseevent
    

    should read

    it issues the response event
    

Add a comment

View 4 comments

  1. awenkhh – Posted March 23, 2011

    get in a proceedural system.

    should read

    get in a procedural system.

  2. awenkhh – Posted March 23, 2011

    Since Node.js is non-blocking system,

    should read

    Since Node.js is a non-blocking system,

  3. awenkhh – Posted March 23, 2011

    different fundtions must start handling

    should read

    different functions must start handling

  4. Prem kumar – Posted Aug. 31, 2011

    "Since Node.js is non-blocking system"

    shall be

    "Since Node is non-blocking system"

    Suggestion: The Node.js mentions something different when we are discussing about Node.

Add a comment

View 2 comments

  1. John Heron – Posted April 10, 2011

    I think that you're trying to use your metaphor/analogy to illustrate that this is cooperative multi-tasking and that long waits in request handlers will prevent events from being dispatched. If so you should say so explicitly for the folks who are going to get it. Perhaps you should say so explicitly and then use the analogy. Or maybe that's not the point or the only point. If not, you've confused me completely.

  2. Edmond Meinfelder – Posted Oct. 31, 2011

    Who is the intended audience for this book? People new to programming? If so, the analogy to explain callbacks makes sense. Though, it's long on words and short on code.

Add a comment

View 3 comments

  1. awenkhh – Posted March 27, 2011

    Maybe it's a good idea to give a hint, that this code will raise the CPU to 99.9% when running it ;-)

  2. Paolo Freuli – Posted May 8, 2011

    Supposing there is need for the while loop, which should be the way to make it aware of the emitted 'die' event?

  3. Edmond Meinfelder – Posted Oct. 31, 2011

    If people new to programming are reading this book, the busy while loop is a bad pattern to present, explanation or no. People will re-use code they see and take it out of context. You can say, "That's their fault," but we all, collectively, suffer.

Add a comment

View 1 comment

  1. John Heron – Posted April 10, 2011

    Maybe you could show us the psuedo code for an abstrated version of the libev event loop? libev is complicated. An event loop (even one that does poll, select and all) is not.

Add a comment

View 2 comments

  1. awenkhh – Posted March 27, 2011

    console.log('Server running at http://127.0.0.1:8124/'

    should read

    console.log('Server running at http://127.0.0.1:8124/')

  2. Leonardo Cassarani – Posted April 21, 2011

    Don't forget the semicolon at the end of the console.log call.

Add a comment

View 4 comments

  1. awenkhh – Posted March 23, 2011

    deligating it to Web workers.

    if supported ...

  2. awenkhh – Posted March 23, 2011

    consider deligating it to

    should read

    consider delegating it to

  3. Paul Sherwood – Posted April 10, 2011

    maybe you should explain what a Web worker is?

  4. Samyak – Posted April 16, 2011

    would second with Paul, explanation required.

Add a comment

View 2 comments

  1. awenkhh – Posted March 23, 2011

    distinguish similiar callbacks

    should read

    distinguish similar callbacks

  2. awenkhh – Posted March 27, 2011

    So as a reader I expect a short summary of how to avoid these problems. One way is to refactor out the callback function and give them names. Both problems are solved then. Or at least give the anonymous callback function a name like

    http.createServer(function requestEvent (req, res) {

Add a comment