Chapter 4. Core APIs
There are a lot of APIs in Node but some of them are more important than others. These core APIs will form the backbone of any Node app. You'll find yourself using them again and again.
Events
The first API we are going to look at is the Events API. This is because, while abstract, it is a fundamental piece of making every other API work. By having a good grip on this API, you'll be able to use all the other APIs effectively.
If you've ever programmed JavaScript in the browser you'll have used events before. However, the event model used in the browser comes from the DOM rather than JavaScript itself. A lot of the concepts in the DOM don't necessarily make sense out of that context. Let's look at the DOM model of events and compare it to the implementation in Node.
The DOM has a user-driven event model based on user interaction with a set of interface elements arranged in a tree structure (HTML, XML, etc). This means when a user interacts with a particular part of the interface there is an event and a context, which is the HTML/XML element on which the click or other activity took place. That context has a parent and potentially children. Since the context is within a tree, the model includes the concepts of bubbling and capturing. This allows elements either up or down the tree to receive the event that was called.
For example, in an HTML list, a click event on an <li> can be captured by a listening on the <ul> that is its parent. Conversely a click on the <ul> can be bubbled down to a listener on the <li>. Since JavaScript objects don't have this kind of tree structure the model in Node is much simpler.
EventEmitter
Because the event model is tied to the DOM in browsers, Node created the EventEmitter class in order to provide some basic event functionality. All event functionality in Node revolves around EventEmitter, because it is also designed to be an interface class for other classes to extend. It would be unusual to call an EventEmiter instance directly.
EventEmitter has a handful of methods, the main two being on and emit. The class provides these methods to other classes to use. The on method creates an event listener for an event:
Example 4.1. Listening for an event with the on method
server.on('event', function(a, b, c) {
//do things
});
The on method takes two parameters: the name of the event to listen for and the function to call when that event is emitted. Since EventEmitter is an interface pseudo-class the class that inherits from EventEmitter is expected to be invoked with the new keyword. Let's look at how we create a new class as a listener.
Example 4.2. Creating a new class that supports events with EventEmitter
var utils = require('utils'),
EventEmitter = require('events').EventEmitter;
var Server = function() {
console.log('init');
};
utils.inherits(Server, EventEmitter);
var s = new Server();
s.on('abc', function() {
console.log('abc');
});
This example begins by including the utils module so we can use the inherits method. We are going to look at utils in detail later in the chapter, so don't worry too much about how inherits works right now.
We then include the events module. However, we want to access just the specific EventEmitter class inside that module. Note how EventEmitter is capitalized to show it is a class. We didn't use a createEventEmitter method because we aren't planning to use an EventEmitter directly. We simply want to attach its methods to the server class we are going to make.
Once we have included the modules we need, the next step is to create our basic Server class. This offers just one simple function, which logs a message when it is initialized. In a real implementation, we would decorate the Server class prototype with the functions that the class would use. For the sake of simplicity, we've skipped that. The important step is to use sys.inherits to add EventEmitter as a superclass of our Server class.
When we want to use the Server class, we instantiate it with new Server(). This instance of Server will have access to the methods in the superclass (EventEmitter). That means we can add a listener to our instance using the on method.
Right now, however, the event listener we added will never be called, because the abc event isn't fired. We can fix this by adding some code to emit the event.
Example 4.3. Emitting an event
s.emit('abc');
Firing the event listener is as simple as calling the emit method that the Server instance inherited from EventEmitter. It's important to note that these events are instance based. There are no global events. When you call the on method, you attach to a specific EventEmitter-based object. Even the various instances of the Server class don't share events. s from the code sample will not share the same events as another Server instance, such as created by var z = new Server();.
Callback syntax
An important part of using events is dealing with callbacks. Chapter 3 looks at best practices in much more depth, but we'll look here at the mechanics of callbacks in Node. They use a few standard patterns, but first let's discuss what is possible.
Example 4.4. Passing parameters when emitting an event
s.emit('abc', a, b, c);
When calling emit, in addition to the event name, you can also pass an arbitrary list of parameters. The previous example included three such parameters. These will be passed to the function listening to the event. When you receive a request event from the http Server, for example, you receive two parameters: req and res. When the request event was emitted, those parameters were passed as the second and third arguments to the emit.
It is important to understand how Node calls the event listeners, because it will affect your programming style. When emit() is called with arguments, the following code is used to call each event listener.
Example 4.5. Calling event listeners from emit
if (arguments.length <= 3) {
// fast case
handler.call(this, arguments[1], arguments[2]);
} else {
// slower
var args = Array.prototype.slice.call(arguments, 1);
handler.apply(this, args);
}
This code uses both of the JavaScript methods for calling a function from code. If emit() was passed with three or fewer arguments, the method takes a shortcut and uses call. Otherwise it use the slower apply to pass all the arguments as an array. The important thing to recognize here, though, is that Node makes both of these calls using the this argument directly. This means that the context in which the event listeners being called is the context of EventEmitter耒not their original context. Using Node REPL, you can see what is happening when things get called EventEmitter.
Example 4.6. The changes in context caused by EventEmitter
> var EventEmitter = require('events').EventEmitter,
... util = require('util');
>
> var Server = function() {};
> util.inherits(Server, EventEmitter);
> Server.prototype.outputThis= function(output) {
... console.log(this);
... console.log(output);
... };
[Function]
>
> Server.prototype.emitOutput = function(input) {
... this.emit('output', input);
... };
[Function]
>
> Server.prototype.callEmitOutput = function() {
... this.emitOutput('innerEmitOutput');
... };
[Function]
>
> var s = new Server();
> s.on('output', s.outputThis);
{ _events: { output: [Function] } }
> s.emitOutput('outerEmitOutput');
{ _events: { output: [Function] } }
outerEmitOutput
> s.callEmitOutput();
{ _events: { output: [Function] } }
innerEmitOutput
> s.emit('output', 'Direct');
{ _events: { output: [Function] } }
Direct
true
>
The sample output first sets up a Server class. It includes functions to emit the output event. The outputThis method is attached to output event as an event listener. When we emit the output event from various contexts, we stay within the scope of the EventEmitter object, so the value of this that s.outputThis has access to is the one belonging to the EventEmitter. This means the this variable must be passed in as a parameter and assigned to, if we wish to create event listeners that make use of it.
HTTP
One of the core tasks of Node.js is to act as a web server. This is such a key part of the system that when Ryan Dahl started the project, he rewrote the HTTP stack for V8 to make it non-blocking. Although both the API and the internals for the original HTTP implementation has morphed a lot since it was created, the core activities are still the same. The Node implementation of HTTP is non-blocking and fast. Much of the code has moved from C into JavaScript.
HTTP uses a pattern that is common in Node. Pseudo-class[7] factories provide an easy way to create a new server. The http.createServer() method provides us with a new instance of the HTTP Server class. It is this server class that we use to define the actions taken when Node receives incoming HTTP requests. There are a few other main pieces of the HTTP module, and other Node modules in general. These are the events the Server class fires and the data structures that are passed to the callbacks. Knowing about these three types of Class allow you use the HTTP module well.
HTTP Servers
Acting as an HTTP server is probably the commonest current use case for Node. In the previous chapter, we set up an HTTP server and used it to serve a very simple request. However, HTTP is a lot more multifaceted than that. The server component of the HTTP module provides the raw tools to build complex and comprehensive Web servers. In this chapter, we are going to explore the mechanics of dealing with requests and issuing responses. Even if you end up using a higher level server such as Express, many of the concepts it uses are extensions of those defined here.
As we've already, seen the first step in using HTTP servers is to create a new server using the http.createServer() method. This returns a new instance of a the Server class. This Class has only a few methods, because most of the functionality is going to be provided using events. The http server Class has six events and three methods. The other thing to notice is how most of the methods are used to initialize the server, whereas events are used during its operation.
Let's start by creating the smallest basic HTTP server code we can.
Example 4.7. A simple, but very short, HTTP server
require('http').createServer(function(req,res){res.writeHead(200, {}); res.end('hello world');}).listen(8125);
This example is not good code. However, it illustrates some important points. We'll fix the style shortly. The first thing we do is require the http module. Notice how I can chain methods in order to access the module without first assigning it to a variable. Many things in Node return a function,[8] which allows us to invoke those functions immediately. From the included http module we call createServer. This doesn't have to take any arguments, but we pass it a function to attach to the request event. Finally, we tell the server created with createServer to listen on port 8125.
I hope you never write code like this in real situations, but it does show the flexibility of the syntax and the potential brevity of the language. Let's be a lot more explicit about our code. The following rewrite should make it a lot easier to understand and maintain.
Example 4.8. A simple, but more descriptive, HTTP server
var http = require('http');
var server = http.createServer();
var handleReq = function(req,res){
res.writeHead(200, {});
res.end('hello world');
};
server.on('request', handleReq);
server.listen(8125);
This example implements the minimal web server again. However, we've started assigning things to named variables. This not only makes the code easier to read than when it's chained, but also means you can reuse it. For example, it's not uncommon to use http more than once in a file. You want to have both an HTTP server and an HTTP client, so reusing the module object is really helpful. While JavaScript doesn't force you to think about memory, it doesn't mean you should thoughtlessly litter unnecessary objects everywhere. So rather than use an anonymous callback, I've named the function that handles the request event. This is less about memory usage and more about readability. I'm not saying you shouldn't use anonymous functions, but if you can lay out your code so it's easy to find, that helps a lot when maintaining it.
Note
Remember to look at Part I of the book for more help with programming style. Chapters 3 and 4 deal with programming style in particular.
Because we didn't pass the request event listener as part of the factory method for the http Server object, we need to add an event listener explicitly. Calling the on method from EventEmitter does this. Finally, as with the previous example, we call the listen method with the port we want to listen on. The http Class provides other functions, but this example illustrates the most important ones.
The http server supports a number of events, which are associated with either the TCP or HTTP connection to the client. The connection and close events indicate the build-up or tear-down of a TCP connection to a client. It's important to remember that some clients will be using HTTP 1.1, which supports keep-alive. This means that their TCP connections may remain open across multiple HTTP requests.
The request, checkContinue, upgrade, and clientError events are associated with HTTP requests. We've already used the request event, which signals a new HTTP request.
The checkContinue event indicates a special event. It allows you to take more direct control of an HTTP request in which the client streams chunks of data to the server. As the client sends data to the server, it will check whether it can continue, at which point this event will fire. If an event handler is created for this event, the request event will not be emitted.
The upgrade event is emitted when a client asks for a protocol upgrade. The http server will deny HTTP upgrade requests unless there is an event handler for this event.
Finally, the clientError event passes on error events sent by the client.
The HTTP server can throw a few events. The most common one is request but you can also get events associated with the TCP connection for the request as well as other parts of the request life-cycle.
When a new TCP stream is created for a request, a connection event is emitted. This event passes the TCP stream for the request as a parameter. The stream is also available as a request.connection variable for each request that happens through it. However, only one connection event will be emitted for each stream. This means tat many requests can happen from a client with only one connection event.
HTTP Clients
Node is also great when you want to make outgoing HTTP connections. This is useful in many contexts, such as using Web Services, connecting to document store databases, or just scraping web sites. You can use the same http module when doing HTTP requests, but should use the http.ClientRequest class. There are two factory methods for this class: a general purpose one and a convenience method. Let's take a look at the general purpose case:
Example 4.9. Creating an HTTP request
var http = require('http');
var opts = {
host: 'www.google.com'
port: 80,
path: '/',
method: 'GET'
};
var req = http.request(opts, function(res) {
console.log(res);
res.on('data', function(data) {
console.log(data);
});
});
req.end();
The first thing you can see is that an options object defines a lot of the functionality of the request. We must provide the host name (although an IP address is also acceptable), the port, and the path. The method is optional and defaults to a value of GET if none is specified. In essence, the example is specifying that the request should be an HTTP GET request to http://www.google.com/ on port 80.
The next thing we do is use the options object to construct an instance of http.ClientRequest using the factory method http.request(). This method takes an options object and an optional callback argument. The passed callback listens to the response event. When a response event is received, we can process the results of the request. In the previous example, we simply output the response object to the console. However, it's important to notice that the body of the HTTP request is actually received via a stream in the response object. As such, you can subscribe to the data event of the response object to get the data as it becomes available.
The final important point to notice is that we had to end() the request. Since this was a GET request, we didn't write any data to the server, but for other HTTP methods, such as PUT or POST, you may need to. Until we call the end() method, request won't initiate the HTTP request because it doesn't know whether it should still be waiting for us to send data.
Making HTTP GET requests
Since GET is such a common HTTP use case, there is a special factory method to support it in a more convenient way:
Example 4.10. Simple HTTP GET requests
var http = require('http');
var opts = {
host: 'www.google.com'
port: 80,
path: '/',
};
var req = http.get(opts, function(res) {
console.log(res);
res.on('data', function(data) {
console.log(data);
});
});
This example of http.get() does exactly the same thing as the previous example, but it's slightly more concise. We've lost the method attribute of the config object and we didn't have to call request.end() because it's implied.
If you run the previous two examples, you are going to get back raw Buffer objects. As described later in this chapter, a Buffer is a special class defined in Node to support the storage of arbitrary, binary data. Although it's certainly possible to work with these, you often want a specific encoding, such as UTF-8 (an encoding for UNICODE characters). You can specify this with the response.setEncoding() method:
Example 4.11. Comparing raw Buffer output to output with a specified encoding
> var http = require('http');
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { console.log(res); res.on('data', function(c) { console.log(c); }); });
> <Buffer 3c 21 64 6f 63 74 79 70
...
65 2e 73 74>
<Buffer 61 72 74 54 69
...
69 70 74 3e>
>
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { res.setEncoding('utf8'); res.on('data', function(c) { console.log(c); }); });
> <!doctype html><html><head><meta http-equiv="content-type
...
load.t.prt=(f=(new Date).getTime());
})();
</script>
>
In the first case, we do not pass ClientResponse.setEncoding(), and we get chunks of data in Buffers. Although the output is abridged in the previous print-out, you can see that it isn't just a single Buffer, but that several Buffers have been returned with data. In the second example, the data is returned as UTF-8 because we specified res.setEncoding('utf8'). The chunks of data returned from the server are still the same, but are given to the program as strings in the correct encoding rather than as raw Buffers. Although the print-out may not make this clear, there is one string for each of the original Buffers.
Uploading Data for HTTP POST and PUT
Not all HTTP is GET. You might also need to call POST, PUT, and other HTTP methods that alter data on the other end. This is functionally the same as making a GET request, except you are going to write some data upstream:
Example 4.12. Writing data to an upstream service
var options = {
host: 'www.example.com',
port: 80,
path: '/submit',
method: 'POST'
};
var req = http.request(options, function(res) {
res.setEncoding('utf8');
res.on('data', function (chunk) {
console.log('BODY: ' + chunk);
});
});
req.write("my data");
req.write("more of my data");
req.end();
This example is very similar to the previous example, but uses the http.ClientRequest.write() method. This method allows you to send data upstream. And as explained earlier, it requires you to explicitly call http.ClientRequest.end() to indicate you're finished sending data. Whenever ClientRequest.write() is called, the data is sent upstream (it isn't buffered), but the server will not respond until ClientRequest.end() is called.
You can stream data to a server using ClientRequest.write() by coupling the writes to the data event of a Stream. This is ideal if you need to, for example, send a file from disk to a remote server over HTTP.
The ClientResponse Object
The ClientResponse object stores a variety of information about the request. In general, it is pretty intuitive. Obvious things that are often useful include statusCode (which contains the HTTP status) and headers (which is the response header object). Also hung off of ClientResponse are various streams and properties, which you may or may not want to interact with directly.
URL
The URL module provides tools for easily parsing and dealing with URL strings. It's extremely useful if you have to deal with URLs. The module offers three methods: parse, format, and resolve. Let's start by looking at an example of parse using Node REPL.
Example 4.13. Parsing a URL using the URL module
> var URL = require('url');
> var myUrl = "http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash";
> myUrl
'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
> parsedUrl = URL.parse(myUrl);
{ href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query¶m=that&are=awesome'
, query: 'with=query¶m=that&are=awesome'
, pathname: '/some/url/'
}
> parsedUrl = URL.parse(myUrl, true);
{ href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query¶m=that&are=awesome'
, query:
{ with: 'query'
, param: 'that'
, are: 'awesome'
}
, pathname: '/some/url/'
}
>
The first thing we do, of course, is to require the URL module. Note that the names of modules are always lowercase. I've created a url as a string containing all the parts that will be parsed out. Parsing is really easy: we just call the parse method from the URL module on the string. It returns a data structure representing the parts of the parsed URL. The components it produces are:
href
protocol
host
auth
hostname
port
pathname
search
query
hash
The href is the full URL that was originally passed to parse. The protocol is the protocol used in the URL: e.g., http://, https://, ftp://, etc. host is the fully qualified hostname of the URL. This could be as simple as hostname for a local server, such as print server, or a fully qualified domain name such as www.google.com. It might also include a port number, such as 8080, or username and password credentials like un:pw@ftpserver.com. The various parts of the hostname are broken down further into auth, containing just the user credentials, port containing just the port, and hostname containing the hostname portion of the URL. An important thing to know about hostname is that it is still the full hostname including the top level domain (TLD, e.g. .com, .net, etc) and the specific server. If the URL was http://sport.yahoo.com/nhl, hostname would not give you just the TLD (yahoo.com) or just the host sport) but the entire hostname sport.yahoo.com. The url module doesn't have the capability to split the hostname down into its components such as domain or TLD.
The next set of components of the URL relate to everything after the host. The pathname is the entire file path after the host. In http://sports.yahoo.com/nhl, it would be /nhl. The next component is the search component. It stores the HTTP GET parameters in the URL. For example, if the URL was http://mydomain.com/?foo=bar&baz=qux, the search component would be ?foo=bar&baz=qux. Note the inclusion of the ?. The query parameter is similar to the search component. It contains one of two things, depending how parse was called.
parse takes two arguments: the url string and an optional Boolean that determines whether the queryString should be parsed using the querystring module, discussed in the next section. If the second argument is false, query will just contain a string similar to that of search but without the leading ?. If you don't pass anything for the second argument, it defaults to false.
The final component is the fragment portion of the URL. This is the part of the URL after the #. Commonly, this is used to refer to named anchors in HTML pages. For instance, http://abook.com/#chapter2 might refer to the second chapter on a web page hosting a whole book. The hash component in this case would contain #chapter2. Again note the included # in the string. Some sites, such as http://twitter.com, use more complex fragments for AJAX applications, but the same rules apply. So the URL for the Twitter mentions account, http://twitter.com/#!/mentions would have a pathname of / but a hash of #!/mentions.
querystring
The querystring module is a very simple helper module to deal with query strings. As discussed in the previous section, query strings are the parameters encoded at the end of a URL. However, when reported back as just a JavaScript string, the parameters are fiddly to deal with. The querystring module provides an easy way to create objects from the query strings. The main methods it offers are parse and decode, but some internal helper functions such as escape, unescape, unescapeBuffer, encode, and stringify are also exposed. If you have a query string, you can use parse to turn it into an object:
Example 4.14. Parsing a query string with the querystring module in Node-REPL
> var qs = require('querystring');
> qs.parse('a=1&b=2&c=d');
{ a: '1', b: '2', c: 'd' }
>
Here, the class's parse function turns the querystring into an object in which the properties are the keys and the values correspond to the ones in the query string. You should notice a few things, though. First, the numbers are returned as strings, not numbers. Since JavaScript is loosely typed and will coerce a string into a number in a numerical operation, this works pretty well. However, it's worth bearing in mind for those times when that coercion doesn't work.
Another thing it's important to be aware of is that you must pass the query string without the leading ? that demarks it in the URL. A typical URL might look like http://www.bobsdiscount.com/?item=304&location=san+francisco. The querystring starts with a ? to indicate where the filepath ends, but if you include the ? in the string you pass to parse, the first key will start with a ?, which is almost certainly not what you want.
This library is really useful in a bunch of contexts. Query strings are used outside of URLs. When you get content from an HTTP POST that is x-form-encoded, it will also be in query string form. All the browser manufacturers have standardized around this approach. By default, forms in HTML will send data to the server in this way also.
The querystring module is also used as a helper module to the url module. Specifically when decoding URLs you can ask URL to turn the querystring into an object for you rather than just a string. That's described in more detail in the previous section, but the parsing that is done is using the parse method from querystring.
Another important part of querystring is encode. This function takes a queryString key/value pair object and stringifies it. This is really useful when working with HTTP requests, especially POST data. It makes it easy to work with a JavaScript object until you need to send the data over the wire and then simply to encode it at that point. Any JavaScript object can be used however you should ideally use an object which only has the data in it you want because the encode method will add all properties of the object. However if the property value isn't a string, boolean or number won't serialized and the key will just be included with an empty value.
Example 4.15. Encoding an object into a querystring
> var myObj = {'a':1, 'b':5, 'c':'cats', 'func': function(){console.log('dogs')}}
> qs.encode(myObj);
'a=1&b=5&c=cats&func='
>
I/O
Streams
Many components in Node provide continuous output or can process continuous input. To make these components act in a consistent way, the stream API provides an abstract interface for them. This API provides common methods and properties that are available in specific implementation of streams. Streams can be readable, writable, or both. All streams are EventEmitter instances, allowing them to emit events.
Readable Streams
The readable stream API is a set of methods and events that provides access to chunks of data as they are sent by an underlying data source. Fundamentally, readable streams are about emitting data events. These events represent the stream of data as a stream of events. In order to make this manageable, streams have a number of features that allow you to configure how much data you get and how fast.
Example 4.16. Creating a readable file stream
var fs = require('fs');
var filehandle = fs.readFile('data.txt', function(err, data) {
console.log(data)
});
The basic stream in the example simply reads data from a file in chunks. Every time a new chunk is made available, it is exposed to a callback in the variable called data. In this example, we simply log the data to the console. However, in real use cases you might either stream the data somewhere else or spool it into bigger pieces before you work on it. In essence, the data event simply provides access to the data, while you have to figure out what to do with each chunk.
Let's look in more detail at one of the common patterns used in dealing with streams. The spooling pattern is used when we need an entire resource available before we deal with it. We know it's important not to block the event loop for Node to perform well, so even though we don't want to perform the next action on this data until we've received all of it, we don't want to block the event loop. In this scenario we use a stream to get the data but use the data only when enough is available. Typically this means when the stream ends, but it could be another event or condition.
Example 4.17. Using the spooling pattern to read a complete stream
//abstract stream
var spool = "";
stream.on('data', function(data) {
spool += data;
});
stream.on('end', function() {
console.log(spool);
});
FileSystem
Filesystem is obviously a very helpful module because you need it in order to access files on disk. It closely mimics the POSIX style of file I/O. It is a somewhat unique module in that all of the methods have both asynchronous and synchronous versions. However, I strongly recommend that you use the asynchronous methods unless you are building command line scripts with Node. Even then, it is often much better to use the async versions, even though doing so adds a little extra code, so that you can access multiple files in parallel and reduce the running time of your script.
The main issue that people face while dealing with asynchronous calls is ordering, and this is especially true with file I/O. It's common to want to do a number of moves, renames, copies, reads, or writes at one time. However, if one of the operations depends on another then this can create issues because return order is not guaranteed. This means that the first operation in the code happens after the second operation in the code. Patterns exist to deal make ordering easy. They are talked about in detail in chapter 4 but we'll recap here as well.
An example of this might be reading and then deleting a file. If the delete (unlink) happens before the read, it will be impossible to read the content of the file.
Example 4.18. Reading and deleting a file asynchronously耒but all wrong
var fs = require('fs');
fs.readFile('warandpeace.txt', function(e, data) {
console.log('War and Peace: ' + data);
});
fs.unlink('warandpeace.txt');
Notice that we are using the asynchronous methods, and while we have created callbacks we haven't written any code that defines which order they get called in. This often becomes a problem for programmers not used to programming in event loops. This code looks OK on the surface and sometimes it will work at runtime, but sometimes it won't. Instead, we need to use a pattern in which we specify the ordering we want for the calls. There are a few approaches. One common approach is to use nested callbacks. In the following example the asynchronous call to delete the file is nested within the callback to the asynchronous function that reads the file.
Example 4.19. Reading and deleting a file asynchronously using nested callbacks
var fs = require('fs');
fs.readFile('warandpeace.txt', function(e, data) {
console.log('War and Peace: ' + data);
fs.unlink('warandpeace.txt');
});
This approach is often a very effective one for discrete sets of operations. In our example with just two operations, it's easy to read and understand. This pattern can get out of control though.
Buffers
Although Node is JavaScript, it is JavaScript out its usual environment. For instance, the browser requires JavaScript to perform many functions, but manipulating binary data is rarely one of them. While JavaScript does support bitwise operations, it doesn't have a native representation of binary data. This is especially troublesome when one also considers the limitations of the number type system in JavaScript, which might otherwise lend itself to binary representation. Node introduces the Buffer class to make up for this shortfall when working with binary data is often essential.
Buffers are an extension to the V8 engine, which means that they have their own set of pitfalls. Buffers are actually a direct allocation of memory, which may mean a little or a lot depending on your experience with lower level computer languages. Unlike the data types in JavaScript, which abstract some of the ugliness of storing data, Buffer provides direct memory access warts and all. Once a Buffer is created, it is a fixed size. If you want to add more data, you must clone the Buffer into a larger Buffer. Although some of these features may seem frustrating, they allow Buffer to perform at the speed necessary for many data operations on the server. This was a conscious design choice, to trade off some programmer convenience for performance.
A quick primer on binary
I thought it was important to include this quick primer on working with binary data for those who haven't done much of it, or as a refresher for those of us that hadn't in a long time (which was true for me when I started working with Node). Computers, as almost everyone knows work by manipulating states of 'on' and 'off'. We call this a binary state, because there are only two possibilities. Everything thing in computers is built on top of this, which means that working directly binary can often be the fastest method on the computer. In order to do more complex things, we collect "bits" (each representing a single binary state) into groups of eights, often called an octet or more commonly byte[9]. This allows us to represent bigger numbers than just 0 or 1.
By creating sets of 8 bits we are able to represent any number from 0 to 255. The rightmost bit represents one, but then we double the value of the number represented by each bit as we move left. In order to find out what number it represents we simply sum the numbers in column headers (Figure 4.1, “Representing 0 through 255 in a byte”).
Figure 4.1. Representing 0 through 255 in a byte
128 64 32 16 8 4 2 1
--- -- -- -- - - - -
0 0 0 0 0 0 0 0 = 0
128 64 32 16 8 4 2 1
--- -- -- -- - - - -
1 1 1 1 1 1 1 1 = 255
128 64 32 16 8 4 2 1
--- -- -- -- - - - -
1 0 0 1 0 1 0 1 = 149
You'll also see the use of hexadecimal notation or "hex" a lot. Since bytes need to be easily described and a string of eight 0s and 1s isn't very convenient, hex notation has become popular. Binary notation is base 2, in that there are only two possible states per digit (0 or 1). Hex uses bases 16, and each digit in hex can have a value from 0 to F, where the letters A through F (or their lowercase equivalents stand for 10 through 15 respectively. What's very convenient about hex is that with two digits we can represent a whole byte. The right digit represents 1s and the left digit represents 16s. If we wanted to represent decimal 149 it is (16 x 9) + (5 x 1) or the hex 95.
Figure 4.2. Representing 0 through 255 with hex notation
Hex to Decimal:
0 1 2 3 4 5 6 7 8 9 A B C D E F
- - - - - - - - - - -- -- -- -- -- --
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Counting in hex:
16 1
-- -
0 0 = 0
16 1
-- -
F F = 255
16 1
-- -
9 5 = 149
In JavaScript, you can create a number from a hex value using the notation 0x in front of the hex value. For instance, 0x95 is decimal 149. More commonly, in Node, you'll see Buffers represented by hex values in console.log() output or Node-REPL:
Example 4.20. A creating a 3 byte buffer from an array of octets
> new Buffer([255,0,149]);
<Buffer ff 00 95>
>
So how does binary relate to other kinds of data? Well we've seen how binary can represent numbers. In network protocols, it's common to specify a certain number of bytes to convey some information, using particular bits in fixed places to indicate specific things. For example, when constructing a DNS request, the first 2 bytes are used as a number for a transaction ID, while the next byte is treated as individual bits, each used to indicate whether a specific feature of DNS is being used in this request.
The other extremely common use of binary is to represent strings. The two most common 'encoding' formats for strings are ASCII and UTF (typically UTF-8). These encodings define how the bits should be converted into characters. I'm not going to go into too much of the gory detail. Essentially, encodings work by having a look-up table of the character to a specific number represented in bytes. In order to convert the encoding, the computer has to simply convert from the number to the character by looking it up in a conversation table.
ASCII characters (some of which are non-visible "control characters", such as Return) are always exactly seven bits each, so they can be represented by the values from 0 to 127. The eight bit in a byte is often used to extend the character set to represent various choices of international characters (such as ȳ or ȱ).
UTF is a little more complex. Its character set has a lot more characters, including many international ones. Each character in UTF-8 is represented by at least 1 byte, but sometimes up to 4. Essentially, the first 128 values are good old ASCII, whereas the others are pushed further down in the map and represented by higher numbers. When a less common character is referenced, the first byte uses a number that tells the computer to check out the next byte to find the real address of the character starting on the second sheet of its map. If the character isn't on the second sheet of the map, the second byte tells the computer to look at the third, and so on. This means that in UTF-8, the length of a string measured in characters isn't necessarily the same as its length in bytes, as is always true with ASCII.
Binary and Strings
It is important to remember is that once you copy things to a Buffer, they will be stored as their binary representations. You can always convert the binary representation in the buffer back into other things such as strings later. So a Buffer is defined only by its size, not by the encoding or any other indication of its meaning.
Given that Buffer is opaque, how big does it ned to be in order to store a particular string of input? As we've said, a UTF character can occupy up to 4 bytes, so to be safe, you should define a Buffer to be 4 times the size of the largest input you can accept, measured in UTF characters. There may be ways you can reduce this burden; for instance, if you limit your input to European languages, you'll know there will be at most 2 bytes per character.
Using Buffers
Buffers can be created using three possible parameters: the length of the buffer in bytes, an array of bytes to copy into the buffer, or a string to copy into the buffer. The first and last methods are by far the most common. There aren't too many instances where you are likely to have a JavaScript array of bytes.[10]
Creating a Buffer of a particular size is a very common scenario and easy to deal with. Simply put, you specify the number of bytes as your argument when creating the Buffer.
Example 4.21. Creating a Buffer using byte length
> new Buffer(10);
<Buffer e1 43 17 05 01 00 00 00 41 90>
>
As you can see from the previous example, when we create a Buffer we get a matching number of bytes. However, since the Buffer is just getting an allocation of memory directly, it is uninitialized and the contents are left over from whatever happened to occupy them before. This is unlike all the native JavaScript types, which initialize all memory so that when you create a new primitive or object it doesn't assign whatever was already in the memory space to the primitive or object you just created. Here is a good way to think about it. If you go to a busy cafe and you want a table, the fastest way to get one would be to sit down as soon as some other people vacate one. However, while it's fast, you are left with all their dirty dishes and the detritus from their meals. You might prefer to wait for one of the staff to clear the table and wipe it down before you sit. This is a lot like Buffers versus native types. Buffers do very little to make things easy for you, but they do give you direct access to memory, fast. If you want to have a nicely zeroed set of bits, you'll need to do it yourself (or find a helper library).
Creating a Buffer using byte length is most common when you are working with things like network transport protocols that have very specifically defined structures. When you know exactly how big the data is going to be or you know exactly how big it could be and you want to allocate and reuse a Buffer for performance reasons, this is the way to go.
Probably the most common way to use a Buffer is to create it with a string of either ASCII or UTF-8 characters. Although a Buffer can hold any data, it is particularly useful for I/O with character data, because the constraints we've already seen on Buffer can make their operations much faster than operations on regular strings. So when you are building really highly scalable apps it's often worth using Buffers to hold strings. This is especially true if you are just shunting the strings around the application without modifying them. Therefore, even though strings exist as primitives in JavaScript, it's still very common to keep strings in Buffers in Node.
Example 4.22. Creating Buffers using strings
> new Buffer('foobarbaz');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('foobarbaz', 'ascii');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('foobarbaz', 'utf8');
<Buffer 66 6f 6f 62 61 72 62 61 7a>
> new Buffer('é');
<Buffer c3 a9>
> new Buffer('é', 'utf8');
<Buffer c3 a9>
> new Buffer('é', 'ascii');
<Buffer e9>
>
When we create a Buffer with a string, it defaults to UTF-8. That is, if you don't specify an encoding, it will be considered a UTF-8 string. That is not to say Buffer pads the string to fit any UNICODE character (blindly allocating 4 bytes per character), rather that it will not truncate characters. In the previous example, we can see that when taking a string with just lowercase alpha characters, the Buffer uses the same byte structure whatever the encoding because they all fall in the same range. However, when we have an é it's encoded as 2 bytes in the default UTF-8 case or if we specify UTF-8 explicitly. If we specify ASCII, the character gets truncated to a single byte.
Working with Strings
Node offers a number of operations to simplify working with strings and Buffers. First, you don't need to compute the length of a string before creating a Buffer to hold it, just assign the string as the argument when creating the Buffer. Alternatively, you can use the Buffer.byteLength() method. This method takes a string and an encoding and returns the length in bytes of that string, rather than the length in characters as String.length does.
You can also write a string to an existing Buffer. The Buffer.write() method writes a string to a specific index of a Buffer. If there is room in the Buffer starting from the specified offset, the entire string will be written. Otherwise, characters are truncated from the end of the string to fit the buffer. In either case, Buffer.write() will return the number of bytes that were written. In the case of UTF-8 strings, if a whole character can't be written to the Buffer, none of the bytes for that character will be written. In the following example, because the buffer is too small for even one non-ASCII character, it ends up empty.
Example 4.23. Buffer.write() and partial characters
> var b = new Buffer(1);
> b
<Buffer 00>
> b.write('a');
1
> b
<Buffer 61>
> b.write('é');
0
> b
<Buffer 61>
>
In a single-byte Buffer it's possible to write an 'a' character, and doing so returns 1 indicating that 1 byte was written. However trying to write a 'é' character fails because it requires 2 bytes and the method returns a 0 because nothing was written.
There is a little more complexity to Buffer.write(), though. If possible, when writing UTF-8, Buffer.write() will terminate the character string with a NUL character.[11] This is much more significant when writing into the middle of a larger Buffer.
Example 4.24. Writing a string into a Buffer including a terminator
> var b = new Buffer(5);
> b.write('fffff');
5
> b
<Buffer 66 66 66 66 66>
> b.write('ab', 1);
2
> b
<Buffer 66 61 62 00 66>
>
After creating a Buffer 5 bytes long (which could have been done directly using the string), we write the character f to the entire Buffer. f is the character code 0x66 (102 in decimal). This makes it easy to see what happens when we write the characters 'ab' to the Buffer starting with an offset of 1. The zeroeth character is left as f. At positions 1 and 2, the characters themselves are written, 61 followed by 62. Then Buffer.write() inserts a terminator, in this case a null character of 0x00.
Console.log
Borrowed from the Firebug debugger in Firefox, this simple command allows you to easily output to STDOUT without using any modules. It also does some pretty printing functionality to help enumerate objects.
Example 4.25. Outputting with console.log
> foo = {};
{}
> foo.bar = function() {1+1};
[Function]
> console.log(foo);
{ bar: [Function] }
>
[7] When I talk about Pseudo-class I am referring to the definition found in Douglas Crockford's [JavaScript: The Good Parts] (O'Reilly). From now I will use Class to refer to Pseudo-class.
[8] This works in JavaScript because it supports first class functions. See Appendix B for more information
[9] There is no "standard" size of byte, but the de facto size that virtually everyone uses nowadays is 8 bits. As such, octets and bytes are equivalent, and I'll be using the more common term byte to mean specifically an octet.
[10] It's very memory-inefficient, for one thing. If you store each byte as a number, for instance, you are using a 64-bit memory space to represent 8 bits.
[11] This generally just means a binary 0





Add a comment



Add a comment