Streams for every day

December 03, 2015— 3541 days ago, 0 views.

“Streams in node are one of the rare occasions when doing something the fast way is actually easier. SO USE THEM. not since bash has streaming been introduced into a high level language as nicely as it is in node.” @dominictarr at high level node style guide.

TL;DR

Streams emits Events, the native observer pattern of NodeJS.
At this moment exists 3 iterations of the Stream implementation that depend of your version of node/iojs.
Instead of use the native API (that depend of your node version) better use readable-stream or through2. Both are backward compatibility and works fine in browser build. (this last is more lightweight because just expose a Duplex Stream).
.pipe() is just a function that takes a readable source stream and hooks the output to a destination writable stream (as UNIX commands):

tweetStream.pipe(process.stdout)

Using .pipe() has other benefits too, like handling backpressure automatically so that node won’t buffer chunks into memory needlessly when the remote client is on a really slow or high-latency connection.
We have 4 types of Streams: Duplex, Readable, Transform and Writable.
A good library that collect stream utilities are mississippi.
You can implement a Stream using inheritance or composition.

Readable

Streams from which data can be read (e.g fs.createReadStream()).

const toReadableStream = input => (
  new Readable({
    read () {
      this.push(input)
      this.push(null)
    }
  })
)

Readable streams produce data that can be fed into a writable, transform, or duplex stream by calling .pipe()
For emit chunks of data you need to create a object that implement the ._read method.
It emits data events each time they get a chunk of data. From the implementation this is synonymous of this.push(data).
It emits end when it has no more data this.push(null). In others words, the event end is triggered when the last chunk of data arrives, signifying that this is it and there is no more data after this last piece.
When you are using a Readable Stream you can use resume() and pause() methods to control the data flow of the stream.

Writable

Streams to which data can be written (e.g fs.createWriteStream()).

A writable stream is a stream you can .pipe() to but not from.
For emit chunks of data you need to create a object that implement the ._write method.
.end to close the stream and also you can pass the last chunk to .write.
Just provide the callback if you want to wait, but the order of the successive calls is guaranteed.
The event finish is triggered when all the data has been processed (after end has been run and been processed).

Duplex

Streams that are both Readable and Writable. Both are independent and each have separate internal buffer. (e.g net.Socket).

                             Duplex Stream
                          ------------------|
                    Read  <-----               External Source
            You           ------------------|   
                    Write ----->               External Sink
                          ------------------|
            You don't get what you write. It is sent to another source.

It was implemented in the most recent node version but you can use through2.

Transform

Duplex streams where the output is in some way related to the input (e.g zlib streams).

                                 Transform Stream
                           --------------|--------------
            You     Write  ---->                   ---->  Read  You
                           --------------|--------------
            You write something, it is transformed, then you read something.

File System/Descriptor Streams

They are a subclass of Readable/Writable streams because they interact with the filesystem, emitting special kind of events

uses open event to control the file state of the fs.ReadStream/fs.WriteStream streams.

Child Process

Also it’s an especial kind of streams. They particulary fire exit event that is different from close.
It uses stdio to setup stream communication between the child_process and where the output have to be write/read (by default stdin, stdout and stderr that are align with UNIX standard streams).

What about Callback

You can convert whatever stream interface into a callback. See my stream-callback library that makes easy this conversion.
It’s also possible transform an async callback function into a stream interface. You need to be sure to handle correctly the backpressure of the stream. In my experience in this area I use from2. Check fetch-timeline or totalwind-api as examples.

Bonus Extra

Interested libraries to use with streams are:

progress-stream – Read the progress of a stream.
throughv – stream.Transform with parallel chunk processing.
emit-stream – Turn event emitters into streams and streams into event emitters.
pretty-stream – Format a stream to make it more human readable.
squeak – A tiny stream log.
hyperquest – Make streaming http requests.
multi-write-stream – A writable stream that writes to multiple other writeable streams.