Building node.js applications on Google Cloud Platform

24 May 2016 Posted Under: node.js [0] comments
 

Node.js is for hats and cats

In March I had the chance to talk at GCP Next on Node.js @ Google. This is a fun little tour of what Google Cloud has to offer Node.js developers.

Timeline

  • 00:00 - Intro
  • 01:57 - Why Node.js
  • 02:57 - Node.js + Google Cloud Platform
  • 03:48 - The many engines of Google Cloud
  • 04:48 - Getting started with App Engine
  • 07:26 - Traffic splitting
  • 11:45 - Cloud Shell
  • 16:03 - Google Cloud APIs & Services
  • 17:03 - gcloud npm module
  • 20:17 - Cloud cats demo
  • 22:09 - Code review
  • 25:24 - Cloud Debugger
  • 27:30 - Cloud Trace
  • 28:47 - Enterprise Node.js at NodeSource
  • 37:05 - Node.js and IoT
  • 41:09 - Hatspin
  • 42:31 - Closing

Watch the video

Thanks!

 
 

Dependency management and Go

29 May 2015 Posted Under: Go [0] comments
 

I find dependency management and package managers interesting. Each language has its own package manager, and each one has characteristics that are specific to that community. NuGet for .NET has great tooling and Visual Studio support, since that’s important to the .NET developer audience. NPM has a super flexible model, and great command line tools.

In a lot of ways, golang is a little quirky. And that’s awesome. However - I’ve really struggled to wrap my head around dependency management in Go.

"Dependency management and golang"

When dealing with dependency management, I expect a few things:

1. Repeatable builds

Given the same source code, I expect to be able to reproduce the same set of binaries. Every. Time. Every bit of information needed to complete a build, whether it be on my local dev box or on a build server, should be explicitly called out in my source code. No surprises.

2. Isolated environments

I am likely to be working on multiple projects at a time. Each project may have a requirement on different compilers, and different versions of the same dependency. At no point should changing a dependency in one project have an effect on the dependencies on a completely separate project.

3. Consensus

Having a package management story is awesome. What’s even better is making sure everyone uses the same one :) As long as developers are inventive and curious, there will always be alternatives. But there needs to be consensus on the community accepted standard on how a package manager will work. If 5 projects use 5 different models of dependency management, we’re all out of luck.

How node.js does it

As I’ve talked about before, I like to use my experience with other languages as a way to learn about a new language (just like most people I’d assume). Let’s take a look at how NPM for node.js solves these problems.

Similar to the go get command, there is an npm install command. It looks like this:


npm install --save yelp

The big difference you’ll see is --save. This tells NPM to save the dependency, and the version I’m using into the package.json for my project:


{
  "name": "pollster",
  "version": "2.0.0",
  "private": true,
  "scripts": {
    "start": "node server"
  },
  "dependencies": {
    "express": "~3.1.0",
    ...
    "nconf": "~0.6.7",
    "socket.io": "~0.9.13"
  }
}

package.json is stored in the top level directory of my app. It provides my isolation. If I start another project - that means another project.json, another set of dependencies. The environments are entirely isolated. The list of dependencies and their versions provides my repeatability. Every time someone clones my repository and runs npm install, they will get the same list of dependencies from a centralized source. The fact that most people use NPM provides my consensus.

Version pinning is accomplished using semver. The ~ relaxes the rules on version matching, meaning I’m ok with bringing down a different version of my dependency, as long as it is only a PATCH - which means no API breaking changes, only bug fixes. If you’re being super picky (on production stuff I am), you can specify a specific version minus the ~. For downstream dependencies (dependencies of your dependencies) you can lock those in as well using npm-shrinkwrap. On one of my projects, I got bit by the lack of shrink-wrapping when a misbehaved package author used a wildcard import for a downstream dependency that actually broke us in production.

The typical workflow is to check in your package.json, and then .gitignore your node_modules directory that contains the actual source code of 3rd party packages.

It’s all pretty awesome.

Go out of the box

With the out of the box behavior, Go is less than ideal in repeatability, isolation, and consensus. If you follow the setup guide for golang, you’ll find yourself with a single directory where you’re supposed to keep all of your code. Inside of there, you create a /src directory, and a new directory for each project you’re going to work on. When you install a dependency using go get, it will essentially drop the source code from that repository into `$GOPATH/src’. In your source code, you just tell the compiler where it needs to go to grab the latest sources:

import "github.com/JustinBeckwith/go-yelp/yelp"
...
client := yelp.New(options)
result, err := client.DoSimpleSearch("coffee", "seattle")

So this is really bad. The go-yelp library I’m importing from github is pulled down at compile time (if not already available from a go get command), and built into my project. That is pointing to the master branch of my github repository. Who’s to say I won’t change my API tomorrow, breaking everyone who has imported the library in this way? As a library author, I’m left with 3 options:

  1. Never make breaking changes.
  2. Make a completely new repository on GitHub for a new version of my API that has breaking changes.
  3. Make breaking changes, and assume / hope developers are using a dependency management tool.

Without using an external tool (or one of the methods I’ll talk about below), there is no concept of version pinning in go. You point towards a namespace, and that path is used to find your code during the build. For most open source projects - the out of the box behavior is broken.

My problem is that the default workflow on a go project leads you down a path of sadness. You start with a magical go get command that installs the latest and greatest version of a dependency - but doesn’t ask you which specific version or hash of that dependency you should be using. Most web developers have been conditioned to not check our dependencies into source control, if they’re managed by a package manager (see: gem, NuGet, NPM, bower, etc). The end result is that I could easily break someone else, and I can easily be broken.

Vendoring, import rewrites, and the GOPATH

There is currently no agreed upon package manager for Go. Recently the Go team kicked up a great thread asking the community for their thoughts on a package management system. There are a few high level concepts that are helpful to understand.

Vendoring

At Google, the source code for a dependency is copied into the source tree, and checked into source control. This provides repeatability. There is never a question on where the source is downloaded from, because it is always available in the source tree. Copying the source from a dependency into your own source is referred to as “vendoring”.

Import rewriting

After you copy the code into your source tree, you need to change your import path to not point at the original source, but rather to point at a path in your tree. This is called “Import rewriting”.

After copying a library into your tree, instead of this:

import "github.com/JustinBeckwith/go-yelp/yelp"
...
client := yelp.New(options)

you would do this:

import "yourtree/third_party/github.com/JustinBeckwith/go-yelp/yelp"
...
client := yelp.New(options)

.

GOPATH rewriting

Vendoring and import rewriting provide our repeatable builds. But what about isolation? If project (x) relies on go-yelp#v1.0, project (y) should be able to rely on go-yelp#v2.0. They should be isolated. If you follow How to write go code, you’re led down a path of a single workspace, which is driven by $GOPATH. $GOPATH is where libraries installed via go get will be installed. It controls where your own binaries are generated. It’s generally the defining variable for the root of your workspace. If you try to run multiple projects out of the same directory - it completely blows up isolation. If you want to be able to reference different versions of the same dependency, you need to change the $GOPATH variable for each current project. The act of changing the $GOPATH environment variable when switching projects is “GOPATH rewriting”.

Package managers & tools

Given the lack of prescriptive guidance and tools on how to deal with dependency management, just a few tools have popped up. In no particular order, here are a few I found:

Given my big 3 requirements above, I checked out the most popular of the repos above, and settled on godep. The alternatives all fell into at least one of these traps:

  • Forced rewriting the url, making it harder to manage dependency paths
  • Relied on a centralized service
  • Only works on a single platform
  • Doesn’t provide isolation in the $GOPATH

godep

Godep matched most of my requirements for a package manager, and is the most popular solution in the community. It solves the repeatability and isolation issues above. The workflow:

Run go get to install a dependency (nothing new here):


go get github.com/JustinBeckwith/go-yelp/yelp

When you’re done installing dependencies, use the godep save command. This will copy all of the referenced code imported into the project from the current $GOPATH into the ./Godeps directory in your project. Make sure to check this into source control.


godep save

It also will walk the graph of dependencies and create a ./Godeps/Godeps.json file:


{
	"ImportPath": "github.com/JustinBeckwith/coffee",
	"GoVersion": "go1.4.2",
	"Deps": [
		{
			"ImportPath": "github.com/JustinBeckwith/go-yelp/yelp",
			"Rev": "e0e1b550d545d9be0446ce324babcb16f09270f5"
		},
		{
			"ImportPath": "github.com/JustinBeckwith/oauth",
			"Rev": "a1577bd3870218dc30725a7cf4655e9917e3751b"
		},
    ....

When it’s time to build, use the godep tool instead of the standard go toolchain:


godep go build

The $GOPATH is automatically rewritten to use the local copy of dependencies, ensuring you have isolation for your project. This approach is great for a few reasons:

  1. Repeatable builds - When someone clones the repository and runs it, everything you need to build is present. There are no floating versions.
  2. No external repository needed for dependencies - with all dependencies checked into the local repository, there’s no need to worry about a centralized service. NPM will occasionally go down, as does NuGet.
  3. Isolated environment - With $GOPATH being rewritten at build time, you have complete isolation from one project to the next.
  4. No import rewriting - A few other tools operate by changing the import url from the origin repository to a rewritten local repository. This makes installing dependencies a little painful, and makes the import statement somewhat unsightly.

There are a few negatives though as well:

  1. Not checking in your dependencies is convenient. It’s a pain to check in thousands of source files I won’t really edit. Without a centralized repository, this is not likely to be solved.
  2. You need to use a wrapped toolchain with the godep commands. There is still no real consensus.

For an example of a project that uses godep, check out coffee.

Wrapping up

While using godep is great - I’d really love to see consensus. It’s way too easy for newcomers to fall into the trap of floating dependencies, and it’s hard without much official guidance to come to any sort of consensus on the right approach. At this stage - it’s really up to each team to pick what they value in their dependency management story and choose one of the (many) options out there. Until proven otherwise, I’m sticking with godep.

Great posts on this subject

There have been a lot of great posts by others on this subject, check these out as well:

 
 

Docker, Revel, and AppEngine

08 May 2015 Posted Under: Google Cloud [0] comments
 

"Revel running on Google AppEngine with Docker"

I’ve spent some time recently using go for my side web projects. The Go standard libraries are minimal by design - meaning it doesn’t come with a prescriptive web framework out of the box. The good news is that there are a ton of options:

Of course, you could decide to just not use a web framework at all. Comparing these is a topic of great debate - but that topic is for another post :) I decided to try out Revel first, as it was the closest to a full featured rails-esque framework at a glance. I’ll likely give all of these a shot at some point.

After building an app on Revel, I wanted to get a feel for deploying my app to see if it posed any unique challenges. I recently started a new gig working on Google Cloud, and decided to try out AppEngine. The default runtime environment for Go in AppEngine is sandboxed. This comes with some benefits, and a few challenges. You get a lot of stuff for free, but you also are restricted in terms of file system access, network access, and library usage. Given the restrictions, I decided to use the new managed VM service. Managed VMs let you deploy your application in a docker container, while still having access to the other AppEngine features like datastore, logging, caching, etc. The advantage of using docker here is that I don’t need to write any AppEngine specific code. I can write a standard Go/Revel app, and just deploy to docker.

Starting with Revel

There’s a pretty great getting started tutorial for Revel. After getting the libraries installed, scaffold a new app with the revel new command:

go get github.com/revel/revel
go get github.com/revel/cmd/revel
revel new myapp

Using Docker

Before touching managed VMs in AppEngine, the first step is to get it working with docker. It took a little time and effort, but once docker is completely set up on your machine, writing the docker file is straight forward.

Here’s the docker file I’m using right now:


# Use the official go docker image built on debian.
FROM golang:1.4.2

# Grab the source code and add it to the workspace.
ADD . /go/src/github.com/JustinBeckwith/revel-appengine

# Install revel and the revel CLI.
RUN go get github.com/revel/revel
RUN go get github.com/revel/cmd/revel

# Use the revel CLI to start up our application.
ENTRYPOINT revel run github.com/JustinBeckwith/revel-appengine dev 8080

# Open up the port where the app is running.
EXPOSE 8080

There are a few things to call out with this Dockerfile:

  1. I chose to use the golang docker image as my base. You could replicate the steps needed to install and configure go with a base debian/ubuntu image, but I found this easier. I could have also used the pre-configured AppEngine golang image, but I did not need the additional service account support.

  2. The ENTRYPOINT command tells Docker (and AppEngine) which process to run when the container is started. I’m using the CLI included with revel.

  3. For the ENTRYPOINT and EXPOSE directives, make sure to use port 8080 - this is a hard coded port for AppEngine.

To start using docker with your existing revel app, you need to install docker and copy the dockerfile into the root of your app. Update the dockerfile to change the path in the ADD and ENTRYPOINT instructions to use the local path to your revel app instead of mine.

After you have docker setup, build your image and try running the app:


# make sure docker is running (I'm in OSX)
boot2docker up
$(boot2docker shellinit)

# build and run the image
docker build -t revel-appengine .
docker run -it -p 8080:8080 revel-appengine

This will run docker, build the image locally, and then run it. Try hitting http://localhost:8080 in your browser. You should see the revel startup page:

"Running revel in docker"

Now we’re running revel inside of docker.

AppEngine Managed VMs

The original version of AppEngine had a bit of a funny way of managing application runtimes. There are a limited set of stacks available, and you’re left using a locked down version an approved runtime. Managed VMs get rid of this restriction by letting you run pretty much anything inside of a container. You just need to define a little bit of extra config in a app.yaml file that tells AppEngine how to treat your container:


runtime: custom
vm: true
api_version: go1
health_check:
  enable_health_check: False

This config lets me use AppEngine, with a custom docker image as my runtime, running on a managed virtual machine. You can copy my app.yaml into your app directory, alongside the Dockerfile. Next, make sure you’ve signed up for a Google Cloud account, and download the Google Cloud SDK. After getting all of that setup, you’ll need to create a new project in the developer console.


# Install the Google Cloud SDK
curl https://sdk.cloud.google.com | bash

# Log into your account
gcloud auth login

# Install the preview components
gcloud components update app

# Set the project
gcloud config set project <project-id>

That covers the initial setup. After you have a project created, you can try running the app locally. This is essentially going to startup your app using the Dockerfile we defined earlier:


# Run the revel application locally
gcloud preview app run ./app.yaml

# Deploy the application
gcloud preview app deploy ./app.yaml

After deploying, you can visit your site here: http://revel-gae.appspot.com

Revel running on AppEngine

Wrapping up

So that’s it. I decided to use revel for this one, but the whole idea behind using docker for AppEngine is that you can bring pretty much any stack. If you have any questions, feel free to check out the source, or find me @JustinBeckwith.

 
 

Realtime services with io.js, redis and Azure

15 February 2015 Posted Under: azure [0] comments
 

"View the demo"

A few years ago, I put together a fun little app that used node.js, service bus, cloud services, and the Instagram realtime API to build a realtime visualization of images posted to Instagram. In 2 years time, a lot has changed on the Azure platform. I decided to go back into that code, and retool it to take advantage of some new technology and platform features. And for fun.

Let’s take a look through the updates!

Resource groups

I’m using resource groups to organize the various services. Resource groups provide a nice way to visualize and manage the services that make up an app. RBAC and aggregated monitoring are two of the biggest features that make this useful.

"Using a resource group makes it easier to organize services"

Websites & Websockets

In the original version of this app, I chose to use cloud services instead of Azure web sites. One of the biggest reasons for this choice was websocket support with socket.io. At the time, Azure websites did not support websockets. Well… now it does. There are a lot of reasons to choose websites over cloud services:

  • Fast continuous deployment via Github
  • Low concept count, no special tooling needed
  • Now supports deployment slots, ssl, enterprise features

When you create your site, make sure to turn on websockets:

"setting up websockets"

io.js

io.js is a fork of node.js that provides a faster release cycle and es6 support. It’s pretty easy to get it running on Azure, thanks to iojs-azure. Just to prove I’m running io.js instead of node.js, I added this little bit in my server.js:

logger.info(`Started wazstagram running on ${process.title} ${process.version}`);

The results:

"Console says it's io.js"

redis

In the previous version of this app, I used service bus for publishing messages from the back end process to the scaled out front end nodes. This worked great, but I’m more comfortable with redis. There are a lot of options for redis on Azure, but we recently rolled out a first class redis cache service, so I decided to give that a try. I’m really looking to use two features from redis:

  • Pub / Sub - Messages received by Instagram are published to the scaled out front end
  • Caching - I keep a cache of 100 messages around to auto-fill the page on the initial visit

You can create a new redis cache from the Gallery:

"Create a new redis cache"

After creating the cache, you have a good ol standard redis database. Nothing special/fancy/funky. You can connect to it using the standard redis-cli from the command line:

"I can connect using standard redis tools"

Note the password I’m using is actually one of the management keys provided in the portal. I also chose to disable SSL, as nothing I’m storing is sensitive data:

"Set up non-SSL connections"

I used node-redis to talk to the database, both for pub/sub and cache. First, create a new redis client:

function createRedisClient() {
    return redis.createClient(
        6379,
        nconf.get('redisHost'), 
        {
            auth_pass: nconf.get('redisKey'), 
            return_buffers: true
        }
    ).on("error", function (err) {
        logger.error("ERR:REDIS: " + err);
    });    
}

// create redis clients for the publisher and the subscriber
var redisSubClient = createRedisClient();
var redisPubClient = createRedisClient();

PROTIP: Use nconf to store secrets in json locally, and read from app settings in Azure.

When the Instagram API sends a new image, it’s published to a channel, and centrally cached:

logger.verbose('new pic published from: ' + message.city);
logger.verbose(message.pic);
redisPubClient.publish('pics', JSON.stringify(message));

// cache results to ensure users get an initial blast of (n) images per city
redisPubClient.lpush(message.city, message.pic);
redisPubClient.ltrim(message.city, 0, 100);
redisPubClient.lpush(universe, message.pic);
redisPubClient.ltrim(universe, 0, 100);

The centralized cache is great, since I don’t need to use up memory in each io.js process used in my site (keep scale out in mind). Each client also connects to the pub/sub channel, ensuring every instance gets new messages:

// listen to new images from redis pub/sub
redisSubClient.on('message', function(channel, message) {
    logger.verbose('channel: ' + channel + " ; message: " + message);
    var m = JSON.parse(message.toString());
    io.sockets.in (m.city).emit('newPic', m.pic);
    io.sockets.in (universe).emit('newPic', m.pic);
}).subscribe('pics');

After setting up the service, I was using the redis-cli to do a lot of debugging. There’s also some great monitoring/metrics/alerts available in the portal:

"monitoring and metrics"

Wrapping up

If you have any questions, feel free to check out the source, or find me @JustinBeckwith.

 
 

Please take this personally

01 February 2015 Posted Under: Product Management [0] comments
 

A few weeks ago I got pulled into a meeting. There’s another team at Microsoft that’s using our SDK to build their UI, and they had a few questions. Their devs had a chance to get their hands on our SDK, and like most product guys, I was interested in getting some unfiltered feedback from an internal team. Before getting into details, someone dropped the phrase “Please don’t take this personally, but…“

The feedback that comes after that sentence is really important. We write it down. We share it with the team. We stack it up against other priorities, compare it with feedback from teams who have similar pain points, and use it to find a way to make our product better. Customer feedback (from an internal team or external customer) is so incredibly critical, that any product team has a similar pattern/process for dealing with it. So what’s the problem?

Any feedback that starts with “Don’t take this personally” really pisses me off. When you say this to someone, you’re making one of two judgments about this person:

  1. They are not personally invested in their work. They go to their job, they do whatever work is put in front of them, and then they go home. If what they’ve made is not good, it doesn’t bother them.

  2. They are personally invested in their work. They want to create something amazing, and will go to great lengths to do so. Whatever you’re about to say - despite your warning - They’re going to take it personally.

For me, that expression elicits a sort of Marty Mcfly “nobody calls me chicken” response.

What’s more personal to me than my product!? I work at Microsoft because I genuinely believe it’s the best place for me to build stuff that has a real tangible impact. I went through years of school so I could do *this*. I moved my family moved across the country. I work 50-60 hours a week (probably more than I should) because I wanted to build *this*. My product is in many ways a reflection of me. What could possibly be more personal?

Does this mean I don’t want criticism? Of course I do! Objective criticism from an informed customer who has used your product is the greatest gift a product manager can receive. It’s how we get better. Just expect me to take it personally.