To lock or not to lock

18 December 2019 Posted Under: node.js [0] comments

Package Lock Files

A few years ago, npm introduced the notion of a package-lock.json. The purpose of the file is to provide a manifest that calls out the exact version of every package in your tree, the last time npm install was run. After running npm install, you’re going to see a message like this:

npm notice created a lockfile as package-lock.json. You should commit this file.

While this advice is well intentioned… it’s not always true :) Let’s talk about when you want to check this into source control, and when not to.

How package-lock.json works

The mechanics of package-lock.json are simple enough:

  • When you run npm install the first time, npm automatically creates the file for you.
  • It has the exact version of every top level and transitive dependency in your node_modules folder.
  • When you npm install some-package, the lock file is updated automatically.
  • When you update the version of a package in your package.json and run npm install, the package-lock.json file will get updated automatically.

If you peek inside one of these files, you’re going to get a bunch of gobbledygook like this:

  "name": "@google-cloud/bigtable",
  "version": "2.3.0",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "@babel/code-frame": {
      "version": "7.5.5",
      "resolved": "",
      "integrity": "sha512-27d4lZoomVyo51VegxI20xZPuSHusqbQag/ztrBC7wegWoQ1nLREPVSKSW8byhTlzTKyNE4ifaTA6lCp7JjpFw==",
      "dev": true,
      "requires": {
        "@babel/highlight": "^7.0.0"

The next time you run npm install, you’re (likely) going to get the exact copy of dependencies outlined in the package-lock.json file. If you didn’t have a lock file, npm would look at every dependency in package.json, and find the latest compatible version of every top level and transitive dependency. Having a lock file saves you from this lookup on every install.

Why lock files sound great

In a world where lock files do not exist - running npm install is largely non-idempotent, or non-reproducable. Given the same set of inputs, the outputs can be different. This means that I can run npm install twice, and the results in the node_modules folder can be wildly different. Stop me if you’ve seen this happen before:

  • I just wrote some code locally, and everything works
  • I go ahead and git push my changes. For some reason CI is breaking!?
  • It’s a lint error… it looks like prettier is failing on some formatting change.

This can happen because the version of prettier you’re using locally can be different than the version of prettier you’re getting in CI. This is one of the critical problems package lock files can solve - the version you’re getting on your computer, and on CI are mostly going to be the same (there are exceptions for native modules, deleted package versions, etc).

Having an exact record of the full transitive dependency tree can be great for a few reasons:

  • Every environment is (mostly) guaranteed to install the same set of dependencies when running npm install
  • npm install typically runs faster because some resolution steps can be skipped
  • There is a log in source control of the exact tree used to run your code and tests

This all sounds great, but guess what … I don’t use lock files, and if you’re building libraries, you probably shouldn’t use lock files either.

When lock files are not so great

While lock files sound great, they come with some real baggage ranging from cosmetically annoying, to outright poor software practices. For context - my team at Google builds npm modules that other developers use to make their applications. In the context of a library, it does not make sense to use lock files (more on the end application use case later). We’ve broken far more customers specifically because of our lock file usage, and decided to start ignoring them across the ~80 or so npm modules we ship.

Generated files don’t belong in git

The first reason I don’t like lock files is largely cosmetic. I generally don’t like generated files to be tracked in source control. I personally don’t like this for a few reasons:

  • They create terrible merge conflicts
  • They make code reviews more difficult
  • Things like lock file maintenance PRs muddy commit history

When dealing with PRs and trying to keep them up to date I just find the ergonomics… clunky.

You’re not testing reality

Way more importantly - lock files provide false comfort. You’re not testing what users get. The lock file you check into source control has no material impact on the module that developers will install. package-lock.json is not packaged into your module, and is not delivered to developers when they npm install your module. It is only used as a library developer, during local npm install for developer machines, and in CI.

In other words - The transitive dependencies that are defined in package-lock.json may not at all reflect what developers see when they run npm install. The transitive dependencies you test against in CI may not at all reflect what developers are seeing. You are probably testing against an entirely different set of dependencies than the ones developers will see.

Can you see how that’s a big problem? Here’s how it happens:

  1. You have a dependency on "baby-yoda": "^1.0.0" in your package.json
  2. baby-yoda@1.0.0 takes a dependency on green-lizard@1.0.0
  3. You run npm install - at this point in time, green-lizard has its dist-tag set to 1.0.0, and that gets written into your package-lock.json
  4. Your code, including the lock file, get checked into git

So let’s stop, and take a breath. You installed the module baby-yoda, and you got a lock file that includes both baby-yoda@1.0.0 and green-lizard@1.0.0. All is going well! Let’s say a few weeks pass…

  1. Somewhere over the last few weeks, green-lizard version 1.0.1 is released. It has a subtle breaking change that the authors missed (we ain’t perfect).
  2. Your nightly tests run, PRs are submited, and CI never notices there’s a problem with green-lizard@1.0.1. That’s because you have green-lizard@1.0.0 in your package-lock.json, protecting you.
  3. Consumers of your module are broken. They’re reporting errors related to the latest version of green-lizard, but you can’t see the problem.
  4. You delete your package-lock.json, run npm install, and all of the sudden you can reproduce the error.

We’ve seen this dance play out several times. We want our nightly (and presubmit) CI to test against the same set of dependencies consumers of our libraries are going to see - and lock files make that impossible.

When to use lock files, and when not to

Lock files shine when you’re building user facing applications. If you’re building a web application, or command line tool (that isn’t npm install-ed), lock files protect you from the ever-changing set of transitive dependencies that make up your application. If your binary is packaging the node_modules folder along with your build, using a lock file is great. If you’re building an npm module that you expect others to npm install - they aren’t consumed by downstream users, and create more problems than they solve (in my opinion).

How to ignore the lock file

The easiest way to opt out of the lock file is to add the following to your .npmrc file:


There are a few approaches where you can find middle ground. One approach is to keep the lock file for the CI gains, but make sure to delete it as part of your nightly build workflow. This ensures CI builds are fast and consistent, but also gives you a warning when something goes awry. (soldair likes to call this the “broken every morning” approach). I personally don’t like the concept of nightly builds, but that’s a topic for another post.

Some think differently, and that’s great

There are no “right” answers here. I don’t speak for every engineer working on node.js at Google :) Others like lock files, and believe the positives outweight the negatives. I’d love to know what you think! If you want to chat lock files, let me know at @JustinBeckwith!


Analyzing node.js on GitHub with BigQuery

13 August 2016 Posted Under: node.js [0] comments

BigQuery + GitHub awesomeness

As someone who works on developer tooling - GitHub is the holy grail of data sets. There’s just so much code out there, written by so many people, for so many reasons. I’ve often wished I could just clone all of the data on GitHub, and then write scripts to process the data for various reasons:

  • What are the top 1k npm modules used with Node.js apps? We want to know this so we can test them with App Engine.

  • What percentage of people are defining their supported Ruby versions in .ruby-version files? What about Gemfile? Can we reliabily use that to choose a Ruby version for the user?

  • What’s the most common way to inject configuration? Environment variables? Nconf? Etcd? Dotenv?

For each of these, we’re largely left to poke around using anecodtal observations or surveys. Having a simple way of answering these questions would be huge. Well… using the new public GitHub dataset with BigQuery we can.

BigQuery is essentially a giant data warehouse that lets you store petabytes of data, originally built for internal use at Google. Usually querying over this much data requires a ton of infrastructure and an understanding of MapReduce… but BigQuery lets me just use SQL.

One of the fun things BigQuery offers is a bunch of public data sets. Some of the fun sets include:

Fun with Sandeep at NodeSummit

I was hanging out with with Sandeep Dinesh at NodeSummit a few weeks ago, and we were chatting about some of the new data available in BigQuery from GitHub. We figured with a little bit of SQL … we could learn all kinds of cool stuff.

To get started - first, you’re going to need to visit the BigQuery console.

BigQuery + GitHub awesomeness

From here we can choose the dataset, and start taking a look at the schema. Now lets start asking some interesting questions!

How many files are out there on GitHub

We just need to query over the github_repos.files table, and get a count.

SELECT COUNT(*) FROM [bigquery-public-data:github_repos.files]


Wow -over 2 billion files. Next question!

How many package.json(s) are there on GitHub?

This time we’re just going to limit our files to paths ending in package.json. We can just use the RIGHT function to grab the end of the full path:

SELECT COUNT(*) FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 12) = "package.json"


Over 8 million! Now of course - this could include any project that has a package.json (not node.js), so it’s probably going to be a little front-end heavy.

What’s the most popular top level npm import on GitHub?

So here’s the big one. Lets say you want to know which npm module is most likely to be imported as a top level dependency? You could get some of this data by looking at, but that’s going to include subdependencies, and also count every install. I don’t want every install - I want to know how many apps are using which modules.

Up until this point, we’ve only been looking at the data available to us directly in the table. But in this case - we want to parse the contents of a file. This is where things start to get fun. This query will…

  • Grab all of the package.json files out there
  • Get the contents of those files
  • Run a JavaScript user-defined-function
  • Place the results in a temp table
  • Do a GROUP BY / ORDER BY to get our final count

Let’s take a look!

  COUNT(*) as cnt, package
    (SELECT content FROM [bigquery-public-data:github_repos.contents] WHERE id IN (
      SELECT id FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 12) = "package.json"
    "[{ name: 'package', type: 'string'}]",
    "function(row, emit) {
      try {
        x = JSON.parse(row.content);
        if (x.dependencies) {
          Object.keys(x.dependencies).forEach(function(dep) {
            emit({ package: dep });
      } catch (e) {}
GROUP BY package
LIMIT 1000

So this is really freaking cool. We were able to just slam a JavaScript function in the middle of the SQL query to help us process the results. You may also notice the try/catch floating around in there - turns out that not every package.json on GitHub is valid JSON!

So let’s take a look at the results:

Package Count
express 66207
lodash 55698
debug 47499
async 40054
inherits 35782
body-parser 35644
request 31242
mkdirp 25941
chalk 25904
readable-stream 25015
glob 24497
underscore 24151
morgan 22398
minimatch 20561
cookie-parser 19957
react 19764
through2 18488
mongoose 17992
commander 17805
jade 17666
isarray 16677
minimist 16518 15675
moment 15434
graceful-fs 15198
qs 14663
object-assign 14218
jquery 13709
serve-favicon 13641
string_decoder 13597
source-map 13548
babel-runtime 13524
rimraf 13233
gulp-util 13055
express-session 13045
core-util-is 13041
bluebird 12751
semver 12722
passport 12530
q 11990
colors 11710
mime 11627
react-dom 11560
ejs 11392
xtend 11312
node-uuid 11265
optimist 11070
gulp 10934
compression 10759
once 10544
mime-types 10352

( … it keeps going for a while ) At the end of this - we processed quite a bit of data.

Query complete (209.3s elapsed, 1.76 TB processed)

What other types of questions should we ask? I can think of a few that may be interesting:

  • Which npm dependencies are the most likely to be out of date?
  • How many people are using the fs npm module (the one on, not the core module)
  • How many people are hard coding keys in their JavaScript files?

If you want to play around with the GitHub dataset, check out the getting started tutorial.

If you find the answers to these (or anything else interesting), let me know at @JustinBeckwith!


Building node.js applications on Google Cloud Platform

24 May 2016 Posted Under: node.js [0] comments

Node.js is for hats and cats

In March I had the chance to talk at GCP Next on Node.js @ Google. This is a fun little tour of what Google Cloud has to offer Node.js developers.


  • 00:00 - Intro
  • 01:57 - Why Node.js
  • 02:57 - Node.js + Google Cloud Platform
  • 03:48 - The many engines of Google Cloud
  • 04:48 - Getting started with App Engine
  • 07:26 - Traffic splitting
  • 11:45 - Cloud Shell
  • 16:03 - Google Cloud APIs & Services
  • 17:03 - gcloud npm module
  • 20:17 - Cloud cats demo
  • 22:09 - Code review
  • 25:24 - Cloud Debugger
  • 27:30 - Cloud Trace
  • 28:47 - Enterprise Node.js at NodeSource
  • 37:05 - Node.js and IoT
  • 41:09 - Hatspin
  • 42:31 - Closing

Watch the video



Dependency management and Go

29 May 2015 Posted Under: Golang [0] comments

I find dependency management and package managers interesting. Each language has its own package manager, and each one has characteristics that are specific to that community. NuGet for .NET has great tooling and Visual Studio support, since that’s important to the .NET developer audience. NPM has a super flexible model, and great command line tools.

In a lot of ways, golang is a little quirky. And that’s awesome. However - I’ve really struggled to wrap my head around dependency management in Go.

"Dependency management and golang"

When dealing with dependency management, I expect a few things:

1. Repeatable builds

Given the same source code, I expect to be able to reproduce the same set of binaries. Every. Time. Every bit of information needed to complete a build, whether it be on my local dev box or on a build server, should be explicitly called out in my source code. No surprises.

2. Isolated environments

I am likely to be working on multiple projects at a time. Each project may have a requirement on different compilers, and different versions of the same dependency. At no point should changing a dependency in one project have an effect on the dependencies on a completely separate project.

3. Consensus

Having a package management story is awesome. What’s even better is making sure everyone uses the same one :) As long as developers are inventive and curious, there will always be alternatives. But there needs to be consensus on the community accepted standard on how a package manager will work. If 5 projects use 5 different models of dependency management, we’re all out of luck.

How node.js does it

As I’ve talked about before, I like to use my experience with other languages as a way to learn about a new language (just like most people I’d assume). Let’s take a look at how NPM for node.js solves these problems.

Similar to the go get command, there is an npm install command. It looks like this:

npm install --save yelp

The big difference you’ll see is --save. This tells NPM to save the dependency, and the version I’m using into the package.json for my project:

  "name": "pollster",
  "version": "2.0.0",
  "private": true,
  "scripts": {
    "start": "node server"
  "dependencies": {
    "express": "~3.1.0",
    "nconf": "~0.6.7",
    "": "~0.9.13"

package.json is stored in the top level directory of my app. It provides my isolation. If I start another project - that means another project.json, another set of dependencies. The environments are entirely isolated. The list of dependencies and their versions provides my repeatability. Every time someone clones my repository and runs npm install, they will get the same list of dependencies from a centralized source. The fact that most people use NPM provides my consensus.

Version pinning is accomplished using semver. The ~ relaxes the rules on version matching, meaning I’m ok with bringing down a different version of my dependency, as long as it is only a PATCH - which means no API breaking changes, only bug fixes. If you’re being super picky (on production stuff I am), you can specify a specific version minus the ~. For downstream dependencies (dependencies of your dependencies) you can lock those in as well using npm-shrinkwrap. On one of my projects, I got bit by the lack of shrink-wrapping when a misbehaved package author used a wildcard import for a downstream dependency that actually broke us in production.

The typical workflow is to check in your package.json, and then .gitignore your node_modules directory that contains the actual source code of 3rd party packages.

It’s all pretty awesome.

Go out of the box

With the out of the box behavior, Go is less than ideal in repeatability, isolation, and consensus. If you follow the setup guide for golang, you’ll find yourself with a single directory where you’re supposed to keep all of your code. Inside of there, you create a /src directory, and a new directory for each project you’re going to work on. When you install a dependency using go get, it will essentially drop the source code from that repository into `$GOPATH/src’. In your source code, you just tell the compiler where it needs to go to grab the latest sources:

import ""
client := yelp.New(options)
result, err := client.DoSimpleSearch("coffee", "seattle")

So this is really bad. The go-yelp library I’m importing from github is pulled down at compile time (if not already available from a go get command), and built into my project. That is pointing to the master branch of my github repository. Who’s to say I won’t change my API tomorrow, breaking everyone who has imported the library in this way? As a library author, I’m left with 3 options:

  1. Never make breaking changes.
  2. Make a completely new repository on GitHub for a new version of my API that has breaking changes.
  3. Make breaking changes, and assume / hope developers are using a dependency management tool.

Without using an external tool (or one of the methods I’ll talk about below), there is no concept of version pinning in go. You point towards a namespace, and that path is used to find your code during the build. For most open source projects - the out of the box behavior is broken.

My problem is that the default workflow on a go project leads you down a path of sadness. You start with a magical go get command that installs the latest and greatest version of a dependency - but doesn’t ask you which specific version or hash of that dependency you should be using. Most web developers have been conditioned to not check our dependencies into source control, if they’re managed by a package manager (see: gem, NuGet, NPM, bower, etc). The end result is that I could easily break someone else, and I can easily be broken.

Vendoring, import rewrites, and the GOPATH

There is currently no agreed upon package manager for Go. Recently the Go team kicked up a great thread asking the community for their thoughts on a package management system. There are a few high level concepts that are helpful to understand.


At Google, the source code for a dependency is copied into the source tree, and checked into source control. This provides repeatability. There is never a question on where the source is downloaded from, because it is always available in the source tree. Copying the source from a dependency into your own source is referred to as “vendoring”.

Import rewriting

After you copy the code into your source tree, you need to change your import path to not point at the original source, but rather to point at a path in your tree. This is called “Import rewriting”.

After copying a library into your tree, instead of this:

import ""
client := yelp.New(options)

you would do this:

import "yourtree/third_party/"
client := yelp.New(options)

GOPATH rewriting

Vendoring and import rewriting provide our repeatable builds. But what about isolation? If project (x) relies on go-yelp#v1.0, project (y) should be able to rely on go-yelp#v2.0. They should be isolated. If you follow How to write go code, you’re led down a path of a single workspace, which is driven by $GOPATH. $GOPATH is where libraries installed via go get will be installed. It controls where your own binaries are generated. It’s generally the defining variable for the root of your workspace. If you try to run multiple projects out of the same directory - it completely blows up isolation. If you want to be able to reference different versions of the same dependency, you need to change the $GOPATH variable for each current project. The act of changing the $GOPATH environment variable when switching projects is “GOPATH rewriting”.

Package managers & tools

Given the lack of prescriptive guidance and tools on how to deal with dependency management, just a few tools have popped up. In no particular order, here are a few I found:

Given my big 3 requirements above, I checked out the most popular of the repos above, and settled on godep. The alternatives all fell into at least one of these traps:

  • Forced rewriting the url, making it harder to manage dependency paths
  • Relied on a centralized service
  • Only works on a single platform
  • Doesn’t provide isolation in the $GOPATH


Godep matched most of my requirements for a package manager, and is the most popular solution in the community. It solves the repeatability and isolation issues above. The workflow:

Run go get to install a dependency (nothing new here):

go get

When you’re done installing dependencies, use the godep save command. This will copy all of the referenced code imported into the project from the current $GOPATH into the ./Godeps directory in your project. Make sure to check this into source control.

godep save

It also will walk the graph of dependencies and create a ./Godeps/Godeps.json file:

 "ImportPath": "",
 "GoVersion": "go1.4.2",
 "Deps": [
   "ImportPath": "",
   "Rev": "e0e1b550d545d9be0446ce324babcb16f09270f5"
   "ImportPath": "",
   "Rev": "a1577bd3870218dc30725a7cf4655e9917e3751b"

When it’s time to build, use the godep tool instead of the standard go toolchain:

godep go build

The $GOPATH is automatically rewritten to use the local copy of dependencies, ensuring you have isolation for your project. This approach is great for a few reasons:

  1. Repeatable builds - When someone clones the repository and runs it, everything you need to build is present. There are no floating versions.
  2. No external repository needed for dependencies - with all dependencies checked into the local repository, there’s no need to worry about a centralized service. NPM will occasionally go down, as does NuGet.
  3. Isolated environment - With $GOPATH being rewritten at build time, you have complete isolation from one project to the next.
  4. No import rewriting - A few other tools operate by changing the import url from the origin repository to a rewritten local repository. This makes installing dependencies a little painful, and makes the import statement somewhat unsightly.

There are a few negatives though as well:

  1. Not checking in your dependencies is convenient. It’s a pain to check in thousands of source files I won’t really edit. Without a centralized repository, this is not likely to be solved.
  2. You need to use a wrapped toolchain with the godep commands. There is still no real consensus.

For an example of a project that uses godep, check out coffee.

Wrapping up

While using godep is great - I’d really love to see consensus. It’s way too easy for newcomers to fall into the trap of floating dependencies, and it’s hard without much official guidance to come to any sort of consensus on the right approach. At this stage - it’s really up to each team to pick what they value in their dependency management story and choose one of the (many) options out there. Until proven otherwise, I’m sticking with godep.

Great posts on this subject

There have been a lot of great posts by others on this subject, check these out as well:


Docker, Revel, and App Engine

08 May 2015 Posted Under: Google Cloud [0] comments

"Revel running on Google App Engine with Docker"

note: I recently updated this post to make sure all of the commands still work.

I’ve spent some time recently using the Go programming language for my side web projects. The Go standard libraries are minimal by design - meaning it doesn’t come with a prescriptive web framework out of the box. The good news is that there are a ton of options:

Of course, you could decide to just not use a web framework at all. Comparing these is a topic of great debate - but that topic is for another post :) I decided to try out Revel first, as it was the closest to a full featured rails-esque framework at a glance. I’ll likely give all of these a shot at some point.

After building an app on Revel, I wanted to get a feel for deploying my app to see if it posed any unique challenges. I recently started a new gig working on Google Cloud, and decided to try out App Engine. The default runtime environment for Go in App Engine is sandboxed. This comes with some benefits, and a few challenges. You get a lot of stuff for free, but you also are restricted in terms of file system access, network access, and library usage. Given the restrictions, I decided to use the new App Engine Flexible service. App Engine Flex lets you deploy your application in a docker container, while still having access to the other App Engine features like datastore, logging, caching, etc. The advantage of using docker here is that I don’t need to write any App Engine specific code. I can write a standard Go/Revel app, and just deploy to docker.

Starting with Revel

There’s a pretty great getting started tutorial for Revel. After getting the libraries installed, scaffold a new app with the revel new command:

go get
go get
revel new myapp

Using Docker

Before touching App Engine Flexible, the first step is to get it working with docker. It took a little time and effort, but once docker is completely set up on your machine, writing the docker file is straight forward.

Here’s the docker file I’m using right now:

# Use the official go docker image built on debian.
FROM golang:1.4.2

# Grab the source code and add it to the workspace.
ADD . /go/src/

# Install revel and the revel CLI.
RUN go get
RUN go get

# Use the revel CLI to start up our application.
ENTRYPOINT revel run dev 8080

# Open up the port where the app is running.

There are a few things to call out with this Dockerfile:

  1. I chose to use the golang docker image as my base. You could replicate the steps needed to install and configure go with a base debian/ubuntu image, but I found this easier. I could have also used the pre-configured App ngine golang image, but I did not need the additional service account support.

  2. The ENTRYPOINT command tells Docker (and App Engine) which process to run when the container is started. I’m using the CLI included with revel.

  3. For the ENTRYPOINT and EXPOSE directives, make sure to use port 8080 - this is a hard coded port for App Engine.

To start using docker with your existing revel app, you need to install docker and copy the dockerfile into the root of your app. Update the dockerfile to change the path in the ADD and ENTRYPOINT instructions to use the local path to your revel app instead of mine.

After you have docker setup, build your image and try running the app:

# build and run the image
docker build -t revel-appengine .
docker run -it -p 8080:8080 revel-appengine

This will run docker, build the image locally, and then run it. Try hitting http://localhost:8080 in your browser. You should see the revel startup page:

"Running revel in docker"

Now we’re running revel inside of docker.

App Engine Flexible

The original version of App Engine had a bit of a funny way of managing application runtimes. There are a limited set of stacks available, and you’re left using a locked down version an approved runtime. Flex gets rid of this restriction by letting you run pretty much anything inside of a container. You just need to define a little bit of extra config in a app.yaml file that tells App Engine how to treat your container:

runtime: custom
vm: true
api_version: go1

This config lets me use App Engine, with a custom docker image as my runtime, running on a managed virtual machine. You can copy my app.yaml into your app directory, alongside the Dockerfile. Next, make sure you’ve signed up for a Google Cloud account, and download the Google Cloud SDK. After getting all of that setup, you’ll need to create a new project in the developer console.

# Install the Google Cloud SDK
curl | bash

# Log into your account
gcloud init

That covers the initial setup. After you have a project created, you can try deploying the app. This is essentially going to startup your app using the Dockerfile we defined earlier on Google Cloud:

# Deploy the application
gcloud app deploy

After deploying, you can visit your site here:

Revel running on App Engine

Wrapping up

So that’s it. I decided to use revel for this one, but the whole idea behind using docker for App Engine is that you can bring pretty much any stack. If you have any questions, feel free to check out the source, or find me @JustinBeckwith.