Win the Battle Against CM Dependency Errors


Dependencies got you down?  Find them!

Many of us end up with lots of dependencies in our configuration management code.  Whether it’s community cookbooks or package management, there are plenty of reasons to leverage outside data sources for building your immutable artifacts.  We’ve been working with many customers who have been depending on external resources that they don’t explicitly know about (i.e. a dependency hierarchy created from a community cookbook, etc.).

A common cause for tracking these down is the implementation of whitelisted egress from a HITRUST certified environment.  Suddenly, we need to know exactly where all of our resources are coming from.  Obviously there are many benefits from tightly controlling sources for your builds, but this definitely creates a forcing function.  You may also (for the same or other reasons) want to localize or privatize any external dependency (like moving a tar to S3 or Artifactory instead of grabbing it from a mirror, that may change over time as your code runs).

So, you have dependencies in your configuration management and you want to identify them? 


Using these tools:

  • Vagrant
  • Virtualbox
  • Python

If you are already using Vagrant and Virtualbox for local development of your CM, you have all the tools you need (except this blog post maybe).  Virtualbox provides a virtual interface configuration parameter to packet capture (PCAP) the virtual interface traffic as mentioned here:

In Vagrant, it looks like this:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.provider "virtualbox" do |vbox, override|
    vbox.memory = 2048
    vbox.cpus = 2
    # Enable multiple guest CPUs if available
    vbox.customize ["modifyvm", :id, "--ioapic", "on"]
    # Enable PCAP output
    vbox.customize ["modifyvm", :id, "--nictrace1", "on"]
    vbox.customize ["modifyvm", :id, "--nictracefile1", "yourfilename.pcap"]

Now when you vagrant up, all traffic will be written to yourfilename.pcap.  Note that this file will be large. Once you have the file we can search for domains/URLs, but using grep as-is carries a warning.  Namely, since the PCAP file is binary, your text output could be processed by terminal. See this warning from :


If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. If TYPE is without-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option. Warning: grep –binary-files=text might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands.

To avoid that risk, we are going to use a python script that provides us with a nice clean output.  Thanks to my colleague Loren for authoring the python script when my ngrep queries failed to provide a complete and easy to digest output.  The script uses the dpkt library and loads in the virtualbox PCAP file. It then compares packets to a known list of ports over TCP and UDP and generates a list of URLs and Domains (in the case of domains, a sorted list).  We added 22 in the case of git SSH clones. Python script:

#!/usr/bin/env python
import sys
import dpkt
import socket
f = open('./yourfilename.pcap', "rb")
pcap = dpkt.pcap.Reader(f)

http_ports = [ 80, 8080 ]
https_ports = [ 443 ]

domains = []
urls = [ ]
ip_list = []

for timestamp, buf in pcap:

       eth = dpkt.ethernet.Ethernet(buf)
       ip =
       tcp =
       udp =
       if tcp.__class__.__name__ == ‘TCP’:
           if tcp.dport in http_ports and len( > 0:
                   http = dpkt.http.Request(
                   urls.append(http.headers[‘host’] + http.uri)
               except Exception as e:
                   errorHappened = True
           if tcp.dport not in http_ports and len( > 0:
               if tcp.dport == 22:
                   dest_ip_addr = socket.inet_ntop(socket.AF_INET, ip.dst)
                   if dest_ip_addr not in ip_list:

       if udp.__class__.__name__ == ‘UDP’:
               dns = dpkt.dns.DNS(
               if dns.qr != dpkt.dns.DNS_R: continue
               if dns.opcode != dpkt.dns.DNS_QUERY: continue
               if dns.rcode != dpkt.dns.DNS_RCODE_NOERR: continue
               if len( < 1: continue
               for answer in
                   if not in domains:
                       if answer.type == dpkt.dns.DNS_CNAME:
                       elif answer.type == dpkt.dns.DNS_A:
                       elif answer.type == dpkt.dns.DNS_PTR:
           except Exception as e:
               errorHappened = True
   except Exception as e:
       errorHappened = True

print (“n[+] URLs extracted from PCAP file are:”)
for url in urls:

print(“n[+] DOMAINs extracted from PCAP file are:”)
for domain in sorted(set(domains)):


Putting it all together, I modified my Vagrantfile as described above.  This Vagrantfile uses bento/ubuntu-16.04 and Berkshelf/chef-solo to:

  • Install updates on Ubuntu
  • Install nginx (to be used as a reverse proxy to NodeJS)
  • Install nodejs and npm using community cookbook
  • Install npm modules using community cookbook
  • Install git client using community cookbook
  • Install mysql client
  • Grab my nodejs app from github (git SSH)

So on the surface without doing any packet capture we can already determine some dependencies: apt, npm registry,  But each of those resources have dependencies as well. This is further complicated by my decision to leverage community cookbooks which also have dependencies.  If you knew ahead of time you were deploying into an environment that required URL whitelisting, you would have likely opted to not use any community code and to instead author everything from scratch to minimize your dependencies.  I used the above example to illustrate how this workflow can be used to audit a seemingly simple, but somewhat messy example.

For a Berkshelf view of the Chef dependencies:


And the output of the aforementioned pcap filter python script:

[+] URLs extracted from PCAP file are:

[+] DOMAINs extracted from PCAP file are:

And then win the battle..

This gives you a good starting point for decisions on where to go next.  Hosting your own mirror for apt and then setup your own npm private registry. In any case, this still provides a good audit step before configuration management leaves development.