Win the Battle Against CM Dependency Errors

| | | 0 comments

Dependencies got you down?  Find them!

Many of us end up with lots of dependencies in our configuration management code.  Whether it’s community cookbooks or package management, there are plenty of reasons to leverage outside data sources for building your immutable artifacts.  We’ve been working with many customers who have been depending on external resources that they don’t explicitly know about (i.e. a dependency hierarchy created from a community cookbook, etc.).

A common cause for tracking these down is the implementation of whitelisted egress from a HITRUST certified environment.  Suddenly, we need to know exactly where all of our resources are coming from.  Obviously there are many benefits from tightly controlling sources for your builds, but this definitely creates a forcing function.  You may also (for the same or other reasons) want to localize or privatize any external dependency (like moving a tar to S3 or Artifactory instead of grabbing it from a mirror, that may change over time as your code runs).

So, you have dependencies in your configuration management and you want to identify them? 

How?

Using these tools:

  • Vagrant
  • Virtualbox
  • Python

If you are already using Vagrant and Virtualbox for local development of your CM, you have all the tools you need (except this blog post maybe).  Virtualbox provides a virtual interface configuration parameter to packet capture (PCAP) the virtual interface traffic as mentioned here: https://www.virtualbox.org/wiki/Network_tips

In Vagrant, it looks like this:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.provider "virtualbox" do |vbox, override|
    vbox.memory = 2048
    vbox.cpus = 2
    # Enable multiple guest CPUs if available
    vbox.customize ["modifyvm", :id, "--ioapic", "on"]
    # Enable PCAP output
    vbox.customize ["modifyvm", :id, "--nictrace1", "on"]
    vbox.customize ["modifyvm", :id, "--nictracefile1", "yourfilename.pcap"]
  End
...
end

Now when you vagrant up, all traffic will be written to yourfilename.pcap.  Note that this file will be large. Once you have the file we can search for domains/URLs, but using grep as-is carries a warning.  Namely, since the PCAP file is binary, your text output could be processed by terminal. See this warning from https://linux.die.net/man/1/grep :

–binary-files=TYPE

If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. If TYPE is without-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option. Warning: grep –binary-files=text might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands.

To avoid that risk, we are going to use a python script that provides us with a nice clean output.  Thanks to my colleague Loren for authoring the python script when my ngrep queries failed to provide a complete and easy to digest output.  The script uses the dpkt library and loads in the virtualbox PCAP file. It then compares packets to a known list of ports over TCP and UDP and generates a list of URLs and Domains (in the case of domains, a sorted list).  We added 22 in the case of git SSH clones. Python script:

#!/usr/bin/env python
import sys
import dpkt
import socket
f = open('./yourfilename.pcap', "rb")
pcap = dpkt.pcap.Reader(f)

http_ports = [ 80, 8080 ]
https_ports = [ 443 ]

domains = []
urls = [ ]
ip_list = []

for timestamp, buf in pcap:
   try:

       eth = dpkt.ethernet.Ethernet(buf)
       ip = eth.data
       tcp = ip.data
       udp = ip.data
       if tcp.__class__.__name__ == ‘TCP’:
           if tcp.dport in http_ports and len(tcp.data) > 0:
               try:
                   http = dpkt.http.Request(tcp.data)
                   urls.append(http.headers[‘host’] + http.uri)
                   domains.append(http.headers[‘host’])
               except Exception as e:
                   errorHappened = True
           if tcp.dport not in http_ports and len(tcp.data) > 0:
               if tcp.dport == 22:
                   dest_ip_addr = socket.inet_ntop(socket.AF_INET, ip.dst)
                   if dest_ip_addr not in ip_list:
                       ip_list.append(dest_ip_addr)
                       print(dest_ip_addr)

               continue
       if udp.__class__.__name__ == ‘UDP’:
           try:
               dns = dpkt.dns.DNS(udp.data)
               if dns.qr != dpkt.dns.DNS_R: continue
               if dns.opcode != dpkt.dns.DNS_QUERY: continue
               if dns.rcode != dpkt.dns.DNS_RCODE_NOERR: continue
               if len(dns.an) < 1: continue
               for answer in dns.an:
                   if answer.name not in domains:
                       if answer.type == dpkt.dns.DNS_CNAME:
                           continue
                       elif answer.type == dpkt.dns.DNS_A:
                           continue
                       elif answer.type == dpkt.dns.DNS_PTR:
                           continue
               domains.append(answer.name)
           except Exception as e:
               errorHappened = True
   except Exception as e:
       errorHappened = True
f.close()

print (“n[+] URLs extracted from PCAP file are:”)
for url in urls:
   print(url)

print(“n[+] DOMAINs extracted from PCAP file are:”)
for domain in sorted(set(domains)):
   print(domain)

Example

Putting it all together, I modified my Vagrantfile as described above.  This Vagrantfile uses bento/ubuntu-16.04 and Berkshelf/chef-solo to:

  • Install updates on Ubuntu
  • Install nginx (to be used as a reverse proxy to NodeJS)
  • Install nodejs and npm using community cookbook
  • Install npm modules using community cookbook
  • Install git client using community cookbook
  • Install mysql client
  • Grab my nodejs app from github (git SSH)

So on the surface without doing any packet capture we can already determine some dependencies: apt, npm registry, github.com.  But each of those resources have dependencies as well. This is further complicated by my decision to leverage community cookbooks which also have dependencies.  If you knew ahead of time you were deploying into an environment that required URL whitelisting, you would have likely opted to not use any community code and to instead author everything from scratch to minimize your dependencies.  I used the above example to illustrate how this workflow can be used to audit a seemingly simple, but somewhat messy example.

For a Berkshelf view of the Chef dependencies:

graph

And the output of the aforementioned pcap filter python script:

[+] URLs extracted from PCAP file are:

archive.ubuntu.com/ubuntu/pool/main/libx/libxau/libxau6_1.0.8-1_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libj/libjpeg-turbo/libjpeg-turbo8_1.4.2-0ubuntu3_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libj/libjpeg8-empty/libjpeg8_8c-2ubuntu8_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/t/tiff/libtiff5_4.0.6-1ubuntu0.2_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libv/libvpx/libvpx3_1.5.0-2ubuntu1_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libx/libxpm/libxpm4_3.5.11-1ubuntu0.16.04.1_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libg/libgd2/libgd3_2.1.1-4ubuntu0.16.04.8_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/libx/libxslt/libxslt1.1_1.1.28-2.1ubuntu0.1_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/n/nginx/nginx-common_1.10.3-0ubuntu0.16.04.2_all.deb
archive.ubuntu.com/ubuntu/pool/main/n/nginx/nginx-core_1.10.3-0ubuntu0.16.04.2_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/n/nginx/nginx_1.10.3-0ubuntu0.16.04.2_all.deb
archive.ubuntu.com/ubuntu/pool/main/liba/libaio/libaio1_0.3.110-2_amd64.deb
archive.ubuntu.com/ubuntu/pool/main/m/mysql-5.7/mysql-client-core-5.7_5.7.21-0ubuntu0.16.04.1_amd64.deb
keyserver.ubuntu.com/pks/lookup?op=get&options=mr&search=0x1655A0AB68576280
security.ubuntu.com/ubuntu/dists/xenial-security/InRelease
archive.ubuntu.com/ubuntu/dists/xenial/InRelease
archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease
security.ubuntu.com/ubuntu/dists/xenial-security/main/binary-amd64/by-hash/SHA256/6d2f9d7bd117425141aa944d9ab0958319e131505debd322bb4bac2528b5a85e
archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease
archive.ubuntu.com/ubuntu/dists/xenial-updates/main/binary-amd64/by-hash/SHA256/8bd59bab518243ac32000e2233a9a58a3491c67e4f9eb07f53ed8e392ca89f61
security.ubuntu.com/ubuntu/dists/xenial-security/main/binary-i386/by-hash/SHA256/28fd4df99b96b87e6d8878dccb429ab42d53f822911f812c247610f83d7a9488
archive.ubuntu.com/ubuntu/dists/xenial-backports/main/binary-amd64/by-hash/SHA256/76b35ccd1a5487b287908c263fec8861ec8e53178d060a58cc06a1b1cd51acbd
archive.ubuntu.com/ubuntu/dists/xenial-backports/universe/binary-i386/by-hash/SHA256/b43bfab32fead59ff1e23f36ec2b8fccf1da130b37aeb13ec33c5960cf51a098

[+] DOMAINs extracted from PCAP file are:
a.sni.fastly.net
api.snapcraft.io
archive.ubuntu.com
deb.nodesource.com
f4.shared.global.fastly.net
github.com
keyserver.ubuntu.com
omnitruck-direct.chef.io
omnitruck-elb-1344766176.us-west-2.elb.amazonaws.com
packages-delivered.es.chef.io
registry.npmjs.org
security.ubuntu.com
tgz.pm2.io

And then win the battle..

This gives you a good starting point for decisions on where to go next.  Hosting your own mirror for apt and then setup your own npm private registry. In any case, this still provides a good audit step before configuration management leaves development.