Jeff Blaine

OpenSSH Session Timeouts

Jeff Blaine — Fri, 03 May 2019 21:10:43 +0000

It is often desirable to terminate SSH sessions after they have been sitting idle for a period of time. If you do any quick searching you will find that many (most?) believe that the sshd configuration settings ClientAliveInterval and ClientAliveCountMax are the place to configure this. Sadly, many security hardening guides[1,2], frameworks, benchmark documents, and tools (which are based on those documents) provide this same incorrect guidance. The ClientAliveInterval and ClientAliveCountMax do not at all exist for the sake of the terminating sessions after mere lack of use for a period of time. These settings in OpenSSH are used to determine unresponsive clients (NOT responsive/functioning but idle clients). The settings are used purely as a heartbeat mechanism.

Below you can see a ClientAliveInterval setting of 60 seconds and my OpenSSH session having zero input or output for 120 seconds and still remaining connected:

[jblaine@testbed1~]$ sudo grep -i client /etc/ssh/sshd_config
ClientAliveCountMax 0
ClientAliveInterval 60
[jblaine@testbed1~]$ count=0; while :; do count=$(( count + 120 )); sleep 120; echo $count seconds have passed; done
120 seconds have passed
240 seconds have passed
^C

Likewise, we see the same behavior with ClientAliveMaxCount set to 1:

[jblaine@testbed1~]$ sudo grep -i client /etc/ssh/sshd_config
ClientAliveCountMax 1
ClientAliveInterval 60
[jblaine@testbed1~]$ count=0; while :; do count=$(( count + 120 )); sleep 120; echo $count seconds have passed; done
120 seconds have passed
240 seconds have passed
^C

OpenSSH has zero functionality built into it to disconnect sessions that are functional but merely idle for a certain period of time. When sshd receives no response from a client after ClientAliveMaxCount * ClientAliveInterval seconds, it means we are considering the client unresponsive network-wise and sshd will terminate connection. See client_alive_check() in OpenSSH serverloop.c

How did this misunderstanding come about? The usual suspects: Echo chambers and people copying and pasting without testing. But I also blame the ClientAliveInterval section of the man page for sshd which uses the overloaded term “inactive” where it should use the word “unresponsive”. As such, I’ve created a pull request.

Using test-kitchen and kitchen-vagrant behind an HTTP proxy

Jeff Blaine — Wed, 28 Oct 2015 14:58:35 +0000

Here’s what got us working with test-kitchen and kitchen-vagrant behind an HTTP proxy. Comments inline.

---
driver:
  name: vagrant
  driver_config:
    # Allows fetching boxes if required
    http_proxy: http://yourproxy.example.com:80
    https_proxy: https://yourproxy.example.com:80
  # This vagrantfiles callout is needed because of
  # https://github.com/test-kitchen/test-kitchen/issues/821
  # You should check that issue to see if this is still required.
  vagrantfiles:
    - your-site-specific-vagrantfile-block.rb
  network:
    - ["private_network", {ip: "192.168.1.2"}]

provisioner:
  name: chef_zero
  # Allows fetching Omnibus Chef installer if required
  http_proxy: http://yourproxy.example.com:80
  https_proxy: http://yourproxy.example.com:80

platforms:
  - name: centos-6.7

suites:
  - name: default
    run_list:
      - recipe[things::default]
    attributes:
      chef-client:
        # Allows your Chef run to fetch stuff it requires
        config:
          http_proxy: "http://yourproxy.example.com:80"
          https_proxy: "http://yourproxy.example.com:80"

And finally your-site-specific-vagrantfile-block.rb, your custom Vagrantfile piece that is merged in via the vagrantfiles directive in your YAML file:

unless Vagrant.has_plugin?("vagrant-proxyconf")
  raise "Missing required plugin 'vagrant-proxyconf', run `vagrant plugin install vagrant-proxyconf`"
end

Vagrant.configure(2) do |config|
  # Allows busser gem and deps to be fetched as required
  config.proxy.http     = "#{ENV['http_proxy']}"
  config.proxy.https    = "#{ENV['https_proxy']}"
  config.proxy.no_proxy = "localhost,127.0.0.1"
end

64 bits is Too Many Bits

Jeff Blaine — Thu, 18 Dec 2014 20:31:19 +0000

A seemingly simple move of NFS-exported home directories resulted in Apache suEXEC freaking out when referencing the new space. Here are the details and solution, because sometimes a debugging story is fun.

Friday

Setting: R&D environment of ~800 physical and virtual hosts running Windows, Linux, Solaris and using DAS, NAS, and SAN with NFS and OpenAFS. Our direct customers are scientists, researchers, and software and hardware engineers.

A small portion of our users, say 20%, have NFS-based home directories. Over the last ~18 years these have been served from varied hosts running SunOS 4.1 – 5.10. As part of our department’s ongoing weening from Sun/Oracle hardware, it was recently finally time to get this NFS data moved from the host (Sun-Fire V240). Our Isilon IQ 7200x 3-node cluster, which I’d personally had only limited interaction with (and no training with) to date, seemed the appropriate destination for the data.

After moving the data (twice[1]), changing automounter maps in LDAP, and successfully performing basic usability testing, I called it a night and ended the Planned Outage event. Because what better time to make changes than Friday at 10PM, right?

Monday

Jimbo: “Our project’s CGI scripts in Perl at http://lab.our.org/~jimbo/cgi-bin, that have been working fine for 8 years, are all giving 500 Internal Server errors. We use these scripts as part of a high-value production process.”

Let’s ignore for now everything wrong with the sentence above. I know.

I SSHed into lab.mitre.org and had a look through recent data in /var/log/httpd/lab-error_log. It showed nothing interesting.

Then I noticed that suexec_log had been written to recently. Having no idea what suexec_log even was, I looked through it to find that it’s the log for Apache httpd’s suEXEC mechanism. While I’m not entirely Apache httpd clueless, I don’t exactly deal with every nook of it on even a yearly basis. I read up on suEXEC’s role and functionality then returned to suexec_log:

cannot stat directory: (/home/jimbo/public_html/cgi-bin)

Okay, that explains the 500 HTTP errors. However:

lab.our.org# cd /home/jimbo/public_html/cgi-bin
lab.our.org# pwd
/home/jimbo/public_html/cgi-bin
lab.our.org#

From here, scratching my head for 30 minutes or so, I mucked around in the Isilon OneFS ACL for /home/jimbo, somewhat convinced I’d hit an ACL peculiarity like the previous one. That led nowhere, so it was time to examine just what the heck Apache httpd and its suexec helper were doing under the covers. Firing up strace -fq ... on the httpd PIDs while hitting one of Jimbo’s troubled CGI scripts, I saw this:

...
setgroups32(2, [1130, 67])  = 0
setuid32(21389)             = 0
getcwd("/home/jimbo/public_html/cgi-bin"..., 4096) = 32
chdir("/home/jimbo")        = 0
chdir("public_html")        = 0
getcwd("/home/jimbo/public_html"..., 4096) = 24
chdir("/home/jimbo/public_html/cgi-bin") = 0
lstat64("/home/jimbo/public_html/cgi-bin", {st_mode=S_IFDIR|0755, st_size=1955, ...}) = 0
write(2, "suexec policy violation: see sue"..., 57) = 57

According to the lstat() man page, a return code of 0 is success. The line following the lstat() call is writing to stderr that there was a suexec problem and to see our friend suexec_log for information. We already know how that goes. For some reason, lstat() was returning what we understand to be success and there was no further system call taking place before the code was logging an error line. Funky.

Pouring over the suexec.c code, this is the block delivering our line to suexec_log:

if (((lstat(cwd, &dir_info)) != 0) || !(S_ISDIR(dir_info.st_mode))) {
    log_err("cannot stat directory: (%s)\n", cwd);
    exit(115);
}

So, given this lazy code, it’s possible the lstat part of it was working fine but the POSIX S_ISDIR macro call was returning a false value. Note, too, that our strace showed lstat64() being called, which means in theory, “largefile” support is compiled in and enabled. Anyway, I messed around with a custom one-off version of the code, running it via the command-line for a bit, before I decided it was easier to just strip things down to the basics in my own new code.

Using lstat() and actually reporting what errno got set to on lstat failure lead me in the final right direction. On lab.our.org, which happens to be an old 32-bit CentOS 5.11 host, my code reported this:

Path:                   /home/jimbo/public_html/cgi-bin
File Size:              0 bytes
Number of Links:        0
File inode:             1580990986
File Permissions:       ----------
ERROR: lstat errno says "Value too large for defined data type"

As soon as I saw that, I ran the same code on one of our 64-bit hosts:

Path:                   /home/jimbo/public_html/cgi-bin
File Size:              1982 bytes
Number of Links:        4
File inode:             1580990986
File Permissions:       drwxr-xr-x
Success - Exiting 0

Searching for “Value too large for defined data type” in the EMC/Isilon support portal showed me that others had hit this as well, one even mentioning Apache httpd.

Turns out Isilon OneFS serves up 64-bit file IDs by default. Unless 32-bit applications allow for compilation with “largefile” support, and are compiled with it, they don’t know how to properly handle stat()/lstat() calls against 64-bit file-id-having files.

I hear you saying, “Right, but, your strace shows that the code was calling lstat64() which is supposed to handle things fine.”

I know, right? But, based on the instructions in the EMC/Isilon support portal, I forced 32-bit file IDs for the NFS export in question and everything immediately began functioning.

isibed-1# isi nfs exports modify 5 --return-32bit-file-ids=yes

Clearly something else is at play here. The lstat64() was in fact returning success (0) all along, but the fact remains that forcing 32-bit file IDs for the NFS export solved the problem. Maybe something in the POSIX S_ISDIR macro was failing, which doesn’t show in the strace output.

You could even go so far as to say that my one-off non-largefile-enabled code that showed the “Value too large…” error and led me to the EMC/Isilon support thread was an fortuitous goof that gave me a working solution. And I don’t 100% understand how it solved the issue.

Footnotes

Prior to this debacle described in this main blog post, I’d used rsync -av /old/ /new to perform the NFS home directory migration the week before. Due to embarassingly poor testing on my part, I did not notice until the next morning that writes (and only writes) to user home directories by the users were not working. Examining the problem showed that each user did not even own his/her home directory. After an hour of digging around in various strace sessions, it became evident that only a data copy command issuing an actual chown(2) call resulted in the proper ownership of the destination data. Neither rsync or tar do this in the way OneFS wanted, but cp -rpP does.
```
# cp -rpP
chown("/newusers/testjblaine/sdfdsa", 26560, 0) = 0
chown("/newusers/testjblaine", 26560, 0) = 0

# rsync
lchown("testjblaine", 26560, 0) = 0
lchown("testjblaine/.sdfdsa.2Ba4BO", 26560, 0) = 0

# tar
fchownat(4, "testjblaine", 0x000067C0, 0x00000000, 0x00000000) = 0
fchownat(4, "sdfdsa", 0x000067C0, 0x00000000, 0x00000000) = 0
```

When bare ‘tee | command’ won’t suffice

Jeff Blaine — Fri, 25 Apr 2014 19:23:19 +0000

In my 20 years of touching UNIX/Linux, I’d never had a need for this until today. If you search around for common search terms like ‘stdout to screen and file’, you are likely only to get basic information on how to use the UNIX/Linux tee command.

However, command | tee somefile.out loses the exit code of command if you care about checking it.

The bash solution provided by someone on my employer’s internal Linux mailing list is:

{ command; RESULT=$?; } | tee somefile.out

Though I also just learned that one can reference the PIPESTATUS array instead.

command | tee somefile.out
if [ $PIPESTATUS[0] -ne 0 ]; then
    echo "Error..."
fi

Syslog Output for Chef Runs

Jeff Blaine — Thu, 27 Jun 2013 16:15:37 +0000

A new blog post category was added to my blog for this post: Yak Shave. It was a small yak shave, but a table-flip inducing yak shave nonetheless.

UPDATE 6/3/2015: Per Lamont Granquist of Chef, As of Chef 12.4, the following works in client.rb without any other cookbooks needed (negating this whole blog post finally!):

log_location Chef::Log::Syslog.new("chef-client", ::Syslog::LOG_DAEMON)

THE FOLLOWING BLOG POST IS KEPT FOR HISTORICAL PURPOSES ONLY.

If you don’t want to read a pissy rant, don’t read any further (you can jump to the answer). If you’re going to comment to suggest Puppet, CFEngine, Ansible, or SaltStack, save your typing ;)

Begin

There you are with your configuration management tool. You say to yourself:

Self, surely there is a way to configure this damn-near-Linux-centric systems management tool to syslog its output so that, based on the common sane IT pattern of syslogging to a central server, all Chef run output can be stored in a centralized location and perhaps queried via Splunk or whatever other OSS tool. Because, you know, syslog has been around for 20+ years and is the de-facto means of logging information on UNIX/Linux hosts. And CFengine 2, dating back at least 15 years, is a CM tool that has offered such incredibly basic functionality. Surely the default of logging to stdout is just the absolute safe case that the product must default to, but there is definitely an option to turn on syslog as the log output location…

Knowing that this sort of thing would be defined as a “Chef Report Handler”, and that Chef has been alive for 4 years now, you start with Google: “chef syslog handler”

Handler – chef-syslog-handler

Bingo! The first search result is chef-syslog-handler, a “Chef handler to send syslog messages”.

You visit the link which takes you to rubygems.org, click on “Homepage” and get a 404 error from github.

Maybe the author renamed the repository? Viewing all of the author’s repositories shows that to be untrue. It’s simply gone. Returning to the rubygems page, you see “August 19, 2011”

Welp.

The Old Magically Worked Ticket

Returning to our Google results, we scroll down past all of the chef-syslog-handler results and end up at an old Opscode JIRA ticket.

Yes! This must be good news. The ticket is marked as Fixed!

17/Mar/11 5:49 PM
This is done... configure chef like this:

    log_location SyslogLogger.new("chef-client")

My follow-up comment to the ticket says it all:

I went to implement this today per snippet above.

It does not work with Chef 11.4.0

Looking through the mixlib-log 1.3.0+ versions on github, and looking at 10-stable, I don’t understand how it would have ever worked there either, but surely that’s just my ignorance.

WELP.

Revise Search

This is going nowhere past 2011, fast. Let’s feed Google something more generic: “chef syslog”

Cookbook – chef-client_syslog

Described as “Send chef-client log to syslog”, this too seemed promising.

And then I see that it does its own management of /etc/chef/client.rb via a template. As I said above, any sane person is handling via the chef-client cookbook.

SyslogLogger log_location

And finally, if you’re “lucky” (I forget how I got there), you end up at CHEF-2560 from August 11th 2011 which is still open!

Oh. It’s someone reporting, like I did, that the old Magically Worked Ticket’s solution does not work at all. It even includes code to fix it (unchecked), yet the ticket remains just dangling in limbo for the last 18 months.

Let’s follow the link to the pull request!

Here we find our old friend “pmorton” (the author of the old chef-syslog-handler gem in the first section of this blog post):

@guillermo – This is some great stuff. I seem to have located the ticket for this pull request (http://tickets.opscode.com/browse/CHEF-2560) but noticed two issues, the ticket has not been resolved and you are not part of the approved contributors list. I hope that you will take the time to get a CLA signed and sent to opscode. See http://wiki.opscode.com/display/chef/How+to+Contribute for more details.

The pull request is then closed out by Opscode due to lack of Contributor License Agreement. The original JIRA ticket CHEF-2560 remains open.

Fitna Cut Someone

This is where, per @jordansissel, I entered Hate-Driven Development mode.

I vowed to fix this once and for all, submit the code, and set aflame anyone at Opscode who rejected it.

Then I stumbled across one more thing.

Bleeding Edge chef-client Cookbook

UPDATE: THE BLOG POST IS KEPT FOR HISTORICAL PURPOSES ONLY. The solution here no longer works and I give up chasing this elusive dream.

Again, I don’t recall how I got there, but the current master branch of the chef-client community cookbook has a syslog example! That looks promising. Heh, cute, it references yet another JIRA ticket about syslog support, COOK-2326 (which links to almost everything listed above).

AND SO … if you are willing to make the leap to the heavily refactored latest chef-client cookbook that was released around June 11, 2013, you have a clean method of syslogging your output!

# our-chef-client-wrapper/files/default/syslog.rb

require 'rubygems'
require 'syslog-logger'
require 'syslog'

Logger::Syslog.class_eval do
  attr_accessor :sync, :formatter
end

log_location Logger::Syslog.new('chef-client', Syslog::LOG_DAEMON)

# our-chef-client-wrapper/recipes/whatever.rb

chef_gem 'syslog-logger'

# Drop off our chef-client customization file that
# allows for syslogging
cookbook_file '/etc/chef/client.d/syslog.rb' do
  source 'syslog.rb'
  mode 00644
  notifies :create, 'ruby_block[reload_client_config]', :immediately
end

include_recipe 'chef-client::config'

And this, my friends, is why you better know how to navigate the world of Ruby, Ruby gems, OSS, if you ever intend to have a remote chance with Chef. You will note that there was not a SINGLE THING in the experience above that was even close to being “sysadmin” friendly.

And the solution requires installing a gem at a critical point in what you want to be rock solid — the reporting of your 2000 machines’ chef runs. Behind an HTTP proxy? Hopefully you’ve run the chef-client cookbook already so that ENV['http_proxy'] is set so that the gem can fetch the right stuff.

You’d think Ruby didn’t have native syslog support in it’s standard library since version 1.8.6 or something.

Cause, you know… syslog is obscure. Who would ever want to use the logging mechanism used by the OS daemons since 1990… when running a daemon… that manages the OS and its daemons.

Jenkins running Test-Kitchen via Vagrant. On Windows.

Jeff Blaine — Tue, 11 Jun 2013 15:51:19 +0000

If you’re like me and for various reasons (we’ll not discuss) the physical hardware you have access to right now must run Windows, you might think you’re out of luck as far as getting Jenkins running Test-Kitchen jobs as Joshua Timberman shows in Test Kitchen and Jenkins¹. But there’s hope if you have patience. I’ll show how I got it working, and I’m looking forward to ideas from you on how to develop better solutions to some of the kludges. There’s not much original material here, but instead of mentioning just my piece of things and linking you coldly to 2 other places for the rest of the info, I figured I would write up as much of the whole experience as I felt up to.

Prerequisites

Windows Add-Ons

It is assumed that you have the following installed. If not, do so.

Ruby 1.9.3 from rubyinstallers.org
Git 1.8.x for Windows from git-scm.com/download. All commands shown below are run under a “Git Bash” session and not Windows’ cmd.exe! unless explicitly stated as such.
Jenkins

Ruby Gems

At this point, you should open a Git-Bash shell and ensure that which gem indicates the one found in the Ruby 1.9.3 install you did previously. For me, that was /c/Ruby193/bin/gem.

Start with: gem install bundler

Then, in the directory containing the cookbook you want to test, create a Gemfile with the following contents:

source 'https://rubygems.org'
gem 'test-kitchen', '~> 1.0.0.alpha.7'

Run: bundle install

Run: kitchen init

Run: bundle install (yes, again)

Run: kitchen help to ensure Test Kitchen can at least run.

Edit .kitchen.yml to look like the following (configure later how you want):

---
driver_plugin: vagrant
driver_config:
  require_chef_omnibus: true

platforms:
- name: ubuntu-12.04
  driver_config:
    box: opscode-ubuntu-12.04
    box_url: https://opscode-vm.s3.amazonaws.com/vagrant/opscode_ubuntu-12.04_provisionerless.box

suites:
- name: default
  run_list: []
  attributes: {}

At this point you should be able to run kitchen test and see something success-like.

Jenkins

The Windows installer for Jenkins is nice in that it configures Jenkins to run as a Windows service. So go get and install that from jenkins-ci.org/windows/latest. Once installed, you should be able to hit http://localhost:8080 and see the Jenkins web UI.

Use the web UI to install the “Git” plugin.

Configure a new Jenkins job. For my case, it looked like the following.

Source Code Management, Git Repositories, Repository URL: git://github.com/jblaine/resolver.git (use the Git read-only URL to the repo, not SSH, or you’ll have to mess with copying your SSH stuff similar to the problem section below regarding Vagrant)
Build Triggers, Poll SCM, Schedule: H 9-16/2 * * 1-5
Excute a Windows Batch Command: kitchen test

And click “Build now” at left to start the troubleshooting process below.

Problem: Jenkins can’t find Git

If you see in your job’s console output that Jenkins is not able to find Git, set the full path to the executable binary (regardless of the setting being called “Installation Directory”) as I did. This is the only thing that worked for me after trying a few other things found on the net.

Oh, and you apparently have to use the old 8dot3 naming. No spaces allowed. What is this, 1999?

Manage Jenkins, Configure System, Git, Git Installations, Installation Directory: C:\Progra~2\Git\cmd\git.exe

Problem: Vagrant can’t import Boxes

Perhaps this is the next roadblock you hit:

...
STDOUT: Bringing machine 'default' up with 'virtualbox' provider...

[default] Box 'opscode-ubuntu-12.04' was not found. Fetching box from specified URL for
the provider 'virtualbox'. Note that if the URL does not have
a box for this provider, you should interrupt Vagrant now and add
the box yourself. Otherwise Vagrant will attempt to download the
full box prior to discovering this error.

Downloading or copying the box...

...

Successfully added box 'opscode-ubuntu-12.04' with provider 'virtualbox'!

[default] Importing base box 'opscode-ubuntu-12.04'...
STDERR: There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["import", "C:/WINDOWS/system32/config/systemprofile/.vagrant.d/boxes/opscode-ubuntu-12.04/virtualbox/box.ovf"]

Stderr: 0%...

Progress state: VBOX_E_FILE_ERROR
VBoxManage.exe: error: Appliance read failed
VBoxManage.exe: error: Could not read OVF file 'box.ovf' (VERR_PATH_NOT_FOUND)
VBoxManage.exe: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component Appliance, interface IAppliance
VBoxManage.exe: error: Context: "int __cdecl handleImportAppliance(struct HandlerArg *)" at line 306 of file VBoxManageAppliance.cpp
---- End output of vagrant up --no-provision ----
Ran vagrant up --no-provision returned 1
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details

Build step 'Execute Windows batch command' marked build as failure
Finished: FAILURE

What’s telling here is that Vagrant is looking for boxes in C:\Windows\System32 and not C:\Windows\SysWOW64. My wildly speculative guess is that this is because Vagrant is a 32-bit application and somehow this ties it to the different directory, even though the Jenkins service is tied to C:\Windows\SysWOW64 by nature of the LocalSystem account.

I’m pretty Windows-guts clueless as you can see. Any explanation you can share here, please do in the comments.

At any rate, we persist. Using a cmd.exe that is run as Administrator, we look around:

C:\>dir C:\Windows\System32\config\systemprofile
 Volume in drive C has no label.
 Volume Serial Number is 9424-87E5

 Directory of C:\Windows\System32\config\systemprofile

06/11/2013  11:16 AM              .vagrant.d
06/11/2013  11:19 AM              .VirtualBox
10/27/2012  12:17 PM              AppData
10/27/2012  12:17 PM           262,144 ntuser.dat
06/11/2013  11:19 AM              VirtualBox VMs
               1 File(s)        262,144 bytes
               6 Dir(s)  279,382,310,912 bytes free

C:\>dir C:\Windows\System32\config\systemprofile\.VirtualBox
 Volume in drive C has no label.
 Volume Serial Number is 9424-87E5

 Directory of C:\Windows\System32\config\systemprofile\.VirtualBox

06/11/2013  11:19 AM              .
06/11/2013  11:19 AM              ..
06/11/2013  11:19 AM            32,520 VBoxSVC.log
06/11/2013  10:54 AM             1,114 VBoxSVC.log.1
06/11/2013  10:47 AM               886 VBoxSVC.log.2
06/11/2013  11:19 AM             1,066 VirtualBox.xml
06/11/2013  11:18 AM             1,285 VirtualBox.xml-prev
               5 File(s)         36,871 bytes
               2 Dir(s)  279,382,302,720 bytes free

C:\>

Okay, so some VirtualBox stuff happened in C:\Windows\System32. But we can see that Vagrant has set up shop in C:\Windows\SysWOW64:

C:\Users\jblaine>dir C:\Windows\SysWOW64\config\systemprofile\.vagrant.d\boxes
 Volume in drive C has no label.
 Volume Serial Number is 9424-87E5

 Directory of C:\Windows\SysWOW64\config\systemprofile\.vagrant.d\boxes

06/11/2013  10:54 AM              .
06/11/2013  10:54 AM              ..
06/11/2013  10:54 AM              opscode-ubuntu-12.04
               0 File(s)              0 bytes
               3 Dir(s)  279,808,155,648 bytes free

C:\Users\jblaine>

Welp, I’m unaware of any environment variable that would allow me to tell Vagrant to use C:\Windows\SysWOW64\config\systemprofile\.vagrant.d, so I just copied everything over to C:\Windows\System32\config\systemprofile\.vagrant.d.

Success!

Rerunning the Jenkins build, you should now succeed, and at this point you would want to go back into your cookbook’s .kitchen.yml file to actually establish some tests to run in the suites section.

...
C:\Program Files (x86)\Jenkins\workspace\test-resolver>kitchen test 
-----> Starting Kitchen (v1.0.0.alpha.7)
...
       [default] Importing base box 'opscode-ubuntu-12.04'...
...
-----> Converging 
-----> Installing Chef Omnibus (true)
...
[2013-06-11T15:19:39+00:00] INFO: *** Chef 11.4.4 ***
...
[2013-06-11T15:19:39+00:00] INFO: Chef Run complete in 0.009065059 seconds
...
-----> Kitchen is finished. (1m29.52s)
Finished: SUCCESS

References

Test Kitchen Wiki: https://github.com/opscode/test-kitchen/wiki/Getting-Started
Test Kitchen and Jenkins: http://jtimberman.housepub.org/blog/2013/05/08/test-kitchen-and-jenkins
Jenkins Git Plugin (wiki page): https://wiki.jenkins-ci.org/display/JENKINS/Git+Plugin#GitPlugin-

Real DevOps Defined: Wonder Twins

Jeff Blaine — Thu, 06 Jun 2013 19:56:35 +0000

I had a “how to define DevOps” revelation last night: It’s the Wonder Twins! Surely everyone remembers the Wonder Twins. You weren’t born before the 1980s? Oh. Well. Anyway… citing the Wikipedia article on Wonder Twins:

If the two are out of reach of each other, they are unable to activate their powers. […] A rarely-seen aspect of their powers is the ability to break mind control. […] The Wonder Twins have a pet Space Monkey called Gleek who had a useful prehensile tail and who could act as a conduit for the twins to activate their powers should they be out of reach.

Dibs on the name Gleek for a DevOps Practices enabling tool. And your manager should have his trait mentioned above.

Can we move on now?

A Missed Thank You

Jeff Blaine — Mon, 20 May 2013 19:38:22 +0000

While brushing up on TCP today, ancient synapses fired leading me to recall my first forays into network programming in 1992. While on permanent hiatus from CS education at FSU, I took a very long cross-country train trip from Jacksonville, FL through Chicago to L.A. and back to Jacksonville, FL via Pittsburgh, PA. Ideally, the goal was to get away cheaply to spend time figuring out just what the hell the next step might be. Along the way, I stayed with people I had met on social MUDs.

Steven Augart (‘swa’ as he was known to me online), then working at the USC Information Sciences Institute, gave me an assignment. Though I don’t recall the details, it was related to TCP/IP network programming (standard sockets API stuff), as that is what I had shown interest in. We had talked about it a little bit online before the trip, and for some reason he decided to guide me a little bit and help me out. What I minimally recall was that it was just something that needed to be written sometime by someone, and he threw it my way as a pseudo vetting process of sorts. Perhaps the plan was to get me working at ISI in a low position, but I really don’t recall. On my last day staying at his house, he again went out of his way suggesting that I come to work that day with him to meet others. It was an awkward situation, as I did not have any dress clothes with me. Then, with one of his dress shirts on me (he insisted), and a few blocks down the road in his small pickup truck, he caught wind of my unease at the formality of what we were doing. After a bit of an embarassing conversation where I acknowledged that I was really uncomfortable about the plan, he dropped me back off at his home. He was very understanding and had no problem with it. Perhaps he was disappointed, but did not show it.

Over the next few months in late 1992, it became clear that I did not really understand too well the Big Boys technical spec he had given me for the programming project. I wrote some C code. I got some parts working, but not much, and that was that. Thankfully, it was not something expected (or agreed) to be completed.

For no real reason, I don’t believe we ever really spoke again.

Steve and I were really only aquaintances so reflecting now (20 years further in maturity), I’m finally moved by the generosity he showed. He had nothing to gain by challenging me.

As I set out to email him a note of thanks 30 minutes ago, I learned that he has passed away at the early age of 46.

I’m sorry swa. Thank you.

Memory-backed Filesystem for Temporary Storage of Whisper Data

Jeff Blaine — Tue, 14 May 2013 02:12:31 +0000

Instead of buying and installing SSDs, storing Graphite’s whisper files in a memory-backed filesystem can be a good way to go if you have the RAM to spare. Depending on your environment, you may or may not care about losing a few minutes (or hours) of metric data. I know we certainly don’t care about losing 30 minutes, so there’s no reason for our carbon-cache instances to be scrawling to persistent storage 24/7.

Here’s the internal carbon Average Update Time metric showing the switch from writing to spinning disk and then memory. No shocker here:

Likewise, it’s no shocker that CPU wait time was greatly reduced. You can see me testing 5 rsyncs from memory-backed filesystem to the old spinning disk in the right of the graph:

You have 2 commonly found implementation choices for your memory-backed filesystem: tmpfs or a standard RAM disk with non-journaling filesystem applied

	tmpfs	RAM disk
Create w/o reboot	yes	no
Resize w/o reboot	yes	no
Dynamically allocated	yes	no
Can be swapped	yes	no

tmpfs

The Linux tmpfs implementation would look something like the following.

Stop your carbon-cache services, then:

cd /opt/graphite/storage
mv whisper whisper.permanent
mkdir whisper.ephemeral
ln -s ./whisper.ephemeral ./whisper

echo 0 > /proc/sys/vm/swappiness

mount -t tmpfs -o size=5g tmpfs /opt/graphite/storage/whisper.ephemeral

Add the following, modified to suit, to your /etc/rc.local or other preferred boot-time mechanism:
```
mount -t tmpfs -o size=5g tmpfs /opt/graphite/storage/whisper.ephemeral
```

One-way synchronize the persistent storage with the state of the ephemeral storage every 30 minutes:

*/30 * * * * rsync --archive /opt/graphite/storage/whisper.ephemeral/ /opt/graphite/storage/whisper.permanent

Explicit RAM disk

To implement the more predictable RAM disk setup, you would:

Add ramdisk_size=N (where N is the size of the RAM disk you want specified in KB) to the kernel line in grub.conf for your distribution. Reboot.
Create a filesystem on the RAM disk device and mount it. You pointedly do not want to waste efficiency by using a journaling filesystem here, so use an ext2 filesystem and do not reserve any “minfree” space.
```
cd /opt/graphite/storage
mv whisper whisper.permanent
mkdir whisper.ephemeral
ln -s ./whisper.ephemeral ./whisper

mkfs.ext2 -m 0 /dev/ram0
mount /dev/ram0 /opt/graphite/storage/whisper.ephemeral
```
Add the following, modified to suit, to your /etc/rc.local or other preferred boot-time mechanism:
```
mkfs.ext2 -m 0 /dev/ram0
mount /dev/ram0 /opt/graphite/storage/whisper.ephemeral
```

One-way synchronize the persistent storage with the state of the ephemeral storage every 30 minutes:

*/30 * * * * rsync --archive /opt/graphite/storage/whisper.ephemeral/ /opt/graphite/storage/whisper.permanent

As always, I welcome your thoughts.

Documenting bash scripts

Jeff Blaine — Tue, 19 Mar 2013 15:52:09 +0000

For those of us still in the bash trenches now and then, I learned you can avoid the hassle of hash/octothorpe (#) commenting blocks of documentation in your scripts by using a standard “here” doc prefixed by the no-op expression (:).

Clearly this is something you’d only want to do for major blocks of text, like a large documentation block at the start of a script.

Example:

: <<'DOCUMENTATION'

My free-form documentation here.

DOCUMENTATION