Atomic Host and Kubernetes Clusters Made Easy(ish)

Recently I got to go out to visit a customer and talk about containers. Even though I call containers parlor tricks, it (seriously) is one of my favorite things to do. They had some questions about container performance tuning as well as how to run an internal registry.

So I came up with a ~2 hour workshop to have with them. I put it out on GitHub so they could access the code afterward if they wanted. I had a few realizations while I was putting this together.

  1. Atomic Host is getting really easy to configure. Back in the 7.0 days you really had to be double-jointed to configure a kubernetes cluster. In 7.2, you edit 3 files per cluster member (master or node). The total lines edited is around 8. That doesn’t include flannel or your SDN solution of choice.
  2. NFS as persistent storage for a multi-node replication controller for docker-registry is way harder than it should be. There are several bugs out there (Red Hat as well as upstream) that show issues when you have a multi-container docker-registry rc and have it use NFS to store the registry data.Once I thought this through it made sense. NFS (especially NFSv4) uses client-side caching to make writes more efficient. Since both pods are in play for these writes, the confirmation in the registry code barfs all over itself when container A looks for data that is still in the NFS write cache inside container B.

    There are work-arounds with the NFS server settings as well as the k8s service definition to tweak the kubernetes scheduler. It works for demos, but I would have mountains of fear trying this for a production environment.

  3. OMG ANSIBLE IS AWESOME. I hadn’t really had a chance to use ansible to solve a problem. So I used this project to start to get used to the technology a little. I watched some videos where the ansible folks said it had become the defacto language to define an infrastructure. I totally see that now. I can’t wait to learn more about it.I included the ansbile playbook as well as all of the templates in the github repo along with the asciidoc for the workshop itself. I intentionally kept it simple, so people who hadn’t used it before could see what work was happening and where it was coming from. I can’t wait to need to get deeper into ansible.

 

RSS from Trello with Jenkins

Trello is a pretty nice web site. It is (sort of) a kanban board that is very useful when organizing groups of people in situations where a full agile framework would be too cumbersome. Kanban is used a lot in IT Operations. If you want a great story on it, go check out The Phoenix Project.

One thing Trello is lacking, however, is the ability to tap into an RSS-style feed for one or more of your boards. But, where there is an API, there’s a way. This took me about 30 minutes to iron out, and is heavily borrowed from the basic example in the documentation for trello-rss.

Step One – Three-legged OAuth

Trello uses OAuth. So you will need to get your developer API keys from Trello. You will also need to get permanent (or expiring whenever you want) OAuth tokens from them. This process is a little cloudy, but I found a post on StackOverflow that got me over the hump.

Step Two – a little python

I created a little bit of python to handle this for me. Bear in mind it’s still VERY rough. My though is to start to incorporate other Trello automation and time-savers into it down the road. If that happens I’ll stick it out on github.

#!/usr/bin/env python
from trello_rss.trellorss import TrelloRSS
from optparse import OptionParser
import sys
class TrelloAutomate:
 ''' 
 Used for basic automation tasks with Trello, 
 particularly with CI/CD platforms like Jenkins.
 Author: jduncan
 Licence: GPL2+
 Dependencies (py modules):
 - httplib2
 - oauthlib / oauth2
 '''
 def __init__(self):
  reload(sys)
  sys.setdefaultencoding('utf8')
  self.oauth_token = $my_token
  self.oauth_token_secret = $my_token_secret
  self.oauth_apikey = $my_api_key
  self.oauth_api_private_key = $my_api_private_key
 def _get_rss_data(self):
  try:
   rss = TrelloRSS(self.oauth_apikey,
     self.oauth_api_private_key,
     self.oauth_token,
     channel_title="My RSS Title",
     rss_channel_link="https://trello.com/b/XXX/board_name",
     description="My Description")
   rss.get_all(50)
   data = rss.rss
   return data
  except Exception,e:
   raise e
 def create_rss_file(self, filename):
  data = self._get_rss_data()
  fh = open(filename,'w')
  for line in data:
   fh.write(line)
  fh.close()
 def main():
  parser = OptionParser(usage="%prog ", version="%prog 0.1")
  parser.add_option("-r", "--rss", 
    action="store_true", 
    dest="rss", 
    help="create the rss feed", 
    metavar="RSS")
  parser.add_option("-f", "--file", 
    dest="filename", 
    default="trellorss.xml", 
    help="output filename. 
    default = trello.xml", 
    metavar="FILENAME")
  (options, args) = parser.parse_args()
  trello = TrelloAutomate()
  if options.rss:
   trello.create_rss_file(options.filename)

if __name__ == '__main__':
main()

Step Three – Jenkins Automation

At this point I could stick this little script on a web server and have it generate my feed for me with a cron tab. But that would mean my web server would have to have to build content instead of just serving it. I don’t like that.

Instead I will build my content on a build server (Jenkins) and then move deploy it to my web server so people can access my RSS feed easily.

Put your python on your build server

Get your python script to your build server, and make sure you satisfy all of the needed dependencies. You will know if you haven’t, because your script won’t work. 🙂 For one-off scripts like this I tend to put them in /usr/local/bin/$appname/. But that’s just my take on the FHS.

Create your build job

This is a simple build job, especially since it’s not pulling anything out of source control. You just tell it what command to run, how often to run it, and where to put what is generated.

trello-rss-1
The key at the beginning is to not keep all of these builds. If you run this frequently you could fill up lots of things on your system with old cruft from 1023483248 builds ago. I run mine every 15 minutes (you’ll see later) and keep output from the last 10.
trello-rss-2
Here I tell Jenkins to run this job every 15 minutes. The syntax is sorta’ like a crontab, but not exactly. The help icon is your friend here.
trello-rss-3
I have previously defined where to send my web docs (see my previous post about automating documentation). If you don’t specify a filename, the script above saves the RSS feed as ‘trello.xml’. I just take the default here and send trello.xml to the root directory on my web server.
trello-rss-4
And this is the actual command to run. You can see the -f and -r options I define in the script above. $WORKSPACE is a Jenkins variable that is the filesystem location for the current build workspace. I just output the file there.

Summary

So using a little python and my trusty Jenkins server, I now have an RSS Feed at $mywebserver/trello.xml that is updated every 15 minutes (or however often you want).

Of course this code could get way more involved. The py-trello module that it uses is very robust and easy to use for all of your Trello needs. I highly recommend it.

If I have time to expand on this idea I’ll post a link to the github where I upload it.

-jduncan

 

Multi-Node OpenStack on your laptop in about an hour

OpenStack is just about the hottest thing going in IT today.  Started in 2010 as a joint project between Rackspace and NASA, it is quickly maturing into the premier solution for anyone who wants to make their infrastructure more self-service without having to go behind and clean up after developers all day every day.

It’s biggest hurdle is still its learning curve and the associated pain you are almost guaranteed to suffer during your first install.  After approximately 243535 tries, I have a pretty solid process to stand up an OpenStack demo setup across multiple nodes on your laptop. It could also be altered (if any alterations are even needed) to deploy it across multiple physical or virtual systems in any environment.

Depending on your bandwidth to patch the servers, it takes about an hour, soup to nuts.

Which Flavor?

RHEL OSP 7 comes with an awesome tool called OSP Director, which is based on TripleO. It is essentially a canned OpenStack install that you then use to deploy production OpenStack nodes. It’s called an ‘undercloud’.

For two reasons, I’m not using OSP Director for this demo.

  1. It takes more time and more resources. If I were doing this in my environment and I was an everyday engineer, I’d totally use it. But this is an exercise in tire-kicking.
  2. I haven’t had time yet to play with it very much.

Instead I’m using RDO’s Quickstart tool, which is based on Packstack.

OpenStack in 60 seconds

The goal when OpenStack was started was to engineer an FOSS alternative to Amazon Web Services. What they came up with were an ever-growing list of services that each perform a task required (or that is optionally neat) to build out a virutalized infrastructure.

The services are all federated together with RESTful APIs. Python is the language of choice.

Core Services

  • Nova – compute services. The core brains of the operation and the initial product
  • Neutron – Software-defined Networking service. Nova also has some less flexible networking components built in.
  • Cinder – provides block devices to virtual machines (instances in OpenStack parlance)
  • Glance – manages images used to create instances
  • Swift – provides object/blob storage
  • Keystone – Identity and Identification services for all other services as well as users
  • Horizon – a Django-based web frontend that’s customizable and extensible
  • Heat – Orchestration services
  • Ceilometer – Telemetry

Optional Fun Services

  • Trove – Database-as-a-service
  • Ironic – Bare metal provisioning – treat your racked stuff like your virtual stuff
  • Elastic Map Reduce – Big-Data-as-a-Service (?!?!)
  • $instert_awesome_project_here

All Openstack modules that are currently canon are listed on their roadmap.

HOWTO

Cluster Setup

The demo setup I’ll be going through was setup on my laptop (Fedora 21) using good ol’ KVM Virtual Machines running RHEL 7. The laptop has 8 cores and 16GB of RAM total.

  • rdo0.localdomain (192.168.122.100) – 4GB RAM, 2 VCPU
    • Controller Node (all services except Nova Compute)
  • rdo1.localdomain (192.168.122.101) – 2GB RAM, 2 VCPU
    • Nova Compute Node
  • rdo2.localdomain (192.168.122.102)- 2GB RAM, 2 VCPU
    • Nova Compute Node

Host OS Setup

NOTE – since these are VMs, the single NIC I assigned them was designated eth0. We all know the naming convention has changed in RHEL 7

subscription-manager register --username=$RHN_USERNAME --password=$RHN_PASSWORD
subscription-manager attach --pool=$SUBSCRIPTION_MANAGER_POOL_ID
subscription-manager repos --disable=\* --enable=rhel-7-server-rpms --enable=rhel-7-server-optional-rpms --enable=rhel-7-server-extras-rpms
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y
sudo yum install -y https://rdoproject.org/repos/rdo-release.rpm
yum install -y openstack-packstack vim-enhanced
yum update -y
systemctl stop NetworkManager
systemctl disable NetworkManager
systemctl enable network
#confirm the network setup is working
ifdown eth0 && systemctl start network && ifup eth0 
#reboot to apply any patches that require it, etc.
reboot

The above snippet will

  • register your system to Red Hat via subscription-manager
  • attach the proper subscription pool (supplied by you)
  • enable the needed channels
  • install the RDO package repository
  • install a few things (I’m a Vim guy, feel free to edit)
  • disable NetworkManager (required for OpenStack) and replace it with a legacy network service script
  • activate the new network setup

Once this is set up on each host (I did it on one and cloned it twice to create the new other VMs) , you are ready to get OpenStack rolling.

Creating an answers file

On rdo0.localdomain, run the following command. It willThe next command will generate a default answers file that you can then edit and keep up with over time as you deploy various OpenStack incarnations.

packstack --gen-answer-file rdo.txt

The following changes were made.

NOTE – if you create 2 answer files and diff them, you will see many other changes, as passwords are randomized each time.

# diff -u rdo.txt rdo-edited.txt 
--- rdo.txt 2015-08-23 15:41:45.041000000 -0400
+++ rdo-edited.txt 2015-08-21 20:17:05.538000000 -0400
@@ -64,7 +64,7 @@
 # Specify 'y' to install Nagios to monitor OpenStack hosts. Nagios
 # provides additional tools for monitoring the OpenStack environment.
 # ['y', 'n']
-CONFIG_NAGIOS_INSTALL=y
+CONFIG_NAGIOS_INSTALL=n
 
 # Comma-separated list of servers to be excluded from the
 # installation. This is helpful if you are running Packstack a second
@@ -84,7 +84,7 @@
 
 # List of IP addresses of the servers on which to install the Compute
 # service.
-CONFIG_COMPUTE_HOSTS=192.168.122.100
+CONFIG_COMPUTE_HOSTS=192.168.122.101,192.168.122.102
 
 # Specify 'y' to provision for demo usage and testing. ['y', 'n']
-CONFIG_PROVISION_DEMO=y
+CONFIG_PROVISION_DEMO=n

And then you run packstack. Depending on your install targets and have much horsepower is available, this can take a while. On my laptop, it takes the better part of an hour.

packstack --answer-file=rdo.txt

Getting Networking Working

The next part of the setup will borrow heavily from This RDO blog post about setting up Neutron with an existing network.

After packstack does its thing, assuming you have a ‘Success!’ sort of output on your screen, you will then have a 3-node OpenStack cluster with 2 Nova Compute nodes and 1 node doing pretty much everything else. Unfortunately, out of the box you need to make a few tweaks so you can see your new instances from your libvirt networking (or the network in your lab or whatever your use case is).

NOTE – this needs to happen on each host

Create your bridge and set up your NIC

On my VMs the only NIC is named eth0 (benefit of using a VM). So you may need to edit this slightly to fit your set up’s naming conventions.

We want to use a bridge device to get our VMs on to our network so we create a device named br-ex. We then edit $YOUR_NIC to

[root@rdo0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ex 
DEVICE=br-ex
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
IPADDR=192.168.122.100 # Old eth0 IP since we want the network restart to not 
 # kill the connection, otherwise pick something outside your dhcp range
NETMASK=255.255.255.0 # your netmask
GATEWAY=192.168.122.1 # your gateway
DNS1=192.168.122.1 # your nameserver
ONBOOT=yes

[root@rdo0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 
#TYPE=Ethernet
#BOOTPROTO=none
#DEFROUTE=yes
#IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
NAME=eth0
UUID=f16a4a9c-184c-403d-bfff-25092c9519b0
DEVICE=eth0
#ONBOOT=yes
#IPADDR=192.168.122.100
#PREFIX=24
#GATEWAY=192.168.122.1
#DNS1=192.168.122.1
#DOMAIN=localdomain
#IPV6_PEERDNS=yes
#IPV6_PEERROUTES=yes
#IPV6_PRIVACY=no
HWADDR=52:54:00:2f:f0:a2
TYPE=OVSPort
DEVICETYPE=ovs
OVS_BRIDGE=br-ex
ONBOOT=yes

Tell Neutron about your bridge

We then run the following to tell Neutron to use a bridge called ‘br-ex’, and to use the proper plugins

openstack-config --set /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini ovs bridge_mappings extnet:br-ex
openstack-config --set /etc/neutron/plugin.ini ml2 type_drivers vxlan,flat,vlan
reboot

You could probably restart Neutron and be OK here, but I prefer the belts and suspenders.

Define your software network

After the reboot, you should be able to ssh back into your normal IP address. We now have a host infrastructure that is ready to serve our OpenStack instances. So let’s define our SDN components so we can get going!

NOTE – This should be done on your controller node, rdo0.localdomain in my case

Provider network

# source keystonerc_admin
# neutron net-create external_network --provider:network_type flat --provider:physical_network extnet --router:external --shared

Public subnet and router

# neutron subnet-create --name public_subnet --enable_dhcp=False --allocation-pool=start=192.168.122.10,end=192.168.122.20 \
 --gateway=192.168.122.1 external_network 192.168.122.0/24
# neutron router-create router1
# neutron router-gateway-set router1 external_network

Private subnet

# neutron net-create private_network
# neutron subnet-create --name private_subnet private_network 192.168.100.0/24

Connect the two networks with the router

# neutron router-interface-add router1 private_subnet

Wrapping Up

And that’s it! You should now have 3 nodes ready to handle your OpenStack demo loads.

My current plan is to keep evolving this setup and re-deploying to do things like

  • take advantage of Ceph
  • Further distribute loads (maybe a virtual/physical hybrid setup)
  • handle multiple NICs
  • ???

If you have grief or ideas for improvement please feel free to comment below.

Sytemtap Fun #1 – looking for particular signals (e.g. kill -9)

Systemtap is amazing, plain and simple. I’ve only begun to scratch the surface myself, but I can already see its power and have even used it in a few cases to make my life easier.

I found this example in a document somewhere, and I love it. Have you ever had one of those cases where a process was being mysteriously killed off on a server and you couldn’t quite figure out why? Well in comes Systemtap. With just a few lines of code:

if ( sig_name == "SIGKILL ")
printf ("%s was sent to %s ( pid :%d ) by %s uid :%d \n " ,
 sig_name , pid_name , sig_pid , execname ( ) , u i d ( ) )
}

in a Systemtap script. Execute it and any time a “SIGKILL” is sent to the kernel for any reason (SIGKILL == kill -9), it outputs what was killed  and what pid / process executed it.

For example:

# stap /usr/share/systemtap/tapset/signal.stp
SIGKILL was sent to saslauthd (pid :6202) by AntiCloseWait.s uid :0

In this case “AntiCoseWait.s” was “AntiCloseWait.sh”, a long-forgotten cronjob.

Simple, Powerful, Flexible. One of my favorite new tools to use.

Zabbix Fun – Tracking SSL Certificate Expiration Times

>One of the most important things that an IT pro has to do is make sure the SSL certs for his sites don’t expire. It’s one of those weird little things that seems to fall through the cracks way too often. Happily, Zabbix can help keep track of this and make sure we take care of it.

For the record, I heavily borrowed this idea from http://aperto.fr/cms/en/15-blog-en/15-ssl-certificate-expiration-monitoring-with-zabbix.html, keeping the vast majority of his technical operation, and primarily changed how Zabbix is executing the check.

Step 1 – the script:

[root@sfo-it-zabbix-prod-01 ~]# cat /etc/zabbix/scripts/ssl_check.sh 
#!/usr/bin/env bash
host=$1
port=443
end_date=`openssl s_client -host $host -port $port -showcerts /dev/null |
          sed -n ‘/BEGIN CERTIFICATE/,/END CERT/p’ |
          openssl x509 -text 2>/dev/null |
          sed -n ‘s/ *Not After : *//p’`


if [ -n “$end_date” ]
then
    end_date_seconds=`date ‘+%s’ –date “$end_date”`
    now_seconds=`date ‘+%s’`
    echo “($end_date_seconds-$now_seconds)/24/3600” | bc
fi


This script takes a hostname as input, and looks up the associated SSL certificate using openssl. Example usage is:

[root@sfo-it-zabbix-prod-01 ~]# /etc/zabbix/scripts/ssl_check.sh http://www.gmail.com
176


The SSL Certificate for http://www.gmail.com expires in 176 days.


Now we add this as a custom parameter to Zabbix.


Step 2 – adding to zabbix_agentd.conf


UserParameter=cert_check[*],/etc/zabbix/scripts/ssl_check.sh $1


More information about creating custom checks in Zabbix can be found at http://www.zabbix.com/documentation/1.8/manual/config/user_parameters

Step 3 – setting up the Zabbix GUI

Since this will only change once per day, we really only care about checking it once every 24 hours, or 86400 seconds.

So now we’re collecting data.  If you look at the overview for the box your zabbix server (or wherever you wrote this script and applied the template to), you should see something similar to:
And that’s cool. BUT, how do we get Zabbix to send us info if our certificates are getting close to expiring? The answer is TRIGGERS.
Information on Zabbix triggers is available at http://www.zabbix.com/documentation/1.8/manual/config/triggers. I created three alert levels. 
1. If the certificate is within 30 days of expiring, a standard level alert is sent out.
2. If the certificate is within 7 days of expiring, a high level alert is sent out.
3. If a certificate expires, a Disaster level alert is sent out.
And there you have it. Zabbix is now keeping an eye on our SSL Certificates, and will scream at us loudly to make sure we don’t let it expire.