Troubleshooting Vagrant Libvirt Simulations

There are a lot of moving parts in a vagrant-libvirt simulation. Vagrant calls the vagrant-libvirt plugin which controls libvirt and libvirt in turn, is used to control QEMU.

Vagrant  → libvirt → qemu

Our troubleshooting is going to focus on the libvirt component. Our solution to vagrant issues is to correct permissions and basically remove all of the files that vagrant uses to keep state. We’ll use a series of virsh commands to manually correct issues with libvirt.

Suggestions to keep yourself out of hot water when working with libvirt simulations:

  • vagrant activities are unique per user so perform all actions from a single non-root user account, do not mix and match

Common Troubleshooting Path:

1). Make sure your user is added to the libvirtd group with the “id” command.

$ id
uid=1000(eric) gid=1000(eric) groups=1000(eric),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),131(vboxusers),132(libvirtd)

NOTE: to append this group to your user id use this command

sudo usermod -a -G libvirtd userName

log out and log back in for group change to happen

2). Change ownership of everything in that user’s .vagrant.d directory back to the user in question.

$ sudo chown [user] -Rv ~/.vagrant.d/

3). List all Domains (VMs) and their storage volumes (Hard Drives)

$ virsh list --all
$ virsh vol-list default

Images are stored in /var/lib/libvirt/images/

4). Stop each VM, undefine it, and remove the virtual hard drive

$ virsh destroy vagrant_asdfasdf
$ virsh undefine vagrant_asdfasdf
$ virsh vol-delete --pool default vagrant_asdfasdf.img

 

VM list should be empty now and volume list should not have any of the volumes that correspond to VM names in your simulation.

$ virsh list --all
$ virsh vol-list default

 

5). Remove the hidden .vagrant directory in simulation folder

$ rm -rfv ./.vagrant/

 

6). Try your Vagrant up now.

$ vagrant status
$ vagrant up --provider=libvirt

 

Have VMs that you’ve already removed, stuck in output from “vagrant global-status”?

Remove the machine-index file as follows:

rm ~/.vagrant.d/data/machine-index/index
Advertisements

Controlling Docker from Within A Docker Container

I’ve been tinkering with a project to interact with the Docker-Engine api using docker-py. The catch is that the program is running inside a docker container.

Modify /lib/systemd/system/docker.service to fix the Docker daemon to a TCP port.

First create a loopback IP address

sudo ip addr add 10.254.254.254/32 dev lo

By default the Unix socket used by Docker is inaccessible from inside the container for this reason we need to use TCP.

Modify the ExecStart line to remove the unix socket and instead replace it with the address of your loopback and a port of your choosing. I used 2376 because that is what Docker uses on Windows where the unix sockets are not available.

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket firewalld.service
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H 10.254.254.254:2376
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
ExecReload=/bin/kill -s HUP $MAINPID
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

Reload systemd daemons to pull in the new changes to the unit file.

sudo systemctl daemon-reload

Start Docker with the new settings.

sudo systemctl stop docker.service
sudo systemctl start docker.service

Modify your environment variables so you don’t have to use the -H argument everytime you call a docker cli command.

export DOCKER_HOST=10.254.254.254:2376
echo "export DOCKER_HOST=10.254.254.254:2376" >> ~/.bashrc

Run a new docker image in the new environment.

docker run -itd --name=test ubuntu /bin/bash
docker exec -itd ubuntu /usr/bin/apt-get update -y 
docker exec -itd ubuntu /usr/bin/apt-get install python python-pip -y
docker exec -itd ubuntu /usr/local/bin/pip install docker

Create a Test Script to Try In the Container

#!/usr/bin/python
import docker
import pprint

# PrettyPrint Setup
pp = pprint.PrettyPrinter(indent=4)

# DOCKER API Setup
#  NORMAL SETUP
#client = docker.from_env()
#low_level_client=docker.APIClient(base_url='unix://var/run/docker.sock')
#  Custom SETUP
client = docker.DockerClient(base_url='tcp://10.254.254.254:2376')
low_level_client=docker.APIClient(base_url='tcp://10.254.254.254:2376')

container_list=client.containers.list()
pp.pprint(container_list)
for container in container_list:
    docker_container=low_level_client.inspect_container(container_id)
    pp.pprint(docker_container)

Run The Test Script

chmod +x ./test.py
docker cp ./test.py test:/root/test.py
docker exec -it test /root/test.py

Security: Don’t forget to secure your newly exposed port with IPtables rules!

sudo iptables -t filter -A INPUT -i eth0 -p tcp -m tcp --dport 2376 -j DROP