Preparing a new RAID Drive for Insertion

Today I had the pleasure of fixing my deficient RAID array. With the new drive slotted and identified in position /dev/sdc it was time to create a GPT partition table on the drive and a single primary partition.

blackbox# parted /dev/sdc
GNU Parted 2.3
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
Warning: The existing disk label on /dev/sdc will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
(parted) mkpart primary 2048s 100%
(parted) q
Information: You may need to update /etc/fstab.

Unmount the filesystem, as a safety precaution.

root@blackbox:/home/eric# sudo umount /media/Manta_Array/
root@blackbox:/home/eric# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sdh1[1] sdg1[7] sdb1[6] sda1[5]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
unused devices: <none>

At this point I’ve already confirmed that the partition table on all the RAID devices is consistent/identical by using the following commands.

parted /dev/sda print

parted /dev/sdb print

parted /dev/sdg print

parted /dev/sdh print

I’ve also already removed the errant drive before the last boot via

mdadm --manage /dev/md0 --remove /dev/sdc1

Now it’s time to add in the new drive.

root@blackbox:/home/eric# mdadm --manage /dev/md0 --add /dev/sdc1
mdadm: added /dev/sdc1
root@blackbox:/home/eric# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sdc1[8] sdg1[7] sdb1[6] sda1[5] sdh1[1]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
 [>....................] recovery = 0.0% (401324/2930133504) finish=730.0min speed=66887K/sec
unused devices: <none>

As you can see there is a good bit of synchronization left to do here before the new drive can be put to use.


Installing the Newest Ifupdown2 on an OrangePi Nano

I’ve been pretty enamored with my OrangePi nanos since I first got one. So enamored in fact that I’m up to owning 5 of them now doing all matters of tasks. Being a network person I wanted to make sure I had some of the best interface configuration software available installed so naturally I wanted Ifupdown2.

The OrangePi nano runs a debian based version of software called Armbian which is truly awesome software. It has been stripped down and customized for specific devices to the point that it is a work of art. Since it is debian based it has access to ifupdown2 natively right in the repos. The only problem with that version is that it is outdated being from the November 2015 timeline. So I want to install the latest and greatest Ifupdown2….

From my armbian device:

# Install the newest version in the standard repo
sudo apt-get update -y
sudo apt-get install ifupdown2 -qy

# Now install the newest version of ifupdown2 directly from the debian repos
wget -O /root/ifupdown2.deb && \
dpkg -i /root/ifupdown2.deb && \
rm -rfv /root/ifupdown2.deb

sudo apt-cache policy ifupdown2 | grep Installed
echo "Output above should say: \"Installed: 1.0~git20170314-1\""

# Overwrite NMCLI tool to control the Eth0 interface with Ifupdown2
echo "[keyfile] unmanaged-devices=interface-name:eth0" | sudo tee -a /etc/NetworkManager/NetworkManager.conf

echo "### Before Change ###"
sudo nmcli dev status
sudo systemctl stop NetworkManager; sudo systemctl start NetworkManager

echo "### After Change ###"
sudo nmcli dev status
echo "Eth0 should now show as \"unmanaged\" according to the output above."


Troubleshooting Linux Software RAID (MDADM)

Recently I had the pleasure of rebooting my NAS server for some standard “maintenance” activities i.e. kernel updates etc. Naturally when it came back up my primary large file storage RAID 6 Array did not come up automatically after the reboot.

This is always my worst fear when it comes to rebooting that box… what if the RAID doesn’t come back up and some hard drives were limping along and I didn’t know it?!

After reading syslog I found a number of errors which clearly indicated a READ error on /dev/sdc.

Sep 22 22:25:37 blackbox kernel: [264777.620821] ata3.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
Sep 22 22:25:37 blackbox kernel: [264777.624084] ata3.00: irq_stat 0x40000008
Sep 22 22:25:37 blackbox kernel: [264777.627338] ata3.00: failed command: READ FPDMA QUEUED
Sep 22 22:25:37 blackbox kernel: [264777.630585] ata3.00: cmd 60/08:90:08:08:00/00:00:00:00:00/40 tag 18 ncq 4096 in
Sep 22 22:25:37 blackbox kernel: [264777.630585] res 41/40:00:09:08:00/00:00:00:00:00/40 Emask 0x409 (media error) <F>
Sep 22 22:25:37 blackbox kernel: [264777.637060] ata3.00: status: { DRDY ERR }
Sep 22 22:25:37 blackbox kernel: [264777.640253] ata3.00: error: { UNC }
Sep 22 22:25:37 blackbox kernel: [264777.644587] ata3.00: configured for UDMA/133
Sep 22 22:25:37 blackbox kernel: [264777.644609] sd 2:0:0:0: [sdc] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 22 22:25:37 blackbox kernel: [264777.644616] sd 2:0:0:0: [sdc] tag#18 Sense Key : Medium Error [current] [descriptor] 
Sep 22 22:25:37 blackbox kernel: [264777.644622] sd 2:0:0:0: [sdc] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 22 22:25:37 blackbox kernel: [264777.644628] sd 2:0:0:0: [sdc] tag#18 CDB: Read(16) 88 00 00 00 00 00 00 00 08 08 00 00 00 08 00 00
Sep 22 22:25:37 blackbox kernel: [264777.644632] blk_update_request: I/O error, dev sdc, sector 2057

A couple sources are common for these styles of errors and I recommend troubleshooting all of them first.

  1. New SATA Cables — sometimes the old ones are of low quality, sometimes they get finicky or have dust covering the pins whatever the reason SATA cables are cheap buy some more and try those.
  2. Bad SATA Port — The specific port on your controller could be failing. If this is a port on your motherboard and you have another one available, try that one. If not consider buying a SATA controller card.
  3. BAD SATA Controller Card — These cards can be quite inexpensive in some cases. I’ve seen several of them fail in my lifetime often with spurious read errors like this one being the first symptom of a larger failure in the card.
  4. Bad Hard Drive — This one is the most obvious of course but the other things above should really be investigated first. While I’ve had more hard drive failures than any of the issues above I have also been stricken with getting a new RMA’d hard drive and having that fail to function as well due to one of the issues above being the real culprit.

One other way to investigate item #4 is with the use of the SMART utility in Linux available via `sudo apt-get install smartmontools` SMARTctl provides a TON of useful data from the SMART controller on the disk. They’ll let you know if your drive has already logged other concerning errors etc.

One of the most important sections being this:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 115
 3 Spin_Up_Time 0x0027 181 176 021 Pre-fail Always - 5941
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 951
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 049 049 000 Old_age Always - 37920
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 93
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 57
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 893
194 Temperature_Celsius 0x0022 120 109 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

My takeaway is that after 37920 gours (37920 / 24 / 365 = 4.33 Years) in operation and after already checking items 1-3 perhaps it was finally time to let this drive go.

Making that decision is half the battle, now it’s time to recover the array.

A Reconstructive RAID Cheat Sheet

It is for moments like this that I’m writing myself this guide for later.

Since I run RAID 6 with (5) 3TB drives that means I can lose 2 out of the 5 drives and still be ok. Since It looks like I’ve only lost 1 the RAID should have been able to function but for whatever reason it showed as inactive at boot time.

eric@blackbox:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdb1[1](S) sda1[7](S) sdd1[6](S) sde1[5](S)
 11720536064 blocks super 1.2
unused devices: <none>

Identify your Hard disks

root@blackbox:/home/eric# sudo fdisk -l | grep "2.7 TiB"
Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sde: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors

/dev/sdc is my bad drive some I’m going to remove it from the array.

root@blackbox:/home/eric# mdadm --manage /dev/md0 --remove /dev/sdc1
mdadm: hot remove failed for /dev/sdc1: No such device or address

In this case, we can see from the mdstat output above that /dev/sdc1 was not inserted into the array at boot time so the remove operation has failed. Instead, what needs to be done is that the RAID array must be rebuilt. For this I need to stop the array.

root@blackbox:/home/eric# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@blackbox:/home/eric#  cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
unused devices: <none>

Alright, let’s rebuild this puppy. I had a clean power-down event before so everything should still be mostly in order.

root@blackbox:/home/eric# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdd1 /dev/sde1
mdadm: /dev/md0 has been started with 4 drives (out of 5).
root@blackbox:/home/eric# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid6 sda1[7] sdd1[6] sde1[5] sdb1[1]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
unused devices: <none>
root@blackbox:/home/eric# sudo mount /dev/md0 /media/Manta_Array/

And now with the array remounted and a new drive on order. Life is back to normal.

Some Additional Commands I Find Useful:

Detect Present State and write it to your RAID configuration file

mdadm --verbose --detail --scan > /etc/mdadm.conf

Assemble Existing Arrays by UUID (Optional) if possible

mdadm --assemble --scan --uuid=f6ff12cd:86a8e3fb:89bc0f58:ad15e3e2

Learn about RAID info stored on each individual drive (Superblock)

mdadm --examine /dev/sda1


Adding New Fonts in Bulk to Ubuntu 16.04

This process should also work for 12.04 and 14.04.

Create a new directory under /usr/share/fonts

sudo mkdir /usr/share/fonts/opentype/newfonts

Place all OTF or TTF files in that directory.

Run the font-caching utility to fix permissions on these new fonts and make them available to applications immediately.

sudo chmod -R 655 /usr/share/fonts
sudo fc-cache -fv

Other Methods

There are other methods available on modern Ubuntu as well. For individual fonts you can just double-click on them and click the “install” option in the upper right. Or use a purpose-built program like font-manager.

sudo apt-get install font-manager

Using Active/Backup Bonding (mode 1) with Ifupdown2

Ifupdown2 is a very useful interface configuration utility with tons of enhancements over the stock utility ifupdown. It was built with a specific initial use-case in mind which is for use on network operating systems (NOS) like Cumulus Linux. Cumulus requires LACP support as the primary bonding method. Other modes like active-backup (mode 1) were not initially fully implemented if ifupdown2. This is changing however; CM-14985 brings support for the bond-primary keyword and will be present in the next release of Cumulus Linux and the next version of Ifupdown2.

To hold you over until then here’s a workaround I’ve been using on my server at home running Ifupdown2 for performing active/backup bonding. Writing the sys file directly can provide the same behavior.

auto lo
iface lo inet loopback

auto enp4s0
iface enp4s0
 alias Motherboard Ethernet 
 mtu 9194

auto enxf01e341f95
iface enxf01e341f95
 alias USB3 Ethernet
 mtu 9194

auto bond0
iface bond0
 alias ActiveBackup Uplink
 bond-mode active-backup
 bond-slaves enxf01e341f95 enp4s0
 mtu 9194
 pre-up echo enp4s0 > /sys/class/net/bond0/bonding/primary

Building FRRouting for PowerPC on Debian Wheezy

Tried to do this to modernize the routing software running on an older whitebox which was built on the PowerPC architecture.

One of the challenges on these platforms aside from the PPC arch is the limited space. I found my switch did not have enough hard disk space to complete the build. My answer was to use a USB stick to provide additional disk space to complete the build. At the completion of the build my build directory consumed ~214 MB so plan accordingly if your switch does not have sufficient on-board space.

Assume ROOT for all commands unless otherwise stated.

I mounted my USB stick to –> /mnt/USB

mkdir /mnt/USB
# Use Fdisk to confirm USB device.
fdisk -l 
mount /dev/sda1 /mnt/USB

Add the sources

cat << EOT >> /etc/apt/sources.list
deb wheezy main contrib non-free
deb-src wheezy main contrib non-free

deb wheezy/updates main contrib non-free
deb-src wheezy/updates main contrib non-free

deb wheezy-updates main contrib non-free
deb-src wheezy-updates main contrib non-free

deb wheezy-backports main non-free contrib

Add the Prereq packages

apt-get install git autoconf automake libtool make gawk libreadline-dev texinfo dejagnu pkg-config libpam0g-dev bison flex python-pytest libc-ares-dev python3-dev libjson-c-dev build-essential fakeroot devscripts


Install some out of Repo Prereqs from Source as shown in the Ubuntu 12.04 LTS build guide

Install newer bison from Ubuntu 14.04 package source:

mkdir builddir
cd builddir
tar -jxvf bison_3.0.2.dfsg.orig.tar.bz2 
cd bison-3.0.2.dfsg/
tar xzf ../bison_3.0.2.dfsg-2.debian.tar.gz 
sudo apt-get build-dep bison
debuild -b -uc -us
cd ..
sudo dpkg -i ./libbison-dev_3.0.2.dfsg-2_amd64.deb ./bison_3.0.2.dfsg-2_amd64.deb 
cd ..
rm -rf builddir

Install newer version of autoconf and automake:

tar xvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure --prefix=/usr
sudo make install
cd ..

tar xvf automake-1.15.tar.gz
cd automake-1.15
./configure --prefix=/usr
sudo make install
cd ..

Add frr groups and user

sudo groupadd -g 92 frr
sudo groupadd -r -g 85 frrvty
sudo adduser --system --ingroup frr --home /var/run/frr/ \
   --gecos "FRR suite" --shell /sbin/nologin frr
sudo usermod -a -G frrvty frr

Download Source, configure and compile it

git clone frr
cd frr
./configure \
    --prefix=/usr \
    --enable-exampledir=/usr/share/doc/frr/examples/ \
    --localstatedir=/var/run/frr \
    --sbindir=/usr/lib/frr \
    --sysconfdir=/etc/frr \
    --enable-pimd \
    --enable-watchfrr \
    --enable-ospfclient=yes \
    --enable-ospfapi=yes \
    --enable-multipath=64 \
    --enable-user=frr \
    --enable-group=frr \
    --enable-vty-group=frrvty \
    --enable-configfile-mask=0640 \
    --enable-logfile-mask=0640 \
    --enable-rtadv \
    --enable-fpm \
    --with-pkg-git-version \
make install

Most guides would end here but there’s a bit more required to get FRR functioning.

Create empty FRR configuration files

sudo install -m 755 -o frr -g frr -d /var/log/frr
sudo install -m 775 -o frr -g frrvty -d /etc/frr
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/zebra.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/bgpd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/ospfd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/ospf6d.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/isisd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/ripd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/ripngd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/pimd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/ldpd.conf
sudo install -m 640 -o frr -g frr /dev/null /etc/frr/nhrpd.conf
sudo install -m 640 -o frr -g frrvty /dev/null /etc/frr/vtysh.conf

Install the init.d service

sudo install -m 755 tools/frr /etc/init.d/frr
sudo install -m 644 tools/etc/frr/daemons /etc/frr/daemons
sudo install -m 644 tools/etc/frr/daemons.conf /etc/frr/daemons.conf
sudo install -m 644 -o frr -g frr tools/etc/frr/vtysh.conf /etc/frr/vtysh.conf

Enable your Routing Daemons

cat << EOT > /etc/frr/daemons

Start FRR

service frr start
service frr status

Enable FRR At boot time for subsequent reboots

sudo update-rc.d frr defaults

Fix Exit Scripts

 sed -i 's/ip route flush proto ripng/ip route flush proto 190 \# ripng/' /usr/lib/frr/frr
 sed -i 's/ip route flush proto bgp/ip route flush proto 186 \# bgp/' /usr/lib/frr/frr
 sed -i 's/ip route flush proto isis/ip route flush proto 187 \# isis/' /usr/lib/frr/frr
 sed -i 's/ip route flush proto ospf/ip route flush proto 188 \# ospf/' /usr/lib/frr/frr
 sed -i 's/ip route flush proto rip/ip route flush proto 189 \# rip/' /usr/lib/frr/frr
 sed -i 's/ip route flush proto static/ip route flush proto 191 \# static/' /usr/lib/frr/frr

Hopefully that should do it for you. Now the next step is figuring out how to build a proper deb from the source. I’ll leave that process for next time 🙂

Troubleshooting Vagrant Libvirt Simulations

There are a lot of moving parts in a vagrant-libvirt simulation. Vagrant calls the vagrant-libvirt plugin which controls libvirt and libvirt in turn, is used to control QEMU.

Vagrant  → libvirt → qemu

Our troubleshooting is going to focus on the libvirt component. Our solution to vagrant issues is to correct permissions and basically remove all of the files that vagrant uses to keep state. We’ll use a series of virsh commands to manually correct issues with libvirt.

Suggestions to keep yourself out of hot water when working with libvirt simulations:

  • vagrant activities are unique per user so perform all actions from a single non-root user account, do not mix and match

Common Troubleshooting Path:

1). Make sure your user is added to the libvirtd group with the “id” command.

$ id
uid=1000(eric) gid=1000(eric) groups=1000(eric),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare),131(vboxusers),132(libvirtd)

NOTE: to append this group to your user id use this command

sudo usermod -a -G libvirtd userName

log out and log back in for group change to happen

2). Change ownership of everything in that user’s .vagrant.d directory back to the user in question.

$ sudo chown [user] -Rv ~/.vagrant.d/

3). List all Domains (VMs) and their storage volumes (Hard Drives)

$ virsh list --all
$ virsh vol-list default

Images are stored in /var/lib/libvirt/images/

4). Stop each VM, undefine it, and remove the virtual hard drive

$ virsh destroy vagrant_asdfasdf
$ virsh undefine vagrant_asdfasdf
$ virsh vol-delete --pool default vagrant_asdfasdf.img


VM list should be empty now and volume list should not have any of the volumes that correspond to VM names in your simulation.

$ virsh list --all
$ virsh vol-list default


5). Remove the hidden .vagrant directory in simulation folder

$ rm -rfv ./.vagrant/


6). Try your Vagrant up now.

$ vagrant status
$ vagrant up --provider=libvirt


Have VMs that you’ve already removed, stuck in output from “vagrant global-status”?

Remove the machine-index file as follows:

rm ~/.vagrant.d/data/machine-index/index