Preparing a new RAID Drive for Insertion

Today I had the pleasure of fixing my deficient RAID array. With the new drive slotted and identified in position /dev/sdc it was time to create a GPT partition table on the drive and a single primary partition.

blackbox# parted /dev/sdc
GNU Parted 2.3
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
Warning: The existing disk label on /dev/sdc will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
(parted) mkpart primary 2048s 100%
(parted) q
Information: You may need to update /etc/fstab.

Unmount the filesystem, as a safety precaution.

root@blackbox:/home/eric# sudo umount /media/Manta_Array/
root@blackbox:/home/eric# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sdh1[1] sdg1[7] sdb1[6] sda1[5]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
 
unused devices: <none>
root@blackbox:/home/eric#

At this point I’ve already confirmed that the partition table on all the RAID devices is consistent/identical by using the following commands.

parted /dev/sda print

parted /dev/sdb print

parted /dev/sdg print

parted /dev/sdh print

I’ve also already removed the errant drive before the last boot via

mdadm --manage /dev/md0 --remove /dev/sdc1

Now it’s time to add in the new drive.

root@blackbox:/home/eric# mdadm --manage /dev/md0 --add /dev/sdc1
mdadm: added /dev/sdc1
root@blackbox:/home/eric# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sdc1[8] sdg1[7] sdb1[6] sda1[5] sdh1[1]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
 [>....................] recovery = 0.0% (401324/2930133504) finish=730.0min speed=66887K/sec
 
unused devices: <none>
root@blackbox:/home/eric#

As you can see there is a good bit of synchronization left to do here before the new drive can be put to use.

Advertisements

Meet Charles Clos: The Father of Clos Networks

Around Cumulus Networks we’re constantly talking about this thing called a Clos network. If you don’t know much about what those are, check out the following descriptions from Network World and Wikipedia:

https://www.networkworld.com/article/2226122/cisco-subnet/clos-networks–what-s-old-is-new-again.html

https://en.wikipedia.org/wiki/Clos_network

They were a concept initially envisioned in the 1950’s by a gentleman working at Bell Labs named Charles Clos. Not a ton is known about Mr. Clos aside from the fact that his seminal paper  in “The Bell System Technical Journal ( Volume: 32, Issue: 2, March 1953 )” A study of non-blocking switching networks went on to make waves in the Data Center Computer Networking space many years later as it turns out non-blocking telephone networks have a lot to do with modern Data Center design pillars.

It bothered me that were no images to be found anywhere on the internet of Mr. Clos. So one day I decided I would get this mystery solved. I reached out to Bell Labs with an e-mail explaining my academic interest in Mr. Clos and asking for a picture. Before the end of that very day I received a response containing this image.

Internet, I would like to introduce you to Mr. Charles Clos…

10232017 - Scan - 171347
Description: three dimensional chart showing relationship of dial tone service marker occupancy and register occupancy in the #5 crossbar system l t r –mr r i wilkinson mrs r d leonard and mr c clos on the right.
Date: May 19, 1949

So there he is on the right, Charles Clos. I can sleep a little easier tonight putting a face to this much revered gentleman in the Data Center networking space.

Making Realistic Tombstones

Halloween is my favorite holiday of the year. To celebrate the occasion I try something a bit different each year. This year I had seen some interesting tutorials on making tombstones out of 4×8 foot 2 inch thick foam board. These days my biggest issue with all projects is having the time to complete them; knowing this I started this one in July.

IMG_4794

The first step which I do not have a picture of was to cut the profile out for the tombstones. In our case we found that a single 4×8 sheet had enough material to construct about two whole tombstones. We did multiple layers, my spire tombstone was three layers thick while my wife’s traditional tombstone was two layers thick. Each of the main pieces was also surrounded by a base that was two layers thick as well.

At this point the time had come to put them together. For this I carved a C-channel in between the layers and adhered a piece of PVC pipe with construction adhesive in the channel. The PVC pipe would serve as a guide for the rebar which will hold these stones firmly in the ground.

Notice I left about 1/2″ of PVC pipe exposed. This was to be made flush by a bottom base layer made of wood. I used construction adhesive again to glue all of the layers together.

ProShot_2017_09_17_14_49_34

Here you can see I’ve cut a bottom sheet of MDF and used a paddle bit to make allowances for the extra 1/2 inch of PVC pipe to sit flush with the bottom edge. I used a heavy dose of construction adhesive again in between where the PVC conduit pokes through the bottom plate.

In the background of this photo you can also see the latex-based Drylock paint that was used as the primer coat for all the tombstones. I lathered this stuff on thick. You can get the white color from Home Depot for about $23/gallon and they’ll even throw some gray pigment in there too if you ask.

You can also see the 2 foot epoxy-coated rebar pieces I picked up from Home Depot as well. These should work well as they’ll provide a little extra protection from rust.

ProShot_2017_10_14_14_56_15

At this point I used the Stanley Sur-form shaper to even out all the edges and make the layers look as one. I also used a Dremel tool with the 565-02 attachment to make some nice even cuts to carve out the epitaphs for the tombstones. You can use a program like rasterizer to make images or words large enough to cover your tombstone.

With epitaphs carved and everything looking smooth, it was time to put on the first 2 layers of Drylock. You can see the entire family enjoyed this.

ProShot_2017_10_14_11_00_45

Here is a glamour shot of the first two coats of Drylock complete on the stones. I also slathered Drylock on the underside as well.

ProShot_2017_10_14_14_56_50

From here I just bought a few darker colors to fill-in the insets and reliefs in the tombstone to make it look a little more realistic. You can get by with a sample can from Home Depot and that should provide enough color for 1-2 tombstones.

Last step is to dig these bad boys into the ground and enjoy!

ProShot_2017_10_19_10_09_55

Installing the Newest Ifupdown2 on an OrangePi Nano

I’ve been pretty enamored with my OrangePi nanos since I first got one. So enamored in fact that I’m up to owning 5 of them now doing all matters of tasks. Being a network person I wanted to make sure I had some of the best interface configuration software available installed so naturally I wanted Ifupdown2.

The OrangePi nano runs a debian based version of software called Armbian which is truly awesome software. It has been stripped down and customized for specific devices to the point that it is a work of art. Since it is debian based it has access to ifupdown2 natively right in the repos. The only problem with that version is that it is outdated being from the November 2015 timeline. So I want to install the latest and greatest Ifupdown2….

From my armbian device:

# Install the newest version in the standard repo
sudo apt-get update -y
sudo apt-get install ifupdown2 -qy

# Now install the newest version of ifupdown2 directly from the debian repos
wget -O /root/ifupdown2.deb http://ftp.us.debian.org/debian/pool/main/i/ifupdown2/ifupdown2_1.0~git20170314-1_all.deb && \
dpkg -i /root/ifupdown2.deb && \
rm -rfv /root/ifupdown2.deb

sudo apt-cache policy ifupdown2 | grep Installed
echo "Output above should say: \"Installed: 1.0~git20170314-1\""

# Overwrite NMCLI tool to control the Eth0 interface with Ifupdown2
echo "[keyfile] unmanaged-devices=interface-name:eth0" | sudo tee -a /etc/NetworkManager/NetworkManager.conf

echo "### Before Change ###"
sudo nmcli dev status
sudo systemctl stop NetworkManager; sudo systemctl start NetworkManager

echo "### After Change ###"
sudo nmcli dev status
echo "Eth0 should now show as \"unmanaged\" according to the output above."

 

Troubleshooting Linux Software RAID (MDADM)

Recently I had the pleasure of rebooting my NAS server for some standard “maintenance” activities i.e. kernel updates etc. Naturally when it came back up my primary large file storage RAID 6 Array did not come up automatically after the reboot.

This is always my worst fear when it comes to rebooting that box… what if the RAID doesn’t come back up and some hard drives were limping along and I didn’t know it?!

After reading syslog I found a number of errors which clearly indicated a READ error on /dev/sdc.

Sep 22 22:25:37 blackbox kernel: [264777.620821] ata3.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
Sep 22 22:25:37 blackbox kernel: [264777.624084] ata3.00: irq_stat 0x40000008
Sep 22 22:25:37 blackbox kernel: [264777.627338] ata3.00: failed command: READ FPDMA QUEUED
Sep 22 22:25:37 blackbox kernel: [264777.630585] ata3.00: cmd 60/08:90:08:08:00/00:00:00:00:00/40 tag 18 ncq 4096 in
Sep 22 22:25:37 blackbox kernel: [264777.630585] res 41/40:00:09:08:00/00:00:00:00:00/40 Emask 0x409 (media error) <F>
Sep 22 22:25:37 blackbox kernel: [264777.637060] ata3.00: status: { DRDY ERR }
Sep 22 22:25:37 blackbox kernel: [264777.640253] ata3.00: error: { UNC }
Sep 22 22:25:37 blackbox kernel: [264777.644587] ata3.00: configured for UDMA/133
Sep 22 22:25:37 blackbox kernel: [264777.644609] sd 2:0:0:0: [sdc] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 22 22:25:37 blackbox kernel: [264777.644616] sd 2:0:0:0: [sdc] tag#18 Sense Key : Medium Error [current] [descriptor] 
Sep 22 22:25:37 blackbox kernel: [264777.644622] sd 2:0:0:0: [sdc] tag#18 Add. Sense: Unrecovered read error - auto reallocate failed
Sep 22 22:25:37 blackbox kernel: [264777.644628] sd 2:0:0:0: [sdc] tag#18 CDB: Read(16) 88 00 00 00 00 00 00 00 08 08 00 00 00 08 00 00
Sep 22 22:25:37 blackbox kernel: [264777.644632] blk_update_request: I/O error, dev sdc, sector 2057

A couple sources are common for these styles of errors and I recommend troubleshooting all of them first.

  1. New SATA Cables — sometimes the old ones are of low quality, sometimes they get finicky or have dust covering the pins whatever the reason SATA cables are cheap buy some more and try those.
  2. Bad SATA Port — The specific port on your controller could be failing. If this is a port on your motherboard and you have another one available, try that one. If not consider buying a SATA controller card.
  3. BAD SATA Controller Card — These cards can be quite inexpensive in some cases. I’ve seen several of them fail in my lifetime often with spurious read errors like this one being the first symptom of a larger failure in the card.
  4. Bad Hard Drive — This one is the most obvious of course but the other things above should really be investigated first. While I’ve had more hard drive failures than any of the issues above I have also been stricken with getting a new RMA’d hard drive and having that fail to function as well due to one of the issues above being the real culprit.

One other way to investigate item #4 is with the use of the SMART utility in Linux available via `sudo apt-get install smartmontools` SMARTctl provides a TON of useful data from the SMART controller on the disk. They’ll let you know if your drive has already logged other concerning errors etc.

One of the most important sections being this:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 115
 3 Spin_Up_Time 0x0027 181 176 021 Pre-fail Always - 5941
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 951
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 049 049 000 Old_age Always - 37920
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 93
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 57
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 893
194 Temperature_Celsius 0x0022 120 109 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

My takeaway is that after 37920 gours (37920 / 24 / 365 = 4.33 Years) in operation and after already checking items 1-3 perhaps it was finally time to let this drive go.

Making that decision is half the battle, now it’s time to recover the array.

A Reconstructive RAID Cheat Sheet

It is for moments like this that I’m writing myself this guide for later.

Since I run RAID 6 with (5) 3TB drives that means I can lose 2 out of the 5 drives and still be ok. Since It looks like I’ve only lost 1 the RAID should have been able to function but for whatever reason it showed as inactive at boot time.

eric@blackbox:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdb1[1](S) sda1[7](S) sdd1[6](S) sde1[5](S)
 11720536064 blocks super 1.2
 
unused devices: <none>

Identify your Hard disks

root@blackbox:/home/eric# sudo fdisk -l | grep "2.7 TiB"
Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sde: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors

/dev/sdc is my bad drive some I’m going to remove it from the array.

root@blackbox:/home/eric# mdadm --manage /dev/md0 --remove /dev/sdc1
mdadm: hot remove failed for /dev/sdc1: No such device or address

In this case, we can see from the mdstat output above that /dev/sdc1 was not inserted into the array at boot time so the remove operation has failed. Instead, what needs to be done is that the RAID array must be rebuilt. For this I need to stop the array.

root@blackbox:/home/eric# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@blackbox:/home/eric#  cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
unused devices: <none>

Alright, let’s rebuild this puppy. I had a clean power-down event before so everything should still be mostly in order.

root@blackbox:/home/eric# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdd1 /dev/sde1
mdadm: /dev/md0 has been started with 4 drives (out of 5).
root@blackbox:/home/eric# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid6 sda1[7] sdd1[6] sde1[5] sdb1[1]
 8790400512 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/4] [UU_UU]
 
unused devices: <none>
root@blackbox:/home/eric# sudo mount /dev/md0 /media/Manta_Array/

And now with the array remounted and a new drive on order. Life is back to normal.

Some Additional Commands I Find Useful:

Detect Present State and write it to your RAID configuration file

mdadm --verbose --detail --scan > /etc/mdadm.conf

Assemble Existing Arrays by UUID (Optional) if possible

mdadm --assemble --scan --uuid=f6ff12cd:86a8e3fb:89bc0f58:ad15e3e2

Learn about RAID info stored on each individual drive (Superblock)

mdadm --examine /dev/sda1

 

Adding New Fonts in Bulk to Ubuntu 16.04

This process should also work for 12.04 and 14.04.

Create a new directory under /usr/share/fonts

sudo mkdir /usr/share/fonts/opentype/newfonts

Place all OTF or TTF files in that directory.

Run the font-caching utility to fix permissions on these new fonts and make them available to applications immediately.

sudo chmod -R 655 /usr/share/fonts
sudo fc-cache -fv

Other Methods

There are other methods available on modern Ubuntu as well. For individual fonts you can just double-click on them and click the “install” option in the upper right. Or use a purpose-built program like font-manager.

sudo apt-get install font-manager

Using Active/Backup Bonding (mode 1) with Ifupdown2

Ifupdown2 is a very useful interface configuration utility with tons of enhancements over the stock utility ifupdown. It was built with a specific initial use-case in mind which is for use on network operating systems (NOS) like Cumulus Linux. Cumulus requires LACP support as the primary bonding method. Other modes like active-backup (mode 1) were not initially fully implemented if ifupdown2. This is changing however; CM-14985 brings support for the bond-primary keyword and will be present in the next release of Cumulus Linux and the next version of Ifupdown2.

To hold you over until then here’s a workaround I’ve been using on my server at home running Ifupdown2 for performing active/backup bonding. Writing the sys file directly can provide the same behavior.

auto lo
iface lo inet loopback

auto enp4s0
iface enp4s0
 alias Motherboard Ethernet 
 mtu 9194

auto enxf01e341f95
iface enxf01e341f95
 alias USB3 Ethernet
 mtu 9194

auto bond0
iface bond0
 alias ActiveBackup Uplink
 bond-mode active-backup
 bond-slaves enxf01e341f95 enp4s0
 address 192.168.1.10/24
 gateway 192.168.1.1
 mtu 9194
 pre-up echo enp4s0 > /sys/class/net/bond0/bonding/primary