Move a Ceph OSD Journal

Move a Ceph OSD Journal

Here are the steps to move a Ceph OSD journal. I previously described how to setup a journal partition here. In my case I am moving the journal on OSD 8. You can get lookup the correct UUID using: ls -l /dev/disk/by-partuuid/

Don’t forget to set (and unset) noout so Ceph doesn’t start rebalancing when the OSD in question temporarily dissappears.

[email protected]:$ ceph osd set noout
set noout
[email protected]:$ systemctl stop [email protected]
[email protected]:$ ceph-osd -i 8 --flush-journal
2017-04-25 23:16:38.445663 7f93bd7f1800 -1 flushed journal /var/lib/ceph/osd/ceph-8/journal for object store /var/lib/ceph/osd/ceph-8
[email protected]:$ rm -f /var/lib/ceph/osd/ceph-8/journal
[email protected]:$ ln -s /dev/disk/by-partuuid/5bc920ad-20ce-4e07-b879-f8d32556c65a /var/lib/ceph/osd/ceph-8/journal
[email protected]:$ echo "5bc920ad-20ce-4e07-b879-f8d32556c65a" > /var/lib/ceph/osd/ceph-8/journal_uuid
[email protected]:$ ceph-osd -i 8 --mkjournal
2017-04-25 23:21:06.353543 7f0eaf792800 -1 created new journal /var/lib/ceph/osd/ceph-8/journal for object store /var/lib/ceph/osd/ceph-8
[email protected]:$ systemctl start [email protected]
[email protected]:$ ceph osd unset noout
unset noout

Issue ceph osd tree to make sure everything is up and in and you are good to go.

Docker to solve SuperMicro IPMI iKVM – JavaWS Problems

Docker to solve SuperMicro IPMI iKVM – JavaWS Problems

icedtea-web 1.6.2 does not seem to work with SuperMicro’s IPMI Java iKVM viewer. SuperMicro’s helpful response is to only use Oracle’s Java.

net.sourceforge.jnlp.LaunchException: Fatal: Initialization Error: Could not initialize application. The application has not been initialized, for more information execute javaws from the command line.

Even when you have the right version of Java you often have to dance through security hoops or Java versions just to get it to work.

If you have Docker installed there is a great solution that avoids installing Oracle’s Java and/or tweaking any security settings. solarkennedy has created a very nice Docker container that encapsulates everything needed to access various Java based IPMI consoles.

 docker run -p 8080:8080 solarkennedy/ipmi-kvm-docker

Now point your browser to http://localhost:8080 and voila:

You are looking at a Java enabled Firefox (and OS) through a web VNC client accessed from the Docker host. Not bad!

MariaDB Crashing Under Docker on Google F1 Micro Instance

MariaDB Crashing Under Docker on Google F1 Micro Instance

This website is being hosted on a Google F1 Micro Instance with 600MB of memory. A few days after enabling Jetpack I noticed the website had a DB connection error.

First I checked the running containers: docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                             PORTS                   NAMES
f1bcda68c62d        wordpress           "docker-entrypoint..."   6 hours ago         Up 6 hours                         10.128.0.2:80->80/tcp   dockerwordpress_wordpress_1
55bb57dfdc8d        mariadb             "docker-entrypoint..."   6 hours ago         Restarting (1) About an hour ago                           dockerwordpress_mariadb_1

Then I viewed the logs for the restarting container: docker logs dockerwordpress_mariadb_1

2017-04-04  8:28:46 139747895928768 [Note] mysqld (mysqld 10.1.21-MariaDB-1~jessie) starting as process 1 ...
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: The InnoDB memory heap is disabled
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using Linux native AIO
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using SSE crc32 instructions
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Initializing buffer pool, size = 256.0M
InnoDB: mmap(281542656 bytes) failed; errno 12
2017-04-04  8:28:46 139747895928768 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2017-04-04  8:28:46 139747895928768 [ERROR] Plugin 'InnoDB' init function returned error.
2017-04-04  8:28:46 139747895928768 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 128663552 bytes)
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 96485376 bytes)
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 72351744 bytes)
2017-04-04  8:28:47 139747895928768 [Note] Plugin 'FEEDBACK' is disabled.
2017-04-04  8:28:47 139747895928768 [ERROR] Unknown/unsupported storage engine: InnoDB
2017-04-04  8:28:47 139747895928768 [ERROR] Aborting

The MariaDB process was abruptly terminated before the failed restarts and there is no log output to show it shutting down. The guilty memory hog on the system seems to be the dockerwordpress container.

To stop this from happening again I made two changes:

  • First I modified the docker-compose.yml to constrain the memory used by dockerwordpress by adding the mem_limit directive
wordpress:
    image: wordpress
    restart: always
    mem_limit: 200MB
    links:
     - mariadb:mysql
    environment:
     - WORDPRESS_DB_PASSWORD=db_password
    ports:
     - "80:80"
    volumes:
     - /site_data/code:/code
     - /site_data/html:/var/www/html
mariadb:
    image: mariadb
    restart: always
    environment:
     - MYSQL_ROOT_PASSWORD=db_password
     - MYSQL_DATABASE=wordpress
    volumes:
     - /site_data/database:/var/lib/mysql

This seems to have had no major negative effects on Apache.

  • Next (just to be safe) I enabled 1024MB of disk swap. By default Docker will allow a container to swap up to twice the memory limit of the container, so in this case 400MB.
[email protected]:$ dd if=/dev/zero of=/swap bs=1M count=1024
[email protected]:$ mkswap /swap
[email protected]:$ swapon /swap

You can check swap is available and working: free -m:

             total       used       free     shared    buffers     cached
Mem:           588        543         44         43         13        170
-/+ buffers/cache:        359        228
Swap:         1023         32        991

Finally, after bringing up the WordPress and MariaDB containers you can check their memory utilization: docker stats:

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O             BLOCK I/O           PIDS
05cee6b27a54 0.00% 177.4 MiB / 200 MiB 88.72% 6.31 MB / 1.33 MB 43 MB / 24.6 kB 11
c6658a81bd3a 0.03% 119.1 MiB / 588.5 MiB 20.23% 799 kB / 5.9 MB 50.2 MB / 89.8 MB 29
Benchmarking Ceph on a Two Node Proxmox Cluster

Benchmarking Ceph on a Two Node Proxmox Cluster

It is inadvisable to run Ceph on two nodes! That said I’ve been using a two node Ceph cluster as my primary data store for several weeks now.

In this post we look at the relative read and write performance of replicated and non-replicated Ceph pools using Rados Bench and from VM Guests using various backends. We’ll start with the results – the details of how we generated them are included after the break.

Read More Read More

Adding a new Ceph OSD to Proxmox

Adding a new Ceph OSD to Proxmox

In this post I describe the process to add a new OSD to an existing Promox Ceph cluster including the placement of the OSD journal on a dedicated SSD. Let’s start by checking the drive health.

[email protected]:$ smartctl -a /dev/sdc | grep -i _sector
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

Read More Read More

libvirt – adding storage pools manually

libvirt – adding storage pools manually

I use direct disk pass-through for several of my KVM guests. I usually use Virt-Manager to set these up, but a bug in the latest version (1.2.1) made that impossible.

Fortunately it’s pretty easy to add drives using virsh. First check the existing storage pools:

$ virsh pool-list --all
Name State Autostart
-------------------------------------------
Backup active yes
BigParity inactive yes
default active yes
Parity active yes

Create a storage pool xml file. Look at the existing pools in  /etc/libvirt/storage/ for reference. Create the file locally:

$ cat Parity5TB.xml
<pool type='disk'>
<name>Parity5TB</name>
<uuid>8a4550e0-3bcf-4351-ad36-496b51737c</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
<device path='/dev/disk/by-id/ata-TOSHIBA_MD04ACA500_55F'/>
<format type='unknown'/>
</source>
<target>
<path>/dev/disk/by-id</path>
<permissions>
<mode>0711</mode>
<owner>-1</owner>
<group>-1</group>
</permissions>
</target>
</pool>

 

Note that I use /dev/disk/by-id. You can use any /dev/disk/by-* reference, but NEVER use /dev/sd* (you’ll undestand why after the first time you add or remove a drive).

Assuming it’s already formatted (I find it easiest to format on the host with gparted and pass through the pre-formatted disk) you can quickly get the uuid with blkid. Then either use /dev/disk/by-uuid, or lookup the symbolic links in the /dev/disk/by-X directory.

Add the pool to your definitions:

$ virsh pool-define Parity5TB.xml
$ virsh pool-list --all
Name State Autostart
-------------------------------------------
Backup active yes
BigParity inactive yes
default active yes
Parity active yes
Parity5TB active no

That’s it. This does not autostart the drive, or attach it to any guests, but you can still do this through virt-manager.

iperf for testing HDMI balun runs

iperf for testing HDMI balun runs

It seems reasonable that jitter will be a big factor in the performance of CAT5e/6 runs used for HDMI baluns. I ran two high quality unshielded CAT5e cables and two high quality shielded (but ungrounded by me) CAT6 cables.

They cross over multiple power lines and unfortunately run parallel to power lines for about 11 foot of the ~ 30 foot run.

Testing Jitter:

On the server:

<pre>iperf -s -w 128k -u</pre>

This means start iperf server with a 128KB buffer in UDP mode. UDP best reflects the nature of traffic of HDMI over ethernet.

On the client:

<pre>iperf -c serverip -u -b 1000m -w 128k -t 120</pre>

This means start the client and connect to serverip, use UDP and send traffic at gigabit speeds for 120 seconds. I needed to run the test for at least 120 seconds to get consistent jitter results.

First I tested with a 3 foot CAT5e cable connected directly to the switch, this represents ‘ideal’ performance. The same short cable was connected to the ends of the long CAT5e and CAT6 runs.

[table]
[ ID], Interval, Transfer, Bandwidth, Jitter, Lost/Total Datagrams
IDEAL, 0.0-120.0 sec, 9.65 GBytes, 691 Mbits/sec, 0.013 ms, 47713/7095556 (0.67%)
STP-CAT6, 0.0-120.0 sec, 9.62 GBytes, 689 Mbits/sec, 0.015 ms, 26399/7052134 (0.37%)
CAT5e, 0.0-120.0 sec, 9.57 GBytes, 685 Mbits/sec, 0.180 ms, 108956/7100704 (1.5%)
[/table]

So there we have it. From a simple bandwidth perspective the long CAT5e run offers basically the same performance as the STP-CAT6 run, however when it comes to jitter, the CAT6 cable is marginally worse than the ideal case, while the CAT5e run is an order of magnitude worse.

Very interesting! In the end I’m sure it’s the shielding rather than the CAT6 rating that makes the STP-CAT6 superior.

Enable Built-in Wifi on Pogoplug v3/Oxnas Running Debian Squeeze

Enable Built-in Wifi on Pogoplug v3/Oxnas Running Debian Squeeze

I’ve not had much luck with the Pogoplug lottery. With archlinux EOL on the oxnas Pogoplugs, I’ve been Debian Squeeze with the latest archlinuxarm kernel: 2.6.31.6_SMP_820

Turns out my Pogoplug Biz had a built in wifi, but getting it to work wasn’t straightforward.

These are the steps I remember off the top of my head.

It’s worth installing:

[email protected]:$ sudo apt-get install pciutils iw wireless-tools

Make sure you do indeed have a PCIe wireless card:

[email protected]:$ lspci
00:00.0 Network controller: RaLink RT3090 Wireless 802.11n 1T/1R PCIe

In my case the card wasn’t loaded correctly (it didn’t show in the output of ifconfig).

Although the device didn’t show in ifconfig, it did show in iwconfig as ra1.

ifconfig ra1 up gave me a controls permission error.

I followed these instructions:

mount -t ubifs -o ro ubi0:rootfs /tmp/ce
sudo mount -t ubifs -o ro ubi0:rootfs /tmp/ce
sudo mkdir /etc/Wireless
/bin/cp -rfv /tmp/ce/etc/Wireless/RT2860STA /etc/Wireless
sudo /bin/cp -rfv /tmp/ce/etc/Wireless/RT2860STA /etc/Wireless
sudo nano /etc/udev/rules.d/70-persistent-net.rules

Nothing was working so I restarted out of frustration.

[email protected]:$ lsmod
cfg80211 85932 1 rt3390sta

Looks good. Issued:
modprobe cfg80211

Now for the first time I think I could bring the interface up without errors:
sudo ifconfig ra1 up

And I could scan for networks:

iwlist ra1 scanning

Now the most painful part, actually getting it to connect to your wireless network!

ifconfig ra1 up
sleep 3
iwpriv ra1 set WpaSupport=0
iwpriv ra1 set WirelessMode=Managed
iwpriv ra1 set WirelessMode=7
iwpriv ra1 set AuthMode=WPA2PSK
iwpriv ra1 set EncrypType=AES
iwpriv ra1 set SSID="MySSID"
iwpriv ra1 set WPAPSK="MyPassword"

sleep 1
dhcpcd ra1

Here the most important line was the WpaSupport=0. Before that the interface would come up but not connect to the access point. Note, the SSID and Password are in quotes – I saw many places saying they should be unquoted but this worked for me.

Finally, the WirelessMode=7 refers to a mixed n/g network. I’m not sure this line is even needed.

Drive Performance Under KVM Using Virtio

Drive Performance Under KVM Using Virtio

Using KVM I was experiencing erratic performance from disk I/O on my OpenMediaVault guest. Aside from the OS volume (8GB on a Vertex Plus SSD) I had 4*3TB drives:

4 * Seagate ST3000DM001
2 * Toshiba DT01ACA300

I added all disks using virt-manager:

Add Storage-Pool->disk: Physical Disk Device->Source Path:/dev/disk/by-id/[DISK_ID]

The only subtle variation in adding the drives was that the 2 * Toshibas were blank, added with Format = auto and Build Pool: Unchecked.

The Seagates had existing, but unwanted partitions. An apparent bug in virt-manager meant I could not delete the pre-existing partitions so I had to add them with Format = gpt and Build Pool: Checked.

I was under the impression that in both cases the raw drive would be presented to the guest…. so let’s take a look at the resulting performance.

Read benchmark:
hdparm -t --direct
Write Benchmark:
dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync

On Host
[table]
Model, OCZ Vertex Plus[/dev/sda], TOSHIBA DT01ACA300[/dev/sdc], Seagate ST3000DM001[/dev/sde]
Read, 221.93MB/s, 186.89MB/s, 169.22MB/s, 179.48MB/s
Write, 158.67MB/s, N/A, N/A
[/table]

On Guest
[table]
Model, OCZ Vertex Plus[/dev/vda], TOSHIBA DT01ACA300[/dev/sdc], Seagate ST3000DM001[/dev/sde]
Read, 126.23MB/s, 185.86MB/s, 98.87MB/s
Write, 122MB/s, 117.67MB/s, 88.83MB/s
[/table]

That’s a huge difference in performance between the Toshiba and the Seagate. Not only that, but the read/write performance on the Seagates was extremely unstable.

Let’s take a look at the Storage Volume configurations for these drives.

Slow Drive

<pool type='disk'>
  <name>Media1</name>
  <uuid>2c5a4e7b-6d61-9644-4162-c97cf11185e4</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <device path='/dev/disk/by-id/ata-ST3000DM001-9YN166_S1F0T0Q6'/>
    <format type='gpt'/>
  </source>
  <target>
    <path>/dev/disk/by-id</path>
    <permissions>
      <mode>0711</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>

Fast Drive

<pool type='disk'>
  <name>Backup</name>
  <uuid>b445343e-39e7-ff85-2c31-ba331ae10311</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <device path='/dev/disk/by-id/ata-TOSHIBA_DT01ACA300_Y3DBDBMGS'/>
    <format type='unknown'/>
  </source>
  <target>
    <path>/dev/disk/by-id</path>
    <permissions>
      <mode>0711</mode>
      <owner>-1</owner>
      <group>-1</group>
    </permissions>
  </target>
</pool>

Due to the quirky behaviour of virt-manager, It had seemingly recreated a nested GPT volume inside my pre-existing partition, and this abstraction was causing the performance issues.