Browsed by
Category: Ceph

Deploy Ceph Rados Gateway on Docker Swarm for Proxmox Cluster

Deploy Ceph Rados Gateway on Docker Swarm for Proxmox Cluster

I want to use the features exposed by the Ceph Rados Gateway (RGW). While it is possible to install this directly on the Proxmox nodes, it is not supported.

I wondered if I could run the gateway on Docker Swarm. The long story is that I want to try NFS via RGW as an alternative to CephFS (which has been a bit of a pain to manage the past). It seems that typically you run multiple instances of RGW, but in this case Swarm already provides HA so perhaps I only need one.

The first official docker image (ceph/radosgw) I found for RGW was ancient – two years old! Not encouraging but I tried it anyway. This choked with:

connect protocol feature mismatch, my ffffffffffffff < peer 4fddff8eea4fffb missing 400000000000000

Well that's a clear as mud way of saying that my Ceph and RGW versions didn't match. Turns out that Ceph don't maintain the RGW image and expect people to use an all in one ceph/daemon image. Let's try again:

$ docker service create --name radosgw \
     --mount type=bind,src=/data/docker/ceph/etc,dst=/etc/ceph \
     --mount type=bind,src=/data/docker/ceph/lib,dst=/var/lib/ceph/ \
     -e RGW_CIVETWEB_PORT=7480 \
     -e RGW_NAME=proxceph \
     --publish 7480:7480 \
     --replicas=1 \
 ceph/daemon rgw
overall progress: 0 out of 1 tasks 
1/1: preparing 
verify: Detected task failure 

This never stabilizes. This is in the logs:

2018-06-25 05:16:14  / ERROR- /var/lib/ceph/bootstrap-rgw/ceph.keyring must exist. You can extract it from your current monitor by running 'ceph auth get client.bootstrap-rgw -o /var/lib/ceph/bootstrap-rgw/ceph.keyring',

This file was auto-generated by the PVE Ceph installation, so copy it to the path exposed to the docker service.

2018-06-25 05:23:37  / SUCCESS
exec: PID 197: spawning /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.proxceph -k /var/lib/ceph/radosgw/ceph-rgw.proxceph/keyring --rgw-socket-path= --rgw-zonegroup= --rgw-zone= --rgw-frontends=civetweb port=
2018-06-25 05:23:37.584 7fcccee8a8c0  0 framework: civetweb
2018-06-25 05:23:37.584 7fcccee8a8c0  0 framework conf key: port, val:
2018-06-25 05:23:37.588 7fcccee8a8c0  0 deferred set uid:gid to 167:167 (ceph:ceph)
2018-06-25 05:23:37.588 7fcccee8a8c0  0 ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable), process (unknown), pid 197
2018-06-25 05:23:49.100 7fcccee8a8c0  0 starting handler: civetweb
2018-06-25 05:23:49.116 7fcccee8a8c0  1 mgrc service_daemon_register rgw.proxceph metadata {arch=x86_64,ceph_release=mimic,ceph_version=ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable),ceph_version_short=13.2.0,cpu=Common KVM processor,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=civetweb port=,frontend_type#0=civetweb,hostname=e326bccb3712,kernel_description=#154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018,kernel_version=4.4.0-128-generic,mem_swap_kb=4190204,mem_total_kb=4046012,num_handles=1,os=Linux,pid=197,zone_id=f2928dc9-3983-46ff-9da9-2987f3639bb6,zone_name=default,zonegroup_id=16963e86-4e7d-4152-99f0-c6e9ae4596a4,zonegroup_name=default}

It looks like RGW auto created some pools. This was expected.

# ceph osd lspools
[..], 37 .rgw.root,38 default.rgw.control,39 default.rgw.meta,40 default.rgw.log,

Now I need to create a user that can use the RGW. Get a shell on the container:

[[email protected] /]# radosgw-admin user create --uid="media" --display-name="media"
    "user_id": "media",
    "display_name": "media",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "auid": 0,
    "subusers": [],
    "keys": [
            "user": "media",
            "access_key": "XXXXXXXX",
            "secret_key": "XXXXXXXXXXXXXXXXXXX"
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []

The next step is a little beyond the scope of this "guide". I already have haproxy in place with a valid wildcard letsencrypt cert. I set all *bucket*.my.domain to point to RGW. For this to work properly you need to set

          rgw dns name = my.domain 

in your ceph.conf.

Let's test everything works:

$ cat
import boto.s3.connection

access_key = 'JU1P3CIATBP1IK297D3H'
secret_key = 'sV0dGfVSbClQFvbCUM22YivwcXyUmyQEOBqrDsy6'
conn = boto.connect_s3(
        host='', port=443,
        is_secure=True, calling_format=boto.s3.connection.OrdinaryCallingFormat(),

bucket = conn.create_bucket('my-test-bucket')
for bucket in conn.get_all_buckets():
    print "{name} {created}".format(,

N.B. is substituted for the real hostname with a valid cert directed to rgw on my internal network. Also note that is_secure is set to true as I do have SSL termination at haproxy.

The output:

$ python 
my-test-bucket 2018-06-25T06:01:19.258Z


Ceph, SolarFlare and Proxmox – slow requests are blocked

Ceph, SolarFlare and Proxmox – slow requests are blocked

Are you seeing lots of `slow requests are blocked` errors during high throughput on your Ceph storage?

We were experiencing serious issues on two supermicro nodes with IOMMU enabled (Keywords: dmar dma pte vpfn) but even on our ASRack C2750 system things weren’t behaving as they should.

We were tearing our hair out trying to figure out what was going on. Especially as we had been using my Solarflare Dual SFP+ 10GB NICs for non-ceph purposes for years.

The answer in this case was to manually install the sfc driver from Solarflare’s website (kudos to solarflare for providing active driver releases covering 5+ year old hardware btw).

Kernel: 4.15.17-2-pve

Check existing driver:

$ modinfo sfc
version:        4.1

Download the driver:

Install alien, kernel headers and dkms:

apt-get install alien pve-headers dkms

Extract the RPM and convert to .deb:

alien -c sfc-dkms-

Build and install:

dpkg -i sfc-dkms_4.13.1.1034-1_all.deb


Check driver was updated correctly:


After this we experienced no further slow request warnings or timed out file transfers even under intense sustained IO.

Move a Ceph OSD Journal

Move a Ceph OSD Journal

Here are the steps to move a Ceph OSD journal. I previously described how to setup a journal partition here. In my case I am moving the journal on OSD 8. You can get lookup the correct UUID using: ls -l /dev/disk/by-partuuid/

Don’t forget to set (and unset) noout so Ceph doesn’t start rebalancing when the OSD in question temporarily dissappears.

[email protected]:$ ceph osd set noout
set noout
[email protected]:$ systemctl stop [email protected]
[email protected]:$ ceph-osd -i 8 --flush-journal
2017-04-25 23:16:38.445663 7f93bd7f1800 -1 flushed journal /var/lib/ceph/osd/ceph-8/journal for object store /var/lib/ceph/osd/ceph-8
[email protected]:$ rm -f /var/lib/ceph/osd/ceph-8/journal
[email protected]:$ ln -s /dev/disk/by-partuuid/5bc920ad-20ce-4e07-b879-f8d32556c65a /var/lib/ceph/osd/ceph-8/journal
[email protected]:$ echo "5bc920ad-20ce-4e07-b879-f8d32556c65a" > /var/lib/ceph/osd/ceph-8/journal_uuid
[email protected]:$ ceph-osd -i 8 --mkjournal
2017-04-25 23:21:06.353543 7f0eaf792800 -1 created new journal /var/lib/ceph/osd/ceph-8/journal for object store /var/lib/ceph/osd/ceph-8
[email protected]:$ systemctl start [email protected]
[email protected]:$ ceph osd unset noout
unset noout

Issue ceph osd tree to make sure everything is up and in and you are good to go.

Benchmarking Ceph on a Two Node Proxmox Cluster

Benchmarking Ceph on a Two Node Proxmox Cluster

It is inadvisable to run Ceph on two nodes! That said I’ve been using a two node Ceph cluster as my primary data store for several weeks now.

In this post we look at the relative read and write performance of replicated and non-replicated Ceph pools using Rados Bench and from VM Guests using various backends. We’ll start with the results – the details of how we generated them are included after the break.

Read More Read More

Adding a new Ceph OSD to Proxmox

Adding a new Ceph OSD to Proxmox

In this post I describe the process to add a new OSD to an existing Promox Ceph cluster including the placement of the OSD journal on a dedicated SSD. Let’s start by checking the drive health.

[email protected]:$ smartctl -a /dev/sdc | grep -i _sector
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

Read More Read More