Browsed by
Category: Docker

Pi-Hole on Docker Swarm (behind SSL proxy)

Pi-Hole on Docker Swarm (behind SSL proxy)

This is my simple config for running Pi-Hole on Docker Swarm. pfsense is configured as a DNS forwarder pulling from three dockerswarm nodes. I only run one instance of Pi-Hole (they need to lock the sqlite db), but docker swarm takes care of availability/resiliency.

As I hit Pi-Hole through an SSL terminating proxy I set the ServerIP as 0.0.0.0. This resolves blocked domains to 0.0.0.0 with no major side effects.

docker service create --name pihole \
    --mount type=bind,src=/data/docker/pihole/pihole,dst=/etc/pihole \
    --mount type=bind,src=/data/docker/pihole/dnsmasq.d,dst=/etc/dnsmasq.d \
    --replicas=1 \
    -e ServerIP=0.0.0.0 \
    -e VIRTUAL_HOST=pihole.my.domain \
    -e WEBPASSWORD=myPassword \
    --publish published=9053,target=80,protocol=tcp \
    --publish published=53,target=53,protocol=tcp \
    --publish published=53,target=53,protocol=udp \
     diginc/pi-hole:debian_dev
Deploy Ceph Rados Gateway on Docker Swarm for Proxmox Cluster

Deploy Ceph Rados Gateway on Docker Swarm for Proxmox Cluster

I want to use the features exposed by the Ceph Rados Gateway (RGW). While it is possible to install this directly on the Proxmox nodes, it is not supported.

I wondered if I could run the gateway on Docker Swarm. The long story is that I want to try NFS via RGW as an alternative to CephFS (which has been a bit of a pain to manage the past). It seems that typically you run multiple instances of RGW, but in this case Swarm already provides HA so perhaps I only need one.

The first official docker image (ceph/radosgw) I found for RGW was ancient – two years old! Not encouraging but I tried it anyway. This choked with:

connect protocol feature mismatch, my ffffffffffffff < peer 4fddff8eea4fffb missing 400000000000000

Well that's a clear as mud way of saying that my Ceph and RGW versions didn't match. Turns out that Ceph don't maintain the RGW image and expect people to use an all in one ceph/daemon image. Let's try again:

$ docker service create --name radosgw \
     --mount type=bind,src=/data/docker/ceph/etc,dst=/etc/ceph \
     --mount type=bind,src=/data/docker/ceph/lib,dst=/var/lib/ceph/ \
     -e RGW_CIVETWEB_PORT=7480 \
     -e RGW_NAME=proxceph \
     --publish 7480:7480 \
     --replicas=1 \
 ceph/daemon rgw
wzj49gor6tfffs3uv3mdyy9sd
overall progress: 0 out of 1 tasks 
1/1: preparing 
verify: Detected task failure 

This never stabilizes. This is in the logs:

2018-06-25 05:16:14  /entrypoint.sh: ERROR- /var/lib/ceph/bootstrap-rgw/ceph.keyring must exist. You can extract it from your current monitor by running 'ceph auth get client.bootstrap-rgw -o /var/lib/ceph/bootstrap-rgw/ceph.keyring',

This file was auto-generated by the PVE Ceph installation, so copy it to the path exposed to the docker service.

2018-06-25 05:23:37  /entrypoint.sh: SUCCESS
exec: PID 197: spawning /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.proxceph -k /var/lib/ceph/radosgw/ceph-rgw.proxceph/keyring --rgw-socket-path= --rgw-zonegroup= --rgw-zone= --rgw-frontends=civetweb port=0.0.0.0:7480
2018-06-25 05:23:37.584 7fcccee8a8c0  0 framework: civetweb
2018-06-25 05:23:37.584 7fcccee8a8c0  0 framework conf key: port, val: 0.0.0.0:7480
2018-06-25 05:23:37.588 7fcccee8a8c0  0 deferred set uid:gid to 167:167 (ceph:ceph)
2018-06-25 05:23:37.588 7fcccee8a8c0  0 ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable), process (unknown), pid 197
2018-06-25 05:23:49.100 7fcccee8a8c0  0 starting handler: civetweb
2018-06-25 05:23:49.116 7fcccee8a8c0  1 mgrc service_daemon_register rgw.proxceph metadata {arch=x86_64,ceph_release=mimic,ceph_version=ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable),ceph_version_short=13.2.0,cpu=Common KVM processor,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=civetweb port=0.0.0.0:7480,frontend_type#0=civetweb,hostname=e326bccb3712,kernel_description=#154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018,kernel_version=4.4.0-128-generic,mem_swap_kb=4190204,mem_total_kb=4046012,num_handles=1,os=Linux,pid=197,zone_id=f2928dc9-3983-46ff-9da9-2987f3639bb6,zone_name=default,zonegroup_id=16963e86-4e7d-4152-99f0-c6e9ae4596a4,zonegroup_name=default}

It looks like RGW auto created some pools. This was expected.

# ceph osd lspools
[..], 37 .rgw.root,38 default.rgw.control,39 default.rgw.meta,40 default.rgw.log,

Now I need to create a user that can use the RGW. Get a shell on the container:

[[email protected] /]# radosgw-admin user create --uid="media" --display-name="media"
{
    "user_id": "media",
    "display_name": "media",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "auid": 0,
    "subusers": [],
    "keys": [
        {
            "user": "media",
            "access_key": "XXXXXXXX",
            "secret_key": "XXXXXXXXXXXXXXXXXXX"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

The next step is a little beyond the scope of this "guide". I already have haproxy in place with a valid wildcard letsencrypt cert. I set all *bucket*.my.domain to point to RGW. For this to work properly you need to set

          rgw dns name = my.domain 

in your ceph.conf.

Let's test everything works:

$ cat s3test.py
import boto.s3.connection

access_key = 'JU1P3CIATBP1IK297D3H'
secret_key = 'sV0dGfVSbClQFvbCUM22YivwcXyUmyQEOBqrDsy6'
conn = boto.connect_s3(
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        host='rgw.my.domain', port=443,
        is_secure=True, calling_format=boto.s3.connection.OrdinaryCallingFormat(),
       )

bucket = conn.create_bucket('my-test-bucket')
for bucket in conn.get_all_buckets():
    print "{name} {created}".format(
        name=bucket.name,
        created=bucket.creation_date,
    )

N.B. rgw.my.domain is substituted for the real hostname with a valid cert directed to rgw on my internal network. Also note that is_secure is set to true as I do have SSL termination at haproxy.

The output:

$ python s3test.py 
my-test-bucket 2018-06-25T06:01:19.258Z

Cool!

Attaching Docker Swarm Services to an Overlay Network

Attaching Docker Swarm Services to an Overlay Network

When I originally configured Prometheus with a variety of exporters I had it scraping ports on a specific docker swarm host. This is dangerous as if that host goes down the underlying service will pop back up on a different host but Prometheus won’t be able to scrape it. I considered using haproxy to round robin onto the docker swarm nodes, but Kubernetes can resolve services by service name – is there no way to do this in Docker Swarm?

There is, but unlike Kubernetes the services can’t resolve each other by default. We must create a specific network and attach the services to it.

Before:

/prometheus $ nslookup unifi_exporter
Server:    127.0.0.11
Address 1: 127.0.0.11

nslookup: can't resolve 'unifi_exporter'

Create overlay network:

sudo docker network create -d overlay monitoring
tb3iw12k7xaw5olz7rasdcnm0

Redeploy Prometheus on network:

docker service create --replicas 1 --name prometheus \
    --mount type=bind,source=/data/docker/prometheus/config/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
    --mount type=bind,src=/data/docker/prometheus/data,dst=/prometheus \
    --publish published=9090,target=9090,protocol=tcp \
    --network monitoring \
    prom/prometheus

Redeploy our exporter, this time attached to the overlay network. Note we no longer need to publish a port.

docker service create --replicas 1 --name unifi_exporter \
    --mount type=bind,src=/data/docker/unifi-exporter/config.yml,dst=/config.yml \
    --mount type=bind,src=/etc/ssl,dst=/etc/ssl,readonly \
    --replicas=1 \
    --network monitoring \
    louisvernon/unifi_exporter:0.4.0-18-g85455df -config.file=/config.yml

Confirm Prometheus can resolve the exporter by service name:

/prometheus $ nslookup unifi_exporter
Server:    127.0.0.11
Address 1: 127.0.0.11

Name:      unifi_exporter
Address 1: 10.0.1.15
Unifi to Grafana (using Prometheus and unifi_exporter)

Unifi to Grafana (using Prometheus and unifi_exporter)

Documenting the process of getting this up and running. We already had Prometheus and Grafana running on our docker swarm cluster (we promise to document this all one day).

There was only one up to date image of unifi_exporter in DockerHub and it had no documentation so we were not comfortable using it.

1) Download, build and push unifi_exporter.

$ git clone [email protected]:mdlayher/unifi_exporter.git
...
$ cd unifi_exporter
$ sudo docker build -t louisvernon/unifi_exporter:$(git describe --tags) . # yields a tag like 0.4.0-18-g85455df
$ sudo docker push louisvernon/unifi_exporter:$(git describe --tags)

2) Create read only admin user for unifi_exporter service:

3) Create config.yml on storage mounted on dockerswarm node. In our case we have a glusterfs volume mounted across all nodes. If you are using the self-signed cert on your unifi controller then you will need to set insecure to true.

$ $ cat /data/docker/unifi-exporter/config.yml 
listen:
  address: :9130
  metricspath: /metrics
unifi:
  address: https://unifi.vern.space
  username: unifiexporter
  password: random_password
  site: Default 
  insecure: false
  timeout: 5s

4) Deploy to docker swarm. The docker image does not contain any trusted certs, so we mounted the host certs as readonly.

$ docker service create --replicas 1 --name unifi_exporter \
    --mount type=bind,src=/data/docker/unifi-exporter/config.yml,dst=/config.yml \
    --mount type=bind,src=/etc/ssl,dst=/etc/ssl,readonly \
    --publish 9130:9130 \
    --replicas=1 \
    louisvernon/unifi_exporter:0.4.0-18-g85455df -config.file=/config.yml

5) You should see something like this from the logs (we use portainer to quickly inspect our services).

2018/06/12 01:10:47 [INFO] successfully authenticated to UniFi controller
2018/06/12 01:10:47 Starting UniFi exporter on ":9130" for site(s): Default

First time around (before we bind mounted /etc/ssl) we had an x509 error due to the missing trusted certs..

6) Add unifi_exporter as a new target for prometheus.

$ cat /data/docker/prometheus/config/prometheus.yml
...
  - job_name: 'unifi_exporter'
    static_configs:
      - targets: ['dockerswarm:9130']
        labels:
          alias: unifi_exporter
...

7) Point your browser at http://dockerswarm:9130/metrics and make sure you see stats. In our case the payload was 267 lines.

8) Restart the prometheus service: `docker service update –force prometheus`

9) Hop on over to prometheus to make sure the new target is listed and UP: http://dockerswarm:9090/targets

10) Finally we import the dashboard into Grafana. Our options are a little sparse right now, but this dashboard gives us somewhere to start. we made some tweaks to this to make it multi-AP friendly with some some extra stats:
Unifi-1516201148080

The result:

Quick GlusterFS Volume Creation Steps

Quick GlusterFS Volume Creation Steps

Here are some quick steps to create a three drive three node replicated distributed GlusterFS volume for use by docker swarm. We are not using LVM for this quick test so we lose features like snapshotting.

1) Create brick mount point on each node

mkdir -p /data/glusterfs/dockerswarm/brick1

2) Format the drives with xfs

 mkfs.xfs -f -i size=512 /dev/sd_

3) Add drives to fstab

/dev/disk/by-id/ata_ /data/glusterfs/dockerswarm/brick1  xfs rw,inode64,noatime,nouuid      1 2

4) Mount

mount /data/glusterfs/dockerswarm/brick1

5) Create volume mount point under brick mount point*

mkdir -p /data/glusterfs/dockerswarm/brick1/brick

5) Create volume

 $gluster volume create dockerswarm replica 3 transport tcp server1:/data/glusterfs/dockerswarm/brick1/brick server2:/data/glusterfs/dockerswarm/brick1/brick server3:/data/glusterfs/dockerswarm/brick1/brick 

volume create: dockerswarm: success: please start the volume to access data

* The reason we mount the volume to a directory inside the brick mount is to ensure the brick has been mounted on the host. If not the brick directory will not be present and gluster will act as if the brick is unavailable.

Docker to solve SuperMicro IPMI iKVM – JavaWS Problems

Docker to solve SuperMicro IPMI iKVM – JavaWS Problems

icedtea-web 1.6.2 does not seem to work with SuperMicro’s IPMI Java iKVM viewer. SuperMicro’s helpful response is to only use Oracle’s Java.

net.sourceforge.jnlp.LaunchException: Fatal: Initialization Error: Could not initialize application. The application has not been initialized, for more information execute javaws from the command line.

Even when you have the right version of Java you often have to dance through security hoops or Java versions just to get it to work.

If you have Docker installed there is a great solution that avoids installing Oracle’s Java and/or tweaking any security settings. solarkennedy has created a very nice Docker container that encapsulates everything needed to access various Java based IPMI consoles.

 docker run -p 8080:8080 solarkennedy/ipmi-kvm-docker

Now point your browser to http://localhost:8080 and voila:

You are looking at a Java enabled Firefox (and OS) through a web VNC client accessed from the Docker host. Not bad!

MariaDB Crashing Under Docker on Google F1 Micro Instance

MariaDB Crashing Under Docker on Google F1 Micro Instance

This website is being hosted on a Google F1 Micro Instance with 600MB of memory. A few days after enabling Jetpack I noticed the website had a DB connection error.

First I checked the running containers: docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                             PORTS                   NAMES
f1bcda68c62d        wordpress           "docker-entrypoint..."   6 hours ago         Up 6 hours                         10.128.0.2:80->80/tcp   dockerwordpress_wordpress_1
55bb57dfdc8d        mariadb             "docker-entrypoint..."   6 hours ago         Restarting (1) About an hour ago                           dockerwordpress_mariadb_1

Then I viewed the logs for the restarting container: docker logs dockerwordpress_mariadb_1

2017-04-04  8:28:46 139747895928768 [Note] mysqld (mysqld 10.1.21-MariaDB-1~jessie) starting as process 1 ...
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: The InnoDB memory heap is disabled
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using Linux native AIO
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Using SSE crc32 instructions
2017-04-04  8:28:46 139747895928768 [Note] InnoDB: Initializing buffer pool, size = 256.0M
InnoDB: mmap(281542656 bytes) failed; errno 12
2017-04-04  8:28:46 139747895928768 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2017-04-04  8:28:46 139747895928768 [ERROR] Plugin 'InnoDB' init function returned error.
2017-04-04  8:28:46 139747895928768 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 128663552 bytes)
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 96485376 bytes)
2017-04-04  8:28:47 139747895928768 [ERROR] mysqld: Out of memory (Needed 72351744 bytes)
2017-04-04  8:28:47 139747895928768 [Note] Plugin 'FEEDBACK' is disabled.
2017-04-04  8:28:47 139747895928768 [ERROR] Unknown/unsupported storage engine: InnoDB
2017-04-04  8:28:47 139747895928768 [ERROR] Aborting

The MariaDB process was abruptly terminated before the failed restarts and there is no log output to show it shutting down. The guilty memory hog on the system seems to be the dockerwordpress container.

To stop this from happening again I made two changes:

  • First I modified the docker-compose.yml to constrain the memory used by dockerwordpress by adding the mem_limit directive
wordpress:
    image: wordpress
    restart: always
    mem_limit: 200MB
    links:
     - mariadb:mysql
    environment:
     - WORDPRESS_DB_PASSWORD=db_password
    ports:
     - "80:80"
    volumes:
     - /site_data/code:/code
     - /site_data/html:/var/www/html
mariadb:
    image: mariadb
    restart: always
    environment:
     - MYSQL_ROOT_PASSWORD=db_password
     - MYSQL_DATABASE=wordpress
    volumes:
     - /site_data/database:/var/lib/mysql

This seems to have had no major negative effects on Apache.

  • Next (just to be safe) I enabled 1024MB of disk swap. By default Docker will allow a container to swap up to twice the memory limit of the container, so in this case 400MB.
[email protected]:$ dd if=/dev/zero of=/swap bs=1M count=1024
[email protected]:$ mkswap /swap
[email protected]:$ swapon /swap

You can check swap is available and working: free -m:

             total       used       free     shared    buffers     cached
Mem:           588        543         44         43         13        170
-/+ buffers/cache:        359        228
Swap:         1023         32        991

Finally, after bringing up the WordPress and MariaDB containers you can check their memory utilization: docker stats:

CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O             BLOCK I/O           PIDS
05cee6b27a54 0.00% 177.4 MiB / 200 MiB 88.72% 6.31 MB / 1.33 MB 43 MB / 24.6 kB 11
c6658a81bd3a 0.03% 119.1 MiB / 588.5 MiB 20.23% 799 kB / 5.9 MB 50.2 MB / 89.8 MB 29