Adding a new Ceph OSD to Proxmox

Adding a new Ceph OSD to Proxmox

In this post I describe the process to add a new OSD to an existing Promox Ceph cluster including the placement of the OSD journal on a dedicated SSD. Let’s start by checking the drive health.

[email protected]:$ smartctl -a /dev/sdc | grep -i _sector
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

The journal will go on an SSD (/dev/sdl). I have to create a new partition on the SSD. I create the partition in the hole between sdl1 and sdl3.

[email protected]:$ parted /dev/sdl
Number Start End Size File system Name Flags
1 1049kB 5370MB 5369MB ceph journal
3 10.7GB 16.1GB 5369MB ceph journal
4 16.1GB 21.5GB 5369MB ceph journal
5 21.5GB 26.8GB 5369MB ceph journal

(parted) mkpart "ceph journal" 5370MB 10.7GB
(parted) p

Number Start End Size File system Name Flags
1 1049kB 5370MB 5369MB ceph journal
2 5370MB 10.7GB 5369MB ceph journal
3 10.7GB 16.1GB 5369MB ceph journal
4 16.1GB 21.5GB 5369MB ceph journal
5 21.5GB 26.8GB 5369MB ceph journal

It’s not enough to create the partition. ceph:ceph must own the partition:

[email protected]:$ chown ceph:ceph /dev/sdl2
[email protected]:$ ls -l /dev/sdl*
brw-rw---- 1 root disk 8, 176 Mar 27 17:50 /dev/sdl
brw-rw---- 1 ceph ceph 8, 177 Mar 27 18:04 /dev/sdl1
brw-rw---- 1 ceph ceph 8, 178 Mar 27 17:47 /dev/sdl2
brw-rw---- 1 ceph ceph 8, 179 Mar 27 18:04 /dev/sdl3
brw-rw---- 1 ceph ceph 8, 180 Mar 27 18:04 /dev/sdl4
brw-rw---- 1 ceph ceph 8, 181 Mar 27 18:04 /dev/sdl5

Ceph looks for a partition type of: 45b0969e-9b03-4f30-b4c6-b4b80ceff106

[email protected]:$ blkid -o udev -p /dev/sdl2 | grep TYPE
ID_PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4

Let’s fix the partition type.

[email protected]:$ sgdisk -t 2:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 /dev/sdl
[email protected]:$ blkid -o udev -p /dev/sdl2 | grep TYPE
ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106

Zap the existing disk. I’m not sure why we see these errors. I’ve also dd’d /dev/zero to the root of the drive which is cleaner than ceph-disk zap apparently.

[email protected]:$ ceph-disk zap /dev/sdc
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.
Warning! One or more CRCs don't match. You should repair the disk!
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.

Finally add the OSD:

[email protected]:$ pveceph createosd /dev/sdc -journal_dev /dev/sdl2
create OSD on /dev/sdc (xfs)
using device '/dev/sdl2' for journal
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=183141597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=732566385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=357698, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.

Looks good! I see in the Proxmox GUI Ceph is re-balancing onto the new OSD. Let’s do some quick sanity checking.

[email protected]:$ ceph osd stat
osdmap eXXX: 12 osds: 12 up, 12 in; 307 remapped pgs
flags noout,sortbitwise,require_jewel_osds

12 of 12 osds up and in. Looks good!

[email protected]:$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 26.31421 root default
-2 12.69534 host myhost1
0 0.44969 osd.0 up 1.00000 1.00000
1 0.44969 osd.1 up 1.00000 1.00000
2 0.44969 osd.2 up 1.00000 1.00000
5 0.44969 osd.5 up 1.00000 1.00000
6 1.81360 osd.6 up 1.00000 1.00000
9 4.54149 osd.9 up 1.00000 1.00000
10 4.54149 osd.10 up 1.00000 1.00000
-3 13.61887 host myhost2
3 2.72279 osd.3 up 1.00000 1.00000
8 1.81360 osd.8 up 1.00000 1.00000
11 2.72279 osd.11 up 1.00000 1.00000
12 3.63199 osd.12 up 1.00000 1.00000
4 2.72769 osd.4 up 1.00000 1.00000

I just added an OSD to myhost2. OSD 4 is indeed the OSD that was just created. Does the fact that it’s the last one in the list of myhost2 osds correspond to that? I’ll assume so. You can always confirm by checking df.

Let’s confirm the journal is in the right place.

[email protected]:$ ls -l /var/lib/ceph/osd/ceph-4 | grep journal
lrwxrwxrwx 1 ceph ceph 58 Mar 27 18:06 journal -> /dev/disk/by-partuuid/5ea73833-978a-423d-8230-31c479b23f78
-rw-r--r-- 1 ceph ceph 37 Mar 27 18:06 journal_uuid
[email protected]:$ ls -l /dev/disk/by-partuuid/5ea73833-978a-423d-8230-31c479b23f78
lrwxrwxrwx 1 root root 10 Mar 27 18:06 /dev/disk/by-partuuid/5ea73833-978a-423d-8230-31c479b23f78 -> ../../sdl2

Great! Everything looks good.

One thought on “Adding a new Ceph OSD to Proxmox

Leave a Reply

Your email address will not be published. Required fields are marked *