Ceph, SolarFlare and Proxmox – slow requests are blocked

Ceph, SolarFlare and Proxmox – slow requests are blocked

Are you seeing lots of `slow requests are blocked` errors during high throughput on your Ceph storage?

We were experiencing serious issues on two supermicro nodes with IOMMU enabled (Keywords: dmar dma pte vpfn) but even on our ASRack C2750 system things weren’t behaving as they should.

We were tearing our hair out trying to figure out what was going on. Especially as we had been using my Solarflare Dual SFP+ 10GB NICs for non-ceph purposes for years.

The answer in this case was to manually install the sfc driver from Solarflare’s website (kudos to solarflare for providing active driver releases covering 5+ year old hardware btw).

Kernel: 4.15.17-2-pve

Check existing driver:

$ modinfo sfc
---
version:        4.1
---

Download the driver:
https://channel.solarflare.com/index.php/component/cognidox/?file=SF-104979-LS-37_Solarflare_NET_driver_source_DKMS.zip&task=download&format=raw&id=1945

Install alien, kernel headers and dkms:

apt-get install alien pve-headers dkms

Extract the RPM and convert to .deb:

alien -c sfc-dkms-4.13.1.1034-0.sf.1.noarch.rpm

Build and install:

dpkg -i sfc-dkms_4.13.1.1034-1_all.deb

Reboot.

Check driver was updated correctly:

---
version: 4.13.1.1034
---

After this we experienced no further slow request warnings or timed out file transfers even under intense sustained IO.

Leave a Reply

Your email address will not be published. Required fields are marked *