Skip to content

Cluster Management

Operational commands for managing your Ceph cluster, scaling, and maintenance. Run these from the Admin node (ceph-mgr-1).

1. Host Labels

Labels tell the orchestrator (cephadm) where to place specific daemons.

View & Add Labels

bash
# View current host status
sudo ceph orch host ls

# Add 'mon' label to a host
sudo ceph orch host label add ceph-osd-1 mon

# Remove a label
sudo ceph orch host label rm ceph-osd-1 mon

2. Monitor Placement (MON)

Ceph requires an odd number of monitors (3 or 5) for quorum.

Placement Strategy

You can either place monitors on specific hosts or use labels for auto-deployment.

bash
# Option A: Explicit placement
sudo ceph orch apply mon --placement="ceph-osd-1,ceph-osd-2,ceph-osd-3"

# Option B: Label-based placement
sudo ceph orch apply mon --placement="label:mon"

3. Storage Management (OSD)

Add OSDs

bash
# View available empty disks
sudo ceph orch device ls

# Add a specific disk
sudo ceph orch daemon add osd ceph-osd-1:/dev/sdb

Remove OSDs Safely

Data Evacuation

Never pull a disk physically before evacuating data via the orchestrator.

  1. Schedule removal: sudo ceph orch osd rm <ID>
  2. Check progress: sudo ceph orch osd rm status
  3. Physical removal: Only after status is "done".

4. Maintenance Mode

Use these flags to prevent "rebalance storms" when rebooting nodes for updates.

bash
# Set maintenance flags
sudo ceph osd set noout
sudo ceph osd set norebalance
sudo ceph osd set norecover

# ... Reboot Nodes ...

# Return to normal operations
sudo ceph osd unset noout
sudo ceph osd unset norebalance
sudo ceph osd unset norecover

5. The Nuclear Option (Purge)

Irreversible Data Loss

This obliterates the entire cluster and wipes all data. Use with extreme caution.

  1. Get Cluster ID: sudo ceph fsid
  2. Destroy Cluster: sudo cephadm rm-cluster --zap-osds --fsid <FSID> --force

6. Troubleshooting Placement

Stuck Removal?

If an OSD removal is stuck, you can abort:

bash
sudo ceph orch osd rm stop <ID>
sudo ceph osd in <ID>
sudo ceph orch daemon start osd.<ID>

Unmanaged Services

Orchestration Logic

If ceph orch ls shows <unmanaged>, it means the daemons were added via imperative commands rather than declarative YAML specs. Ceph will not automatically replace these if they fail.

7. Pool Management

RADOS Pools are the logical partitions where Ceph stores data. Each pool defines its own replication or erasure coding strategy.

View Pools

bash
# List all pools
sudo ceph osd pool ls

# View detailed pool statistics (usage, objects, read/write rates)
sudo ceph osd pool stats

Create and Configure a Pool

bash
# Create a new replicated pool (e.g., name: 'my_pool', PGs: 32)
sudo ceph osd pool create my_pool 32 32 replicated

# Set the replication size (number of copies)
sudo ceph osd pool set my_pool size 3
sudo ceph osd pool set my_pool min_size 2

# Enable the pool for a specific application (cephfs, rbd, or rgw)
sudo ceph osd pool application enable my_pool cephfs

Delete a Pool

Safety Lock

By default, Ceph prevents pool deletion to avoid accidental data loss. You must explicitly allow it in the monitor configuration first. If the pool is attached to a CephFS file system, you must also detach it before deletion.

bash
# 1. Detach the pool if it's currently used by CephFS (Optional, but required if attached)
sudo ceph fs rm_data_pool cephfs my_pool

# 2. Temporarily allow pool deletion globally
sudo ceph tell mon.* injectargs '--mon-allow-pool-delete=true'

# 3. Delete the pool (requires typing the name twice and the confirmation flag)
sudo ceph osd pool rm my_pool my_pool --yes-i-really-really-mean-it

# 4. Secure the cluster again
sudo ceph tell mon.* injectargs '--mon-allow-pool-delete=false'

8. Subvolume Management

CephFS subvolumes provide isolated, quota-enforced directory trees for individual users or applications.

View Subvolumes

bash
# List all subvolumes within a specific group (e.g., 'students')
sudo ceph fs subvolume ls cephfs --group_name students

# Get the internal UUID path of a subvolume (required for client mounts)
sudo ceph fs subvolume getpath cephfs 22106050001 --group_name students

Quota Management

Quotas can be adjusted on the fly without unmounting the client.

bash
# Set or update a quota limit using bytes (e.g., 5GB = 5368709120 bytes)
sudo ceph fs subvolume resize cephfs 22106050001 5368709120 --group_name students

# Remove a quota limit (allows usage up to total cluster capacity)
sudo ceph fs subvolume resize cephfs 22106050001 infinite --group_name students

Delete a Subvolume

Data Loss

Removing a subvolume permanently deletes the directory and all data inside it. This action cannot be undone.

bash
# Delete a specific student's subvolume
sudo ceph fs subvolume rm cephfs 22106050001 --group_name students