[LU-4277] Integrate ZFS zpool resilver status with OFD OS_STATE_DEGRADED flag Created: 20/Nov/13 Updated: 20/Mar/18 Resolved: 06/Feb/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 11749 | ||||||||
| Description |
|
The OFD statfs() handled can optionally add an OS_STATE_DEGRADED flag to the statfs reply, which the MDS uses to help decide which OSTs to allocate new file objects from. Unless all other OSTs are also degraded, offline, or full, the DEGRADED OSTs will be skipped for newly created files. This avoids the application waiting for slow writes because of the rebuild long after it has completed on other healthy OSTs. It also avoids the new writes from interfering with the OST rebuild process, so it is a double win. This was previously implemented as a /proc tunable suitable for mdadm or a hardware-RAID utility to set from userspace, but since ZFS RAID is in the kernel it should be possible to query this status directly from the kernel when the MDS statfs() arrives. |
| Comments |
| Comment by Andreas Dilger [ 23/Nov/13 ] |
|
http://review.whamcloud.com/8378 is a basic patch to fix handling in the LOD code for DEGRADED and READONLY flags. It doesn't yet fix the osd-zfs code in udmu_objset_statfs() that should be setting the flags. |
| Comment by Andreas Dilger [ 16/Sep/16 ] |
|
Don or Brian, |
| Comment by Don Brady (Inactive) [ 12/Oct/16 ] |
|
The degrade state is part of the vdev. Getting this info strictly through the spa interface would yield a ton of data (i.e. the entire config) and require nvlist parsing. A new API, like a spa_get_vdev_state(), to pull out the vdev state of the root vdev would be require to get at this state in a simple manner. We can easily set the state as it changes using a zedlet. We now have a state change event for all healthy<-->degraded vdev states that could be used to initiate a check of the pool state and post that state via lctl as you suggest above. |
| Comment by Don Brady (Inactive) [ 06/Feb/17 ] |
|
Attached a zedlet, statechange-lustre.sh, that will propagate degraded state changes from zfs to Lustre. |
| Comment by Andreas Dilger [ 07/Feb/17 ] |
|
It would be good to get the script in the form of a patch against the fs/lustre-release repo so that it can be reviewed properly. Some general comments first, however:
My thought is that the script would be installed as part of the ost-mount-zfs RPM in some directory (something like /etc/zfs/zed/zedlets.d/ akin to /etc/modprobe.d or /etc/logrotate.d) that is a place to dump zedlets that will be run (at least the next time zed is started) and do not really need any kind of edit from the user to specify the Lustre targets. Then, it would get events from the kernel when a zpool becomes degraded and update lctl obdfilter.$target.degraded for targets in that zpool, and does nothing for non-Lustre pools (e.g. root pool for OS). |
| Comment by Don Brady (Inactive) [ 14/Feb/17 ] |
|
Thanks Andreas for the feedback. I inadvertently attached my local copy used for testing but I can provide the generic one. I'll also address the issues are repost an update. Is there an example license block I can refer to? |
| Comment by Andreas Dilger [ 18/Apr/17 ] |
|
Don, Then, it should be straight forward to submit a patch (per above process) to add your script as lustre/scripts/statechange-lustre.sh and install it into the above directory via lustre/scripts/Makefile.am if ZFS is enabled: if ZFS_ENABLED sbin_SCRIPTS += zfsobj2fid +zeddir = $(sysconfdir)/zfs/zed.d +zed_SCRIPTS = statechange-lustre.sh endif : : +EXTRA_DIST += statechange-lustre.sh and then package it in lustre.spec.in and lustre-dkms.spec.in as part of the osd-zfs-mount RPM (Lustre userspace tools for ZFS-backed targets): %files osd-zfs-mount
%defattr(-,root,root)
%{_libdir}/@PACKAGE@/mount_osd_zfs.so
+%{_sysconfdir}/zfs/zed.d/statechange-lustre.sh
Now, when ZFS server support is installed, your zedlet will also be installed on all the servers, and should start to handle the degraded/offline events automatically. |
| Comment by John Salinas (Inactive) [ 28/Apr/17 ] |
|
The /etc/zfs/zed.d links are links to: /usr/libexec/zfs/zed.d/ for example:
|
| Comment by Peter Jones [ 15/Dec/17 ] |
|
Nathaniel Can you please see what is required to move this forward? Thanks Peter |
| Comment by Gerrit Updater [ 17/Jan/18 ] |
|
Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/30907 |
| Comment by Gerrit Updater [ 06/Feb/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30907/ |
| Comment by Peter Jones [ 06/Feb/18 ] |
|
Landed for 2.11
|