[LU-16479] Automatically manage/control DEGRADED ZFS OST's Created: 16/Jan/23 Updated: 17/Feb/23 Resolved: 14/Feb/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Akash B | Assignee: | Akash B |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre filesystem with ZFS as backend filesystem for OST's. |
||
| Epic/Theme: | zfs |
| Epic: | server, zfs |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We have the obdfilter.testfs-OST000*.degraded value set/unset by zedlets (/etc/zfs/zed.d/statechange-lustre.sh) based on zpool being DEGRADED/ONLINE, We'd like to have this behavior enabled/disabled through an option so that we have I/O or newer allocations to DEGRADED OST's as well and hence there is no degradation in net bandwidth of the filesystem due to the degraded OSTs.
|
| Comments |
| Comment by Gerrit Updater [ 17/Jan/23 ] |
|
"Akash B <akash-b@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49660 |
| Comment by Olaf Faaland [ 30/Jan/23 ] |
|
Hi Akash, |
| Comment by Akash B [ 08/Feb/23 ] |
|
Hi Olaf, By default for ZFS OSTs, the lustre zedlet(statechange-lustre.sh) does the automatic manipulation of Lustre degraded state. While writing to multiple OSS nodes where one or two OSTs are in a degraded state (D seen in lfs df -h at the client side), reduces the overall bandwidth(performance degradation) of the filesystem. We'd want to have this behavior enabled/disabled through an option. Talked to Brian Behlendorf about this behavior for having this as a user property which administrators can use to enable/disable this behavior. Hey Akash, On occasion we've also wanted to be able to administratively disable this behavior, supporting this sounds great. I agree that splitting portions of the patch between Lustre and ZFS is awkward. It'd be nice to handle it entirely on the Lustre side. An alternate solution would be to introduce a new Lustre specific ZFS dataset user property. As I'm sure you've noticed Lustre already adds the following dataset properties which are used for configuration kern3/ost1 lustre:flags 4130 local kern3/ost1 lustre:fsname lslide local kern3/ost1 lustre:version 1 local kern3/ost1 lustre:mgsnode 7@kfi:9@kfi local kern3/ost1 lustre:index 0 local kern3/ost1 lustre:failover.node 21@kfi:3@kfi local kern3/ost1 lustre:svname lslide-OST0000 local We could add a new lustre:autodegrade=<on|off> user property (See "User Properties in zfsprops(7)). The statechange-lustre.sh zedlet could then check this property on the dataset to control the behavior. This has a few advantages: 1. User properties are a generic ZFS feature and won't be interpreted by ZFS itself. No ZFS changes are needed. 2. The property can be set per dataset providing more granularity. 3. The property is persistent and will survive reboots. 4. This mechanism is already used within the zedlet to identify Lustre datasets; e.g.: ZFS get -rH -s local -t filesystem -o name lustre:svname ${ZEVENT_POOL} 5. You can add the property at any time to an existing MDT/OST What do you think? If you were to implement this I'd suggest not only updating the Lustre zedlet, but also extending the lustre.mkfs utility to add this property by default when creating a new Lustre server. Thanks, Brian |
| Comment by Gerrit Updater [ 14/Feb/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49660/ |