Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16479

Automatically manage/control DEGRADED ZFS OST's

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Upstream, Lustre 2.15.0
    • None
    • Lustre filesystem with ZFS as backend filesystem for OST's.
    • 9223372036854775807

    Description

      We have the obdfilter.testfs-OST000*.degraded value set/unset by zedlets (/etc/zfs/zed.d/statechange-lustre.sh) based on zpool being DEGRADED/ONLINE, We'd like to have this behavior enabled/disabled through an option so that we have I/O or newer allocations to DEGRADED OST's as well and hence there is no degradation in net bandwidth of the filesystem due to the degraded OSTs.
       
      Introduce a new Lustre-specific ZFS dataset user property (lustre:autodegrade=on|off) for this purpose. Update the Lustre zedlet and also extend the mkfs.lustre utility to add this property by default when creating a new Lustre server(only for ZFS OSTs). The default behavior would remain the same (lustre:autodegrade=on) which disables new allocations to DEGRADED OSTs.
      Creating a user property has a few advantages:

      1. User properties are a generic ZFS feature and won't be interpreted by ZFS itself. No ZFS changes are needed.
      2. The property can be set per dataset providing more granularity.
      3. The property is persistent and will survive reboots.

      Attachments

        Activity

          [LU-16479] Automatically manage/control DEGRADED ZFS OST's

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49660/
          Subject: LU-16479 utils: Add option to manage degraded ZFS OST
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: a2de6af65d21bff0d9357c30e6eb4ba049ff2059

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49660/ Subject: LU-16479 utils: Add option to manage degraded ZFS OST Project: fs/lustre-release Branch: master Current Patch Set: Commit: a2de6af65d21bff0d9357c30e6eb4ba049ff2059
          akash-b Akash B added a comment -

          Hi Olaf,

          By default for ZFS OSTs, the lustre zedlet(statechange-lustre.sh) does the automatic manipulation of Lustre degraded state. While writing to multiple OSS nodes where one or two OSTs are in a degraded state (D seen in lfs df -h at the client side), reduces the overall bandwidth(performance degradation) of the filesystem.

          We'd want to have this behavior enabled/disabled through an option. Talked to Brian Behlendorf about this behavior for having this as a user property which administrators can use to enable/disable this behavior. 

          Hey Akash,
           
          On occasion we've also wanted to be able to administratively disable this behavior, supporting this sounds great.
           
          I agree that splitting portions of the patch between Lustre and ZFS is awkward.  It'd be nice to handle it entirely on the Lustre side.  An alternate solution would be to introduce a new Lustre specific ZFS dataset user property.  As I'm sure you've noticed Lustre already adds the following dataset properties which are used for configuration
           
          kern3/ost1  lustre:flags           4130                   local 
          kern3/ost1  lustre:fsname         lslide                 local
          kern3/ost1  lustre:version        1                      local
          kern3/ost1  lustre:mgsnode        7@kfi:9@kfi              local
          kern3/ost1  lustre:index          0                      local
          kern3/ost1  lustre:failover.node  21@kfi:3@kfi             local
          kern3/ost1  lustre:svname         lslide-OST0000         local
           
          We could add a new lustre:autodegrade=<on|off> user property (See "User Properties in zfsprops(7)).  The statechange-lustre.sh zedlet could then check this property on the dataset to control the behavior.  This has a few advantages:
          1. User properties are a generic ZFS feature and won't be interpreted by ZFS itself.  No ZFS changes are needed.
          2. The property can be set per dataset providing more granularity.
          3. The property is persistent and will survive reboots.  
          4. This mechanism is already used within the zedlet to identify Lustre datasets;
          e.g.: ZFS get -rH -s local -t filesystem -o name lustre:svname ${ZEVENT_POOL}
          5. You can add the property at any time to an existing MDT/OST
          What do you think?  If you were to implement this I'd suggest not only updating the Lustre zedlet, but also extending the lustre.mkfs utility to add this property by default when creating a new Lustre server.
           
          Thanks,
          Brian
          
          akash-b Akash B added a comment - Hi Olaf, By default for ZFS OSTs, the lustre zedlet(statechange-lustre.sh) does the automatic manipulation of Lustre degraded state. While writing to multiple OSS nodes where one or two OSTs are in a degraded state ( D seen in lfs df -h at the client side ), reduces the overall bandwidth(performance degradation) of the filesystem. We'd want to have this behavior enabled/disabled through an option. Talked to Brian Behlendorf about this behavior for having this as a user property which administrators can use to enable/disable this behavior.  Hey Akash,   On occasion we've also wanted to be able to administratively disable this behavior, supporting this sounds great.   I agree that splitting portions of the patch between Lustre and ZFS is awkward.  It 'd be nice to handle it entirely on the Lustre side.  An alternate solution would be to introduce a new Lustre specific ZFS dataset user property.  As I' m sure you've noticed Lustre already adds the following dataset properties which are used for configuration   kern3/ost1  lustre:flags           4130                   local  kern3/ost1  lustre:fsname         lslide                 local kern3/ost1  lustre:version        1                      local kern3/ost1  lustre:mgsnode        7@kfi:9@kfi              local kern3/ost1  lustre:index          0                      local kern3/ost1  lustre:failover.node  21@kfi:3@kfi             local kern3/ost1  lustre:svname         lslide-OST0000         local   We could add a new lustre:autodegrade=<on|off> user property (See "User Properties in zfsprops(7)).  The statechange-lustre.sh zedlet could then check this property on the dataset to control the behavior.  This has a few advantages: 1. User properties are a generic ZFS feature and won't be interpreted by ZFS itself.  No ZFS changes are needed. 2. The property can be set per dataset providing more granularity. 3. The property is persistent and will survive reboots.   4. This mechanism is already used within the zedlet to identify Lustre datasets; e.g.: ZFS get -rH -s local -t filesystem -o name lustre:svname ${ZEVENT_POOL} 5. You can add the property at any time to an existing MDT/OST What do you think?  If you were to implement this I'd suggest not only updating the Lustre zedlet, but also extending the lustre.mkfs utility to add this property by default when creating a new Lustre server.   Thanks, Brian
          ofaaland Olaf Faaland added a comment -

          Hi Akash,
          What motivated this? Did you see disk failures during testing that affected results unnecessarily?
          thanks

          ofaaland Olaf Faaland added a comment - Hi Akash, What motivated this? Did you see disk failures during testing that affected results unnecessarily? thanks

          "Akash B <akash-b@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49660
          Subject: LU-16479 utils: Add option to manage degraded ZFS OST
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: ccf612d181952744e1dd13a842c02bb28ba181d1

          gerrit Gerrit Updater added a comment - "Akash B <akash-b@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49660 Subject: LU-16479 utils: Add option to manage degraded ZFS OST Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ccf612d181952744e1dd13a842c02bb28ba181d1

          People

            akash-b Akash B
            akash-b Akash B
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: