Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12967

sanity test 80 silently fails to get sync_on_lock_cancel parameter

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      There are actually two issues with sanity test_80 as see here in the suite_log to https://testing.whamcloud.com/test_sets/92b52b08-05c5-11ea-9487-52540065bddc

      == sanity test 80: Page eviction is equally fast at high offsets too  ================================ 19:09:39 (1573585779)
      CMD: trevis-34vm3 lctl get_param -n obdfilter.*.sync_on_lock_cancel
      trevis-34vm3: error: get_param: param_path 'obdfilter/*/sync_on_lock_cancel': No such file or directory
      CMD: trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3 lctl set_param obdfilter.*.sync_on_lock_cancel=never
      pdsh@trevis-34vm1: gethostbyname("trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3") failed
      1+0 records in
      1+0 records out
      1048576 bytes (1.0 MB) copied, 0.00475709 s, 220 MB/s
      CMD: trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3 lctl set_param obdfilter.*.sync_on_lock_cancel=
      pdsh@trevis-34vm1: gethostbyname("trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3trevis-34vm3") failed
      

      One issue is that it looks like the obdfilter.*.sync_on_lock_cancel parameter no longer exists and the host name composed in the test is problematic.

      The test does not fail when it hits these issues, but most likely, the test is not working as intended.

      Attachments

        Issue Links

          Activity

            [LU-12967] sanity test 80 silently fails to get sync_on_lock_cancel parameter

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37038/
            Subject: LU-12967 tgt: clean up sync_on_cancel references
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 1e43d3c4fc8432bebacfa7a8a320163e3b15448d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37038/ Subject: LU-12967 tgt: clean up sync_on_cancel references Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 1e43d3c4fc8432bebacfa7a8a320163e3b15448d

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37037/
            Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: cdffffb73080ad1100549afbcdbb09d3ee7a1c50

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37037/ Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: cdffffb73080ad1100549afbcdbb09d3ee7a1c50

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37038
            Subject: LU-12967 tgt: clean up sync_on_cancel references
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 894d051d1161b6a2462fdac59df2ed4668315854

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37038 Subject: LU-12967 tgt: clean up sync_on_cancel references Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 894d051d1161b6a2462fdac59df2ed4668315854

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37037
            Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 7e3ac15524e1178674e0c3e50945215b82883b0f

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37037 Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 7e3ac15524e1178674e0c3e50945215b82883b0f
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36754/
            Subject: LU-12967 tgt: clean up sync_on_cancel references
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 52a5981be4df863088168b3ea41fac9e29ddf060

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36754/ Subject: LU-12967 tgt: clean up sync_on_cancel references Project: fs/lustre-release Branch: master Current Patch Set: Commit: 52a5981be4df863088168b3ea41fac9e29ddf060

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36748/
            Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7df7347b7b188e7168e094304fd6d2d985f7f274

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36748/ Subject: LU-12967 ofd: restore sync_on_lock_cancel tunable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7df7347b7b188e7168e094304fd6d2d985f7f274

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36754
            Subject: LU-12967 tgt: clean up sync_on_cancel references
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bb49bfca3ea3434a72e54fc6001f2d71c52d761d

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36754 Subject: LU-12967 tgt: clean up sync_on_cancel references Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb49bfca3ea3434a72e54fc6001f2d71c52d761d
            adilger Andreas Dilger added a comment - - edited

            It would make sense to write a conf-sanity test that extracts the "lctl list_param -R '*'" output on the client, MDS, and OSS, removes duplicate entries (e.g. removes instance (e.g. "-ffff012344567"), filesystem name (e.g. "lustre-"), NIDs (e.g. "192.168.10.1@tcp"), target index numbers (e.g. "OST0000" from parameters that make them unique to a specific mount/fsname/configuration), then sorts and removes duplicates and compares with a previously-saved parameter list.

            It should not be considered an error if new parameters are added, but it should be an error to remove existing parameters (or at least we will be notified of this and can make a proper decision about it). The lists should be saved with the version running on the current node (e.g. OSS, MDS, client) and not on the version of the test script, so that this does not cause errors during interop. The list can be updated right before a release so that we don't have to update the saved parameter list continually (though that is not bad either), but we also don't accidentally lose parameters between releases.

            adilger Andreas Dilger added a comment - - edited It would make sense to write a conf-sanity test that extracts the " lctl list_param -R '*' " output on the client, MDS, and OSS, removes duplicate entries (e.g. removes instance (e.g. " -ffff012344567 "), filesystem name (e.g. " lustre- "), NIDs (e.g. " 192.168.10.1@tcp "), target index numbers (e.g. " OST0000 " from parameters that make them unique to a specific mount/fsname/configuration), then sorts and removes duplicates and compares with a previously-saved parameter list. It should not be considered an error if new parameters are added, but it should be an error to remove existing parameters (or at least we will be notified of this and can make a proper decision about it). The lists should be saved with the version running on the current node (e.g. OSS, MDS, client) and not on the version of the test script, so that this does not cause errors during interop. The list can be updated right before a release so that we don't have to update the saved parameter list continually (though that is not bad either), but we also don't accidentally lose parameters between releases.

            It looks like the "ofd.*.sync_on_lock_cancel" tunable was broken by patch https://review.whamcloud.com/33059 "LU-8066 ofd: migrate from proc to sysfs" due to implicit use of the function name as the parameter name. That patch was landed and is part of the 2.12.0 release, so we can't just revert the tunable name to "sync_on_lock_cancel". It also isn't just a matter of restoring the old tunable name, since the "mdt.*.sync_lock_cancel" name is also used since 2.8, and the code for the two tunables was recently consolidated in the server target code in patch https://review.whamcloud.com/34190 "LU-10496 tgt: move FMD handling from OFD to target", and in the long run it is better to have a single tunable name for both.

            Instead, I think the best path forward is to keep the common "sync_lock_cancel" tunable name for both MDT and OST, and add backward compatibility for "ofd.*.sync_on_lock_cancel" for a number of releases, and print a deprecation warning if the old name is used.

            adilger Andreas Dilger added a comment - It looks like the " ofd.*.sync_on_lock_cancel " tunable was broken by patch https://review.whamcloud.com/33059 " LU-8066 ofd: migrate from proc to sysfs " due to implicit use of the function name as the parameter name. That patch was landed and is part of the 2.12.0 release, so we can't just revert the tunable name to " sync_on_lock_cancel ". It also isn't just a matter of restoring the old tunable name, since the " mdt.*.sync_lock_cancel " name is also used since 2.8, and the code for the two tunables was recently consolidated in the server target code in patch https://review.whamcloud.com/34190 " LU-10496 tgt: move FMD handling from OFD to target ", and in the long run it is better to have a single tunable name for both. Instead, I think the best path forward is to keep the common " sync_lock_cancel " tunable name for both MDT and OST, and add backward compatibility for " ofd.*.sync_on_lock_cancel " for a number of releases, and print a deprecation warning if the old name is used.

            People

              jamesanunez James Nunez (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: