Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11803

sanity test 255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0'

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • Lustre 2.12.0
    • Ubuntu 18.04 clients
    • 3
    • 9223372036854775807

    Description

      sanity test_255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0'. It looks like this is only impacting Ubuntu 18.04 client testing.

      Looking at the client test_log from https://testing.whamcloud.com/test_sets/d1b23f66-fdd4-11e8-a97c-52540065bddc , a problem is that we can’t find lock_unused_count

      == sanity test 255c: suite of ladvise lockahead tests ================================================ 19:46:58 (1544471218)
      CMD: trevis-19vm3 /usr/sbin/lctl get_param -n version 2>/dev/null ||
      				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
      				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
      Starting test test10 at 1544471219
      Finishing test test10 at 1544471219
      CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats
      Starting test test11 at 1544471219
      Finishing test test11 at 1544471219
      CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats
      error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory
      Starting test test12 at 1544471220
      Finishing test test12 at 1544471220
      error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory
      

      There are several example of this failure all for Ubuntu at
      https://testing.whamcloud.com/test_sets/bd048afa-f713-11e8-b67f-52540065bddc
      https://testing.whamcloud.com/test_sets/b223ccee-f778-11e8-b67f-52540065bddc
      https://testing.whamcloud.com/test_sets/50f5970e-fa16-11e8-8a18-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11803] sanity test 255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0'

            Some more work is needed.

            simmonsja James A Simmons added a comment - Some more work is needed.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33916/
            Subject: LU-11803 obd: replace class_uuid with linux kernel version.
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 604c266a175b72500ef99793652b64ed4f842b2c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33916/ Subject: LU-11803 obd: replace class_uuid with linux kernel version. Project: fs/lustre-release Branch: master Current Patch Set: Commit: 604c266a175b72500ef99793652b64ed4f842b2c

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34783
            Subject: LU-11803 tests: don't assume obd device name
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 2ab9df22c74f2404ac0d8e2632073f11d167eeba

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34783 Subject: LU-11803 tests: don't assume obd device name Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 2ab9df22c74f2404ac0d8e2632073f11d167eeba

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33894/
            Subject: LU-11803 tests: don't assume obd device name
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b8e6c8bdca9bd0e12d78cd4a06800c13f4293325

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33894/ Subject: LU-11803 tests: don't assume obd device name Project: fs/lustre-release Branch: master Current Patch Set: Commit: b8e6c8bdca9bd0e12d78cd4a06800c13f4293325

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33916
            Subject: LU-11803 llite: use sget() to find client super block
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9df5133ac346b807bcb671fdea0adb7ccb628a5e

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33916 Subject: LU-11803 llite: use sget() to find client super block Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9df5133ac346b807bcb671fdea0adb7ccb628a5e

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33894
            Subject: LU-11803 tests: don't assume obd device name
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ce075a8726fe822803f24d1b336301ad87e9ce12

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33894 Subject: LU-11803 tests: don't assume obd device name Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ce075a8726fe822803f24d1b336301ad87e9ce12
            simmonsja James A Simmons added a comment - - edited

            I see the issue. Its a test problem in this case. The people that wrote these test assume the obd device name is of the format $FSNAME-OST0000-osc-ffff* but that is not always the case. In newer kernels the ffff is filtered out. This also impacts older kernels running on non x86 platforms. For example on Power8 is lustre-OST0000-osc-c000001e4de63000. I will push a patch to fix this.

            simmonsja James A Simmons added a comment - - edited I see the issue. Its a test problem in this case. The people that wrote these test assume the obd device name is of the format $FSNAME-OST0000-osc-ffff* but that is not always the case. In newer kernels the ffff is filtered out. This also impacts older kernels running on non x86 platforms. For example on Power8 is lustre-OST0000-osc-c000001e4de63000. I will push a patch to fix this.

            Its the uuid issue. So newer kernels no long allow you to expose the pointer address to user land. We need to create a new UUID method that is not the pointer of an internal kernel object

            simmonsja James A Simmons added a comment - Its the uuid issue. So newer kernels no long allow you to expose the pointer address to user land. We need to create a new UUID method that is not the pointer of an internal kernel object

            Let me look

            simmonsja James A Simmons added a comment - Let me look

            simmonsja this looks like fallout from moving parameters from /proc to /sys? The Ubuntu kernel is 4.15, which is definitely the newest one we have running, so may be more likely to be affected by these changes.

            adilger Andreas Dilger added a comment - simmonsja this looks like fallout from moving parameters from /proc to /sys? The Ubuntu kernel is 4.15, which is definitely the newest one we have running, so may be more likely to be affected by these changes.

            Since this is a similar failure, we are also seeing sanity-flr tests 0g and 31 fail with similar issues

            == sanity-flr test 0g: lfs mirror create flags support =============================================== 00:45:20 (1544489120)
            osc.lustre-OST0000-osc-        (ptrval).stats=clear
            osc.lustre-OST0001-osc-        (ptrval).stats=clear
            osc.lustre-OST0002-osc-        (ptrval).stats=clear
            osc.lustre-OST0003-osc-        (ptrval).stats=clear
            osc.lustre-OST0004-osc-        (ptrval).stats=clear
            osc.lustre-OST0005-osc-        (ptrval).stats=clear
            osc.lustre-OST0006-osc-        (ptrval).stats=clear
            error: get_param: param_path 'osc/lustre-OST0000-osc-ffff*/stats': No such file or directory
             sanity-flr test_0g: @@@@@@ FAIL: read was not provided by OST1 
            

            and

            == sanity-flr test 31: make sure glimpse request can be retried ====================================== 00:46:46 (1544489206)
            fail_loc=0x1A00
            CMD: trevis-19vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts || true
            Stopping /mnt/lustre-ost1 (opts:) on trevis-19vm3
            CMD: trevis-19vm3 umount -d /mnt/lustre-ost1
            CMD: trevis-19vm3 lsmod | grep lnet > /dev/null &&
            lctl dl | grep ' ST ' || true
            CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n at_min
            can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs
            error: get_param: param_path 'ldlm/namespaces/lustre-OST0000-osc-ffff*/lock_count': No such file or directory
            error: get_param: param_path 'ldlm/namespaces/lustre-OST0001-osc-ffff*/lock_count': No such file or directory
             sanity-flr test_31: @@@@@@ FAIL: OST 1: no glimpse request was sent 
            
            jamesanunez James Nunez (Inactive) added a comment - Since this is a similar failure, we are also seeing sanity-flr tests 0g and 31 fail with similar issues == sanity-flr test 0g: lfs mirror create flags support =============================================== 00:45:20 (1544489120) osc.lustre-OST0000-osc- (ptrval).stats=clear osc.lustre-OST0001-osc- (ptrval).stats=clear osc.lustre-OST0002-osc- (ptrval).stats=clear osc.lustre-OST0003-osc- (ptrval).stats=clear osc.lustre-OST0004-osc- (ptrval).stats=clear osc.lustre-OST0005-osc- (ptrval).stats=clear osc.lustre-OST0006-osc- (ptrval).stats=clear error: get_param: param_path 'osc/lustre-OST0000-osc-ffff*/stats': No such file or directory sanity-flr test_0g: @@@@@@ FAIL: read was not provided by OST1 and == sanity-flr test 31: make sure glimpse request can be retried ====================================== 00:46:46 (1544489206) fail_loc=0x1A00 CMD: trevis-19vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts || true Stopping /mnt/lustre-ost1 (opts:) on trevis-19vm3 CMD: trevis-19vm3 umount -d /mnt/lustre-ost1 CMD: trevis-19vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n at_min can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs error: get_param: param_path 'ldlm/namespaces/lustre-OST0000-osc-ffff*/lock_count': No such file or directory error: get_param: param_path 'ldlm/namespaces/lustre-OST0001-osc-ffff*/lock_count': No such file or directory sanity-flr test_31: @@@@@@ FAIL: OST 1: no glimpse request was sent

            People

              simmonsja James A Simmons
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: