Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10073

lnet-selftest test_smoke: lst Error found

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.11.0, Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6
    • trevis, full, x86_64 servers, ppc clients
      servers: el7.4, ldiskfs, branch master, v2.10.53.1, b3642
      clients: el7.4, branch master, v2.10.53.1, b3642
    • 3
    • 9223372036854775807

    Description

      https://testing.whamcloud.com/test_sets/87032fec-9d50-11e7-b778-5254006e85c2

      Seen previously in 2.9 testing (LU-6622).

      From test_log:

      Batch is stopped
      12345-10.9.0.84@tcp: [Session 0 brw errors, 30 ping errors] [RPC: 0 errors, 0 dropped, 30 expired]
      12345-10.9.0.85@tcp: [Session 0 brw errors, 30 ping errors] [RPC: 0 errors, 0 dropped, 30 expired]
      c:
      Total 2 error nodes in c
      12345-10.9.5.24@tcp: [Session 0 brw errors, 30 ping errors] [RPC: 0 errors, 0 dropped, 30 expired]
      12345-10.9.5.25@tcp: [Session 0 brw errors, 30 ping errors] [RPC: 0 errors, 0 dropped, 30 expired]
      s:
      Total 2 error nodes in s
      session is ended
      Total 2 error nodes in c
      Total 2 error nodes in s
      

      and

      Started clients trevis-77vm3.trevis.hpdd.intel.com,trevis-77vm4: 
      CMD: trevis-77vm3.trevis.hpdd.intel.com,trevis-77vm4 mount | grep /mnt/lustre' '
      10.9.5.25@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs)
      10.9.5.25@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs)
      CMD: trevis-77vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 
      trevis-77vm4: h2tcp: deprecated, use h2nettype instead
      trevis-77vm4: trevis-77vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
       lnet-selftest test_smoke: @@@@@@ FAIL: lst Error found 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5289:error()
        = /usr/lib64/lustre/tests/lnet-selftest.sh:153:check_lst_err()
        = /usr/lib64/lustre/tests/lnet-selftest.sh:179:test_smoke()
        = /usr/lib64/lustre/tests/test-framework.sh:5565:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5604:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5451:run_test()
        = /usr/lib64/lustre/tests/lnet-selftest.sh:182:main()
      

      Attachments

        1. perf-kernel-vm1.svg
          118 kB
        2. perf-kernel-vm2.svg
          131 kB
        3. perf-kernel-vm3.svg
          201 kB
        4. perf-kernel-vm4.svg
          189 kB
        5. perf-kernel-123.svg
          1.14 MB
        6. perf-kernel-122.svg
          681 kB
        7. perf-kernel-124.svg
          757 kB
        8. perf-kernel-121.svg
          945 kB

        Issue Links

          Activity

            [LU-10073] lnet-selftest test_smoke: lst Error found

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/38857/
            Subject: LU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c25a4e1fd92523247a0fa5a5c2809321765df263

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/38857/ Subject: LU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM Project: fs/lustre-release Branch: master Current Patch Set: Commit: c25a4e1fd92523247a0fa5a5c2809321765df263

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/46169
            Subject: LU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 187eef8fd9cc5dce0014afe7db45c60caf7ee605

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/46169 Subject: LU-10073 tests: re-enable lnet selftest smoke test for PPC + ARM Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 187eef8fd9cc5dce0014afe7db45c60caf7ee605

            We still have Power8 to test on to verify it works.

            simmonsja James A Simmons added a comment - We still have Power8 to test on to verify it works.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46037/
            Subject: LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 24424791ac7233e393a32be189fe77102857653b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46037/ Subject: LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels Project: fs/lustre-release Branch: master Current Patch Set: Commit: 24424791ac7233e393a32be189fe77102857653b

            Once PowerPC comes back I will test on that platform. I suspect this issue is resolved but we will have to wait and see.

            simmonsja James A Simmons added a comment - Once PowerPC comes back I will test on that platform. I suspect this issue is resolved but we will have to wait and see.
            xinliang Xinliang Liu added a comment -

            Tested with the patch on arm64. It passes.

            $ ./auster  -vr lnet-selftest
            ...Batch is stopped
            c:
            Total 0 error nodes in c
            s:
            Total 0 error nodes in s
            session is ended
            Total 0 error nodes in c
            Total 0 error nodes in s
            lustre-aio: xxxx clients: 'lustre-aio lustre-aio mds-01'
            lustre-aio: 1xxxx clients: 'lustre-aio
            lustre-aio: mds-01'
            lustre-aio: 2xxxx clients: 2 'lustre-aio,mds-01'
            lustre-aio: lustre-aio: executing lst_cleanup
            mds-01: liuxl-mds-test-01.novalocal: executing lst_cleanup
            PASS smoke (322s) 
            xinliang Xinliang Liu added a comment - Tested with the patch on arm64. It passes. $ ./auster  -vr lnet-selftest ...Batch is stopped c: Total 0 error nodes in c s: Total 0 error nodes in s session is ended Total 0 error nodes in c Total 0 error nodes in s lustre-aio: xxxx clients: 'lustre-aio lustre-aio mds-01' lustre-aio: 1xxxx clients: 'lustre-aio lustre-aio: mds-01' lustre-aio: 2xxxx clients: 2 'lustre-aio,mds-01' lustre-aio: lustre-aio: executing lst_cleanup mds-01: liuxl-mds-test-01.novalocal: executing lst_cleanup PASS smoke (322s)
            xinliang Xinliang Liu added a comment -

            Let me check again, James. Last patch update time I didn't encounter this issue. I will run this test suite on arm again on master branch.

            xinliang Xinliang Liu added a comment - Let me check again, James. Last patch update time I didn't encounter this issue. I will run this test suite on arm again on master branch.

            Good news is that in my testing of LNet selftest on Ubuntu I didn't see any issues with newer kernels. Enabling it for RHEL8 maloo also passed. Most likely some recent LNet bug that was resolved fixed this issue. Due to the lack of ARM / PPC available for testing I can't prove if this is working on those platforms.

            simmonsja James A Simmons added a comment - Good news is that in my testing of LNet selftest on Ubuntu I didn't see any issues with newer kernels. Enabling it for RHEL8 maloo also passed. Most likely some recent LNet bug that was resolved fixed this issue. Due to the lack of ARM / PPC available for testing I can't prove if this is working on those platforms.

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/46037
            Subject: LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 29bb14ccb62a1bccf066d558d35658fb57ffca11

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/46037 Subject: LU-10073 tests: re-enable lnet selftest smoke test 4.4+ kernels Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 29bb14ccb62a1bccf066d558d35658fb57ffca11
            pjones Peter Jones added a comment -

            James

            It looks like this is an area that you are still looking into

            Peter

            pjones Peter Jones added a comment - James It looks like this is an area that you are still looking into Peter

            People

              simmonsja James A Simmons
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: