Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10632

recovery-small test 26a fails with ‘client not evicted from OST’

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.12.7, Lustre 2.15.0
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      recovery-small test_26a fails with the error

      recovery-small test_26a: @@@@@@ FAIL: client not evicted from OST 
      

      The following output comes from the failure at https://testing.hpdd.intel.com/test_sets/fe065d76-0baa-11e8-a7cd-52540065bddc

      Looking at the test_log, we see that there is a problem getting osc.*.state parameter on the client

      CMD: trevis-6vm6.trevis.hpdd.intel.com /usr/sbin/lctl get_param osc.lustre-OST0000-osc-ffff880039c89800.state
      /usr/lib64/lustre/tests/test-framework.sh: line 8496: ((: > 1517964926: syntax error: operand expected (error token is "> 1517964926")
      

      Test 26a calls the check_clients_evicted() routine to get the time of eviction and check it against the “before” time. If you can’t get the time the client osc was evicted, then you can’t compare it to the times and the test will fail.

      8482 # check that clients "oscs" was evicted after "before"
      8483 check_clients_evicted() {
      8484         local before=$1
      8485         shift
      8486         local oscs=${@}
      8487         local osc
      8488         local rc=0
      8489 
      8490         for osc in $oscs; do
      8491                 ((rc++))
      8492                 echo "Check state for $osc"
      8493                 local evicted=$(do_facet client $LCTL get_param osc.$osc.state |
      8494                         tail -n 3 | awk -F"[ [,]" \
      8495                         '/EVICTED ]$/ { if (mx<$5) {mx=$5;} } END { print mx }')
      8496                 if (($? == 0)) && (($evicted > $before)); then
      8497                         echo "$osc is evicted at $evicted"
      8498                         ((rc--))
      8499                 fi
      8500         done
      8501 
      8502         [ $rc -eq 0 ] || error "client not evicted from OST"
      8503 }
      

      you see that getting no value for $evicted, then we don’t decrement the return code and the test fails.

      There’s nothing obviously wrong in the console and dmesg logs except in the client console log, we do see some LustreErrors

      [12802.505434] Lustre: DEBUG MARKER: df
      [12805.745429] Lustre: Evicted from MGS (at 10.9.4.63@tcp) after server handle changed from 0x6728fd52bd37c36a to 0x6728fd52bd37c913
      [12805.745586] LustreError: 14144:0:(file.c:4097:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
      [12805.746256] LustreError: 14144:0:(lmv_obd.c:1387:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff880039c89800), error -108
      [12805.746260] LustreError: 14144:0:(llite_lib.c:1785:ll_statfs_internal()) md_statfs fails: rc = -108
      [12805.760747] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param osc.lustre-OST0000-osc-ffff880039c89800.state
      

      Here are logs for a few other recovery-small test 26a failures:
      review-dne-part-1: https://testing.hpdd.intel.com/test_sets/4ec62074-f473-11e7-8c43-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/bd9510da-f6cd-11e7-bd00-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/01049780-ffef-11e7-a6ad-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/4b4b3144-06d6-11e8-a7cd-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-10632] recovery-small test 26a fails with ‘client not evicted from OST’

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43237/
            Subject: LU-10632 tests: recovery-small test_26 idle_timeout
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: ff8f84a216b8ef432891220971b3ca6d5f1df39d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43237/ Subject: LU-10632 tests: recovery-small test_26 idle_timeout Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: ff8f84a216b8ef432891220971b3ca6d5f1df39d

            James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43237
            Subject: LU-10632 tests: recovery-small test_26 idle_timeout
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: f46f05cf20c59dce4703181bbc24928c54717798

            gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43237 Subject: LU-10632 tests: recovery-small test_26 idle_timeout Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: f46f05cf20c59dce4703181bbc24928c54717798
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/42006/
            Subject: LU-10632 tests: recovery-small test_26 idle_timeout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b4391fcdaf392a50bd1419342eca3b730c077ed2

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/42006/ Subject: LU-10632 tests: recovery-small test_26 idle_timeout Project: fs/lustre-release Branch: master Current Patch Set: Commit: b4391fcdaf392a50bd1419342eca3b730c077ed2

            Test failed 16x in the past week.

            adilger Andreas Dilger added a comment - Test failed 16x in the past week.

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42006
            Subject: LU-10632 tests: recovery-small test_26 idle_timeout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 93e6a185980954e6df072c64831d4465080469e2

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42006 Subject: LU-10632 tests: recovery-small test_26 idle_timeout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 93e6a185980954e6df072c64831d4465080469e2
            adilger Andreas Dilger added a comment - +1 on master https://testing.whamcloud.com/test_sessions/6ef53a85-b6ba-478b-b42a-2b12c33135bb and 7 more in the past week
            adilger Andreas Dilger added a comment - +1 on master https://testing.whamcloud.com/test_sets/4a1b1ac5-51a3-4ecf-87a6-b45d4e24ced0 and 10 more in the past week.
            hornc Chris Horn added a comment - +1 on master https://testing.whamcloud.com/test_sets/92b3221e-d412-484a-abc7-53e1247a2d71
            hornc Chris Horn added a comment - +1 on master https://testing.whamcloud.com/test_sessions/b2965d82-a459-4188-a035-72180920afb6

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: