Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10632

recovery-small test 26a fails with ‘client not evicted from OST’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.12.7, Lustre 2.15.0
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      recovery-small test_26a fails with the error

      recovery-small test_26a: @@@@@@ FAIL: client not evicted from OST 
      

      The following output comes from the failure at https://testing.hpdd.intel.com/test_sets/fe065d76-0baa-11e8-a7cd-52540065bddc

      Looking at the test_log, we see that there is a problem getting osc.*.state parameter on the client

      CMD: trevis-6vm6.trevis.hpdd.intel.com /usr/sbin/lctl get_param osc.lustre-OST0000-osc-ffff880039c89800.state
      /usr/lib64/lustre/tests/test-framework.sh: line 8496: ((: > 1517964926: syntax error: operand expected (error token is "> 1517964926")
      

      Test 26a calls the check_clients_evicted() routine to get the time of eviction and check it against the “before” time. If you can’t get the time the client osc was evicted, then you can’t compare it to the times and the test will fail.

      8482 # check that clients "oscs" was evicted after "before"
      8483 check_clients_evicted() {
      8484         local before=$1
      8485         shift
      8486         local oscs=${@}
      8487         local osc
      8488         local rc=0
      8489 
      8490         for osc in $oscs; do
      8491                 ((rc++))
      8492                 echo "Check state for $osc"
      8493                 local evicted=$(do_facet client $LCTL get_param osc.$osc.state |
      8494                         tail -n 3 | awk -F"[ [,]" \
      8495                         '/EVICTED ]$/ { if (mx<$5) {mx=$5;} } END { print mx }')
      8496                 if (($? == 0)) && (($evicted > $before)); then
      8497                         echo "$osc is evicted at $evicted"
      8498                         ((rc--))
      8499                 fi
      8500         done
      8501 
      8502         [ $rc -eq 0 ] || error "client not evicted from OST"
      8503 }
      

      you see that getting no value for $evicted, then we don’t decrement the return code and the test fails.

      There’s nothing obviously wrong in the console and dmesg logs except in the client console log, we do see some LustreErrors

      [12802.505434] Lustre: DEBUG MARKER: df
      [12805.745429] Lustre: Evicted from MGS (at 10.9.4.63@tcp) after server handle changed from 0x6728fd52bd37c36a to 0x6728fd52bd37c913
      [12805.745586] LustreError: 14144:0:(file.c:4097:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
      [12805.746256] LustreError: 14144:0:(lmv_obd.c:1387:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff880039c89800), error -108
      [12805.746260] LustreError: 14144:0:(llite_lib.c:1785:ll_statfs_internal()) md_statfs fails: rc = -108
      [12805.760747] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param osc.lustre-OST0000-osc-ffff880039c89800.state
      

      Here are logs for a few other recovery-small test 26a failures:
      review-dne-part-1: https://testing.hpdd.intel.com/test_sets/4ec62074-f473-11e7-8c43-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/bd9510da-f6cd-11e7-bd00-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/01049780-ffef-11e7-a6ad-52540065bddc
      failover: https://testing.hpdd.intel.com/test_sets/4b4b3144-06d6-11e8-a7cd-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: