Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1342

Test failure on sanity-quota test_29

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.4
    • Lustre 2.1.1
    • None
    • Server: rhel6.2 with lustre-2.1.1
      Client: rhel6.2 with lustre-client-2.1.1

      MDS/MGS: service360
      OSS1: service361
      OSS2: service362
      Client1: service333
      Client2: service334
    • 3
    • 4547

    Description

      This looks like a duplicate of LU-492, but my software contains the fix of LU-492. The patch of LU-492 did not help in my testing.

      The git source of our code is at https://github.com/jlan/lustre-nas/tree/nas-2.1.1

      The command I issued was:

      1. ONLY=29 cfg/nas.v3.sh SANITY_QUOTA
        The script files nas.v3.sh and ncli_nas.v3.sh are attached.
        The test log tarball sanity-quota-1335289931.tar.bz2 is also attached.

      The failure is reproducible.

      test_29()
      {
      ...

      1. actually send a RPC to make service at_current confined within at_max
        $LFS setquota -u $TSTUSR -b 0 -B $BLK_LIMIT -i 0 -I 0 $DIR || error "should succeed"
        <=== succeeded

      #define OBD_FAIL_MDS_QUOTACTL_NET 0x12e
      lustre_fail mds 0x12e
      <==== fine

      $LFS setquota -u $TSTUSR -b 0 -B $BLK_LIMIT -i 0 -I 0 $DIR & pid=$!
      <==== "setquota failed: Transport endpoint is not connected"

      echo "sleeping for 10 * 1.25 + 5 + 10 seconds"
      sleep 28
      ps -p $pid && error "lfs hadn't finished by timeout"
      <==== the process still alive. Die later due to timeout.
      ...

      Is "setquota failed: Transport endpoint is not connected" error expected?
      I saw that in the test log.
      Was that the result of "lustre_fail mds 0x12e", or did that mean the mds did not see the lustre_fail request? Remote commands were sent via pdsh.

      If I tried a "sleep 40" (instead of "sleep 28" after that, the lfs
      command timed out before the check and the test passed. It seems
      the sleep formula "10 * 1.25 + 5 + 10 seconds" is not long enough?

      Attachments

        1. nas.v3.sh
          3 kB
        2. ncli_nas.v3.sh
          2 kB
        3. sanity-quota-1335289931.tar.bz2
          4.61 MB

        Activity

          People

            bobijam Zhenyu Xu
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: