Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9259

sanity test_17o failed with 'stat file should fail'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • review-dne
    • 3
    • 9223372036854775807

    Description

      sanity test 17o is failing with the error message

      'stat file should fail' 
      

      sanity test 17o touches a file, fails the MDS and then checks to see if the file exists. Here’s the code:

       703         local WDIR=$DIR/${tdir}o
       704         local mdt_index
       705         local rc=0
       706 
       707         test_mkdir -p $WDIR
       708         mdt_index=$($LFS getstripe -M $WDIR)
       709         mdt_index=$((mdt_index+1))
       710 
       711         touch $WDIR/$tfile
       712 
       713         #fail mds will wait the failover finish then set
       714         #following fail_loc to avoid interfer the recovery process.
       715         fail mds${mdt_index}
       716 
       717         #define OBD_FAIL_OSD_LMA_INCOMPAT 0x194
       718         do_facet mds${mdt_index} lctl set_param fail_loc=0x194
       719         ls -l $WDIR/$tfile && rc=1
       720         do_facet mds${mdt_index} lctl set_param fail_loc=0
       721         [[ $rc -ne 0 ]] && error "stat file should fail"
      

      There’s nothing interesting in the console logs to explain why the file exists.

      So far, I only see failures for this error for review-dne.

      This test failed with this error message last year a bit, stopped failing, and started again recently. Here are the most recent failures:
      2017-03-25 –https://testing.hpdd.intel.com/test_sets/438b1b98-116a-11e7-8920-5254006e85c2
      2017-03-25 – https://testing.hpdd.intel.com/test_sets/0053e2ca-1146-11e7-8920-5254006e85c2
      2017-03-06 – https://testing.hpdd.intel.com/test_sets/5dcc4dd2-0293-11e7-8394-5254006e85c2
      2016-10-21 – https://testing.hpdd.intel.com/test_sets/b1d77bc4-980b-11e6-9e8a-5254006e85c2
      2016-10-19 - https://testing.hpdd.intel.com/test_sets/06c986de-961f-11e6-9722-5254006e85c2
      2016-09-27 - https://testing.hpdd.intel.com/test_sets/8c0c48aa-85b1-11e6-91aa-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-9259] sanity test_17o failed with 'stat file should fail'

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26225/
            Subject: LU-9259 tests: set fail_loc on the right MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a0a812d2b019b97356b0d6a1a8debd7d46fed00b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26225/ Subject: LU-9259 tests: set fail_loc on the right MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: a0a812d2b019b97356b0d6a1a8debd7d46fed00b

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26225
            Subject: LU-9259 tests: set fail_loc on the right MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e6a944d82642bc6994d4ee5e44ff9bc25604ce2f

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/26225 Subject: LU-9259 tests: set fail_loc on the right MDT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e6a944d82642bc6994d4ee5e44ff9bc25604ce2f

            It looks from the logs (2017-03-06 at least) that the fail_loc check is not being hit. I'm not sure if that is because the file is not being created on the MDS where the fail_loc is set, or possibly the file attributes are cached on the client and the MDS isn't being involved in the lookup.

            The test itself could be improved a bit:

                    test_mkdir -p $WDIR
                    mdt_index=$($LFS getstripe -M $WDIR)
                    mdt_index=$((mdt_index+1))
            
                    touch $WDIR/$tfile
            

            The mdt_index should be gotten from the file after it is created instead of from the directory, since the directory is striped 2 ways by default, and "getstripe -M" on a striped directory will only return the stripe0/master index, which isn't necessarily where the inode will be allocated (depends on filename and hash function).

            Also, the client MDC DLM lock cache should be flushed so that the client is sure to do a lookup on the MDS.

            adilger Andreas Dilger added a comment - It looks from the logs (2017-03-06 at least) that the fail_loc check is not being hit. I'm not sure if that is because the file is not being created on the MDS where the fail_loc is set, or possibly the file attributes are cached on the client and the MDS isn't being involved in the lookup. The test itself could be improved a bit: test_mkdir -p $WDIR mdt_index=$($LFS getstripe -M $WDIR) mdt_index=$((mdt_index+1)) touch $WDIR/$tfile The mdt_index should be gotten from the file after it is created instead of from the directory, since the directory is striped 2 ways by default, and "getstripe -M" on a striped directory will only return the stripe0/master index, which isn't necessarily where the inode will be allocated (depends on filename and hash function). Also, the client MDC DLM lock cache should be flushed so that the client is sure to do a lookup on the MDS.

            Hi Fan Yong,

            Can you please have a look into this issue?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Fan Yong, Can you please have a look into this issue? Thanks. Joe

            People

              yong.fan nasf (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: