Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5921

conf-sanity test_41c: unexpected concurent OST mounts result, rc=0 rc2=1

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 16528

    Description

      This issue was created by maloo for nasf <fan.yong@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/078fe42c-6b6b-11e4-b1b4-5254006e85c2.

      The sub-test test_41c failed with the following error:

      unexpected concurent OST mounts result, rc=0 rc2=1
      

      Please provide additional information about the failure here.

      Info required for matching: conf-sanity 41c

      Attachments

        Issue Links

          Activity

            [LU-5921] conf-sanity test_41c: unexpected concurent OST mounts result, rc=0 rc2=1
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17302/
            Subject: LU-5921 tests: enhance server target mount race testing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5023ca334069950ccece06db3c12104232b5ab71

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17302/ Subject: LU-5921 tests: enhance server target mount race testing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5023ca334069950ccece06db3c12104232b5ab71

            I have pushed patch #17302, to better set a concurrent/racy situation for same server target mount, and also allow to handle all mount errors instead of only EALREADY.

            bfaccini Bruno Faccini (Inactive) added a comment - I have pushed patch #17302, to better set a concurrent/racy situation for same server target mount, and also allow to handle all mount errors instead of only EALREADY.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/17302
            Subject: LU-5921 tests: enhance server target mount race testing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ec9a588356c09d83e9cb3d9d385fb9ef200edf1e

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/17302 Subject: LU-5921 tests: enhance server target mount race testing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ec9a588356c09d83e9cb3d9d385fb9ef200edf1e

            Sorry, I meant OBD_RACE() might be better than waiting in userspace. That depends on whether you actually want the two processes to race or just to have one in the middle of mounting when the second one starts.

            adilger Andreas Dilger added a comment - Sorry, I meant OBD_RACE() might be better than waiting in userspace. That depends on whether you actually want the two processes to race or just to have one in the middle of mounting when the second one starts.

            Hello Andreas,
            I am sorry but I am not completely sure about what you mean by "this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount" ? Do you think that I better have to use the "OBD_RACE(OBD_FAIL_TGT_CONN_RACE);" capability at the beginning of target_handle_connect(), instead to set/unset OBD_FAIL_TGT_DELAY_CONNECT to have 1st mount stall 10s and try to force a race?

            OTOH, I may also change conf-sanity/test_41c to only check that only one of both concurrent mounts fails regardless of its return code ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Andreas, I am sorry but I am not completely sure about what you mean by "this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount" ? Do you think that I better have to use the "OBD_RACE(OBD_FAIL_TGT_CONN_RACE);" capability at the beginning of target_handle_connect(), instead to set/unset OBD_FAIL_TGT_DELAY_CONNECT to have 1st mount stall 10s and try to force a race? OTOH, I may also change conf-sanity/test_41c to only check that only one of both concurrent mounts fails regardless of its return code ?

            Bruno, it isn't clear that this should be considered a test failure. I think any result where one mount succeeds and the other fails should be considered a PASS?

            Also, it seems to me that this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount. Since the test is running in a VM that may be quite slow, this opens up the possibility of the first mount completing, and the second one just refuses to mount in mount.lustre because it sees the first filesystem has finished mounting.

            adilger Andreas Dilger added a comment - Bruno, it isn't clear that this should be considered a test failure. I think any result where one mount succeeds and the other fails should be considered a PASS? Also, it seems to me that this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount. Since the test is running in a VM that may be quite slow, this opens up the possibility of the first mount completing, and the second one just refuses to mount in mount.lustre because it sees the first filesystem has finished mounting.

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: