Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6660

interop DNE2 test between 2.5/2.7 clients and DNE2 servers

Details

    • Task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • None
    • None
    • 9223372036854775807

    Description

      This test is to verify 2.5/2.7 clients can work with DNE2 servers.
      For DNE2 server, it can use this build
      https://build.hpdd.intel.com/job/lustre-reviews/32529/

      2.5/2.7 client can use the most update rpm in the build.
      https://build.hpdd.intel.com

      Right now, we can check these two things.

      1. Sanity should pass as other inter-op test.
      2. check if "lfs mkdir" and "lfs mv" works in this environment.

      Attachments

        Issue Links

          Activity

            [LU-6660] interop DNE2 test between 2.5/2.7 clients and DNE2 servers
            laisiyao Lai Siyao added a comment -

            yes, Andreas.

            I'll do it later.

            laisiyao Lai Siyao added a comment - yes, Andreas. I'll do it later.
            adilger Andreas Dilger added a comment - - edited

            Lai, to confirm - the test failures in your last comment are when runnin with patch 15323 applied, and without that patch there are more failures due to -EREMOTE being returned to the client?

            It might be useful to submit patches to b2_5_fe and b2_7_fe to skip the SOM test 132.

            adilger Andreas Dilger added a comment - - edited Lai, to confirm - the test failures in your last comment are when runnin with patch 15323 applied, and without that patch there are more failures due to -EREMOTE being returned to the client? It might be useful to submit patches to b2_5_fe and b2_7_fe to skip the SOM test 132.
            laisiyao Lai Siyao added a comment -

            2.5 client failed tests:
            132: failed because SOM is removed from master.
            151,156: failed because lproc stats format changed, after I changed to use DNE2 format, tests can pass.
            154a: failed because of LU-5424, which is not in 2.5, and 2.7 test script adds version check.

            2.7 client failed tests:
            132: failed because SOM is removed from master.

            laisiyao Lai Siyao added a comment - 2.5 client failed tests: 132: failed because SOM is removed from master. 151,156: failed because lproc stats format changed, after I changed to use DNE2 format, tests can pass. 154a: failed because of LU-5424 , which is not in 2.5, and 2.7 test script adds version check. 2.7 client failed tests: 132: failed because SOM is removed from master.

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15323
            Subject: LU-6660 rename: DNE2 should return -EXDEV upon remote rename
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b08334081bbfb07958cc51ceda785c1bff784252

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15323 Subject: LU-6660 rename: DNE2 should return -EXDEV upon remote rename Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b08334081bbfb07958cc51ceda785c1bff784252
            laisiyao Lai Siyao added a comment -

            Indeed, this was introduced by http://review.whamcloud.com/#/c/12282/47 for LU-3537, which adds support for cross-MDT rename, so once cross-MDT detected, DNE2 MDS return -EREMOTE, and LMV will retry with correct MDS. I'll add interop code to address this issue.

            laisiyao Lai Siyao added a comment - Indeed, this was introduced by http://review.whamcloud.com/#/c/12282/47 for LU-3537 , which adds support for cross-MDT rename, so once cross-MDT detected, DNE2 MDS return -EREMOTE, and LMV will retry with correct MDS. I'll add interop code to address this issue.

            I'm not sure I understand about 2.7 clients and -EREMOTE? 2.7 clients DO support striped directories, but renames across stripes should return -EXDEV so that user space tools can handle this properly. If the Async commit causes clients to return -EREMOTE to applications then "mv" and other tools will fail.

            adilger Andreas Dilger added a comment - I'm not sure I understand about 2.7 clients and -EREMOTE? 2.7 clients DO support striped directories, but renames across stripes should return -EXDEV so that user space tools can handle this properly. If the Async commit causes clients to return -EREMOTE to applications then "mv" and other tools will fail.
            laisiyao Lai Siyao added a comment -

            https://testing.hpdd.intel.com/test_sets/bd8ddb6c-0e27-11e5-a0ac-5254006e85c2 sanity test failures 24b, 24f, 24x and 27z look to be because DNE2 rename changed semantic, so for striped directory 2.7 client will fail with -EREMOTE, this should not be an issue since this is striped directory. 103b failure looks to be a real issue, and I have reproduced it locally, it looks to be there is an inflight OSP RPC, so OSP device refcount is not NULL upon umount, I'll continue investigating into it.

            As for 2.5 a wrong build number is specified in Test-parameters, I'll retrigger auto test.

            laisiyao Lai Siyao added a comment - https://testing.hpdd.intel.com/test_sets/bd8ddb6c-0e27-11e5-a0ac-5254006e85c2 sanity test failures 24b, 24f, 24x and 27z look to be because DNE2 rename changed semantic, so for striped directory 2.7 client will fail with -EREMOTE, this should not be an issue since this is striped directory. 103b failure looks to be a real issue, and I have reproduced it locally, it looks to be there is an inflight OSP RPC, so OSP device refcount is not NULL upon umount, I'll continue investigating into it. As for 2.5 a wrong build number is specified in Test-parameters, I'll retrigger auto test.
            laisiyao Lai Siyao added a comment -

            Richard, it failed because 2.5 on el6.6 client is not supported by autotest, I've changed to b2_5_fe according to Yujian's advice.

            Local test result looks good so far, and I'll have autotest run also the make the result more clear.

            laisiyao Lai Siyao added a comment - Richard, it failed because 2.5 on el6.6 client is not supported by autotest, I've changed to b2_5_fe according to Yujian's advice. Local test result looks good so far, and I'll have autotest run also the make the result more clear.

            Hi Lai;

            It looks like Autotest is failing early currently. Can you confirm that this test runs successfully on your local machine?

            rhenwood Richard Henwood (Inactive) added a comment - Hi Lai; It looks like Autotest is failing early currently. Can you confirm that this test runs successfully on your local machine?

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15099
            Subject: LU-6660 dne: interop with 2.5 clients
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 76c6df4f81b312145f1db674d89abc8a30e0281a

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15099 Subject: LU-6660 dne: interop with 2.5 clients Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 76c6df4f81b312145f1db674d89abc8a30e0281a

            It would be best to run this as much as possible via Test-Parameters: in Gerrit and alwaysuploadlogs so we get a good record of the tests run.

            adilger Andreas Dilger added a comment - It would be best to run this as much as possible via Test-Parameters: in Gerrit and alwaysuploadlogs so we get a good record of the tests run.

            People

              laisiyao Lai Siyao
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: