[LU-6660] interop DNE2 test between 2.5/2.7 clients and DNE2 servers Created: 28/May/15  Updated: 31/Aug/15  Resolved: 31/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Task Priority: Blocker
Reporter: Di Wang Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-6858 Demonstrate DNE2 functionality Open
Related
is related to LU-3531 DNE2: striped directory Resolved
Rank (Obsolete): 9223372036854775807

 Description   

This test is to verify 2.5/2.7 clients can work with DNE2 servers.
For DNE2 server, it can use this build
https://build.hpdd.intel.com/job/lustre-reviews/32529/

2.5/2.7 client can use the most update rpm in the build.
https://build.hpdd.intel.com

Right now, we can check these two things.

1. Sanity should pass as other inter-op test.
2. check if "lfs mkdir" and "lfs mv" works in this environment.



 Comments   
Comment by Andreas Dilger [ 29/May/15 ]

It would be best to run this as much as possible via Test-Parameters: in Gerrit and alwaysuploadlogs so we get a good record of the tests run.

Comment by Gerrit Updater [ 02/Jun/15 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15099
Subject: LU-6660 dne: interop with 2.5 clients
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 76c6df4f81b312145f1db674d89abc8a30e0281a

Comment by Richard Henwood (Inactive) [ 03/Jun/15 ]

Hi Lai;

It looks like Autotest is failing early currently. Can you confirm that this test runs successfully on your local machine?

Comment by Lai Siyao [ 04/Jun/15 ]

Richard, it failed because 2.5 on el6.6 client is not supported by autotest, I've changed to b2_5_fe according to Yujian's advice.

Local test result looks good so far, and I'll have autotest run also the make the result more clear.

Comment by Lai Siyao [ 10/Jun/15 ]

https://testing.hpdd.intel.com/test_sets/bd8ddb6c-0e27-11e5-a0ac-5254006e85c2 sanity test failures 24b, 24f, 24x and 27z look to be because DNE2 rename changed semantic, so for striped directory 2.7 client will fail with -EREMOTE, this should not be an issue since this is striped directory. 103b failure looks to be a real issue, and I have reproduced it locally, it looks to be there is an inflight OSP RPC, so OSP device refcount is not NULL upon umount, I'll continue investigating into it.

As for 2.5 a wrong build number is specified in Test-parameters, I'll retrigger auto test.

Comment by Andreas Dilger [ 11/Jun/15 ]

I'm not sure I understand about 2.7 clients and -EREMOTE? 2.7 clients DO support striped directories, but renames across stripes should return -EXDEV so that user space tools can handle this properly. If the Async commit causes clients to return -EREMOTE to applications then "mv" and other tools will fail.

Comment by Lai Siyao [ 11/Jun/15 ]

Indeed, this was introduced by http://review.whamcloud.com/#/c/12282/47 for LU-3537, which adds support for cross-MDT rename, so once cross-MDT detected, DNE2 MDS return -EREMOTE, and LMV will retry with correct MDS. I'll add interop code to address this issue.

Comment by Gerrit Updater [ 17/Jun/15 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15323
Subject: LU-6660 rename: DNE2 should return -EXDEV upon remote rename
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b08334081bbfb07958cc51ceda785c1bff784252

Comment by Lai Siyao [ 01/Jul/15 ]

2.5 client failed tests:
132: failed because SOM is removed from master.
151,156: failed because lproc stats format changed, after I changed to use DNE2 format, tests can pass.
154a: failed because of LU-5424, which is not in 2.5, and 2.7 test script adds version check.

2.7 client failed tests:
132: failed because SOM is removed from master.

Comment by Andreas Dilger [ 02/Jul/15 ]

Lai, to confirm - the test failures in your last comment are when runnin with patch 15323 applied, and without that patch there are more failures due to -EREMOTE being returned to the client?

It might be useful to submit patches to b2_5_fe and b2_7_fe to skip the SOM test 132.

Comment by Lai Siyao [ 03/Jul/15 ]

yes, Andreas.

I'll do it later.

Comment by Richard Henwood (Inactive) [ 16/Jul/15 ]

Lai,

Can this ticket be closed?

Comment by Lai Siyao [ 17/Jul/15 ]

Richard, you may wait for some time, cause when all patches landed, I'd like to test again to verify.

Comment by Gerrit Updater [ 19/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15323/
Subject: LU-6660 rename: DNE2 should return -EXDEV upon remote rename
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 31234e608fa156761b2d8c48ff60cba875dd4832

Comment by Jessica A. Popp (Inactive) [ 03/Aug/15 ]

Lai - Have you had a chance to to verify now that the patch has landed?

Comment by James A Simmons [ 03/Aug/15 ]

Don't we need to back port it to 2.7 and 2.5?

Comment by Jian Yu [ 03/Aug/15 ]

Hi James,

Lustre 2.7 and 2.5 do not contain the patch http://review.whamcloud.com/12282 for LU-3537, which introduced the issue. So, we do not need back-port http://review.whamcloud.com/15323.

Di or Lai, could you please confirm on this?

Comment by Lai Siyao [ 05/Aug/15 ]

Jian, yes, you're right.

Comment by Peter Jones [ 31/Aug/15 ]

Fix was landed for 2.8

Generated at Sat Feb 10 02:02:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.