[LU-5921] conf-sanity test_41c: unexpected concurent OST mounts result, rc=0 rc2=1 Created: 14/Nov/14  Updated: 21/Apr/17  Resolved: 09/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-5299 osd_start() LBUG when doing parallel ... Resolved
is related to LU-8168 conf-sanity test_41c: unexpected conc... Open
is related to LU-7442 conf-sanity test_41c: @@@@@@ FAIL: un... Resolved
Severity: 3
Rank (Obsolete): 16528

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/078fe42c-6b6b-11e4-b1b4-5254006e85c2.

The sub-test test_41c failed with the following error:

unexpected concurent OST mounts result, rc=0 rc2=1

Please provide additional information about the failure here.

Info required for matching: conf-sanity 41c



 Comments   
Comment by Andreas Dilger [ 14/Nov/14 ]

Bruno, it isn't clear that this should be considered a test failure. I think any result where one mount succeeds and the other fails should be considered a PASS?

Also, it seems to me that this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount. Since the test is running in a VM that may be quite slow, this opens up the possibility of the first mount completing, and the second one just refuses to mount in mount.lustre because it sees the first filesystem has finished mounting.

Comment by Bruno Faccini (Inactive) [ 14/Nov/14 ]

Hello Andreas,
I am sorry but I am not completely sure about what you mean by "this would be better with OBD_FAIL_RACE() instead of having the test clear the fail_loc before the second mount" ? Do you think that I better have to use the "OBD_RACE(OBD_FAIL_TGT_CONN_RACE);" capability at the beginning of target_handle_connect(), instead to set/unset OBD_FAIL_TGT_DELAY_CONNECT to have 1st mount stall 10s and try to force a race?

OTOH, I may also change conf-sanity/test_41c to only check that only one of both concurrent mounts fails regardless of its return code ?

Comment by Andreas Dilger [ 15/Nov/14 ]

Sorry, I meant OBD_RACE() might be better than waiting in userspace. That depends on whether you actually want the two processes to race or just to have one in the middle of mounting when the second one starts.

Comment by Gerrit Updater [ 20/Nov/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/17302
Subject: LU-5921 tests: enhance server target mount race testing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ec9a588356c09d83e9cb3d9d385fb9ef200edf1e

Comment by Bruno Faccini (Inactive) [ 20/Nov/15 ]

I have pushed patch #17302, to better set a concurrent/racy situation for same server target mount, and also allow to handle all mount errors instead of only EALREADY.

Comment by Gerrit Updater [ 09/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17302/
Subject: LU-5921 tests: enhance server target mount race testing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5023ca334069950ccece06db3c12104232b5ab71

Comment by Peter Jones [ 09/Dec/15 ]

Landed for 2.8

Generated at Sat Feb 10 01:55:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.