[LU-511] Race condition loading modules causes mount to fail Created: 19/Jul/11  Updated: 03/Oct/12  Resolved: 03/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Wally Wang (Inactive) Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: patch

Severity: 2
Rank (Obsolete): 6581

 Description   

On servers with more than one OST, the module load on the initial boot intermittently fails with
the following errors:

> [2010-12-09 07:17:55][c0-0c0s1n2]lustre: gave up waiting for init of module lov.
> [2010-12-09 07:17:55][c0-0c0s1n2]lustre: Unknown symbol lov_stripe_lock
> [2010-12-09 07:18:25][c0-0c0s1n2]lustre: gave up waiting for init of module lov.
> [2010-12-09 07:18:25][c0-0c0s1n2]lustre: Unknown symbol lov_test_and_clear_async_rc
> [2010-12-09 07:18:55][c0-0c0s1n2]lustre: gave up waiting for init of module lov.
> [2010-12-09 07:18:55][c0-0c0s1n2]lustre: Unknown symbol lov_stripe_unlock

The kernel (in module.c:use_module) aborts the module load because it takes too long. Because the
lov module load is aborted, one OST fails to mount. The other OSTs mount successfully.

This is bug Oracle 24464.



 Comments   
Comment by Wally Wang (Inactive) [ 19/Jul/11 ]

Patch from Oracle bug 24464, attachment 32987 is in:

http://review.whamcloud.com/#change,1117

Comment by Andreas Dilger [ 05/Apr/12 ]

Wally, is this change still needed? If not, please abandon the change and close the bug.

Comment by Wally Wang (Inactive) [ 11/Apr/12 ]

Cray has put the modprobe in Cray's own startup script to avoid the problem. Lustre without this patch could still run into the problem. So do you still want to push it or close it for now?

Comment by Keith Mannthey (Inactive) [ 13/Sep/12 ]

It seems we have been years without this patch? Our autotest runs plenty of multi OST servers and I don't think we have seen this issue directly? Perhaps we don't need this patch?

Comment by Keith Mannthey (Inactive) [ 03/Oct/12 ]

The patch as been abandoned. Please reopen if it it still needed.

Generated at Sat Feb 10 01:07:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.