[LU-1840] ldlm_resource_get returns to user space without dropping lr_lvb_mutex Created: 05/Sep/12  Updated: 13/Mar/13  Resolved: 05/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Peng Tao Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 6335

 Description   

I got following warnings:

================================================
[ BUG: lock held when returning to user space! ]
3.6.0-rc3 #1 Tainted: G O
------------------------------------------------
bash/1868 is leaving the kernel with locks still held!
1 lock held by bash/1868:
#0: (&res->lr_lvb_mutex)

{......}

, at: [<ffffffffa041a18c>] ldlm_resource_get+0x2cc/0x880 [ptlrpc]



 Comments   
Comment by Peng Tao [ 05/Sep/12 ]

patch submitted:
http://review.whamcloud.com/3883

Comment by Nathaniel Clark [ 05/Nov/12 ]

Change has been successfully cherry-picked as e6d5d5508231fca83cb306af69a877a7101de37a

Comment by Artem Blagodarenko (Inactive) [ 13/Mar/13 ]

This bug is actual for 2.1 (without LU-1840 patch applied). This BUG happened two day ago:

Feb 16 02:25:04 rhel6-64 kernel: Lustre: MGS: Regenerating lustre-MDTffff log by user request.
Feb 16 02:25:04 rhel6-64 kernel: Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Feb 16 02:25:04 rhel6-64 kernel: Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripecount in log lustre-MDT0000
Feb 16 02:25:04 rhel6-64 kernel: Lustre: Skipped 1 previous similar message

=========================
[ BUG: held lock freed! ]
-------------------------
ll_mgs_02/16035 is freeing memory ffff8800a4da4dc0-ffff8800a4da4e57, with a lock still held there!
 (&res->lr_lvb_mutex){+.+...}, at: [<ffffffffa076e509>] ldlm_resource_get+0x2a9/0x790 [ptlrpc]
1 lock held by ll_mgs_02/16035:
 #0:  (&res->lr_lvb_mutex){+.+...}, at: [<ffffffffa076e509>] ldlm_resource_get+0x2a9/0x790 [ptlrpc]

stack backtrace:
Pid: 16035, comm: ll_mgs_02 Tainted: G        W  ----------------   2.6.32-131.17.1-osg #0
Call Trace:
 [<ffffffff810a89b4>] ? debug_check_no_locks_freed+0x164/0x170
 [<ffffffff810a3677>] ? debug_mutex_init+0x27/0x50
 [<ffffffff81095751>] ? __mutex_init+0x61/0x70
 [<ffffffffa076e4fe>] ? ldlm_resource_get+0x29e/0x790 [ptlrpc]
 [<ffffffff814fb880>] ? _spin_unlock_irqrestore+0x40/0x80
 [<ffffffffa0767e45>] ? ldlm_lock_create+0x55/0xaa0 [ptlrpc]
 [<ffffffffa078f194>] ? ldlm_handle_enqueue0+0x164/0xf60 [ptlrpc]
 [<ffffffffa078fff6>] ? ldlm_handle_enqueue+0x66/0x70 [ptlrpc]
 [<ffffffffa0790000>] ? ldlm_server_completion_ast+0x0/0x620 [ptlrpc]
 [<ffffffffa0790620>] ? ldlm_server_blocking_ast+0x0/0x860 [ptlrpc]
 [<ffffffffa0c10b9a>] ? mgs_handle+0x68a/0x1800 [mgs]
 [<ffffffffa04e3fe1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa04df424>] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [<ffffffffa07bc6d4>] ? ptlrpc_server_handle_request+0x474/0x1050 [ptlrpc]
 [<ffffffffa04d85ae>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa04e5bff>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
 [<ffffffffa07b5ee2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
 [<ffffffffa07b8156>] ? ptlrpc_server_handle_req_in+0xa66/0xcc0 [ptlrpc]
 [<ffffffffa07bec74>] ? ptlrpc_main+0x5c4/0xcd0 [ptlrpc]
 [<ffffffff810a85fd>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffffa07be6b0>] ? ptlrpc_main+0x0/0xcd0 [ptlrpc]
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20
 [<ffffffff8100bbd0>] ? restore_args+0x0/0x30
 [<ffffffffa07be6b0>] ? ptlrpc_main+0x0/0xcd0 [ptlrpc]
 [<ffffffff8100c280>] ? child_rip+0x0/0x20
Lustre: 16047:0:(osd_handler.c:234:osd_push_ctxt()) ucred is not initialized
Lustre: 16047:0:(osd_handler.c:234:osd_push_ctxt()) ucred is not initialized
Feb 16 02:25:04 rhel6-64 kernel:
Generated at Sat Feb 10 01:20:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.