[LU-3810] client_obd_cleanup() LBUGs if called immediately after client_obd_setup() Created: 21/Aug/13  Updated: 14/Nov/13  Resolved: 14/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Major
Reporter: John Hammond Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: mdc, mgc, obd, osc, osp, patch

Severity: 3
Rank (Obsolete): 9843

 Description   

In osc_setup() any failure after client_obd_setup() will trigger an LBUG in client_obd_cleanup().

LustreError: 14072:0:(ldlm_lib.c:480:client_obd_cleanup()) ASSERTION( obddev->u.cli.cl_import == ((void *)0) ) failed: 
LustreError: 14072:0:(ldlm_lib.c:480:client_obd_cleanup()) LBUG
Pid: 14072, comm: llog_process_th

Call Trace:
 [<ffffffffa04ec895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa04ece97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa07e46db>] client_obd_cleanup+0x11b/0x120 [ptlrpc]
 [<ffffffffa0a9535f>] osc_setup+0x1ef/0x230 [osc]
 [<ffffffffa0aaad0a>] osc_device_alloc+0x20a/0x370 [osc]
 [<ffffffffa0680e87>] obd_setup+0x1d7/0x2e0 [obdclass]
 [<ffffffffa0681198>] class_setup+0x208/0x870 [obdclass]
 [<ffffffffa0687a6c>] class_process_config+0xc7c/0x1c30 [obdclass]
 [<ffffffffa068241b>] ? lustre_cfg_new+0x3fb/0x650 [obdclass]
 [<ffffffffa0689bbb>] class_config_llog_handler+0xaab/0x17d0 [obdclass]
 [<ffffffff81510afe>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0648499>] llog_process_thread+0x8d9/0xcc0 [obdclass]
 [<ffffffffa0693ccf>] ? keys_fill+0x6f/0x190 [obdclass]
 [<ffffffffa0649325>] llog_process_thread_daemonize+0x45/0x70 [obdclass]
 [<ffffffffa06492e0>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass]
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

This is seen by replacing

handler = ptlrpcd_alloc_work(cli->cl_import, brw_queue_work, cli);

with

handler = ERR_PTR(-ENOMEM);

or by similar fault injection in osc_quota_setup().

mgc_setup() has the same issue.

mdc_setup() has the same issue, plus double frees and multiple calls to ptlrpcd_decref().

lwp_setup() would have this issue but nothing interesting happens after client_obd_setup().

osp_init0 doesn't LBUG but we do get stuck with some devices:

9 AT osp lustre-MDT0001-osp-MDT0000 lustre-MDT0000-mdtlov_UUID 1
13 AT osp lustre-MDT0000-osp-MDT0001 lustre-MDT0001-mdtlov_UUID 1



 Comments   
Comment by Swapnil Pimpale (Inactive) [ 06/Sep/13 ]

I have submitted a patch to fix this issue here --> http://review.whamcloud.com/#/c/7561/

Comment by Jodi Levi (Inactive) [ 14/Nov/13 ]

Patch landed to Master. If more work is needed in this ticket let me know and I will reopen.

Generated at Sat Feb 10 01:37:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.