Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
3
-
5462
Description
Running replay-single, I hit this LBUG:
[50576.766529] Lustre: DEBUG MARKER: == replay-single test 88: MDS should not assign same objid to different files == 01:10:48 (1352095848) [50576.964168] Turning device loop1 (0x700001) read-only [50576.985608] Lustre: DEBUG MARKER: ost1 REPLAY BARRIER on lustre-OST0000 [50576.989899] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-OST0000 [50577.290548] Turning device loop0 (0x700000) read-only [50577.316807] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [50577.323211] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [50577.541027] Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=8104, next=8104, reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19 [50577.541472] LustreError: 10305:0:(osp_precreate.c:694:osp_precreate_reserve()) LBUG [50577.541985] Pid: 10305, comm: mdt00_000 [50577.542331] [50577.542332] Call Trace: [50577.542866] [<ffffffffa0455915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [50577.543256] [<ffffffffa0455f27>] lbug_with_loc+0x47/0xb0 [libcfs] [50577.543618] [<ffffffffa0e4473c>] osp_precreate_reserve+0x50c/0x600 [osp] [50577.544000] [<ffffffffa0e3d7a0>] ? osp_object_alloc+0x120/0x290 [osp] [50577.544368] [<ffffffffa0e43533>] osp_declare_object_create+0x163/0x540 [osp] [50577.544785] [<ffffffffa07384bd>] lod_qos_declare_object_on+0xed/0x4c0 [lod] [50577.545173] [<ffffffffa073b65b>] lod_qos_prep_create+0x92b/0x1848 [lod] [50577.545549] [<ffffffffa073686b>] lod_declare_striped_object+0x14b/0x920 [lod] [50577.546558] [<ffffffffa0737348>] lod_declare_object_create+0x308/0x4f0 [lod] [50577.546905] [<ffffffffa0cf1c4f>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd] [50577.547452] [<ffffffffa0d02aca>] mdd_create+0x39a/0x1550 [mdd] [50577.547770] [<ffffffffa0de9d28>] mdt_reint_open+0x1078/0x1870 [mdt] [50577.548094] [<ffffffffa0d08b0e>] ? md_ucred+0x1e/0x60 [mdd] [50577.548403] [<ffffffffa0db5655>] ? mdt_ucred+0x15/0x20 [mdt] [50577.548718] [<ffffffffa0dd4c81>] mdt_reint_rec+0x41/0xe0 [mdt] [50577.549059] [<ffffffffa0dce413>] mdt_reint_internal+0x4e3/0x7e0 [mdt] [50577.549395] [<ffffffffa0dce9dd>] mdt_intent_reint+0x1ed/0x500 [mdt] [50577.549715] [<ffffffffa0dca3ae>] mdt_intent_policy+0x38e/0x770 [mdt] [50577.550077] [<ffffffffa0783dda>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc] [50577.550424] [<ffffffffa07ab187>] ldlm_handle_enqueue0+0x4e7/0x1010 [ptlrpc] [50577.550761] [<ffffffffa0dca8e6>] mdt_enqueue+0x46/0x130 [mdt] [50577.551076] [<ffffffffa0dbe1f2>] mdt_handle_common+0x932/0x1740 [mdt] [50577.551425] [<ffffffffa0dbf0d5>] mdt_regular_handle+0x15/0x20 [mdt] [50577.551797] [<ffffffffa07d9573>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc] [50577.552413] [<ffffffffa045666e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [50577.552697] [<ffffffffa07d2241>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [50577.552950] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [50577.553303] [<ffffffffa07dc10a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc] [50577.553709] [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [50577.553927] [<ffffffff8100c14a>] child_rip+0xa/0x20 [50577.554222] [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [50577.554605] [<ffffffffa07db570>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [50577.554960] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [50577.555289] [50577.560016] Kernel panic - not syncing: LBUG
crashdump and modules are at /exports/crashdumps/192.168.10.210-2012-11-05-01\:10\:50
I am confused here, or I miss something probably. According to the log we saw here
sync_changes and syn_rpc_in_progress are zero as expected, and status is -19 because ost is not setup in time, so the loop in osp_precreate_reserve will
not sleep at all (why you say it was waken up so often?), i.e. it will keep running until count reach 1000 and hit LBUG, unless it reaches expire time first, which might be unlikely given that expire time is normally 100 seconds.
Another thing I do not understand is that why this OST is still being chosen to do procreate if it is not ready yet, probably another bug?
Please correct me if I misunderstand sth.