Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.5, Lustre 1.8.9
-
None
-
3
-
8241
Description
I was suspicious when I hit this for the first time, but I hit it once more, so I am filing a bug.
First time was in sanity test 182
[78415.059119] LustreError: 17692:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 25769825268, new: 25769825267 replay: 0. see LU-617. [78415.061562] LustreError: 17692:0:(mdt_recovery.c:390:mdt_last_rcvd_update()) Skipped 8 previous similar messages [78417.103412] LustreError: 31938:0:(pack_generic.c:1032:lustre_msg_get_opc()) incorrect message magic: a0b03b5e(msg:ffff8800aba18df0) [78417.103916] LustreError: 31938:0:(pack_generic.c:1033:lustre_msg_get_opc()) LBUG [78417.104332] Pid: 31938, comm: createmany [78417.104542] [78417.104543] Call Trace: [78417.104925] [<ffffffffa0b028a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [78417.105213] [<ffffffffa0b02ea7>] lbug_with_loc+0x47/0xb0 [libcfs] [78417.105490] [<ffffffffa0d5a4ba>] lustre_msg_get_opc+0x8a/0x110 [ptlrpc] [78417.105778] [<ffffffffa0d5d97b>] _debug_req+0x57b/0x680 [ptlrpc] [78417.106087] [<ffffffff81047fde>] ? kernel_map_pages+0xfe/0x110 [78417.106345] [<ffffffffa0d4965e>] ptlrpc_free_committed+0x20e/0x620 [ptlrpc] [78417.106619] [<ffffffffa0d4b313>] after_reply+0x7a3/0xd90 [ptlrpc] [78417.106878] [<ffffffffa0d502c3>] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc] [78417.107137] [<ffffffff8107eca0>] ? process_timeout+0x0/0x10 [78417.107389] [<ffffffffa0d512fa>] ptlrpc_set_wait+0x32a/0x880 [ptlrpc] [78417.107639] [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20 [78417.107907] [<ffffffffa0d5b4c6>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc] [78417.108187] [<ffffffffa0d518cf>] ptlrpc_queue_wait+0x7f/0x220 [ptlrpc] [78417.108453] [<ffffffffa0d2d9c5>] ldlm_cli_enqueue+0x365/0x770 [ptlrpc] [78417.119499] [<ffffffff814fc8bc>] ? __mutex_lock_slowpath+0x21c/0x2c0 [78417.119779] [<ffffffffa0d326e0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [78417.120071] [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [78417.120334] [<ffffffffa0415f55>] mdc_enqueue+0x795/0x18c0 [mdc] [78417.120583] [<ffffffffa041727d>] mdc_intent_lock+0x1fd/0x5b4 [mdc] [78417.120847] [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [78417.121427] [<ffffffffa0d326e0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [78417.121688] [<ffffffffa03c8667>] ? lmv_fid_alloc+0x117/0x4b0 [lmv] [78417.121952] [<ffffffffa03e02d6>] lmv_intent_open+0x1f6/0x8c0 [lmv] [78417.122206] [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [78417.122464] [<ffffffff811661ca>] ? cache_alloc_debugcheck_after+0x14a/0x210 [78417.122720] [<ffffffff81168671>] ? __kmalloc+0x191/0x2a0 [78417.122958] [<ffffffffa03e0c4b>] lmv_intent_lock+0x2ab/0x370 [lmv] [78417.123211] [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [78417.123471] [<ffffffffa0f37a9e>] ? ll_i2gids+0x2e/0xd0 [lustre] [78417.123718] [<ffffffffa0f1e4ba>] ? ll_prep_md_op_data+0xfa/0x3a0 [lustre] [78417.123983] [<ffffffffa0f3d014>] ll_lookup_it+0x3a4/0xbf0 [lustre] [78417.124273] [<ffffffffa0f389e0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [78417.124539] [<ffffffffa0f3d8e9>] ll_lookup_nd+0x89/0x3b0 [lustre] [78417.124784] [<ffffffff8119b456>] ? d_alloc+0x166/0x1d0 [78417.125087] [<ffffffff8118e8e2>] __lookup_hash+0x102/0x160 [78417.125322] [<ffffffff8118ef8a>] lookup_hash+0x3a/0x50 [78417.125550] [<ffffffff81192edb>] do_filp_open+0x2eb/0xe00 [78417.125790] [<ffffffff814fae88>] ? thread_return+0x4e/0x776 [78417.126034] [<ffffffff8118f37b>] ? getname+0x3b/0x250 [78417.126259] [<ffffffff8119fd7b>] ? alloc_fd+0xab/0x160 [78417.126490] [<ffffffff8117e399>] do_sys_open+0x69/0x140 [78417.126717] [<ffffffff8100c575>] ? math_state_restore+0x45/0x60 [78417.126965] [<ffffffff8117e4b0>] sys_open+0x20/0x30 [78417.127190] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [78417.127428] [78417.169930] Kernel panic - not syncing: LBUG
Crashdumps and modules for this one are in:
/exports/crashdumps/192.168.10.224-2013-05-09-20\:43\:05
The other crash was like this:
[79955.565718] Lustre: DEBUG MARKER: == sanity test 71: Running dbench on lustre (don't segment fault) ====== 21:01:15 (1368493275) [80393.197212] LustreError: 25536:0:(pack_generic.c:1032:lustre_msg_get_opc()) incorrect message magic: a0b03b5e(msg:ffff8800464ccdf0) [80393.197711] LustreError: 25536:0:(pack_generic.c:1033:lustre_msg_get_opc()) LBUG [80393.209355] Pid: 25536, comm: dbench [80393.209559] [80393.209559] Call Trace: [80393.209964] [<ffffffffa0b028a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [80393.210228] [<ffffffffa0b02ea7>] lbug_with_loc+0x47/0xb0 [libcfs] [80393.210502] [<ffffffffa0d5a4ba>] lustre_msg_get_opc+0x8a/0x110 [ptlrpc] [80393.212853] [<ffffffffa0d5d97b>] _debug_req+0x57b/0x680 [ptlrpc] [80393.213149] [<ffffffff81047fde>] ? kernel_map_pages+0xfe/0x110 [80393.213412] [<ffffffffa0d4965e>] ptlrpc_free_committed+0x20e/0x620 [ptlrpc] [80393.213689] [<ffffffffa0d4b313>] after_reply+0x7a3/0xd90 [ptlrpc] [80393.213967] [<ffffffffa0d502c3>] ptlrpc_check_set+0x1093/0x1da0 [ptlrpc] [80393.214225] [<ffffffff8107eca0>] ? process_timeout+0x0/0x10 [80393.214477] [<ffffffffa0d512fa>] ptlrpc_set_wait+0x32a/0x880 [ptlrpc] [80393.214733] [<ffffffff8105ad10>] ? default_wake_function+0x0/0x20 [80393.214995] [<ffffffffa0d5b4c6>] ? lustre_msg_set_jobid+0xb6/0x140 [ptlrpc] [80393.215271] [<ffffffffa0d518cf>] ptlrpc_queue_wait+0x7f/0x220 [ptlrpc] [80393.215526] [<ffffffffa0401f59>] mdc_sync+0xb9/0x230 [mdc] [80393.215773] [<ffffffffa03d576e>] lmv_sync+0x2ae/0x550 [lmv] [80393.216022] [<ffffffffa0f0a6ab>] ll_fsync+0x1ab/0x860 [lustre] [80393.216290] [<ffffffff811b2401>] vfs_fsync_range+0xa1/0xe0 [80393.216525] [<ffffffff811b24ad>] vfs_fsync+0x1d/0x20 [80393.216777] [<ffffffff811b24ee>] do_fsync+0x3e/0x60 [80393.217003] [<ffffffff811b2540>] sys_fsync+0x10/0x20 [80393.217230] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [80393.217471] [80393.228801] Kernel panic - not syncing: LBUG
/exports/crashdumps/192.168.10.224-2013-05-13-21\:08\:35/vmcore, same modules as before.
Attachments
Issue Links
- duplicates
-
LU-4949 Master to 2.5 server - LBUG: lustre_msg_buf()) ASSERTION( 0 ) failed: incorrect message magic: 0000000d(msg:ffff88020fe3c0f8)
-
- Closed
-
-
LU-4956 Test failure on test suite recovery-small, subtest test_27
-
- Closed
-
- is duplicated by
-
LU-3309 sanity test 182 crash
-
- Resolved
-
- is related to
-
LU-6172 Client LBUG in null_free_reqbuf
-
- Resolved
-