Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.8.0
-
3
-
17625
Description
This issue was created by maloo for wangdi <di.wang@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b55511da-bd38-11e4-8d85-5254006e85c2.
The sub-test test_101 failed with the following error:
test failed to respond and timed out 11:19:32:Lustre: DEBUG MARKER: == replay-single test 101: Shouldn't reassign precreated objs to other files after recovery == 11:19:09 (1424863149) 11:19:32:Lustre: DEBUG MARKER: sync; sync; sync 11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno 11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly 11:19:32:LustreError: 24268:0:(osd_handler.c:1462:osd_ro()) *** setting lustre-MDT0000 read-only *** 11:19:32:Turning device dm-0 (0xfd00000) read-only 11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 11:19:32:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 11:19:32:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 11:19:32:Lustre: DEBUG MARKER: umount -d /mnt/mds1 11:19:32:Removing read-only on unknown block (0xfd00000) 11:19:32:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 11:19:32:Lustre: DEBUG MARKER: hostname 11:19:32:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 11:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o abort_recovery /dev/lvm-Role_MDS/P1 /mnt/mds1 11:19:32:LDISKFS-fs (dm-0): recovery complete 11:19:32:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 11:19:32:LustreError: 24605:0:(mdt_handler.c:5797:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device 11:19:32:LustreError: 24605:0:(ldlm_lib.c:2261:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery 11:19:32:Lustre: 24684:0:(ldlm_lib.c:1822:target_recovery_overseer()) recovery is aborted, evict exports in recovery 11:19:32:Lustre: 24684:0:(ldlm_lib.c:1822:target_recovery_overseer()) Skipped 2 previous similar messages 11:19:32:Lustre: lustre-MDT0000: disconnecting 5 stale clients 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) header@ffff88006060fd08[0x0, 4, [0x200028c71:0x19:0x0] hash]{ 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....mdt@ffff88006060fd58mdt-object@ffff88006060fd08(ioepoch=0 flags=0x0, epochcount=0, writecount=0) 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....mdd@ffff88006d9a9f00mdd-object@ffff88006d9a9f00(open_count=0, valid=0, cltime=0, flags=0) 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....lod@ffff88006060ee48lod-object@ffff88006060ee48 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....osd-ldiskfs@ffff880061e98740osd-ldiskfs-object@ffff880061e98740(i:(null):0/0)[plain] 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) } header@ffff88006060fd08 11:19:32: 11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) lu_object does not exists [0x200028c71:0x19:0x0] 11:19:32:LustreError: 24684:0:(osd_handler.c:4672:osd_index_ea_insert()) lustre-MDT0000-osd: Can not find object [0x200028c71:0x19:0x0]32776:3382146134: rc = -2 11:19:32:LustreError: 24684:0:(osd_internal.h:979:osd_trans_exec_op()) ASSERTION( oh->ot_handle != ((void *)0) ) failed: 11:19:32:LustreError: 24684:0:(osd_internal.h:979:osd_trans_exec_op()) LBUG 11:19:32:Pid: 24684, comm: tgt_recov 11:19:32: 11:19:32:Call Trace: 11:19:32: [<ffffffffa0491895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 11:19:32: [<ffffffffa0491e97>] lbug_with_loc+0x47/0xb0 [libcfs] 11:19:32: [<ffffffffa0d0df7f>] osd_trans_exec_op+0x14f/0x2e0 [osd_ldiskfs] 11:19:32: [<ffffffffa0d18c45>] osd_index_ea_delete+0x1d5/0xd00 [osd_ldiskfs] 11:19:32: [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40 11:19:32: [<ffffffffa087fd33>] out_obj_index_delete+0x153/0x370 [ptlrpc] 11:19:32: [<ffffffffa08800fc>] out_tx_index_insert_undo+0x1c/0x20 [ptlrpc] 11:19:32: [<ffffffffa088c83c>] distribute_txn_replay_handle+0x7ec/0x940 [ptlrpc] 11:19:32: [<ffffffffa07d86a1>] target_recovery_thread+0x9e1/0x1ad0 [ptlrpc] 11:19:32: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc] 11:19:32: [<ffffffff8109e66e>] kthread+0x9e/0xc0 11:19:32: [<ffffffff8100c20a>] child_rip+0xa/0x20 11:19:32: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 11:19:32: [<ffffffff8100c200>] ? child_rip+0x0/0x20 11:19:32: 11:19:32:Kernel panic - not syncing: LBUG 11:19:32:Pid: 24684, comm: tgt_recov Not tainted 2.6.32-504.8.1.el6_lustre.g0ef66b1.x86_64 #1 11:19:32:Call Trace: 11:19:32: [<ffffffff81529b76>] ? panic+0xa7/0x16f 11:19:32: [<ffffffffa0491eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 11:19:32: [<ffffffffa0d0df7f>] ? osd_trans_exec_op+0x14f/0x2e0 [osd_ldiskfs] 11:19:32: [<ffffffffa0d18c45>] ? osd_index_ea_delete+0x1d5/0xd00 [osd_ldiskfs] 11:19:32: [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40 11:19:32: [<ffffffffa087fd33>] ? out_obj_index_delete+0x153/0x370 [ptlrpc] 11:19:32: [<ffffffffa08800fc>] ? out_tx_index_insert_undo+0x1c/0x20 [ptlrpc] 11:19:32: [<ffffffffa088c83c>] ? distribute_txn_replay_handle+0x7ec/0x940 [ptlrpc] 11:19:32: [<ffffffffa07d86a1>] ? target_recovery_thread+0x9e1/0x1ad0 [ptlrpc] 11:19:32: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc] 11:19:32: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 11:19:32: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 11:19:32: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 11:19:32: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Please provide additional information about the failure here.
Info required for matching: replay-single 101
Attachments
Issue Links
- is related to
-
LU-3534 async update cross-MDTs
- Resolved