[LU-6292] replay-single test_101: osd_trans_exec_op()) ASSERTION( oh->ot_handle != ((void *)0) ) failed: - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.8.0
Labels:
- dne2

Severity:
3
Rank (Obsolete):
17625

Description

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b55511da-bd38-11e4-8d85-5254006e85c2.

The sub-test test_101 failed with the following error:

test failed to respond and timed out
11:19:32:Lustre: DEBUG MARKER: == replay-single test 101: Shouldn't reassign precreated objs to other files after recovery == 11:19:09 (1424863149)
11:19:32:Lustre: DEBUG MARKER: sync; sync; sync
11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
11:19:32:LustreError: 24268:0:(osd_handler.c:1462:osd_ro()) *** setting lustre-MDT0000 read-only ***
11:19:32:Turning device dm-0 (0xfd00000) read-only
11:19:32:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
11:19:32:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
11:19:32:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
11:19:32:Lustre: DEBUG MARKER: umount -d /mnt/mds1
11:19:32:Removing read-only on unknown block (0xfd00000)
11:19:32:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
11:19:32:Lustre: DEBUG MARKER: hostname
11:19:32:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
11:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre  -o abort_recovery 		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
11:19:32:LDISKFS-fs (dm-0): recovery complete
11:19:32:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
11:19:32:LustreError: 24605:0:(mdt_handler.c:5797:mdt_iocontrol()) lustre-MDT0000: Aborting recovery for device
11:19:32:LustreError: 24605:0:(ldlm_lib.c:2261:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
11:19:32:Lustre: 24684:0:(ldlm_lib.c:1822:target_recovery_overseer()) recovery is aborted, evict exports in recovery
11:19:32:Lustre: 24684:0:(ldlm_lib.c:1822:target_recovery_overseer()) Skipped 2 previous similar messages
11:19:32:Lustre: lustre-MDT0000: disconnecting 5 stale clients
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) header@ffff88006060fd08[0x0, 4, [0x200028c71:0x19:0x0] hash]{
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....mdt@ffff88006060fd58mdt-object@ffff88006060fd08(ioepoch=0 flags=0x0, epochcount=0, writecount=0)
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....mdd@ffff88006d9a9f00mdd-object@ffff88006d9a9f00(open_count=0, valid=0, cltime=0, flags=0)
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....lod@ffff88006060ee48lod-object@ffff88006060ee48
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) ....osd-ldiskfs@ffff880061e98740osd-ldiskfs-object@ffff880061e98740(i:(null):0/0)[plain]
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) } header@ffff88006060fd08
11:19:32:
11:19:32:LustreError: 24684:0:(osd_handler.c:4519:osd_object_find()) lu_object does not exists [0x200028c71:0x19:0x0]
11:19:32:LustreError: 24684:0:(osd_handler.c:4672:osd_index_ea_insert()) lustre-MDT0000-osd: Can not find object [0x200028c71:0x19:0x0]32776:3382146134: rc = -2
11:19:32:LustreError: 24684:0:(osd_internal.h:979:osd_trans_exec_op()) ASSERTION( oh->ot_handle != ((void *)0) ) failed: 
11:19:32:LustreError: 24684:0:(osd_internal.h:979:osd_trans_exec_op()) LBUG
11:19:32:Pid: 24684, comm: tgt_recov
11:19:32:
11:19:32:Call Trace:
11:19:32: [<ffffffffa0491895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
11:19:32: [<ffffffffa0491e97>] lbug_with_loc+0x47/0xb0 [libcfs]
11:19:32: [<ffffffffa0d0df7f>] osd_trans_exec_op+0x14f/0x2e0 [osd_ldiskfs]
11:19:32: [<ffffffffa0d18c45>] osd_index_ea_delete+0x1d5/0xd00 [osd_ldiskfs]
11:19:32: [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40
11:19:32: [<ffffffffa087fd33>] out_obj_index_delete+0x153/0x370 [ptlrpc]
11:19:32: [<ffffffffa08800fc>] out_tx_index_insert_undo+0x1c/0x20 [ptlrpc]
11:19:32: [<ffffffffa088c83c>] distribute_txn_replay_handle+0x7ec/0x940 [ptlrpc]
11:19:32: [<ffffffffa07d86a1>] target_recovery_thread+0x9e1/0x1ad0 [ptlrpc]
11:19:32: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc]
11:19:32: [<ffffffff8109e66e>] kthread+0x9e/0xc0
11:19:32: [<ffffffff8100c20a>] child_rip+0xa/0x20
11:19:32: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
11:19:32: [<ffffffff8100c200>] ? child_rip+0x0/0x20
11:19:32:
11:19:32:Kernel panic - not syncing: LBUG
11:19:32:Pid: 24684, comm: tgt_recov Not tainted 2.6.32-504.8.1.el6_lustre.g0ef66b1.x86_64 #1
11:19:32:Call Trace:
11:19:32: [<ffffffff81529b76>] ? panic+0xa7/0x16f
11:19:32: [<ffffffffa0491eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
11:19:32: [<ffffffffa0d0df7f>] ? osd_trans_exec_op+0x14f/0x2e0 [osd_ldiskfs]
11:19:32: [<ffffffffa0d18c45>] ? osd_index_ea_delete+0x1d5/0xd00 [osd_ldiskfs]
11:19:32: [<ffffffff8106c85a>] ? __cond_resched+0x2a/0x40
11:19:32: [<ffffffffa087fd33>] ? out_obj_index_delete+0x153/0x370 [ptlrpc]
11:19:32: [<ffffffffa08800fc>] ? out_tx_index_insert_undo+0x1c/0x20 [ptlrpc]
11:19:32: [<ffffffffa088c83c>] ? distribute_txn_replay_handle+0x7ec/0x940 [ptlrpc]
11:19:32: [<ffffffffa07d86a1>] ? target_recovery_thread+0x9e1/0x1ad0 [ptlrpc]
11:19:32: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc]
11:19:32: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
11:19:32: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
11:19:32: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
11:19:32: [<ffffffff8100c200>] ? child_rip+0x0/0x20

Please provide additional information about the failure here.

Info required for matching: replay-single 101

Attachments

Issue Links

is related to

LU-3534 async update cross-MDTs

Resolved

Activity

[LU-6292] replay-single test_101: osd_trans_exec_op()) ASSERTION( oh->ot_handle != ((void *)0) ) failed:

Di Wang (Inactive) added a comment - 26/Feb/15 8:39 AM

The failure seems because the MDT-MDT recovery is wrongly abort after the timeout. the fix will be included in http://review.whamcloud.com/#/c/11737/

Di Wang (Inactive) added a comment - 26/Feb/15 8:39 AM The failure seems because the MDT-MDT recovery is wrongly abort after the timeout. the fix will be included in http://review.whamcloud.com/#/c/11737/

replay-single test_101: osd_trans_exec_op()) ASSERTION( oh->ot_handle != ((void *)0) ) failed:

Details

Description

Attachments

Issue Links

Activity

People

Dates