[LU-4897] dt_declare_delete()) ASSERTION( dt->do_index_ops ) failed (in orphan cleanup) Created: 12/Apr/14  Updated: 08/May/14  Resolved: 08/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: John Hammond Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2, lod, mdd

Issue Links:
Related
is related to LU-3531 DNE2: striped directory Resolved
Severity: 3
Rank (Obsolete): 13533

 Description   

This occurs on remount after crash (or reset) while running racer on 2.5.57-72-g69ddb2e or on checkout of http://review.whamcloud.com/#/c/9699/. Disabling migration in racer has no effect here.

# export MDSCOUNT=4
# export MOUNT_2=y
# llmount.sh
...
# sh lustre/tests/racer.sh
... Wait for panic or reset while running.
... Restart the node.
# export MDSCOUNT=4
# export PTLDEBUG=+trace
# export NOFORMAT=1
# llmount.sh
Call Trace:
 [<ffffffffa02a9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa02a9e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0ced4db>] lod_declare_object_destroy+0x55b/0x780 [lod]
 [<ffffffffa0bb3869>] __mdd_orphan_cleanup+0x7d9/0xca0 [mdd]
 [<ffffffffa0bc7cbd>] mdd_recovery_complete+0xed/0x170 [mdd]
 [<ffffffffa0bfa9c5>] mdt_postrecov+0x35/0xd0 [mdt]
 [<ffffffffa0bfbf08>] mdt_obd_postrecov+0x78/0x90 [mdt]
 [<ffffffffa0632cf4>] ? ldlm_reprocess_all_ns+0xa4/0x110 [ptlrpc]
 [<ffffffffa0648505>] target_recovery_thread+0xd25/0x19c0 [ptlrpc]
 [<ffffffffa06477e0>] ? target_recovery_thread+0x0/0x19c0 [ptlrpc]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
00010000:00080000:0.0:1397318109.970223:0:3046:0:(ldlm_lib.c:1991:target_recovery_thread()) lustre-MDT0002: started recovery thread pid 3046
00010000:02000400:3.0:1397318182.861911:0:3046:0:(ldlm_lib.c:1803:target_recovery_overseer()) lustre-MDT0002: recovery is timed out, evict stale exports

...

00000100:00100000:2.0:1397318182.977710:0:3046:0:(client.c:1849:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc tgt_recov:lustre-MDT0002-mdtlov_UUID:3046:1465194230319564:0@lo:1000
00000004:00080000:2.0:1397318182.977722:0:3046:0:(mdd_orphans.c:395:orph_key_test_and_del()) Found orphan [0x380000bd0:0x170f:0x0], delete it
00000004:00040000:2.0:1397318182.977731:0:3046:0:(dt_object.h:1483:dt_declare_delete()) ASSERTION( dt->do_index_ops ) failed: 
00000004:00040000:2.0:1397318182.977735:0:3046:0:(dt_object.h:1483:dt_declare_delete()) LBUG


 Comments   
Comment by Jodi Levi (Inactive) [ 15/Apr/14 ]

Di,
Would you be able to give direction on this fix?

Comment by Di Wang [ 18/Apr/14 ]

hmm, it seems index_try is missing before delete orphans, I will cook a patch.

Comment by Di Wang [ 19/Apr/14 ]

http://review.whamcloud.com/10027

Comment by Di Wang [ 08/May/14 ]

the patch has been merged to 9511 and landed to master

Generated at Sat Feb 10 01:46:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.