[LU-2355] orph_index_delete()) ASSERTION(obj->mod_flags & ORPHAN_OBJ) Created: 23/Mar/12  Updated: 13/Jan/16  Resolved: 13/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Johann Lombardi (Inactive) Assignee: Li Wei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4011 problems with upstream lustre client ... Closed
Severity: 3
Project: Orion
Rank (Obsolete): 2952

 Description   

Bug hit on the orion_quota branch which has just been rebased on orion. There is really nothing on the orion_quota branch which could cause this:

14:42:59:Lustre: DEBUG MARKER: == replay-single test 22b: check orphan code race in test 22 == 14:42:59 (1332452579)
14:43:01:Turning device dm-0 (0xfd00000) read-only
14:43:01:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
14:43:02:Removing read-only on unknown block (0xfd00000)
14:43:19:LDISKFS-fs (dm-0): warning: maximal mount count reached, running e2fsck is recommended
14:43:20:LDISKFS-fs (dm-0): recovery complete
14:43:20:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=off. Opts: 
14:44:20:Lustre: 7516:0:(ldlm_lib.c:1644:target_recovery_overseer()) recovery is timed out, evict stale exports
14:44:21:LustreError: 7516:0:(genops.c:1302:class_disconnect_stale_exports()) lustre-MDT0000: disconnect stale client 0a4ff024-85d5-c11f-ade5-2ac2049b2a25@<unknown>
14:44:21:Lustre: lustre-MDT0000: Recovery over after 1:00, of 3 clients 2 recovered and 1 was evicted.
14:44:21:Lustre: Skipped 9 previous similar messages
14:44:21:LustreError: 7516:0:(libcfs_fail.h:141:cfs_race()) cfs_race id 148 sleeping
14:44:21:LustreError: 7479:0:(libcfs_fail.h:146:cfs_race()) cfs_fail_race id 148 waking
14:44:21:LustreError: 7516:0:(libcfs_fail.h:144:cfs_race()) cfs_fail_race id 148 awake, rc=0
14:44:21:Lustre: 7516:0:(mdd_orphans.c:283:orph_key_test_and_del()) Found orphan [0x200002341:0x7:0x0]! Delete it
14:44:21:LustreError: 7516:0:(mdd_orphans.c:227:orph_index_delete()) ASSERTION(obj->mod_flags & ORPHAN_OBJ) failed
14:44:21:LustreError: 7516:0:(mdd_orphans.c:227:orph_index_delete()) LBUG
14:44:21:Pid: 7516, comm: tgt_recov
14:44:21:
14:44:21:Call Trace:
14:44:21: [<ffffffffa043a835>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
14:44:21: [<ffffffffa043ad67>] lbug_with_loc+0x47/0xb0 [libcfs]
14:44:22: [<ffffffffa044441d>] libcfs_assertion_failed+0x2d/0x30 [libcfs]
14:44:22: [<ffffffffa0984f48>] orph_index_delete+0x718/0x990 [mdd]
14:44:22: [<ffffffffa09858a7>] __mdd_orphan_cleanup+0x6e7/0xa50 [mdd]
14:44:22: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
14:44:22: [<ffffffffa0993043>] mdd_recovery_complete+0x73/0xf0 [mdd]
14:44:22: [<ffffffffa0a32a7e>] mdt_postrecov+0x3e/0xb0 [mdt]
14:44:22: [<ffffffffa055d0be>] ? lu_env_init+0x1e/0x30 [obdclass]
14:44:22: [<ffffffffa0a34480>] mdt_obd_postrecov+0x80/0xa0 [mdt]
14:44:22: [<ffffffffa0669950>] ? ldlm_reprocess_res+0x0/0x20 [ptlrpc]
14:44:22: [<ffffffffa0672c4b>] target_recovery_thread+0x8fb/0xcf0 [ptlrpc]
14:44:22: [<ffffffff8106cc0f>] ? release_task+0x36f/0x4e0
14:44:22: [<ffffffff81096294>] ? switch_task_namespaces+0x24/0x60
14:44:22: [<ffffffff8106eac7>] ? do_exit+0x5a7/0x860
14:44:22: [<ffffffffa0672350>] ? target_recovery_thread+0x0/0xcf0 [ptlrpc]
14:44:22: [<ffffffff8100c14a>] child_rip+0xa/0x20
14:44:22: [<ffffffffa0672350>] ? target_recovery_thread+0x0/0xcf0 [ptlrpc]
14:44:22: [<ffffffffa0672350>] ? target_recovery_thread+0x0/0xcf0 [ptlrpc]
14:44:22: [<ffffffff8100c140>] ? child_rip+0x0/0x20

https://maloo.whamcloud.com/test_sets/f3a3d4dc-74b4-11e1-bfc6-5254004bbbd3



 Comments   
Comment by Li Wei (Inactive) [ 10/Apr/12 ]

I think this is what led to the assertion failure:

tgt_recov Thread in           MDS_CLOSE Handler in
orph_key_test_and_del()       mdd_close()
--------------------------------------------------------------------------------
OBD_RACE(): Slept.
                              mdd_write_lock(): Locked.
                              OBD_RACE(): Woke up peer and went on.
                              mod_count--
Check mod_count: Zero.
Check ORPHAN_OBJ: Set.
mdd_write_lock(): Blocked.
                              orph_index_delete(): Cleared ORPHAN_OBJ.
                              mdd_write_unlock()
orph_index_delete(): LBUG().

http://review.whamcloud.com/2549

Comment by James A Simmons [ 29/Aug/15 ]

This is a really old ticket. Peter can you close it.

Comment by James A Simmons [ 13/Jan/16 ]

Orion work has been finished for a long time.

Generated at Sat Feb 10 01:24:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.