[LU-7975] "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 )" LBUG/Assert on MDS - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.9.0
Affects Version/s: None
Labels:
- cea

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

A site has encountered multiple crashes with same signature/stack+msgs following :

LustreError: 89879:0:(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11
Lustre: composit-OST0009-osc-MDT0000: Connection to composit-OST0009 (at 10.0.14.31@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) LBUG
Pid: 89879, comm: mdt01_006

Call Trace:
 [<ffffffffa057e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa057ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa266c0af>] lod_ah_init+0x58f/0x5d0 [lod]
 [<ffffffffa26c7ad3>] mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa26d4502>] mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa25a93f0>] mdt_finish_open+0x1350/0x19a0 [mdt]
 [<ffffffffa257e5f4>] ? mdt_object_lock+0x14/0x20 [mdt]
 [<ffffffffa25a9fbd>] mdt_open_by_fid_lock+0x57d/0x910 [mdt]
 [<ffffffffa25aabac>] mdt_reint_open+0x56c/0x21a0 [mdt]
 [<ffffffffa059b14c>] ? upcall_cache_get_entry+0x29c/0x890 [libcfs]
 [<ffffffffa0983930>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa2572945>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa258f8ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
 [<ffffffffa123bad6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
 [<ffffffffa2593ab1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa2578f83>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa257950e>] mdt_intent_reint+0x1ee/0x520 [mdt]
 [<ffffffffa2576cee>] mdt_intent_policy+0x3ae/0x770 [mdt]
 [<ffffffffa11ca2f5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
 [<ffffffffa11f43fb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
 [<ffffffffa25771b6>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa257c84a>] mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa25b98f5>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa12238d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa05904fa>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
 [<ffffffffa121c289>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffff81057849>] ? __wake_up_common+0x59/0x90
 [<ffffffffa122605d>] ptlrpc_main+0xaed/0x1780 [ptlrpc]
 [<ffffffffa1225570>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
 [<ffffffff8109e78e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

According to existing tickets, I have found that this kind of problem has already (partially?) been addressed in LU-4260, LU-4791 and LU-5346 tickets.
And since both fixes for LU-4260 and LU-4791 are already integrated, this means that we encounter a new situation/problem during OST objects pre-creation, likely to be caused by some specific file meta-data pattern (I have identified as "deferred layout" feature usage with open(, ...|O_LOV_DELAY_CREATE|...,) along with a non-0 truncate() to trigger objects preallocation), leading to trigger a similar case than described in LU-5346 upon error return path that is still not fixed.

BTW, I have also determined that these MDT assert always occurs just following an OSS crash, hence the -EAGAIN/EWOULDBLOCK error in "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" msg just preceding the assert !

Attachments

Issue Links

is related to

LU-4971 sanity-scrub test_2: ldlm_lock2desc()) ASSERTION( lock->l_policy_data.l_inodebits.bits == (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT)failed

Resolved

mentioned in: Page Loading...

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Bruno Faccini (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/Apr/16 3:35 PM

Updated:: 14/Jun/18 9:41 PM

Resolved:: 31/May/16 12:50 PM