[LU-7975] "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 )" LBUG/Assert on MDS Created: 01/Apr/16  Updated: 14/Jun/18  Resolved: 31/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Bruno Faccini (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: cea

Issue Links:
Related
is related to LU-4971 sanity-scrub test_2: ldlm_lock2desc()... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A site has encountered multiple crashes with same signature/stack+msgs following :

LustreError: 89879:0:(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11
Lustre: composit-OST0009-osc-MDT0000: Connection to composit-OST0009 (at 10.0.14.31@o2ib) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) LBUG
Pid: 89879, comm: mdt01_006

Call Trace:
 [<ffffffffa057e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa057ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa266c0af>] lod_ah_init+0x58f/0x5d0 [lod]
 [<ffffffffa26c7ad3>] mdd_object_make_hint+0x83/0xa0 [mdd]
 [<ffffffffa26d4502>] mdd_create_data+0x332/0x7d0 [mdd]
 [<ffffffffa25a93f0>] mdt_finish_open+0x1350/0x19a0 [mdt]
 [<ffffffffa257e5f4>] ? mdt_object_lock+0x14/0x20 [mdt]
 [<ffffffffa25a9fbd>] mdt_open_by_fid_lock+0x57d/0x910 [mdt]
 [<ffffffffa25aabac>] mdt_reint_open+0x56c/0x21a0 [mdt]
 [<ffffffffa059b14c>] ? upcall_cache_get_entry+0x29c/0x890 [libcfs]
 [<ffffffffa0983930>] ? lu_ucred+0x20/0x30 [obdclass]
 [<ffffffffa2572945>] ? mdt_ucred+0x15/0x20 [mdt]
 [<ffffffffa258f8ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
 [<ffffffffa123bad6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
 [<ffffffffa2593ab1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa2578f83>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa257950e>] mdt_intent_reint+0x1ee/0x520 [mdt]
 [<ffffffffa2576cee>] mdt_intent_policy+0x3ae/0x770 [mdt]
 [<ffffffffa11ca2f5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
 [<ffffffffa11f43fb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
 [<ffffffffa25771b6>] mdt_enqueue+0x46/0xe0 [mdt]
 [<ffffffffa257c84a>] mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa25b98f5>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa12238d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa05904fa>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
 [<ffffffffa121c289>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffff81057849>] ? __wake_up_common+0x59/0x90
 [<ffffffffa122605d>] ptlrpc_main+0xaed/0x1780 [ptlrpc]
 [<ffffffffa1225570>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
 [<ffffffff8109e78e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

According to existing tickets, I have found that this kind of problem has already (partially?) been addressed in LU-4260, LU-4791 and LU-5346 tickets.
And since both fixes for LU-4260 and LU-4791 are already integrated, this means that we encounter a new situation/problem during OST objects pre-creation, likely to be caused by some specific file meta-data pattern (I have identified as "deferred layout" feature usage with open(, ...|O_LOV_DELAY_CREATE|...,) along with a non-0 truncate() to trigger objects preallocation), leading to trigger a similar case than described in LU-5346 upon error return path that is still not fixed.

BTW, I have also determined that these MDT assert always occurs just following an OSS crash, hence the -EAGAIN/EWOULDBLOCK error in "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" msg just preceding the assert !



 Comments   
Comment by Alex Zhuravlev [ 01/Apr/16 ]

the good thing is that we've got a new infrastructure with DNE2 allowing for striping procedures (declaration and creation) to be per-thandle and not put intermediate state into an object. another improvement we can make with this infrastructure is to release OSP objects after creation, don't leave them in cache consuming memory.

Comment by Bruno Faccini (Inactive) [ 01/Apr/16 ]

After more crash-dump and Lustre debug logs analysis, I was able to find a reproducer for the situation unveiled at customer site and first causing MDS CERROR/msg "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" and a few later "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed" Assertion/LBUG() :

# !!! with all OSTs down !!!
mkdir /mnt/lustre/foo_dir/
lfs setstripe -c 2 /mnt/lustre/foo_dir/
# open file with O_LOV_DELAY_CREATE, then non-0 truncate
multiop /mnt/lustre/foo_dir/foo5 oO_RDWR:O_CREAT:O_LOV_DELAY_CREATE:T1050000c
# normal open
cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "-bash: /mnt/lustre/foo_dir/foo5: Resource temporarily unavailable" and "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11"
# normal open again
cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "LustreError: 11915:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed:"

Then, diging more in both MDS full Lustre debug log along into the source code, I found the reason of the -EAGAIN return from osp_object_create() call, which is not sent over the wire but in turn is coded as -EWOULDBLOCK in ptlrpc_import_delay_req() !! :

/**
 * Based on the current state of the import, determine if the request
 * can be sent, is an error, or should be delayed.
 *
 * Returns true if this request should be delayed. If false, and
 * *status is set, then the request can not be sent and *status is the
 * error code.  If false and status is 0, then request can be sent.
 *
 * The imp->imp_lock must be held.
 */
static int ptlrpc_import_delay_req(struct obd_import *imp,
                                   struct ptlrpc_request *req, int *status)
{
        int delay = 0;
        ENTRY;
..............
                 } else if (req->rq_send_state != imp->imp_state) {
                /* invalidate in progress - any requests should be drop */
                if (cfs_atomic_read(&imp->imp_inval_count) != 0) {
                        DEBUG_REQ(D_ERROR, req, "invalidate in flight");
                        *status = -EIO;
                } else if (imp->imp_dlm_fake || req->rq_no_delay) {
                        *status = -EWOULDBLOCK; <<<<<<<<<<<<<<<<<<<<<<<<<<<<
.........................

when

/*
 *
 */
int osp_object_truncate(const struct lu_env *env, struct dt_object *dt,
                        __u64 size)
{
...............
        /*
         * XXX: decide how do we do here with resend
         * if we don't resend, then client may see wrong file size
         * if we do resend, then MDS thread can get stuck for quite long
         */
        req->rq_no_resend = req->rq_no_delay = 1;
................

So this raises more implications and I think resend (or any retry mechanism) may not be an option at the moment if we don't want data corruption failure (missing non-0 truncate() effect/result) when this particular scenario occurs. Because even if striping allocated things/stuff cleanup upon error return path must be fixed in MDS code, the error should not be returned to the user otherwise further open() will fail and non-0 truncate() effect/result will be lost.

Comment by Alex Zhuravlev [ 01/Apr/16 ]

as for truncate, there is a ticket to move this functionality to the client side. could be a good time, probably..

Comment by Gerrit Updater [ 01/Apr/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19301
Subject: LU-7975 lod: fix delayed stripe error path & MDS resend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 43a47da70a47d027f9ac199ed4ff6fdc4fe91614

Comment by Gerrit Updater [ 01/Apr/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19302
Subject: LU-7975 lod: fix delayed stripe error path & Client resend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bb8a376656f6b3626a47c32cbc1855045a47d929

Comment by Bruno Faccini (Inactive) [ 01/Apr/16 ]

Patch at http://review.whamcloud.com/19301 fixes cleanup in delayed stripe error path and also implements resend mechanism from MDS side. It may keep a MDS thread busy for some time doing so.

Patch at http://review.whamcloud.com/19302 is an other way to fix, also doing delayed stripe error path necessary cleanup, but offloading resend mechanism to Client side, which may be less intrusive.

Comment by Bruno Faccini (Inactive) [ 24/May/16 ]

http://review.whamcloud.com/19301 has been abandoned in favor of http://review.whamcloud.com/19302, according to reviewers comments and choice between both solutions.

Comment by Gerrit Updater [ 31/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19302/
Subject: LU-7975 lod: fix delayed stripe error path & Client resend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 047dfe489966c8816cbead1a3abbbb1564fdb7db

Comment by Peter Jones [ 31/May/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:13:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.