Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7975

"(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 )" LBUG/Assert on MDS

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      A site has encountered multiple crashes with same signature/stack+msgs following :

      LustreError: 89879:0:(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11
      Lustre: composit-OST0009-osc-MDT0000: Connection to composit-OST0009 (at 10.0.14.31@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
      LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) LBUG
      Pid: 89879, comm: mdt01_006
      
      Call Trace:
       [<ffffffffa057e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa057ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa266c0af>] lod_ah_init+0x58f/0x5d0 [lod]
       [<ffffffffa26c7ad3>] mdd_object_make_hint+0x83/0xa0 [mdd]
       [<ffffffffa26d4502>] mdd_create_data+0x332/0x7d0 [mdd]
       [<ffffffffa25a93f0>] mdt_finish_open+0x1350/0x19a0 [mdt]
       [<ffffffffa257e5f4>] ? mdt_object_lock+0x14/0x20 [mdt]
       [<ffffffffa25a9fbd>] mdt_open_by_fid_lock+0x57d/0x910 [mdt]
       [<ffffffffa25aabac>] mdt_reint_open+0x56c/0x21a0 [mdt]
       [<ffffffffa059b14c>] ? upcall_cache_get_entry+0x29c/0x890 [libcfs]
       [<ffffffffa0983930>] ? lu_ucred+0x20/0x30 [obdclass]
       [<ffffffffa2572945>] ? mdt_ucred+0x15/0x20 [mdt]
       [<ffffffffa258f8ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
       [<ffffffffa123bad6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
       [<ffffffffa2593ab1>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa2578f83>] mdt_reint_internal+0x4c3/0x780 [mdt]
       [<ffffffffa257950e>] mdt_intent_reint+0x1ee/0x520 [mdt]
       [<ffffffffa2576cee>] mdt_intent_policy+0x3ae/0x770 [mdt]
       [<ffffffffa11ca2f5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
       [<ffffffffa11f43fb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
       [<ffffffffa25771b6>] mdt_enqueue+0x46/0xe0 [mdt]
       [<ffffffffa257c84a>] mdt_handle_common+0x52a/0x1470 [mdt]
       [<ffffffffa25b98f5>] mds_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa12238d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
       [<ffffffffa05904fa>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
       [<ffffffffa121c289>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
       [<ffffffff81057849>] ? __wake_up_common+0x59/0x90
       [<ffffffffa122605d>] ptlrpc_main+0xaed/0x1780 [ptlrpc]
       [<ffffffffa1225570>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
       [<ffffffff8109e78e>] kthread+0x9e/0xc0
       [<ffffffff8100c28a>] child_rip+0xa/0x20
       [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
       [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      According to existing tickets, I have found that this kind of problem has already (partially?) been addressed in LU-4260, LU-4791 and LU-5346 tickets.
      And since both fixes for LU-4260 and LU-4791 are already integrated, this means that we encounter a new situation/problem during OST objects pre-creation, likely to be caused by some specific file meta-data pattern (I have identified as "deferred layout" feature usage with open(, ...|O_LOV_DELAY_CREATE|...,) along with a non-0 truncate() to trigger objects preallocation), leading to trigger a similar case than described in LU-5346 upon error return path that is still not fixed.

      BTW, I have also determined that these MDT assert always occurs just following an OSS crash, hence the -EAGAIN/EWOULDBLOCK error in "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" msg just preceding the assert !

      Attachments

        Issue Links

          Activity

            [LU-7975] "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 )" LBUG/Assert on MDS
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19302/
            Subject: LU-7975 lod: fix delayed stripe error path & Client resend
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 047dfe489966c8816cbead1a3abbbb1564fdb7db

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19302/ Subject: LU-7975 lod: fix delayed stripe error path & Client resend Project: fs/lustre-release Branch: master Current Patch Set: Commit: 047dfe489966c8816cbead1a3abbbb1564fdb7db

            http://review.whamcloud.com/19301 has been abandoned in favor of http://review.whamcloud.com/19302, according to reviewers comments and choice between both solutions.

            bfaccini Bruno Faccini (Inactive) added a comment - http://review.whamcloud.com/19301 has been abandoned in favor of http://review.whamcloud.com/19302 , according to reviewers comments and choice between both solutions.

            Patch at http://review.whamcloud.com/19301 fixes cleanup in delayed stripe error path and also implements resend mechanism from MDS side. It may keep a MDS thread busy for some time doing so.

            Patch at http://review.whamcloud.com/19302 is an other way to fix, also doing delayed stripe error path necessary cleanup, but offloading resend mechanism to Client side, which may be less intrusive.

            bfaccini Bruno Faccini (Inactive) added a comment - Patch at http://review.whamcloud.com/19301 fixes cleanup in delayed stripe error path and also implements resend mechanism from MDS side. It may keep a MDS thread busy for some time doing so. Patch at http://review.whamcloud.com/19302 is an other way to fix, also doing delayed stripe error path necessary cleanup, but offloading resend mechanism to Client side, which may be less intrusive.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19302
            Subject: LU-7975 lod: fix delayed stripe error path & Client resend
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bb8a376656f6b3626a47c32cbc1855045a47d929

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19302 Subject: LU-7975 lod: fix delayed stripe error path & Client resend Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb8a376656f6b3626a47c32cbc1855045a47d929

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19301
            Subject: LU-7975 lod: fix delayed stripe error path & MDS resend
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 43a47da70a47d027f9ac199ed4ff6fdc4fe91614

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/19301 Subject: LU-7975 lod: fix delayed stripe error path & MDS resend Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 43a47da70a47d027f9ac199ed4ff6fdc4fe91614

            as for truncate, there is a ticket to move this functionality to the client side. could be a good time, probably..

            bzzz Alex Zhuravlev added a comment - as for truncate, there is a ticket to move this functionality to the client side. could be a good time, probably..

            After more crash-dump and Lustre debug logs analysis, I was able to find a reproducer for the situation unveiled at customer site and first causing MDS CERROR/msg "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" and a few later "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed" Assertion/LBUG() :

            # !!! with all OSTs down !!!
            mkdir /mnt/lustre/foo_dir/
            lfs setstripe -c 2 /mnt/lustre/foo_dir/
            # open file with O_LOV_DELAY_CREATE, then non-0 truncate
            multiop /mnt/lustre/foo_dir/foo5 oO_RDWR:O_CREAT:O_LOV_DELAY_CREATE:T1050000c
            # normal open
            cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "-bash: /mnt/lustre/foo_dir/foo5: Resource temporarily unavailable" and "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11"
            # normal open again
            cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "LustreError: 11915:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed:"
            

            Then, diging more in both MDS full Lustre debug log along into the source code, I found the reason of the -EAGAIN return from osp_object_create() call, which is not sent over the wire but in turn is coded as -EWOULDBLOCK in ptlrpc_import_delay_req() !! :

            /**
             * Based on the current state of the import, determine if the request
             * can be sent, is an error, or should be delayed.
             *
             * Returns true if this request should be delayed. If false, and
             * *status is set, then the request can not be sent and *status is the
             * error code.  If false and status is 0, then request can be sent.
             *
             * The imp->imp_lock must be held.
             */
            static int ptlrpc_import_delay_req(struct obd_import *imp,
                                               struct ptlrpc_request *req, int *status)
            {
                    int delay = 0;
                    ENTRY;
            ..............
                             } else if (req->rq_send_state != imp->imp_state) {
                            /* invalidate in progress - any requests should be drop */
                            if (cfs_atomic_read(&imp->imp_inval_count) != 0) {
                                    DEBUG_REQ(D_ERROR, req, "invalidate in flight");
                                    *status = -EIO;
                            } else if (imp->imp_dlm_fake || req->rq_no_delay) {
                                    *status = -EWOULDBLOCK; <<<<<<<<<<<<<<<<<<<<<<<<<<<<
            .........................
            

            when

            /*
             *
             */
            int osp_object_truncate(const struct lu_env *env, struct dt_object *dt,
                                    __u64 size)
            {
            ...............
                    /*
                     * XXX: decide how do we do here with resend
                     * if we don't resend, then client may see wrong file size
                     * if we do resend, then MDS thread can get stuck for quite long
                     */
                    req->rq_no_resend = req->rq_no_delay = 1;
            ................
            

            So this raises more implications and I think resend (or any retry mechanism) may not be an option at the moment if we don't want data corruption failure (missing non-0 truncate() effect/result) when this particular scenario occurs. Because even if striping allocated things/stuff cleanup upon error return path must be fixed in MDS code, the error should not be returned to the user otherwise further open() will fail and non-0 truncate() effect/result will be lost.

            bfaccini Bruno Faccini (Inactive) added a comment - After more crash-dump and Lustre debug logs analysis, I was able to find a reproducer for the situation unveiled at customer site and first causing MDS CERROR/msg "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" and a few later "(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed" Assertion/LBUG() : # !!! with all OSTs down !!! mkdir /mnt/lustre/foo_dir/ lfs setstripe -c 2 /mnt/lustre/foo_dir/ # open file with O_LOV_DELAY_CREATE, then non-0 truncate multiop /mnt/lustre/foo_dir/foo5 oO_RDWR:O_CREAT:O_LOV_DELAY_CREATE:T1050000c # normal open cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "-bash: /mnt/lustre/foo_dir/foo5: Resource temporarily unavailable" and "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" # normal open again cat /etc/hosts > /mnt/lustre/foo_dir/foo5 --->>> "LustreError: 11915:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed:" Then, diging more in both MDS full Lustre debug log along into the source code, I found the reason of the -EAGAIN return from osp_object_create() call, which is not sent over the wire but in turn is coded as -EWOULDBLOCK in ptlrpc_import_delay_req() !! : /** * Based on the current state of the import, determine if the request * can be sent, is an error, or should be delayed. * * Returns true if this request should be delayed. If false, and * *status is set, then the request can not be sent and *status is the * error code. If false and status is 0, then request can be sent. * * The imp->imp_lock must be held. */ static int ptlrpc_import_delay_req(struct obd_import *imp, struct ptlrpc_request *req, int *status) { int delay = 0; ENTRY; .............. } else if (req->rq_send_state != imp->imp_state) { /* invalidate in progress - any requests should be drop */ if (cfs_atomic_read(&imp->imp_inval_count) != 0) { DEBUG_REQ(D_ERROR, req, "invalidate in flight"); *status = -EIO; } else if (imp->imp_dlm_fake || req->rq_no_delay) { *status = -EWOULDBLOCK; <<<<<<<<<<<<<<<<<<<<<<<<<<<< ......................... when /* * */ int osp_object_truncate(const struct lu_env *env, struct dt_object *dt, __u64 size) { ............... /* * XXX: decide how do we do here with resend * if we don't resend, then client may see wrong file size * if we do resend, then MDS thread can get stuck for quite long */ req->rq_no_resend = req->rq_no_delay = 1; ................ So this raises more implications and I think resend (or any retry mechanism) may not be an option at the moment if we don't want data corruption failure (missing non-0 truncate() effect/result) when this particular scenario occurs. Because even if striping allocated things/stuff cleanup upon error return path must be fixed in MDS code, the error should not be returned to the user otherwise further open() will fail and non-0 truncate() effect/result will be lost.

            the good thing is that we've got a new infrastructure with DNE2 allowing for striping procedures (declaration and creation) to be per-thandle and not put intermediate state into an object. another improvement we can make with this infrastructure is to release OSP objects after creation, don't leave them in cache consuming memory.

            bzzz Alex Zhuravlev added a comment - the good thing is that we've got a new infrastructure with DNE2 allowing for striping procedures (declaration and creation) to be per-thandle and not put intermediate state into an object. another improvement we can make with this infrastructure is to release OSP objects after creation, don't leave them in cache consuming memory.

            People

              bfaccini Bruno Faccini (Inactive)
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: