Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7903

recovery-small test_23: hang on umount

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f72c79d4-efbb-11e5-8ddc-5254006e85c2.

      The sub-test test_23 failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: recovery-small 23

      Attachments

        Issue Links

          Activity

            [LU-7903] recovery-small test_23: hang on umount
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22696/
            Subject: LU-7903 ptlrpc: leaked rs on difficult reply
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bd1441b92e31cd3c01cc2453aa627d139a1207f7

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22696/ Subject: LU-7903 ptlrpc: leaked rs on difficult reply Project: fs/lustre-release Branch: master Current Patch Set: Commit: bd1441b92e31cd3c01cc2453aa627d139a1207f7
            niu Niu Yawei (Inactive) added a comment - The patch http://review.whamcloud.com/#/c/12530/ is merged in master on Jan 29, 2016, then we search the maloo failures: https://testing.hpdd.intel.com/test_sets/query?commit=Update+results&hosts=&page=9&query_bugs=&status%5B%5D=TIMEOUT&test_node%5Barchitecture_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=24a6947e-04a9-11e1-bb5f-52540025f9af&test_node%5Bos_type_id%5D=&test_node_network%5Bnetwork_type_id%5D=&test_session%5Bend_date%5D=&test_session%5Bquery_recent_period%5D=&test_session%5Bstart_date%5D=&test_session%5Buser_id%5D=&test_set%5Btest_set_script_id%5D=f36cabd0-32c3-11e0-a61c-52540025f9ae&utf8=%E2%9C%93&warn%5Bnotice%5D=true This bug showed up immediately after that time point (was reported as LU-7172 before, so I think LU-7172 is dup of this).
            laisiyao Lai Siyao added a comment -

            Indeed, this occurs when REPLY-ACK is disabled.

            In http://review.whamcloud.com/#/c/12530/, to save local locks, REPLY-ACK is disabled, however for local operations, Commit-on-Sharing doesn't take effect, therefore we may not be able to replay in transno order if reply is lost. This is another issue, I'll fix it in another ticket.

            laisiyao Lai Siyao added a comment - Indeed, this occurs when REPLY-ACK is disabled. In http://review.whamcloud.com/#/c/12530/ , to save local locks, REPLY-ACK is disabled, however for local operations, Commit-on-Sharing doesn't take effect, therefore we may not be able to replay in transno order if reply is lost. This is another issue, I'll fix it in another ticket.

            Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/22696
            Subject: LU-7903 ptlrpc: leaked rs on difficult reply
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 802f41d1a78f588ab99ef8e1d081ac3da7cb8551

            gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/22696 Subject: LU-7903 ptlrpc: leaked rs on difficult reply Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 802f41d1a78f588ab99ef8e1d081ac3da7cb8551

            Looks there is a race can cause export reference leak on difficult reply (and when rs_no_ack enabled):

            MDT umount calls server_disconnect_export() to disconnects it's export, server_disconnect_exports() tries to complete all it's outstanding (difficult) replies, that'll call into ptlrpc_handle_rs() at the end, where the reply sate is removed from uncommited list, and LNetMDUnlink() is called to trigger an unlink event, reply_out_callback() catches the event and try to finialize the reply sate (where the export hold by reply state will be released) by following code:

                            spin_lock(&svcpt->scp_rep_lock);
                            spin_lock(&rs->rs_lock);
            
                            rs->rs_on_net = 0;
                            if (!rs->rs_no_ack ||
                                rs->rs_transno <=
                                rs->rs_export->exp_obd->obd_last_committed)
                                    ptlrpc_schedule_difficult_reply(rs);
            
                            spin_unlock(&rs->rs_lock);
                            spin_unlock(&svcpt->scp_rep_lock);
            

            We can see if rs_no_ack is false (COS not enabled), or the transaction has been committed, the reply state is finalized here, otherwise, it relies on commit callback, however, for the export already been disconnected by umount or whatever reason, the reply state is already being removed from the uncommitted list, so nobody will execute the reply state (until unload ptlrpc module). I think adding one more check here to see if the reply state is still on uncommitted list is necessary.

            COS is usually not enabled, so this defect won't be triggered easily, however by the change of LU-3538, rs_no_ack is set unpurposely no matter if COS is enabled:

            @@ -2530,24 +2589,27 @@ static void mdt_save_lock(struct mdt_thread_info *info, struct lustre_handle *h,
                                    struct mdt_device *mdt = info->mti_mdt;
                                    struct ldlm_lock *lock = ldlm_handle2lock(h);
                                    struct ptlrpc_request *req = mdt_info_req(info);
            -                       int no_ack = 0;
            +                       int cos;
            +
            +                       cos = (mdt_cos_is_enabled(mdt) ||
            +                              mdt_slc_is_enabled(mdt));
            
                                    LASSERTF(lock != NULL, "no lock for cookie "LPX64"\n",
                                             h->cookie);
            +
                                    /* there is no request if mdt_object_unlock() is called
                                     * from mdt_export_cleanup()->mdt_add_dirty_flag() */
                                    if (likely(req != NULL)) {
                                            CDEBUG(D_HA, "request = %p reply state = %p"
                                                   " transno = "LPD64"\n", req,
                                                   req->rq_reply_state, req->rq_transno);
            -                               if (mdt_cos_is_enabled(mdt)) {
            -                                       no_ack = 1;
            +                               if (cos) {
                                                    ldlm_lock_downgrade(lock, LCK_COS);
                                                    mode = LCK_COS;
                                            }
            -                               ptlrpc_save_lock(req, h, mode, no_ack);
            +                               ptlrpc_save_lock(req, h, mode, cos);
                                    } else {
            -                               ldlm_lock_decref(h, mode);
            +                               mdt_fid_unlock(h, mode);
                                    }
            

            That could be the reason why it can be triggered more easily today.

            niu Niu Yawei (Inactive) added a comment - Looks there is a race can cause export reference leak on difficult reply (and when rs_no_ack enabled): MDT umount calls server_disconnect_export() to disconnects it's export, server_disconnect_exports() tries to complete all it's outstanding (difficult) replies, that'll call into ptlrpc_handle_rs() at the end, where the reply sate is removed from uncommited list, and LNetMDUnlink() is called to trigger an unlink event, reply_out_callback() catches the event and try to finialize the reply sate (where the export hold by reply state will be released) by following code: spin_lock(&svcpt->scp_rep_lock); spin_lock(&rs->rs_lock); rs->rs_on_net = 0; if (!rs->rs_no_ack || rs->rs_transno <= rs->rs_export->exp_obd->obd_last_committed) ptlrpc_schedule_difficult_reply(rs); spin_unlock(&rs->rs_lock); spin_unlock(&svcpt->scp_rep_lock); We can see if rs_no_ack is false (COS not enabled), or the transaction has been committed, the reply state is finalized here, otherwise, it relies on commit callback, however, for the export already been disconnected by umount or whatever reason, the reply state is already being removed from the uncommitted list, so nobody will execute the reply state (until unload ptlrpc module). I think adding one more check here to see if the reply state is still on uncommitted list is necessary. COS is usually not enabled, so this defect won't be triggered easily, however by the change of LU-3538 , rs_no_ack is set unpurposely no matter if COS is enabled: @@ -2530,24 +2589,27 @@ static void mdt_save_lock(struct mdt_thread_info *info, struct lustre_handle *h, struct mdt_device *mdt = info->mti_mdt; struct ldlm_lock *lock = ldlm_handle2lock(h); struct ptlrpc_request *req = mdt_info_req(info); - int no_ack = 0; + int cos; + + cos = (mdt_cos_is_enabled(mdt) || + mdt_slc_is_enabled(mdt)); LASSERTF(lock != NULL, "no lock for cookie " LPX64 "\n" , h->cookie); + /* there is no request if mdt_object_unlock() is called * from mdt_export_cleanup()->mdt_add_dirty_flag() */ if (likely(req != NULL)) { CDEBUG(D_HA, "request = %p reply state = %p" " transno = " LPD64 "\n" , req, req->rq_reply_state, req->rq_transno); - if (mdt_cos_is_enabled(mdt)) { - no_ack = 1; + if (cos) { ldlm_lock_downgrade(lock, LCK_COS); mode = LCK_COS; } - ptlrpc_save_lock(req, h, mode, no_ack); + ptlrpc_save_lock(req, h, mode, cos); } else { - ldlm_lock_decref(h, mode); + mdt_fid_unlock(h, mode); } That could be the reason why it can be triggered more easily today.

            Is this bug a duplicate of LU-7858?

            adilger Andreas Dilger added a comment - Is this bug a duplicate of LU-7858 ?

            Thank you for the information, James, I'm looking at this issue. The console log shows that there is no pending RPC, locks, or commit callback is holding the export, so I'm wondering if it because someone missed puting reference.

            niu Niu Yawei (Inactive) added a comment - Thank you for the information, James, I'm looking at this issue. The console log shows that there is no pending RPC, locks, or commit callback is holding the export, so I'm wondering if it because someone missed puting reference.
            jamesanunez James Nunez (Inactive) added a comment - - edited

            Niu,
            Here is a failure that should include your debug patch: https://testing.hpdd.intel.com/test_sets/38e4f102-79bf-11e6-b08e-5254006e85c2

            From the MDS 1 log:

            09:49:55:[26605.283675] Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
            09:49:55:[26605.288046] Lustre: Skipped 24 previous similar messages
            09:49:55:[26607.056252] LustreError: 30936:0:(client.c:1168:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880046125200 x1545343953854000/t0(0) o13->lustre-OST0005-osc-MDT0000@10.9.4.148@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
            09:49:55:[26607.064057] LustreError: 30936:0:(client.c:1168:ptlrpc_import_delay_req()) Skipped 23 previous similar messages
            09:49:55:[26618.703121] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
            09:49:55:[26618.711819] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0:           (null)  4294967369 stale:0
            09:49:55:[26634.716121] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
            09:49:55:[26634.723386] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0:           (null)  4294967369 stale:0
            09:49:55:[26666.728048] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
            09:49:55:[26666.734510] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0:           (null)  4294967369 stale:0
            09:49:55:[26680.276322] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
            09:49:55:[26680.281813] Lustre: Skipped 231 previous similar messages
            09:49:55:[26730.739132] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
            09:49:55:[26730.745251] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0:           (null)  4294967369 stale:0
            09:49:55:[26831.719420] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.146@tcp (stopping)
            09:49:55:[26831.723246] Lustre: Skipped 481 previous similar messages
            09:49:55:[26858.750122] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
            09:49:55:[26858.761521] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0:           (null)  4294967369 stale:0
            09:49:55:[27000.758142] INFO: task umount:13385 blocked for more than 120 seconds.
            09:49:55:[27000.762771] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            09:49:55:[27000.765657] umount          D 000000000000f918     0 13385  13384 0x00000080
            

            Another at https://testing.hpdd.intel.com/test_sets/1dbc2ffc-7b23-11e6-8afd-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - - edited Niu, Here is a failure that should include your debug patch: https://testing.hpdd.intel.com/test_sets/38e4f102-79bf-11e6-b08e-5254006e85c2 From the MDS 1 log: 09:49:55:[26605.283675] Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete 09:49:55:[26605.288046] Lustre: Skipped 24 previous similar messages 09:49:55:[26607.056252] LustreError: 30936:0:(client.c:1168:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880046125200 x1545343953854000/t0(0) o13->lustre-OST0005-osc-MDT0000@10.9.4.148@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 09:49:55:[26607.064057] LustreError: 30936:0:(client.c:1168:ptlrpc_import_delay_req()) Skipped 23 previous similar messages 09:49:55:[26618.703121] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? 09:49:55:[26618.711819] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0: (null) 4294967369 stale:0 09:49:55:[26634.716121] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck? 09:49:55:[26634.723386] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0: (null) 4294967369 stale:0 09:49:55:[26666.728048] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck? 09:49:55:[26666.734510] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0: (null) 4294967369 stale:0 09:49:55:[26680.276322] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) 09:49:55:[26680.281813] Lustre: Skipped 231 previous similar messages 09:49:55:[26730.739132] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck? 09:49:55:[26730.745251] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0: (null) 4294967369 stale:0 09:49:55:[26831.719420] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.146@tcp (stopping) 09:49:55:[26831.723246] Lustre: Skipped 481 previous similar messages 09:49:55:[26858.750122] Lustre: lustre-MDT0000 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck? 09:49:55:[26858.761521] Lustre: lustre-MDT0000: UNLINKED ffff8800471ee000 dbac7ff5-8b18-2c65-a249-73e04ce7d6a8 10.9.4.146@tcp 1 (0 0 0) 1 0 0 0: (null) 4294967369 stale:0 09:49:55:[27000.758142] INFO: task umount:13385 blocked for more than 120 seconds. 09:49:55:[27000.762771] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 09:49:55:[27000.765657] umount D 000000000000f918 0 13385 13384 0x00000080 Another at https://testing.hpdd.intel.com/test_sets/1dbc2ffc-7b23-11e6-8afd-5254006e85c2

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: