Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.5.0, Lustre 2.7.0, Lustre 2.4.3
    • None
    • nbp8-mds running lustre 2.4.3, CentoS 6.5
    • 1
    • 17570

    Description

      There appears to be some IB related congestion problems that are destabilizing out lustre servers. The various OSS and MDS systems will stop communicating over IB or the meta-data traffic on the file system slows down to where is can take minutes for an ls -l to return information. The problems come in waves, everything can be okay for a few hours, then we get a slow file system, after an hour or two it can recover and be fine for a few hours, then the same or different lustre file system has problems. This has been happening for a few days. Last night, /nobackupp8 went down. When the system was brought up and we mount the MDT, we get a lot of traffic to the MDS which locks the system up. The file system has been down since last night. We'd like to talk to a lustre engineer on this problem.

      service214: Feb 20 09:50:29 kern:err:nbp8-oss13 LNetError: 9772:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 36 seconds
      service214: Feb 20 09:50:29 kern:err:nbp8-oss13 LNetError: 9772:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.151.27.60@o2ib (186): c: 0, oc: 0, rc: 8
      service221: Feb 20 10:24:11 kern:err:nbp8-oss20 LNetError: 9776:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 31 seconds
      service221: Feb 20 10:24:11 kern:err:nbp8-oss20 LNetError: 9776:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.151.27.60@o2ib (181): c: 0, oc: 0, rc: 8
      service216: Feb 20 09:49:41 kern:err:nbp8-oss15 LNetError: 9816:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 43 seconds
      service216: Feb 20 09:49:41 kern:err:nbp8-oss15 LNetError: 9816:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.151.27.60@o2ib (193): c: 0, oc: 0, rc: 8
      service203: Feb 20 09:49:43 kern:err:nbp8-oss2 LNetError: 9880:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 21 seconds
      service203: Feb 20 09:49:43 kern:err:nbp8-oss2 LNetError: 9880:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 10.151.27.60@o2ib (171): c: 0, oc: 0, rc: 8
      service226: Feb 20 10:33:18 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424457198/real 1424457198] req@ffff881f62464800 x1493641412487452/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424457303 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 10:43:18 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424457798/real 1424457798] req@ffff881f42990000 x1493641412488128/t0(0) o38->nbp8-MDT0000-lwp-OST0080@10.151.27.60@o2ib:12/10 lens 400/544 e 0 to 1 dl 1424457903 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 10:53:18 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424458398/real 1424458398] req@ffff881f4758f400 x1493641412488828/t0(0) o38->nbp8-MDT0000-lwp-OST009a@10.151.27.60@o2ib:12/10 lens 400/544 e 0 to 1 dl 1424458503 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 11:03:18 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424458998/real 1424458998] req@ffff881f64bad000 x1493641412489500/t0(0) o38->nbp8-MDT0000-lwp-OST009a@10.151.27.60@o2ib:12/10 lens 400/544 e 0 to 1 dl 1424459103 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 09:51:43 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1424454598/real 0] req@ffff881f46825400 x1493641412486068/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424454703 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 10:02:08 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1424455223/real 0] req@ffff881f405d9400 x1493641412486328/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424455328 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 10:12:33 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1424455848/real 0] req@ffff881f48c69c00 x1493641412486588/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424455953 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      service226: Feb 20 10:22:58 kern:warning:nbp8-oss25 Lustre: 7920:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1424456473/real 0] req@ffff881f4568e000 x1493641412486848/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424456578 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:41:39 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424457699/real 1424457699] req@ffff880ff2c3d400 x1493641414590864/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424457804 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:52:04 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424458324/real 1424458324] req@ffff881028c58400 x1493641414591540/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424458429 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 11:02:04 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424458924/real 1424458924] req@ffff880e3c2cb800 x1493641414592216/t0(0) o38->nbp8-MDT0000-lwp-OST011e@10.151.27.60@o2ib:12/10 lens 400/544 e 0 to 1 dl 1424459029 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:31:19 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1424456974/real 0] req@ffff880e669c6000 x1493641414590140/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424457079 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 09:49:39 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424454474/real 1424454474] req@ffff880ff480b000 x1493641414589100/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424454579 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:00:04 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424455099/real 1424455099] req@ffff880ff4584c00 x1493641414589360/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424455204 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:10:29 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424455724/real 1424455724] req@ffff880ff310c000 x1493641414589620/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424455829 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      service202: Feb 20 10:20:54 kern:warning:nbp8-oss1 Lustre: 8544:0:(client.c:1878:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424456349/real 1424456349] req@ffff880db2b0d000 x1493641414589880/t0(0) o250->MGC10.151.27.60@o2ib@10.151.27.60@o2ib:26/25 lens 400/544 e 0 to 1 dl 1424456454 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1

      Attachments

        1. nbp8-mds1.syslog
          108 kB
        2. service200.LBUG.gz
          108 kB
        3. service200.log.gz
          440 kB
        4. service200.ps.gz
          123 kB

        Issue Links

          Activity

            [LU-6269] Unable to mount /nobackupp8
            pjones Peter Jones added a comment -

            The remaining work here will be tracked under LU-5297

            pjones Peter Jones added a comment - The remaining work here will be tracked under LU-5297

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/14806
            Subject: LU-6269 osp: decrease opd_syn_changes for an invalid rec
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1ecf3aff71b3ab3b6c6432d033535e31243878dc

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/14806 Subject: LU-6269 osp: decrease opd_syn_changes for an invalid rec Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1ecf3aff71b3ab3b6c6432d033535e31243878dc

            OSP sync thread can't handle invalid record properly, see LU-5297.

            niu Niu Yawei (Inactive) added a comment - OSP sync thread can't handle invalid record properly, see LU-5297 .
            pjones Peter Jones added a comment -

            Emoly

            Could you please turn Oleg's suggestion here - https://jira.hpdd.intel.com/browse/LU-6269?focusedCommentId=107620&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107620 into a proper patch? Oleg has also suggested that you use a failloc to trigger this condition.

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please turn Oleg's suggestion here - https://jira.hpdd.intel.com/browse/LU-6269?focusedCommentId=107620&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107620 into a proper patch? Oleg has also suggested that you use a failloc to trigger this condition. Thanks Peter
            green Oleg Drokin added a comment -

            b2_5 version of LU-4345 http://review.whamcloud.com/11435 already includes LU05188 patch as well, so you should be fine.

            green Oleg Drokin added a comment - b2_5 version of LU-4345 http://review.whamcloud.com/11435 already includes LU05188 patch as well, so you should be fine.

            Oleg, I just updated nas-2.5.3 to up to today of b2_5. It already has LU-4345, but b2_5 does not have LU-5188. Do I need to cherry-pick LU-5188 from master to nas-2.5.3?

            jaylan Jay Lan (Inactive) added a comment - Oleg, I just updated nas-2.5.3 to up to today of b2_5. It already has LU-4345 , but b2_5 does not have LU-5188 . Do I need to cherry-pick LU-5188 from master to nas-2.5.3?
            green Oleg Drokin added a comment -

            Well. Once you have LU-4345 patchset 7, the only reason for my patch would be to combat the condition of disk corruption of an llog record - i.e. if somehow an llog record becomes corrupted in such a way that lsr_valid becomes invalid and triggers the check - you'll have a crash exactly like you had after applying the updated patch from LU-5188. Chance of that is much-much smaller than before so I imagine it's ok not to carry it until we hash out a final version.

            green Oleg Drokin added a comment - Well. Once you have LU-4345 patchset 7, the only reason for my patch would be to combat the condition of disk corruption of an llog record - i.e. if somehow an llog record becomes corrupted in such a way that lsr_valid becomes invalid and triggers the check - you'll have a crash exactly like you had after applying the updated patch from LU-5188 . Chance of that is much-much smaller than before so I imagine it's ok not to carry it until we hash out a final version.

            With patchset 7 of LU-4345 in place, do we still need the patch you provided at 21/Feb/15 7:47 PM?

            jaylan Jay Lan (Inactive) added a comment - With patchset 7 of LU-4345 in place, do we still need the patch you provided at 21/Feb/15 7:47 PM?
            green Oleg Drokin added a comment -

            you are correct that the regression was only part of patchset 1 and if you use 10462 patchset 7 then you should be fine.

            11435 is not really applicable to your tree then as it's b2_5 based, I only brought it in as the example of what things you really needed, but I did not know there was a right-sized b2_4 backport too.

            green Oleg Drokin added a comment - you are correct that the regression was only part of patchset 1 and if you use 10462 patchset 7 then you should be fine. 11435 is not really applicable to your tree then as it's b2_5 based, I only brought it in as the example of what things you really needed, but I did not know there was a right-sized b2_4 backport too.

            OK, the two code hunks you quoted above are in both patch LU-5296 (#10989) and in
            LU-4345 ( #10462/7 and #11435). By saying "this is a regression of sorts introduced by LU-4345", I bet you meant to say earlier patch set, especially patch set #1 of #10462). So, either #10462/7 or #11435 of LU-4345 should fix the regression already. Am I correct?
            If I am correct, then I do not really need LU-5296(#10989)? Or do I still need #10989 for other code fixes besides the two hunks?

            Between #10462/7 and #11435, which one do you recommend us to pick? Remember we are running code based on b2_4.

            It is kind of scary to know that we probably the only one running b2_4..., code paths the (next to) nobody have run...

            jaylan Jay Lan (Inactive) added a comment - OK, the two code hunks you quoted above are in both patch LU-5296 (#10989) and in LU-4345 ( #10462/7 and #11435). By saying "this is a regression of sorts introduced by LU-4345 ", I bet you meant to say earlier patch set, especially patch set #1 of #10462). So, either #10462/7 or #11435 of LU-4345 should fix the regression already. Am I correct? If I am correct, then I do not really need LU-5296 (#10989)? Or do I still need #10989 for other code fixes besides the two hunks? Between #10462/7 and #11435, which one do you recommend us to pick? Remember we are running code based on b2_4. It is kind of scary to know that we probably the only one running b2_4..., code paths the (next to) nobody have run...
            green Oleg Drokin added a comment -

            Well, the thing of the matter is that currently you are running with a unique set of patches that nobody else runs, so exposure is very little, as opposed to b2_5 that runs with patch 11435 that incorporates a number of fixes.
            I specifically pointed at LU-5296 because it seems to be fixing a problem you seem to be fixing, namely this part (don't apply it alone, this is just a demonstration):

            @@ -237,7 +237,9 @@ static int osp_sync_add_rec(const struct lu_env *env, struct
                            LASSERT(attr);
                            osi->osi_setattr.lsr_uid = attr->la_uid;
                            osi->osi_setattr.lsr_gid = attr->la_gid;
            -               osi->osi_setattr.lsr_valid = attr->la_valid;
            +               osi->osi_setattr.lsr_valid =
            +                       ((attr->la_valid & LA_UID) ? OBD_MD_FLUID : 0) |
            +                       ((attr->la_valid & LA_GID) ? OBD_MD_FLGID : 0);
                            break;
                    default:
                            LBUG();
            @@ -483,8 +485,9 @@ static int osp_sync_new_setattr_job(struct osp_device *d,
                    ENTRY;
                    LASSERT(h->lrh_type == MDS_SETATTR64_REC);
             
            -       /* lsr_valid can only be 0 or LA_UID/GID set */
            -       if ((rec->lsr_valid & ~(LA_UID | LA_GID)) != 0) {
            +       /* lsr_valid can only be 0 or have OBD_MD_{FLUID,FLGID} set,
            +        * so no bits other than these should be set. */
            +       if ((rec->lsr_valid & ~(OBD_MD_FLUID | OBD_MD_FLGID)) != 0) {
                            CERROR("%s: invalid setattr record, lsr_valid:"LPU64"\n",
                                   d->opd_obd->obd_name, rec->lsr_valid);
                            /* return 0 so that sync thread can continue processing
            

            First hunk starts to really enforce filtering of what's placed into lsr_valid to ensure we only place there the two valid bits and nothing else. Before the patch all sorts of unrelated stuff might be set (CTIME is the most common - also seen in your logs) that trips the check in the second hunk. - this is a regression of sorts introduced by LU-4345 and that's why you need it, without any regards of possible interactions with LU-4690 that you are not carryign anyway.

            change 10462 looks like a good one based on the list of patches in there (and some quick checks). The main risk remains that this is code that (next to) nobody have run, and even though it looks like almost identical to b2_5, there might be some other code paths not hit in b2_5 for example (there were after all changes in b2_5 compared to b2_4).

            green Oleg Drokin added a comment - Well, the thing of the matter is that currently you are running with a unique set of patches that nobody else runs, so exposure is very little, as opposed to b2_5 that runs with patch 11435 that incorporates a number of fixes. I specifically pointed at LU-5296 because it seems to be fixing a problem you seem to be fixing, namely this part (don't apply it alone, this is just a demonstration): @@ -237,7 +237,9 @@ static int osp_sync_add_rec(const struct lu_env *env, struct LASSERT(attr); osi->osi_setattr.lsr_uid = attr->la_uid; osi->osi_setattr.lsr_gid = attr->la_gid; - osi->osi_setattr.lsr_valid = attr->la_valid; + osi->osi_setattr.lsr_valid = + ((attr->la_valid & LA_UID) ? OBD_MD_FLUID : 0) | + ((attr->la_valid & LA_GID) ? OBD_MD_FLGID : 0); break; default: LBUG(); @@ -483,8 +485,9 @@ static int osp_sync_new_setattr_job(struct osp_device *d, ENTRY; LASSERT(h->lrh_type == MDS_SETATTR64_REC); - /* lsr_valid can only be 0 or LA_UID/GID set */ - if ((rec->lsr_valid & ~(LA_UID | LA_GID)) != 0) { + /* lsr_valid can only be 0 or have OBD_MD_{FLUID,FLGID} set, + * so no bits other than these should be set. */ + if ((rec->lsr_valid & ~(OBD_MD_FLUID | OBD_MD_FLGID)) != 0) { CERROR("%s: invalid setattr record, lsr_valid:"LPU64"\n", d->opd_obd->obd_name, rec->lsr_valid); /* return 0 so that sync thread can continue processing First hunk starts to really enforce filtering of what's placed into lsr_valid to ensure we only place there the two valid bits and nothing else. Before the patch all sorts of unrelated stuff might be set (CTIME is the most common - also seen in your logs) that trips the check in the second hunk. - this is a regression of sorts introduced by LU-4345 and that's why you need it, without any regards of possible interactions with LU-4690 that you are not carryign anyway. change 10462 looks like a good one based on the list of patches in there (and some quick checks). The main risk remains that this is code that (next to) nobody have run, and even though it looks like almost identical to b2_5, there might be some other code paths not hit in b2_5 for example (there were after all changes in b2_5 compared to b2_4).

            People

              emoly.liu Emoly Liu
              hyeung Herbert Yeung
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: