Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.4.1
    • lustre-client-modules-2.4.1-6nasC OFED3.5
      server lustre2.4.1 and 2.1.5 OFED1.5.4
    • 3
    • 13682

    Description

      Upgrading to ofed3.5 we have started to get random mount failures during client boot. The filesystem that failed to mount is random. Here it client side debug output.

      0000000:00000001:1.0:1398271322.986806:0:7677:0:(mgc_request.c:947:mgc_enqueue()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:01000000:1.0:1398271322.986808:0:7677:0:(mgc_request.c:1852:mgc_process_log()) Can't get cfg lock: -5
      10000000:00000001:1.0:1398271322.986810:0:7677:0:(mgc_request.c:125:config_log_get()) Process entered
      10000000:00000001:1.0:1398271322.986811:0:7677:0:(mgc_request.c:129:config_log_get()) Process leaving (rc=0 : 0 : 0)
      10000000:00000001:1.0:1398271322.986813:0:7677:0:(mgc_request.c:1713:mgc_process_cfg_log()) Process entered
      10000000:00000001:1.0:1398271322.986815:0:7677:0:(mgc_request.c:1774:mgc_process_cfg_log()) Process leaving via out_pop (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
      10000000:00000001:1.0:1398271322.986818:0:7677:0:(mgc_request.c:1811:mgc_process_cfg_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:01000000:1.0:1398271322.986819:0:7677:0:(mgc_request.c:1871:mgc_process_log()) MGC10.151.25.171@o2ib: configuration from log 'nbp3-client' failed (-5).
      10000000:00000001:1.0:1398271322.986822:0:7677:0:(mgc_request.c:1883:mgc_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:00000001:1.0:1398271322.986824:0:7677:0:(mgc_request.c:136:config_log_put()) Process entered
      10000000:00000001:1.0:1398271322.986825:0:7677:0:(mgc_request.c:160:config_log_put()) Process leaving
      10000000:00000001:1.0:1398271322.986826:0:7677:0:(mgc_request.c:1982:mgc_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      00000020:00000001:1.0:1398271322.986829:0:7677:0:(obd_class.h:714:obd_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      00000020:00000001:1.0:1398271322.986830:0:7677:0:(lustre_cfg.h:214:lustre_cfg_len()) Process entered
      00000020:00000001:1.0:1398271322.986831:0:7677:0:(lustre_cfg.h:220:lustre_cfg_len()) Process leaving (rc=176 : 176 : b0)
      00000020:00000001:1.0:1398271322.986833:0:7677:0:(lustre_cfg.h:259:lustre_cfg_free()) Process leaving
      00000020:02020000:1.0:1398271322.986834:0:7677:0:(obd_mount.c:119:lustre_process_log()) 15c-8: MGC10.151.25.171@o2ib: The configuration from log 'nbp3-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      00000020:00000001:1.0:1398271323.010020:0:7677:0:(obd_mount.c:122:lustre_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      

      Complete Debug output is attached

      Attachments

        Issue Links

          Activity

            [LU-4943] Client Failes to mount filesystem

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11765/
            Subject: LU-4943 obdclass: detach MGC dev on error
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 8d1e9394d3a984e257e1e4b0f46f16b7ff2183cd

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11765/ Subject: LU-4943 obdclass: detach MGC dev on error Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 8d1e9394d3a984e257e1e4b0f46f16b7ff2183cd
            haasken Ryan Haasken added a comment -

            I didn't notice that there was already a b2_5 version of this fix, so http://review.whamcloud.com/#/c/12303/ has been abandoned in favor of http://review.whamcloud.com/#/c/11765

            haasken Ryan Haasken added a comment - I didn't notice that there was already a b2_5 version of this fix, so http://review.whamcloud.com/#/c/12303/ has been abandoned in favor of http://review.whamcloud.com/#/c/11765
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7
            haasken Ryan Haasken added a comment -

            It looks like we may have gotten the same spurious Maloo failures on the b2_5 patch as we did on other branches. Can somebody restart Maloo?

            haasken Ryan Haasken added a comment - It looks like we may have gotten the same spurious Maloo failures on the b2_5 patch as we did on other branches. Can somebody restart Maloo?
            haasken Ryan Haasken added a comment -

            The patch for master has landed.

            This issue also exists in 2.5. Here is a port for b2_5: http://review.whamcloud.com/#/c/12303

            haasken Ryan Haasken added a comment - The patch for master has landed. This issue also exists in 2.5. Here is a port for b2_5: http://review.whamcloud.com/#/c/12303
            haasken Ryan Haasken added a comment - - edited

            Is the test failure in replay-ost-single on http://review.whamcloud.com/#/c/10129/ related to the patch? It doesn't seem like it to me, but I don't see a bug matching that failure.

            haasken Ryan Haasken added a comment - - edited Is the test failure in replay-ost-single on http://review.whamcloud.com/#/c/10129/ related to the patch? It doesn't seem like it to me, but I don't see a bug matching that failure.

            People

              bobijam Zhenyu Xu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: