Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.4.1
    • lustre-client-modules-2.4.1-6nasC OFED3.5
      server lustre2.4.1 and 2.1.5 OFED1.5.4
    • 3
    • 13682

    Description

      Upgrading to ofed3.5 we have started to get random mount failures during client boot. The filesystem that failed to mount is random. Here it client side debug output.

      0000000:00000001:1.0:1398271322.986806:0:7677:0:(mgc_request.c:947:mgc_enqueue()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:01000000:1.0:1398271322.986808:0:7677:0:(mgc_request.c:1852:mgc_process_log()) Can't get cfg lock: -5
      10000000:00000001:1.0:1398271322.986810:0:7677:0:(mgc_request.c:125:config_log_get()) Process entered
      10000000:00000001:1.0:1398271322.986811:0:7677:0:(mgc_request.c:129:config_log_get()) Process leaving (rc=0 : 0 : 0)
      10000000:00000001:1.0:1398271322.986813:0:7677:0:(mgc_request.c:1713:mgc_process_cfg_log()) Process entered
      10000000:00000001:1.0:1398271322.986815:0:7677:0:(mgc_request.c:1774:mgc_process_cfg_log()) Process leaving via out_pop (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
      10000000:00000001:1.0:1398271322.986818:0:7677:0:(mgc_request.c:1811:mgc_process_cfg_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:01000000:1.0:1398271322.986819:0:7677:0:(mgc_request.c:1871:mgc_process_log()) MGC10.151.25.171@o2ib: configuration from log 'nbp3-client' failed (-5).
      10000000:00000001:1.0:1398271322.986822:0:7677:0:(mgc_request.c:1883:mgc_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      10000000:00000001:1.0:1398271322.986824:0:7677:0:(mgc_request.c:136:config_log_put()) Process entered
      10000000:00000001:1.0:1398271322.986825:0:7677:0:(mgc_request.c:160:config_log_put()) Process leaving
      10000000:00000001:1.0:1398271322.986826:0:7677:0:(mgc_request.c:1982:mgc_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      00000020:00000001:1.0:1398271322.986829:0:7677:0:(obd_class.h:714:obd_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      00000020:00000001:1.0:1398271322.986830:0:7677:0:(lustre_cfg.h:214:lustre_cfg_len()) Process entered
      00000020:00000001:1.0:1398271322.986831:0:7677:0:(lustre_cfg.h:220:lustre_cfg_len()) Process leaving (rc=176 : 176 : b0)
      00000020:00000001:1.0:1398271322.986833:0:7677:0:(lustre_cfg.h:259:lustre_cfg_free()) Process leaving
      00000020:02020000:1.0:1398271322.986834:0:7677:0:(obd_mount.c:119:lustre_process_log()) 15c-8: MGC10.151.25.171@o2ib: The configuration from log 'nbp3-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      00000020:00000001:1.0:1398271323.010020:0:7677:0:(obd_mount.c:122:lustre_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
      

      Complete Debug output is attached

      Attachments

        Issue Links

          Activity

            [LU-4943] Client Failes to mount filesystem
            diegom Diego Moreno (Inactive) made changes -
            Link Original: This issue is related to DDN-274 [ DDN-274 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Link New: This issue is related to DDN-274 [ DDN-274 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to SGI-148 [ SGI-148 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to LDEV-44 [ LDEV-44 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-45 [ LDEV-45 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-44 [ LDEV-44 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.5.4 [ 11190 ]
            Labels Original: mq414 patch New: patch

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11765/
            Subject: LU-4943 obdclass: detach MGC dev on error
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 8d1e9394d3a984e257e1e4b0f46f16b7ff2183cd

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11765/ Subject: LU-4943 obdclass: detach MGC dev on error Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 8d1e9394d3a984e257e1e4b0f46f16b7ff2183cd
            haasken Ryan Haasken added a comment -

            I didn't notice that there was already a b2_5 version of this fix, so http://review.whamcloud.com/#/c/12303/ has been abandoned in favor of http://review.whamcloud.com/#/c/11765

            haasken Ryan Haasken added a comment - I didn't notice that there was already a b2_5 version of this fix, so http://review.whamcloud.com/#/c/12303/ has been abandoned in favor of http://review.whamcloud.com/#/c/11765
            pjones Peter Jones made changes -
            Labels Original: mq414 p4n patch New: mq414 patch

            People

              bobijam Zhenyu Xu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: