Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7778

mount of MDT(==MGS) failed after MDS restart

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0, Lustre 2.9.0
    • Lustre 2.8.0
    • lola
      build: 2.8.50-6-gf9ca359 ; commit f9ca359284357d145819beb08b316e932f7a3060
    • 3
    • 9223372036854775807

    Description

      Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled.
      MDT had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA failover configuration.

      Please note that build 20150215 is a vanilla build of the master brunch.
      This issue might be addressed by the changes included in build '20160210' as we didn't observe this issue in a two day test session.

      Sequence of events:

      • 2016-02-15 16:25:21,179:fsmgmt.fsmgmt:INFO triggering fault mds_restart
      • 2016-02-15 16:31:41,282:fsmgmt.fsmgmt:INFO lola-8 is up
      • 2016-02-15 16:36:50,594:fsmgmt.fsmgmt:INFO ... soaked-MDT0001 mounted successfully on lola-8
      • 2016-02-15 16:38:20, mount of MDT0000 (== MGS) fails
        Error message reads as:
        Feb 15 16:38:20 lola-8 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log 'soaked-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
        Feb 15 16:38:20 lola-8 kernel: LustreError: 4538:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server soaked-MDT0000: -5
        Feb 15 16:38:20 lola-8 kernel: LustreError: 4538:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -5
        Feb 15 16:38:20 lola-8 kernel: LustreError: 4538:0:(obd_mount_server.c:1512:server_put_super()) no obd soaked-MDT0000
        Feb 15 16:38:20 lola-8 kernel: Lustre: server umount soaked-MDT0000 complete
        Feb 15 16:38:20 lola-8 kernel: LustreError: 4538:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-5)
        
      • I checked the HW and cluster configuration: no problem with IB HCA, LNet is working, routers are up; Disk device file of MDT-0000 can be read and accessed.

      Attached messages, console and manual forced debug log of node lola-8.

      Attachments

        Activity

          People

            di.wang Di Wang
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: