Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5159

Lustre MGS/MDT fails to start using initscripts using 2.4.2 based packages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.4.2
    • 3
    • 14229

    Description

      I set up a small Lustre filesystem inside of a few VMs running our TOSS 2.2 packages, and the initscript is failing to mount the MGS and MDT when run after a reboot of the MGS. I think this might be a duplicate of LU-1279, so feel free to mark it a duplicate if that's the case.

      -bash-4.1# dmesg -c > /dev/null
      -bash-4.1# time /etc/init.d/lustre start
      Mounting stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000
      Mounting stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000
      mount.lustre: mount stotch-mds1/mgs0 at /mnt/lustre/local/stotch-MGS0000 failed: No such device
      Are the lustre modules loaded?
      Check /etc/modprobe.conf and /proc/filesystems
      mount.lustre: mount stotch-mds1/mdt0 at /mnt/lustre/local/stotch-MDT0000 failed: Input/output error
      Is the MGS running?
      
      real    7m34.545s
      user    0m0.427s
      sys     0m0.173s
      
      -bash-4.1# mount
      /dev/mapper/VolGroup-lv_root on / type ext4 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      /dev/vda1 on /boot type ext4 (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      
      -bash-4.1# dmesg
      LNet: HW CPU cores: 4, npartitions: 1
      alg: No test for crc32 (crc32-table)
      alg: No test for adler32 (adler32-zlib)
      padlock: VIA PadLock Hash Engine not detected.
      Lustre: Lustre: Build Version: 2.4.2-11chaos-11chaos--PRISTINE-2.6.32-431.17.2.1chaos.ch5.2.x86_64
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RQF_FLD_QUERY
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_client_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_queue_wait
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_fini
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_init
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_set
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol req_capsule_server_get
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_at_set_req_timeout
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_alloc_pack
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_OPC
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_request_set_replen
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol RMF_FLD_MDFLD
      fld: gave up waiting for init of module ptlrpc.
      fld: Unknown symbol ptlrpc_req_finished
      LNet: Added LNI 192.168.2.90@tcp [8/256/0/180]
      LNet: Accept secure, port 988
      LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880068beb000 x1470206796890120/t0(0) o253->MGC192.168.2.90@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 2927:0:(obd_mount_server.c:1140:server_register_target()) stotch-MDT0000: error registering with the MGS: rc = -5 (not fatal)
      LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880068beb000 x1470206796890124/t0(0) o101->MGC192.168.2.90@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880068beb000 x1470206796890128/t0(0) o101->MGC192.168.2.90@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 15c-8: MGC192.168.2.90@tcp: The configuration from log 'stotch-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 2927:0:(obd_mount_server.c:1273:server_start_targets()) failed to start server stotch-MDT0000: -5
      Lustre: stotch-MDT0000: Unable to start target: -5
      LustreError: 2927:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) stotch-MDT0000-lwp-MDT0000: Can't end config log stotch-client.
      LustreError: 2927:0:(obd_mount_server.c:1442:server_put_super()) stotch-MDT0000: failed to disconnect lwp. (rc=-2)
      LustreError: 2927:0:(obd_mount_server.c:1472:server_put_super()) no obd stotch-MDT0000
      Lustre: server umount stotch-MDT0000 complete
      LustreError: 2927:0:(obd_mount.c:1290:lustre_fill_super()) Unable to mount  (-5)
      
      -bash-4.1# rpm -qa | grep lustre
      lustre-tools-llnl-1.6-1.ch5.2.x86_64
      lustre-osd-ldiskfs-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64
      lustre-modules-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64
      lustre-osd-zfs-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64
      lustre-debuginfo-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64
      lustre-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64
      
      -bash-4.1# cat /etc/ldev.conf 
      stotch-mds1 - stotch-MGS0000 zfs:stotch-mds1/mgs0
      stotch-mds1 - stotch-MDT0000 zfs:stotch-mds1/mdt0
      stotch-oss1 - stotch-OST0000 zfs:stotch-oss1/ost0
      stotch-oss2 - stotch-OST0001 zfs:stotch-oss2/ost0
      

      Is this expected behavior? I assume not.

      If I run the script a second time, everything mounts just fine (and much faster):

      -bash-4.1# time /etc/init.d/lustre start
      Mounting stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000
      Mounting stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000
      
      real    0m4.484s
      user    0m0.439s
      sys     0m0.228s
      
      -bash-4.1# mount
      /dev/mapper/VolGroup-lv_root on / type ext4 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      /dev/vda1 on /boot type ext4 (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000 type lustre (rw)
      stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000 type lustre (rw)
      

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: