Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.4.2
-
3
-
14229
Description
I set up a small Lustre filesystem inside of a few VMs running our TOSS 2.2 packages, and the initscript is failing to mount the MGS and MDT when run after a reboot of the MGS. I think this might be a duplicate of LU-1279, so feel free to mark it a duplicate if that's the case.
-bash-4.1# dmesg -c > /dev/null -bash-4.1# time /etc/init.d/lustre start Mounting stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000 Mounting stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000 mount.lustre: mount stotch-mds1/mgs0 at /mnt/lustre/local/stotch-MGS0000 failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems mount.lustre: mount stotch-mds1/mdt0 at /mnt/lustre/local/stotch-MDT0000 failed: Input/output error Is the MGS running? real 7m34.545s user 0m0.427s sys 0m0.173s -bash-4.1# mount /dev/mapper/VolGroup-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/vda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) -bash-4.1# dmesg LNet: HW CPU cores: 4, npartitions: 1 alg: No test for crc32 (crc32-table) alg: No test for adler32 (adler32-zlib) padlock: VIA PadLock Hash Engine not detected. Lustre: Lustre: Build Version: 2.4.2-11chaos-11chaos--PRISTINE-2.6.32-431.17.2.1chaos.ch5.2.x86_64 fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RQF_FLD_QUERY fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_server_pack fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_client_get fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_queue_wait fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_fini fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_init fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_set fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol req_capsule_server_get fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_at_set_req_timeout fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_request_alloc_pack fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RMF_FLD_OPC fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_request_set_replen fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol RMF_FLD_MDFLD fld: gave up waiting for init of module ptlrpc. fld: Unknown symbol ptlrpc_req_finished LNet: Added LNI 192.168.2.90@tcp [8/256/0/180] LNet: Accept secure, port 988 LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880068beb000 x1470206796890120/t0(0) o253->MGC192.168.2.90@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 2927:0:(obd_mount_server.c:1140:server_register_target()) stotch-MDT0000: error registering with the MGS: rc = -5 (not fatal) LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880068beb000 x1470206796890124/t0(0) o101->MGC192.168.2.90@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 2927:0:(client.c:1053:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880068beb000 x1470206796890128/t0(0) o101->MGC192.168.2.90@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 LustreError: 15c-8: MGC192.168.2.90@tcp: The configuration from log 'stotch-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 2927:0:(obd_mount_server.c:1273:server_start_targets()) failed to start server stotch-MDT0000: -5 Lustre: stotch-MDT0000: Unable to start target: -5 LustreError: 2927:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) stotch-MDT0000-lwp-MDT0000: Can't end config log stotch-client. LustreError: 2927:0:(obd_mount_server.c:1442:server_put_super()) stotch-MDT0000: failed to disconnect lwp. (rc=-2) LustreError: 2927:0:(obd_mount_server.c:1472:server_put_super()) no obd stotch-MDT0000 Lustre: server umount stotch-MDT0000 complete LustreError: 2927:0:(obd_mount.c:1290:lustre_fill_super()) Unable to mount (-5) -bash-4.1# rpm -qa | grep lustre lustre-tools-llnl-1.6-1.ch5.2.x86_64 lustre-osd-ldiskfs-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64 lustre-modules-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64 lustre-osd-zfs-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64 lustre-debuginfo-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64 lustre-2.4.2-11chaos_2.6.32_431.17.2.1chaos.ch5.2.ch5.2.x86_64 -bash-4.1# cat /etc/ldev.conf stotch-mds1 - stotch-MGS0000 zfs:stotch-mds1/mgs0 stotch-mds1 - stotch-MDT0000 zfs:stotch-mds1/mdt0 stotch-oss1 - stotch-OST0000 zfs:stotch-oss1/ost0 stotch-oss2 - stotch-OST0001 zfs:stotch-oss2/ost0
Is this expected behavior? I assume not.
If I run the script a second time, everything mounts just fine (and much faster):
-bash-4.1# time /etc/init.d/lustre start Mounting stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000 Mounting stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000 real 0m4.484s user 0m0.439s sys 0m0.228s -bash-4.1# mount /dev/mapper/VolGroup-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/vda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) stotch-mds1/mgs0 on /mnt/lustre/local/stotch-MGS0000 type lustre (rw) stotch-mds1/mdt0 on /mnt/lustre/local/stotch-MDT0000 type lustre (rw)
Attachments
Issue Links
- is related to
-
LU-1279 failure trying to mount two targets at the same time after boot
- Resolved