Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.10.0
-
None
-
Lustre: Build Version: 2.10.0_71_g6d59523
-
3
-
9223372036854775807
Description
While trying to mount a Lustre target for initial registration I got:
# mount --verbose -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk3 /mnt/test-fs-MDT0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/sdd arg[5] = /mnt/test-fs-MDT0000 source = /dev/sdd (/dev/sdd), target = /mnt/test-fs-MDT0000 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/sdd at /mnt/test-fs-MDT0000, flags=0x1000000 options=user_xattr,errors=remount-ro,,osd=osd-ldiskfs,mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,virgin,update,param=mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,param=failover.node=10.14.81.0@tcp,svname=test-fs-MDT0000,device=/dev/sdd mount.lustre: increased /sys/block/sdd/queue/max_sectors_kb from 512 to 16384 mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory retries left: 0 mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) # echo $? 2
Error messages on the MDS trying to register the target:
Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: errors=remount-ro Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: 1771:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1505185132/real 1505185132] req@ff Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 166-1: MGC10.14.81.0@tcp: Connection to MGS (at 10.14.81.0@tcp) was lost; in progress operations using this service will fail Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 13a-8: Failed to get MGS log test-fs-MDT0000 and no local copy. Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 15c-8: MGC10.14.81.0@tcp: The configuration from log 'test-fs-MDT0000' failed (-2). This may be the result of communication errors bet Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1373:server_start_targets()) failed to start server test-fs-MDT0000: -2 Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -2 Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1576:server_put_super()) no obd test-fs-MDT0000 Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: server umount test-fs-MDT0000 complete Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-2)
Error messages on the MGS:
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: 30767:0:(osd_handler.c:7007:osd_mount()) MGS-osd: device /dev/sdc was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device' Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to MGC10.14.81.0@tcp_0 (at 0@lo) Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Received new LWP connection from 10.14.81.1@tcp, removing former export from same NID Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to 0f2304fc-a4f2-fb0d-fe61-0eb9e38e1b0a (at 10.14.81.1@tcp) Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message
A subsequent attempt to mount the target was successful.
I have attached the lustre debug for the mds and mgs where this occurred.