[LU-9976] mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory Created: 12/Sep/17 Updated: 03/Oct/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Brian Murrell (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre: Build Version: 2.10.0_71_g6d59523 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
While trying to mount a Lustre target for initial registration I got: # mount --verbose -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk3 /mnt/test-fs-MDT0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/sdd arg[5] = /mnt/test-fs-MDT0000 source = /dev/sdd (/dev/sdd), target = /mnt/test-fs-MDT0000 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/sdd at /mnt/test-fs-MDT0000, flags=0x1000000 options=user_xattr,errors=remount-ro,,osd=osd-ldiskfs,mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,virgin,update,param=mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,param=failover.node=10.14.81.0@tcp,svname=test-fs-MDT0000,device=/dev/sdd mount.lustre: increased /sys/block/sdd/queue/max_sectors_kb from 512 to 16384 mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory retries left: 0 mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) # echo $? 2 Error messages on the MDS trying to register the target: Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: errors=remount-ro Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: 1771:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1505185132/real 1505185132] req@ff Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 166-1: MGC10.14.81.0@tcp: Connection to MGS (at 10.14.81.0@tcp) was lost; in progress operations using this service will fail Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 13a-8: Failed to get MGS log test-fs-MDT0000 and no local copy. Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 15c-8: MGC10.14.81.0@tcp: The configuration from log 'test-fs-MDT0000' failed (-2). This may be the result of communication errors bet Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1373:server_start_targets()) failed to start server test-fs-MDT0000: -2 Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -2 Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1576:server_put_super()) no obd test-fs-MDT0000 Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: server umount test-fs-MDT0000 complete Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount (-2) Error messages on the MGS: Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: 30767:0:(osd_handler.c:7007:osd_mount()) MGS-osd: device /dev/sdc was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device' Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to MGC10.14.81.0@tcp_0 (at 0@lo) Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Received new LWP connection from 10.14.81.1@tcp, removing former export from same NID Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to 0f2304fc-a4f2-fb0d-fe61-0eb9e38e1b0a (at 10.14.81.1@tcp) Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message A subsequent attempt to mount the target was successful. I have attached the lustre debug for the mds and mgs where this occurred. |
| Comments |
| Comment by Andreas Dilger [ 13/Sep/17 ] |
|
At a minimum, the message about the missing dirdata feature should be removed for the MGS. That is only relevant for the MDT. |
| Comment by Brian Murrell (Inactive) [ 26/Sep/17 ] |
|
Do we know what the cause of this is and what remediation should be taken when this occurs? Until we know these, we can't know what IML should do about it and remediating this is blocking the 4.0 RC. |
| Comment by John Hammond [ 03/Oct/17 ] |
|
In a production setting, IML should at least retry the mount. |