[LU-9976] mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory Created: 12/Sep/17  Updated: 03/Oct/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Brian Murrell (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Lustre: Build Version: 2.10.0_71_g6d59523


Attachments: File lctl_debug-mds.bz2     File lctl_debug-mgs.bz2    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While trying to mount a Lustre target for initial registration I got:

# mount --verbose -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk3 /mnt/test-fs-MDT0000
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = /dev/sdd
arg[5] = /mnt/test-fs-MDT0000
source = /dev/sdd (/dev/sdd), target = /mnt/test-fs-MDT0000
options = rw
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Writing CONFIGS/mountdata
mounting device /dev/sdd at /mnt/test-fs-MDT0000, flags=0x1000000 options=user_xattr,errors=remount-ro,,osd=osd-ldiskfs,mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,virgin,update,param=mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,param=failover.node=10.14.81.0@tcp,svname=test-fs-MDT0000,device=/dev/sdd

mount.lustre: increased /sys/block/sdd/queue/max_sectors_kb from 512 to 16384
mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory retries left: 0
mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
# echo $?
2

Error messages on the MDS trying to register the target:

Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: 1771:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1505185132/real 1505185132]  req@ff
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 166-1: MGC10.14.81.0@tcp: Connection to MGS (at 10.14.81.0@tcp) was lost; in progress operations using this service will fail
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 13a-8: Failed to get MGS log test-fs-MDT0000 and no local copy.
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 15c-8: MGC10.14.81.0@tcp: The configuration from log 'test-fs-MDT0000' failed (-2). This may be the result of communication errors bet
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1373:server_start_targets()) failed to start server test-fs-MDT0000: -2
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -2
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1576:server_put_super()) no obd test-fs-MDT0000
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: server umount test-fs-MDT0000 complete
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-2)

Error messages on the MGS:

Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: 30767:0:(osd_handler.c:7007:osd_mount()) MGS-osd: device /dev/sdc was upgraded from Lustre-1.x without enabling the dirdata feature. If you
 do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device'
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to MGC10.14.81.0@tcp_0 (at 0@lo)
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Received new LWP connection from 10.14.81.1@tcp, removing former export from same NID
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to 0f2304fc-a4f2-fb0d-fe61-0eb9e38e1b0a (at 10.14.81.1@tcp)
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message

A subsequent attempt to mount the target was successful.

I have attached the lustre debug for the mds and mgs where this occurred.



 Comments   
Comment by Andreas Dilger [ 13/Sep/17 ]

At a minimum, the message about the missing dirdata feature should be removed for the MGS. That is only relevant for the MDT.

Comment by Brian Murrell (Inactive) [ 26/Sep/17 ]

Do we know what the cause of this is and what remediation should be taken when this occurs?

Until we know these, we can't know what IML should do about it and remediating this is blocking the 4.0 RC.

Comment by John Hammond [ 03/Oct/17 ]

In a production setting, IML should at least retry the mount.

Generated at Sat Feb 10 02:30:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.