Loading...

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.10.0
Labels:
None
Environment:
Lustre: Build Version: 2.10.0_71_g6d59523

Severity:
3
Rank (Obsolete):
9223372036854775807

While trying to mount a Lustre target for initial registration I got:

# mount --verbose -t lustre /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk3 /mnt/test-fs-MDT0000
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = /dev/sdd
arg[5] = /mnt/test-fs-MDT0000
source = /dev/sdd (/dev/sdd), target = /mnt/test-fs-MDT0000
options = rw
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Writing CONFIGS/mountdata
mounting device /dev/sdd at /mnt/test-fs-MDT0000, flags=0x1000000 options=user_xattr,errors=remount-ro,,osd=osd-ldiskfs,mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,virgin,update,param=mgsnode=10.14.81.0@tcp:10.14.81.1@tcp,param=failover.node=10.14.81.0@tcp,svname=test-fs-MDT0000,device=/dev/sdd

mount.lustre: increased /sys/block/sdd/queue/max_sectors_kb from 512 to 16384
mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory retries left: 0
mount.lustre: mount /dev/sdd at /mnt/test-fs-MDT0000 failed: No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
# echo $?
2

Error messages on the MDS trying to register the target:

Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Sep 11 19:58:52 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdd): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: 1771:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1505185132/real 1505185132]  req@ff
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 166-1: MGC10.14.81.0@tcp: Connection to MGS (at 10.14.81.0@tcp) was lost; in progress operations using this service will fail
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 13a-8: Failed to get MGS log test-fs-MDT0000 and no local copy.
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 15c-8: MGC10.14.81.0@tcp: The configuration from log 'test-fs-MDT0000' failed (-2). This may be the result of communication errors bet
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1373:server_start_targets()) failed to start server test-fs-MDT0000: -2
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1866:server_fill_super()) Unable to start targets: -2
Sep 11 19:58:59 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount_server.c:1576:server_put_super()) no obd test-fs-MDT0000
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: server umount test-fs-MDT0000 complete
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message
Sep 11 19:59:03 lotus-10vm6.lotus.hpdd.lab.intel.com kernel: LustreError: 1771:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-2)

Error messages on the MGS:

Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: 30767:0:(osd_handler.c:7007:osd_mount()) MGS-osd: device /dev/sdc was upgraded from Lustre-1.x without enabling the dirdata feature. If you
 do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device'
Sep 11 19:58:44 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to MGC10.14.81.0@tcp_0 (at 0@lo)
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Received new LWP connection from 10.14.81.1@tcp, removing former export from same NID
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: MGS: Connection restored to 0f2304fc-a4f2-fb0d-fe61-0eb9e38e1b0a (at 10.14.81.1@tcp)
Sep 11 19:59:03 lotus-10vm5.lotus.hpdd.lab.intel.com kernel: Lustre: Skipped 1 previous similar message

A subsequent attempt to mount the target was successful.

I have attached the lustre debug for the mds and mgs where this occurred.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

lctl_debug-mds.bz2
6.85 MB
12/Sep/17 1:58 PM
lctl_debug-mgs.bz2
7.28 MB
12/Sep/17 1:58 PM

Details

Description

Attachments

Attachments

Activity

People

Dates