Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.5.3
-
Lustre 2.5.3.90 w/ Bull patches, including
LU-5690
-
3
-
9223372036854775807
Description
We are unable to mount the targets on Lustre servers when using multirail configuration on the MGS.
LU-4334 introduced a format change of the mgsnode value on the targets.
Old format:
mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1
New format:
mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1
With patch LU-5690, we are now unable to start any target with this new format. We can see this Lustre error in the console of the OSS:
LDISKFS-fs (vdb): Unrecognized mount option "192.168.102.41@tcp1:192.168.101.42@tcp" or missing value
The debug log reports the following message while trying to mount OST 0:
00000020:01200004:0.0F:1466084531.621867:0:2966:0:(obd_mount.c:1339:lustre_fill_super()) VFS Op: sb ffff88001f583c00
00000020:01000004:0.0:1466084531.621882:0:2966:0:(obd_mount.c:830:lmd_print()) mount data:
00000020:01000004:0.0:1466084531.621883:0:2966:0:(obd_mount.c:833:lmd_print()) device: /dev/vdb
00000020:01000004:0.0:1466084531.621884:0:2966:0:(obd_mount.c:834:lmd_print()) flags: 0
00000020:01000004:0.0:1466084531.621884:0:2966:0:(obd_mount.c:837:lmd_print()) options: errors=remount-ro,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1
00000020:01000004:0.0:1466084531.621885:0:2966:0:(obd_mount.c:1386:lustre_fill_super()) Mounting server from /dev/vdb
00000020:01000004:0.0:1466084531.621887:0:2966:0:(obd_mount_server.c:1627:osd_start()) Attempting to start scratch-OST0000, type=osd-ldiskfs, lsifl=200002, mountfl=0
00000020:01000004:0.0:1466084531.621925:0:2966:0:(obd_mount.c:191:lustre_start_simple()) Starting obd scratch-OST0000-osd (typ=osd-ldiskfs)
00000004:00020000:0.0:1466084531.623545:0:2966:0:(osd_handler.c:5613:osd_mount()) scratch-OST0000-osd: can't mount /dev/vdb: -22
00000020:00020000:0.0:1466084531.624487:0:2966:0:(obd_config.c:572:class_setup()) setup scratch-OST0000-osd failed (-22)
00000020:00020000:0.0:1466084531.625290:0:2966:0:(obd_mount.c:200:lustre_start_simple()) scratch-OST0000-osd setup error -22
00000020:01000000:0.0:1466084531.626153:0:2966:0:(obd_config.c:750:class_decref()) finishing cleanup of obd scratch-OST0000-osd (scratch-OST0000-osd_UUID)
00000020:00020000:0.0:1466084531.626156:0:2966:0:(obd_mount_server.c:1701:server_fill_super()) Unable to start osd on /dev/vdb: -22
00000020:01000004:0.0:1466084531.627005:0:2966:0:(obd_mount.c:653:lustre_put_lsi()) put ffff88001f583c00 1
00000020:01000004:0.0:1466084531.627007:0:2966:0:(obd_mount.c:603:lustre_free_lsi()) Freeing lsi ffff880017c67000
00000020:00020000:0.0:1466084531.627009:0:2966:0:(obd_mount.c:1405:lustre_fill_super()) Unable to mount (-22)
This is easily reproducible with Lustre 2.5.3.90+LU-5690.
- tunefs.lustre --erase-params --mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 --mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1 /dev/vdb
checking for existing Lustre data: found
Reading CONFIGS/mountdataRead previous values:
Target: scratch-OST0000
Index: 0
Lustre FS: scratch
Mount type: ldiskfs
Flags: 0x42
(OST update )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1Permanent disk data:
Target: scratch-OST0000
Index: 0
Lustre FS: scratch
Mount type: ldiskfs
Flags: 0x42
(OST update )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1Writing CONFIGS/mountdata
- mount -t lustre /dev/vdb /mnt/fs/scratch/ost0
mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647mount.lustre: mount /dev/vdb at /mnt/fs/scratch/ost0 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
We are running into something very similar to this - not sure if its related or something different. Lots of detail in a thread on the mailing list - here is a link to one of the latest posts.
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014154.html
The summary of our situation is that our LFS was formatted originally using 2.8 but we have since upgraded to 2.9.51 We are using a JBOB with server pairs for failover and are using ZFS as the backend. All servers are dual-homed on both ethernet and IB. MDT and OST failover works fine. MGS failover doesn't work if we have both ethernet and IB but does if only have ethernet NID's. We have build our own lustre server RPM's using a "git checkout 2.9.51" and zfs 0.6.5.8-1. I've verified that commit 2458067d8d55173ad68caac8c0460d46bf8106a1 is in the git log. Any help would be much appreciated.