Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
Lustre 2.4.1
-
3
-
11805
Description
On one of our test cluster installed with Lustre 2.4.1, we somtimes saw the following error message in the "shine" command line tool output, when starting a lustre file system, and the corresponding OST is not mounted:
mount.lustre: mount /dev/mapper/mpathj at /mnt/fs1/ost/6 failed: Input/output error Is the MGS running?
The test file system is composed of 6 servers: one MDS (one MDT), 4 OSS (3 with 2 OSTs and one with 1 OST) and a separate MGS.
Configuration (see attached config_parameters file for details):
MGS: lama5 (failover lama6)
MDS: lama6 (failover lama5)
OSS: lama7 (failover lama8, lama9 and lama10) to lama10 (failover lama7, lama8 and lama9)
When the error occurs, we have the following lustre kernel traces on MGS:
MGS: Client <client_name> seen on new nid <nid2> when existing nid <nid1> is already connected ... @@@ MGS fail to handle opc = 250: rc = -114 ...
and on OSS:
InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r0:or0:NEW ... InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r1:or0:CONNECTING ... recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-5) ... MGS: recovery started, waiting 100000 seconds ... MGC10.3.0.10@o2ib: Communicating with 10.4.0.10@o2ib1, operation mgs_connect failed with -114 ... recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-114) MGS: recovery finished ... fs1-OST0005: cannot register this server with the MGS: rc = -5. Is the MGS running? ... Unable to start targets: -5 ... Unable to mount (-5)
I was able to reproduce the error without shine and with only one OSS, with the script below.
The MGS (lama5) and MDS (lama6) are started/mounted, and the script is started on lama10.
If the tunefs.lustre or the lustre_rmmod is removed, or the first mount is started in foreground, the error does not occur.
N=1
rm -f error stop
while true; do
tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
"--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic1@o2ib0" \
"--failnode=lama8-ic1@o2ib0" "--failnode=lama9-ic1@o2ib0" \
--network=o2ib0 --writeconf /dev/ldn.cook.ost3 > /dev/null
tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
"--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic2@o2ib1" \
"--failnode=lama8-ic2@o2ib1" "--failnode=lama9-ic2@o2ib1" \
--network=o2ib1 --writeconf /dev/ldn.cook.ost6 > /dev/null
modprobe fsfilt_ldiskfs
modprobe lustre
ssh lama5 lctl clear
dmesg -c > /dev/null
ssh lama5 dmesg -c > /dev/null
(/bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost3 /mnt/fs1/ost/5 || touch error) &
/bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost6 /mnt/fs1/ost/6 || touch error
wait
if [ -f error ]; then
lctl dk > oss.lustre.dk.bad
ssh lama5 lctl dk > mgs.lustre.dk.bad
dmesg > oss.dmesg.bad
ssh lama5 dmesg > mgs.dmesg.bad
else
lctl dk > oss.lustre.dk.good
ssh lama5 lctl dk > mgs.lustre.dk.good
dmesg > oss.dmesg.good
ssh lama5 dmesg > mgs.dmesg.good
fi
umount /mnt/fs1/ost/5
umount /mnt/fs1/ost/6
lustre_rmmod
[ -f stop -o -f error ] && break
[ $N -ge 25 ] && break
echo "============================> loop $N"
N=$((N+1))
done
I have attached a tarball containing the config parameters, the reproducer, and the files produced by the reproducer:
reproducer
config_parameters
mgs.dmesg.good, mgs.lustre.dk.good, oss.dmesg.good, oss.lustre.dk.good
mgs.dmesg.bad, mgs.lustre.dk.bad, oss.dmesg.bad, oss.lustre.dk.bad
I have tried the following patch, which skips the connection at INIT_RECOV_BACKUP if one already exists.
With this patch the "mount" no longer fails, but it's only a workaround and it does not solve the problem of double connection on MGS. Probably there is a missing serialisation/synchronisation.
--- a/lustre/mgc/mgc_request.c
+++ b/lustre/mgc/mgc_request.c
@@ -1029,6 +1029,7 @@ int mgc_set_info_async(const struct lu_e
ptlrpc_import_state_name(imp->imp_state));
/* Resurrect if we previously died */
if ((imp->imp_state != LUSTRE_IMP_FULL &&
+ imp->imp_state != LUSTRE_IMP_CONNECTING &&
imp->imp_state != LUSTRE_IMP_NEW) || value > 1)
ptlrpc_reconnect_import(imp);
RETURN(0);
Attachments
Issue Links
- is duplicated by
-
LU-1279 failure trying to mount two targets at the same time after boot
-
- Resolved
-