[LU-4311] Mount sometimes fails with EIO on OSS with several mounts in parallel - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:
Lustre 2.4.1

Severity:
3
Rank (Obsolete):
11805

Description

On one of our test cluster installed with Lustre 2.4.1, we somtimes saw the following error message in the "shine" command line tool output, when starting a lustre file system, and the corresponding OST is not mounted:

mount.lustre: mount /dev/mapper/mpathj at /mnt/fs1/ost/6 failed: Input/output error
Is the MGS running?

The test file system is composed of 6 servers: one MDS (one MDT), 4 OSS (3 with 2 OSTs and one with 1 OST) and a separate MGS.
Configuration (see attached config_parameters file for details):
MGS: lama5 (failover lama6)
MDS: lama6 (failover lama5)
OSS: lama7 (failover lama8, lama9 and lama10) to lama10 (failover lama7, lama8 and lama9)

When the error occurs, we have the following lustre kernel traces on MGS:

MGS: Client <client_name> seen on new nid <nid2> when existing nid <nid1> is already connected
...
@@@ MGS fail to handle opc = 250: rc = -114
...

and on OSS:

InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r0:or0:NEW
...
InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r1:or0:CONNECTING
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-5)
...
MGS: recovery started, waiting 100000 seconds
...
MGC10.3.0.10@o2ib: Communicating with 10.4.0.10@o2ib1, operation mgs_connect failed with -114
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-114)
MGS: recovery finished
...
fs1-OST0005: cannot register this server with the MGS: rc = -5. Is the MGS running?
...
Unable to start targets: -5
...
Unable to mount  (-5)

I was able to reproduce the error without shine and with only one OSS, with the script below.
The MGS (lama5) and MDS (lama6) are started/mounted, and the script is started on lama10.
If the tunefs.lustre or the lustre_rmmod is removed, or the first mount is started in foreground, the error does not occur.

N=1
rm -f error stop
while true; do
        tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
             "--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic1@o2ib0" \
             "--failnode=lama8-ic1@o2ib0" "--failnode=lama9-ic1@o2ib0" \
              --network=o2ib0 --writeconf /dev/ldn.cook.ost3 > /dev/null

        tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
             "--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic2@o2ib1" \
             "--failnode=lama8-ic2@o2ib1" "--failnode=lama9-ic2@o2ib1" \
             --network=o2ib1 --writeconf /dev/ldn.cook.ost6 > /dev/null

        modprobe fsfilt_ldiskfs
        modprobe lustre
        ssh lama5 lctl clear
        dmesg -c > /dev/null
        ssh lama5 dmesg -c > /dev/null
        (/bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost3 /mnt/fs1/ost/5 || touch error) &
        /bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost6 /mnt/fs1/ost/6 || touch error
        wait
        if [ -f error ]; then
                lctl dk > oss.lustre.dk.bad
                ssh lama5 lctl dk > mgs.lustre.dk.bad
                dmesg > oss.dmesg.bad
                ssh lama5 dmesg > mgs.dmesg.bad
        else
                lctl dk > oss.lustre.dk.good
                ssh lama5 lctl dk > mgs.lustre.dk.good
                dmesg > oss.dmesg.good
                ssh lama5 dmesg > mgs.dmesg.good
        fi
        umount /mnt/fs1/ost/5
        umount /mnt/fs1/ost/6
        lustre_rmmod
        [ -f stop -o -f error ] && break
        [ $N -ge 25 ] && break
        echo "============================> loop $N"
        N=$((N+1))
done

I have attached a tarball containing the config parameters, the reproducer, and the files produced by the reproducer:
reproducer
config_parameters
mgs.dmesg.good, mgs.lustre.dk.good, oss.dmesg.good, oss.lustre.dk.good
mgs.dmesg.bad, mgs.lustre.dk.bad, oss.dmesg.bad, oss.lustre.dk.bad

I have tried the following patch, which skips the connection at INIT_RECOV_BACKUP if one already exists.
With this patch the "mount" no longer fails, but it's only a workaround and it does not solve the problem of double connection on MGS. Probably there is a missing serialisation/synchronisation.

--- a/lustre/mgc/mgc_request.c
+++ b/lustre/mgc/mgc_request.c
@@ -1029,6 +1029,7 @@ int mgc_set_info_async(const struct lu_e
                        ptlrpc_import_state_name(imp->imp_state));
                 /* Resurrect if we previously died */
                 if ((imp->imp_state != LUSTRE_IMP_FULL &&
+                     imp->imp_state != LUSTRE_IMP_CONNECTING &&
                      imp->imp_state != LUSTRE_IMP_NEW) || value > 1)
                         ptlrpc_reconnect_import(imp);
                 RETURN(0);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

reproduce.tarces.tar
26/Nov/13 1:52 PM
130 kB
Patrick Valentin

Issue Links

is duplicated by

LU-1279 failure trying to mount two targets at the same time after boot

Resolved

Mount sometimes fails with EIO on OSS with several mounts in parallel

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates