Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None
Environment:
Lustre 2.4.1

Severity:
3
Rank (Obsolete):
11805

Description

On one of our test cluster installed with Lustre 2.4.1, we somtimes saw the following error message in the "shine" command line tool output, when starting a lustre file system, and the corresponding OST is not mounted:

mount.lustre: mount /dev/mapper/mpathj at /mnt/fs1/ost/6 failed: Input/output error
Is the MGS running?

The test file system is composed of 6 servers: one MDS (one MDT), 4 OSS (3 with 2 OSTs and one with 1 OST) and a separate MGS.
Configuration (see attached config_parameters file for details):
MGS: lama5 (failover lama6)
MDS: lama6 (failover lama5)
OSS: lama7 (failover lama8, lama9 and lama10) to lama10 (failover lama7, lama8 and lama9)

When the error occurs, we have the following lustre kernel traces on MGS:

MGS: Client <client_name> seen on new nid <nid2> when existing nid <nid1> is already connected
...
@@@ MGS fail to handle opc = 250: rc = -114
...

and on OSS:

InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r0:or0:NEW
...
InitRecov MGC10.3.0.10@o2ib 1/d0:i0:r1:or0:CONNECTING
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-5)
...
MGS: recovery started, waiting 100000 seconds
...
MGC10.3.0.10@o2ib: Communicating with 10.4.0.10@o2ib1, operation mgs_connect failed with -114
...
recovery of MGS on MGC10.3.0.10@o2ib_0 failed (-114)
MGS: recovery finished
...
fs1-OST0005: cannot register this server with the MGS: rc = -5. Is the MGS running?
...
Unable to start targets: -5
...
Unable to mount  (-5)

I was able to reproduce the error without shine and with only one OSS, with the script below.
The MGS (lama5) and MDS (lama6) are started/mounted, and the script is started on lama10.
If the tunefs.lustre or the lustre_rmmod is removed, or the first mount is started in foreground, the error does not occur.

N=1
rm -f error stop
while true; do
        tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
             "--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic1@o2ib0" \
             "--failnode=lama8-ic1@o2ib0" "--failnode=lama9-ic1@o2ib0" \
              --network=o2ib0 --writeconf /dev/ldn.cook.ost3 > /dev/null

        tunefs.lustre --erase-params --quiet "--mgsnode=lama5-ic1@o2ib0,lama5-ic2@o2ib1" \
             "--mgsnode=lama6-ic1@o2ib0,lama6-ic2@o2ib1" "--failnode=lama7-ic2@o2ib1" \
             "--failnode=lama8-ic2@o2ib1" "--failnode=lama9-ic2@o2ib1" \
             --network=o2ib1 --writeconf /dev/ldn.cook.ost6 > /dev/null

        modprobe fsfilt_ldiskfs
        modprobe lustre
        ssh lama5 lctl clear
        dmesg -c > /dev/null
        ssh lama5 dmesg -c > /dev/null
        (/bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost3 /mnt/fs1/ost/5 || touch error) &
        /bin/mount -t lustre -o errors=panic /dev/ldn.cook.ost6 /mnt/fs1/ost/6 || touch error
        wait
        if [ -f error ]; then
                lctl dk > oss.lustre.dk.bad
                ssh lama5 lctl dk > mgs.lustre.dk.bad
                dmesg > oss.dmesg.bad
                ssh lama5 dmesg > mgs.dmesg.bad
        else
                lctl dk > oss.lustre.dk.good
                ssh lama5 lctl dk > mgs.lustre.dk.good
                dmesg > oss.dmesg.good
                ssh lama5 dmesg > mgs.dmesg.good
        fi
        umount /mnt/fs1/ost/5
        umount /mnt/fs1/ost/6
        lustre_rmmod
        [ -f stop -o -f error ] && break
        [ $N -ge 25 ] && break
        echo "============================> loop $N"
        N=$((N+1))
done

I have attached a tarball containing the config parameters, the reproducer, and the files produced by the reproducer:
reproducer
config_parameters
mgs.dmesg.good, mgs.lustre.dk.good, oss.dmesg.good, oss.lustre.dk.good
mgs.dmesg.bad, mgs.lustre.dk.bad, oss.dmesg.bad, oss.lustre.dk.bad

I have tried the following patch, which skips the connection at INIT_RECOV_BACKUP if one already exists.
With this patch the "mount" no longer fails, but it's only a workaround and it does not solve the problem of double connection on MGS. Probably there is a missing serialisation/synchronisation.

--- a/lustre/mgc/mgc_request.c
+++ b/lustre/mgc/mgc_request.c
@@ -1029,6 +1029,7 @@ int mgc_set_info_async(const struct lu_e
                        ptlrpc_import_state_name(imp->imp_state));
                 /* Resurrect if we previously died */
                 if ((imp->imp_state != LUSTRE_IMP_FULL &&
+                     imp->imp_state != LUSTRE_IMP_CONNECTING &&
                      imp->imp_state != LUSTRE_IMP_NEW) || value > 1)
                         ptlrpc_reconnect_import(imp);
                 RETURN(0);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

reproduce.tarces.tar
26/Nov/13 1:52 PM
130 kB
Patrick Valentin

Issue Links

is duplicated by

LU-1279 failure trying to mount two targets at the same time after boot

Resolved

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Patrick Valentin (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 26/Nov/13 1:49 PM

Updated:: 27/Nov/13 7:13 PM

Resolved:: 27/Nov/13 7:13 PM