[LU-4475] mount command errors: "Communicating with 0@lo, operation mds_connect failed with -11" AND "Transport endpoint is not connected" Created: 11/Jan/14  Updated: 09/Jun/16

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Story Priority: Minor
Reporter: Mark Duffield Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: llnl
Environment:
  1. uname -a
    Linux es0 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
  1. rpm -qa | egrep "lustre|e2fs" | sort
    e2fsprogs-1.42.7.wc2-7.el6.x86_64
    e2fsprogs-libs-1.42.7.wc2-7.el6.x86_64
    kernel-2.6.32-358.23.2.el6_lustre.x86_64
    kernel-firmware-2.6.32-358.23.2.el6_lustre.x86_64
    lustre-2.4.2-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
    lustre-ldiskfs-4.1.0-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
    lustre-modules-2.4.2-2.6.32_358.23.2.el6_lustre.x86_64.x86_64
    lustre-osd-ldiskfs-2.4.2-2.6.32_358.23.2.el6_lustre.x86_64.x86_64

Issue Links:
Related
Rank (Obsolete): 12257

 Description   

I created the mgs/mdt:

mkfs.lustre --fsname=lfs1 --mgs --mdt --index=0 /dev/vg_root/es0-00

and the ost, on another node:

mkfs.lustre --fsname=lfs1 --mgsnode=172.18.54.21@tcp0 --ost --index=0 /dev/vg_root/es2-00

When mounting either I receive a comm error.

When mounting the ost I see "Transport endpoint is not connected":

# mount -vvv -t lustre /dev/dm-3 /mnt/ost0
mount: fstab path: "/etc/fstab"
mount: mtab path:  "/etc/mtab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: UID:        0
mount: eUID:       0
mount: spec:  "/dev/mapper/vg_root-es2--00"
mount: node:  "/mnt/ost0"
mount: types: "lustre"
mount: opts:  "(null)"
final mount options: '(null)'
mount: external mount: argv[0] = "/sbin/mount.lustre"
mount: external mount: argv[1] = "/dev/mapper/vg_root-es2--00"
mount: external mount: argv[2] = "/mnt/ost0"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw"
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = /dev/mapper/vg_root-es2--00
arg[5] = /mnt/ost0
source = /dev/mapper/vg_root-es2--00 (/dev/mapper/vg_root-es2--00), target = /mnt/ost0
options = rw
checking for existing Lustre data: found
Reading CONFIGS/mountdata
mounting device /dev/mapper/vg_root-es2--00 at /mnt/ost0, flags=0x1000000 options=osd=osd-ldiskfs,errors=remount-ro,mgsnode=172.18.54.21@tcp,virgin,param=mgsnode=172.18.54.21@tcp,svname=lfs1-OST0000,device=/dev/mapper/vg_root-es2--00
mount.lustre: mount /dev/mapper/vg_root-es2--00 at /mnt/ost0 failed: Transport endpoint is not connected retries left: 0
mount.lustre: mount /dev/mapper/vg_root-es2--00 at /mnt/ost0 failed: Transport endpoint is not connected

And when mounting the mgs/mdt I see "Communicating with 0@lo, operation mds_connect failed with -11":

Jan 11 11:14:30 es0 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: 
Jan 11 11:14:30 es0 kernel: Lustre: lfs1-MDT0000: used disk, loading
Jan 11 11:14:30 es0 kernel: LustreError: 11-0: lfs1-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.

The communication looks fine between nodes:

From es0:

[root@es0 log]# lctl
lctl > ping es2
12345-0@lo
12345-172.18.54.23@tcp

From es2:

[root@es2 log]# lctl
lctl > ping es0
12345-0@lo
12345-172.18.54.21@tcp


 Comments   
Comment by Brian Behlendorf [ 05/Jun/14 ]

I'm able consistently reproduce this with 2.4.2 and just the llmount.sh script. I haven't had a chance yet to investigate further.

FSTYPE=zfs /usr/lib64/lustre/tests/llmount.sh

dmesg output

Lustre: Lustre: Build Version: 2.4.2-7behlendorf-7behlendorf-1-PRISTINE-2.6.32-431.17.1.el6.x86_64
LNet: Added LNI 192.168.2.117@tcp [8/256/0/180]
LNet: Accept secure, port 988
Lustre: Echo OBD driver; http://www.lustre.org/
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripecount in log lustre-MDT0000
Lustre: Skipped 1 previous similar message
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
Lustre: lustre-MDT0000: Initializing new disk
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
SELinux: (dev lustre, type lustre) has no xattr support
Lustre: Failing over lustre-MDT0000
Lustre: server umount lustre-MDT0000 complete
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Skipped 2 previous similar messages
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
Lustre: Skipped 1 previous similar message
Lustre: srv-lustre-MDT0000: No data found on store. Initialize space
Lustre: lustre-MDT0000: Initializing new disk
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
SELinux: (dev lustre, type lustre) has no xattr support
Lustre: Failing over lustre-MDT0000
Lustre: server umount lustre-MDT0000 complete
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre' ' /proc/mounts); if [ $running -ne 0 ] ; then echo Stopping client $(hostname) /mnt/lustre opts:; lsof /mnt/lustre || need_kill=no; if [ x != x -a x$need_kill != xno ]; then pids=$(lsof -t /mnt/lustre | sort -u); if [ -n "$p
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre2' ' /proc/mounts); if [ $running -ne 0 ] ; then echo Stopping client $(hostname) /mnt/lustre2 opts:; lsof /mnt/lustre2 || need_kill=no; if [ x != x -a x$need_kill != xno ]; then pids=$(lsof -t /mnt/lustre2 | sort -u); if [ -n
Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || grep -q ^lustre-mdt1/ /proc/mounts || zpool export lustre-mdt1
Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 || grep -q ^lustre-ost1/ /proc/mounts || zpool export lustre-ost1
Lustre: DEBUG MARKER: grep -c /mnt/ost2' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost2 >/dev/null 2>&1 || grep -q ^lustre-ost2/ /proc/mounts || zpool export lustre-ost2
Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || grep -q ^lustre-mdt1/ /proc/mounts || zpool export lustre-mdt1
Lustre: DEBUG MARKER: mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=zfs --device-size=200000 --reformat lustre-mdt1/mdt1 /tmp/lu
Lustre: DEBUG MARKER: zpool set cachefile=none lustre-mdt1
Lustre: DEBUG MARKER: ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || grep -q ^lustre-mdt1/ /proc/mounts || zpool export lustre-mdt1
Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 || grep -q ^lustre-ost1/ /proc/mounts || zpool export lustre-ost1
Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=ovirt-guest-241@tcp --fsname=lustre --ost --index=0 --param=sys.timeout=20 --backfstype=zfs --device-size=200000 --reformat lustre-ost1/ost1 /tmp/lustre-ost1
Lustre: DEBUG MARKER: zpool set cachefile=none lustre-ost1
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 || grep -q ^lustre-ost1/ /proc/mounts || zpool export lustre-ost1
Lustre: DEBUG MARKER: grep -c /mnt/ost2' ' /proc/mounts
Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost2 >/dev/null 2>&1 || grep -q ^lustre-ost2/ /proc/mounts || zpool export lustre-ost2
Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=ovirt-guest-241@tcp --fsname=lustre --ost --index=1 --param=sys.timeout=20 --backfstype=zfs --device-size=200000 --reformat lustre-ost2/ost2 /tmp/lustre-ost2
Lustre: DEBUG MARKER: zpool set cachefile=none lustre-ost2
Lustre: DEBUG MARKER: ! zpool list -H lustre-ost2 >/dev/null 2>&1 || grep -q ^lustre-ost2/ /proc/mounts || zpool export lustre-ost2
Lustre: DEBUG MARKER: running=$(grep -c /mnt/ost1' ' /proc/mounts); mpts=$(mount | grep -c /mnt/ost1' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: running=$(grep -c /mnt/ost2' ' /proc/mounts); mpts=$(mount | grep -c /mnt/ost2' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: running=$(grep -c /mnt/mds1' ' /proc/mounts); mpts=$(mount | grep -c /mnt/mds1' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: running=$(grep -c /mnt/mds1' ' /proc/mounts); mpts=$(mount | grep -c /mnt/mds1' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre' ' /proc/mounts); mpts=$(mount | grep -c /mnt/lustre' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre2' ' /proc/mounts); mpts=$(mount | grep -c /mnt/lustre2' '); if [ $running -ne $mpts ]; then echo $(hostname) env are INSANE!; exit 1; fi
Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
Lustre: DEBUG MARKER: zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool import -f -o cachefile=none -d /tmp lustre-mdt1
Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre lustre-mdt1/mdt1 /mnt/mds1
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: Skipped 4 previous similar messages
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
Lustre: lustre-MDT0000: Initializing new disk
LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
SELinux: (dev lustre, type lustre) has no xattr support
Lustre: Failing over lustre-MDT0000
Lustre: server umount lustre-MDT0000 complete
Generated at Sat Feb 10 01:43:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.