[LU-393] 1.6<->1.8 interop: 1.8.5.56 client failed to connect to 1.6.7.2 server Created: 05/Jun/11  Updated: 25/Apr/13  Resolved: 25/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Lustre Client:
Version: 1.8.5.56
Distro/Arch: RHEL6.0/x86_64 (kernel version: 2.6.32-71.18.2.el6)
Network: IB (in-kernel OFED)
Node: client-12-ib

Lustre Servers:
Version: 1.6.7.2
Distro/Arch: CentOS5.6/x86_64 (kernel version: 2.6.18-92.1.26.el5_lustre.1.6.7.2smp)
Network: IB (in-kernel OFED)
Nodes: fat-amd-1-ib (MDS), fat-amd-[2,3]-ib (OSSs)


Severity: 3
Rank (Obsolete): 7877

 Description   

While mounting Lustre 1.8.5.56 client on node client-12-ib, it failed to connect to the Lustre 1.6.7.2 MDS server (fat-amd-1-ib) as follows:

[root@client-12-ib ~]# mount -t lustre -o user_xattr,acl,flock fat-amd-1-ib@o2ib:/lustre /mnt/lustre
mount.lustre: mount fat-amd-1-ib@o2ib:/lustre at /mnt/lustre failed: Cannot send after transport endpoint shutdown

Dmesg on client-12-ib showed that:

[root@client-12-ib ~]# dmesg
Lustre: OBD class driver, http://www.lustre.org/
Lustre:     Lustre Version: 1.8.5.56
Lustre:     Build Version: 1.8.5.56-20110528075626-PRISTINE-2.6.32-71.18.2.el6.x86_64
Lustre: Listener bound to ib0:192.168.4.12:987:mlx4_0
Lustre: Register global MR array, MR size: 0xffffffffffffffff, array size: 1
Lustre: Added LNI 192.168.4.12@o2ib [8/64/0/180]
Lustre: Lustre Client File System; http://www.lustre.org/
LustreError: 152-6: Ignoring deprecated mount option 'acl'.
Lustre: 27162:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1370764878020609 sent from MGC192.168.4.132@o2ib to NID 192.168.4.132@o2ib 5s ago has timed out (5s prior to deadline).
  req@ffff88031fb81400 x1370764878020609/t0 o250->MGS@MGC192.168.4.132@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1307263263 ref 2 fl Rpc:N/0/0 rc 0/0
LustreError: 27150:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff88031fb81000 x1370764878020611/t0 o501->MGS@MGC192.168.4.132@o2ib_0:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 15c-8: MGC192.168.4.132@o2ib: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 27150:0:(llite_lib.c:1090:ll_fill_super()) Unable to process log: -108
Lustre: client ffff880329446800 umount complete
LustreError: 27150:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount  (-108)
[root@client-12-ib ~]# ping fat-amd-1-ib
PING fat-amd-1-ib (192.168.4.132) 56(84) bytes of data.
64 bytes from fat-amd-1-ib (192.168.4.132): icmp_seq=1 ttl=64 time=0.730 ms
64 bytes from fat-amd-1-ib (192.168.4.132): icmp_seq=2 ttl=64 time=0.119 ms
64 bytes from fat-amd-1-ib (192.168.4.132): icmp_seq=3 ttl=64 time=0.122 ms
64 bytes from fat-amd-1-ib (192.168.4.132): icmp_seq=4 ttl=64 time=0.117 ms
^C
--- fat-amd-1-ib ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3917ms
rtt min/avg/max/mdev = 0.117/0.272/0.730/0.264 ms

[root@client-12-ib ~]# lctl ping fat-amd-1-ib@o2ib
Can't parse process id "fat-amd-1-ib@o2ib"

[root@fat-amd-1-ib ~]# lctl list_nids
192.168.4.132@o2ib

[root@client-12-ib ~]# lctl ping 192.168.4.132@o2ib
failed to ping 192.168.4.132@o2ib: Input/output error

[root@client-12-ib ~]# lctl list_nids
192.168.4.12@o2ib

[root@fat-amd-1-ib ~]# ping client-12-ib
PING client-12-ib (192.168.4.12) 56(84) bytes of data.
64 bytes from client-12-ib (192.168.4.12): icmp_seq=1 ttl=64 time=1.66 ms
64 bytes from client-12-ib (192.168.4.12): icmp_seq=2 ttl=64 time=0.119 ms
64 bytes from client-12-ib (192.168.4.12): icmp_seq=3 ttl=64 time=0.124 ms

--- client-12-ib ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.119/0.635/1.662/0.726 ms
 
[root@fat-amd-1-ib ~]# lctl ping 192.168.4.12@o2ib
failed to ping 192.168.4.12@o2ib: Protocol error

[root@fat-amd-1-ib ~]# dmesg
LustreError: 14950:0:(api-ni.c:1732:lnet_ping()) 12345-192.168.4.12@o2ib: Unexpected version 0x2



 Comments   
Comment by Andreas Dilger [ 25/Apr/13 ]

Not fixing 1.6 interop issues at this point.

Generated at Sat Feb 10 01:06:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.