[LU-8935] Lustre mount re-export via NFS long timeouts, not working with autofs - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0
Affects Version/s: Lustre 2.8.0
Labels:
None
Environment:
CentOS 7.2
Linux dmds1 3.10.0-327.3.1.el7_lustre.x86_64 #1 SMP Thu Feb 18 10:53:23 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Epic/Theme:
- Lustre-2.8.0
Severity:
2
Rank (Obsolete):
9223372036854775807

Description

We have two lustre clusters (all machines CentOS 7.2). Each of the clusters has a lustre filesystem mounted on a CentOS 7.2 server which exports the filesystem via NFS.

The start of our issue may have coincided with a system downtime for hardware maintenance on the 'delta' cluster.

The 'delta' cluster lustre filesystem will mount properly on the 'vanlustre4' NFS server, but clients intermittently cannot access the export via autofs mounts and if they can it is after long delays. Manually mounting the NFS export on a client machine will work but there is an incredibly long delay (~ 1-5 minutes), which I assume is why autofs has issues mounting.

We also attempted to mount the 'delta' lustre filesystem on the 'vanlustre3' server which exports a lustre filesystem from the 'echo' cluster, but encounter the same long delays so it seems reasonable this is not an NFS server issue.

On the MDS server for the 'delta' cluster 'dmds1' kernel logs I see reconnecting messages relating to the NFS server vanlustre4 (10.23.22.114). As well as LustreError ldlm_lib.c:3169:target_bulk_io(). I've attached the 'Lustre' logs from syslog.

I've also attached an strace of a client machine manually mounting the 'delta' lustre filesystem export from 'vanlustre4'.

On 'vanlustre4' messages shows multiple auth attempts from my client during the above manual mount which did eventually succeed:
Dec 13 10:19:59 vanlustre4 rpc.mountd[2807]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:04 vanlustre4 rpc.mountd[2805]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:14 vanlustre4 rpc.mountd[2812]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:34 vanlustre4 rpc.mountd[2808]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)

Any suggestions on how to further troubleshoot this issue would be appreciated.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

2016-12-20-syslog.log.vanlustre4
474 kB
20/Dec/16 6:14 PM
client.nfs.mount.strace
60 kB
13/Dec/16 6:25 PM
client.nfs.mount.strace.timestamps
122 kB
14/Dec/16 10:20 PM
dmds1.lustre.messages
24 kB
13/Dec/16 6:12 PM
lustre-debug-files.tar.gz
11.36 MB
19/Dec/16 6:55 PM
nfs.server.logs
129 kB
14/Dec/16 8:34 PM
nfs-server-strace-exportfs-a
10 kB
15/Dec/16 7:39 PM

Activity

People

Assignee:: Oleg Drokin

Reporter:: Steve Dainard (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Dec/16 6:34 PM

Updated:: 22/Aug/17 3:23 PM

Resolved:: 29/Jul/17 1:36 PM