Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8935

Lustre mount re-export via NFS long timeouts, not working with autofs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.8.0
    • None
    • CentOS 7.2
      Linux dmds1 3.10.0-327.3.1.el7_lustre.x86_64 #1 SMP Thu Feb 18 10:53:23 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

    • 2
    • 9223372036854775807

    Description

      We have two lustre clusters (all machines CentOS 7.2). Each of the clusters has a lustre filesystem mounted on a CentOS 7.2 server which exports the filesystem via NFS.

      The start of our issue may have coincided with a system downtime for hardware maintenance on the 'delta' cluster.

      The 'delta' cluster lustre filesystem will mount properly on the 'vanlustre4' NFS server, but clients intermittently cannot access the export via autofs mounts and if they can it is after long delays. Manually mounting the NFS export on a client machine will work but there is an incredibly long delay (~ 1-5 minutes), which I assume is why autofs has issues mounting.

      We also attempted to mount the 'delta' lustre filesystem on the 'vanlustre3' server which exports a lustre filesystem from the 'echo' cluster, but encounter the same long delays so it seems reasonable this is not an NFS server issue.

      On the MDS server for the 'delta' cluster 'dmds1' kernel logs I see reconnecting messages relating to the NFS server vanlustre4 (10.23.22.114). As well as LustreError ldlm_lib.c:3169:target_bulk_io(). I've attached the 'Lustre' logs from syslog.

      I've also attached an strace of a client machine manually mounting the 'delta' lustre filesystem export from 'vanlustre4'.

      On 'vanlustre4' messages shows multiple auth attempts from my client during the above manual mount which did eventually succeed:
      Dec 13 10:19:59 vanlustre4 rpc.mountd[2807]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
      Dec 13 10:20:04 vanlustre4 rpc.mountd[2805]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
      Dec 13 10:20:14 vanlustre4 rpc.mountd[2812]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
      Dec 13 10:20:34 vanlustre4 rpc.mountd[2808]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
      Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
      Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)

      Any suggestions on how to further troubleshoot this issue would be appreciated.

      Attachments

        1. 2016-12-20-syslog.log.vanlustre4
          474 kB
          Steve Dainard
        2. client.nfs.mount.strace
          60 kB
          Steve Dainard
        3. client.nfs.mount.strace.timestamps
          122 kB
          Steve Dainard
        4. dmds1.lustre.messages
          24 kB
          Steve Dainard
        5. lustre-debug-files.tar.gz
          11.36 MB
          Steve Dainard
        6. nfs.server.logs
          129 kB
          Steve Dainard
        7. nfs-server-strace-exportfs-a
          10 kB
          Steve Dainard

        Activity

          People

            green Oleg Drokin
            sdai Steve Dainard (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: