Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4567

NFS exports - The mds_getattr operation failed with -43

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 1.8.9
    • None
    • 3
    • 12462

    Description

      Hi,

      We are seeing a lot of "timeouts" to the MDS on two round-robin clients/NFS exporters. The issue seems to be unique to going via NFS. I am aware that the "-43" error is related to UID/GID mismatches but I am almost certain that these are correctly configured to be the same everywhere. Even still should the Lustre client essentially disconnect and then return IO errors to the NFS clients for a short period of time if it can't match a UID/GID?

      Jan 24 01:36:26 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 24 01:36:26 lustre1 kernel: LustreError: Skipped 527 previous similar messages
      Jan 24 01:36:26 lustre1 kernel: LustreError: 3174:0:(llite_nfs.c:276:ll_get_parent()) failure -43 inode 2524321369 get parent
      Jan 24 01:36:26 lustre1 kernel: LustreError: 3174:0:(llite_nfs.c:276:ll_get_parent()) Skipped 58 previous similar messages
      Jan 24 01:36:26 lustre1 kernel: nfsd: non-standard errno: -43
      
      Jan 24 01:36:26 mds kernel: LustreError: 5942:0:(ldlm_lib.c:1921:target_send_reply_msg()) @@@ processing error (-43)  req@ffff8104ce78b000 x1447826783625469/t0 o34->73d957f1-091b-9ffc-a5db-3402eba274ff@NET_0x200000a151615_UUID:0/0 lens 424/192 e 0 to 0 dl 1390527406 ref 1 fl Interpret:/0/0 rc -43/0
      Jan 24 01:36:26 mds kernel: LustreError: 5942:0:(ldlm_lib.c:1921:target_send_reply_msg()) Skipped 319 previous similar messages
      

      They occur at reasonably regular periods because we have an application that scans various directories every 5 mins over NFS. Looking at the occurrences across both Lustre clients/NFS exporters:

      lustre1 /root # tail -f /var/log/messages | grep "10.21.22.10"
      Jan 30 12:01:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:06:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:11:00 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:16:02 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:21:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:31:32 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:51:14 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      Jan 30 12:56:13 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
      
      lustre2 /root # tail -f /var/log/messages | grep "10.21.22.10"
      Jan 30 12:01:06 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:01:06 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:01:07 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:01:08 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:01:26 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:11:25 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:11:42 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:21:01 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:21:52 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      Jan 30 12:37:03 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
      

      Am am reasonably sure that both the network layer and UID/GIDs are fine.

      Regards,

      Daire

      Attachments

        Activity

          People

            pjones Peter Jones
            daire Daire Byrne (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: