Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
Lustre 1.8.9
-
None
-
3
-
12462
Description
Hi,
We are seeing a lot of "timeouts" to the MDS on two round-robin clients/NFS exporters. The issue seems to be unique to going via NFS. I am aware that the "-43" error is related to UID/GID mismatches but I am almost certain that these are correctly configured to be the same everywhere. Even still should the Lustre client essentially disconnect and then return IO errors to the NFS clients for a short period of time if it can't match a UID/GID?
Jan 24 01:36:26 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43
Jan 24 01:36:26 lustre1 kernel: LustreError: Skipped 527 previous similar messages
Jan 24 01:36:26 lustre1 kernel: LustreError: 3174:0:(llite_nfs.c:276:ll_get_parent()) failure -43 inode 2524321369 get parent
Jan 24 01:36:26 lustre1 kernel: LustreError: 3174:0:(llite_nfs.c:276:ll_get_parent()) Skipped 58 previous similar messages
Jan 24 01:36:26 lustre1 kernel: nfsd: non-standard errno: -43
Jan 24 01:36:26 mds kernel: LustreError: 5942:0:(ldlm_lib.c:1921:target_send_reply_msg()) @@@ processing error (-43) req@ffff8104ce78b000 x1447826783625469/t0 o34->73d957f1-091b-9ffc-a5db-3402eba274ff@NET_0x200000a151615_UUID:0/0 lens 424/192 e 0 to 0 dl 1390527406 ref 1 fl Interpret:/0/0 rc -43/0
Jan 24 01:36:26 mds kernel: LustreError: 5942:0:(ldlm_lib.c:1921:target_send_reply_msg()) Skipped 319 previous similar messages
They occur at reasonably regular periods because we have an application that scans various directories every 5 mins over NFS. Looking at the occurrences across both Lustre clients/NFS exporters:
lustre1 /root # tail -f /var/log/messages | grep "10.21.22.10" Jan 30 12:01:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:06:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:11:00 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:16:02 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:21:01 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:31:32 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:51:14 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 Jan 30 12:56:13 lustre1 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr_lock operation failed with -43 lustre2 /root # tail -f /var/log/messages | grep "10.21.22.10" Jan 30 12:01:06 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:01:06 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:01:07 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:01:08 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:01:26 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:11:25 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:11:42 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:21:01 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:21:52 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43 Jan 30 12:37:03 lustre2 kernel: LustreError: 11-0: an error occurred while communicating with 10.21.22.10@tcp. The mds_getattr operation failed with -43
Am am reasonably sure that both the network layer and UID/GIDs are fine.
Regards,
Daire