Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
Client: lustre-modules-2.1.1-13chaos_2.6.32_220.17.1.3chaos.ch5.x86_64.x86_64
Server: lustre-modules-2.1.1-4chaos_2.6.32_220.7.1.7chaos.ch5.x86_64.x86_64
-
3
-
3982
Description
We're currently seeing a user's reads and writes failing with -13 (-EACCES) errors. The errors are coming from a set of clients from a single cluster, but are using multiple different filesystems. From what I can tell, the -EACCES is coming from this part of the server code:
filter_capa.c:
138 if (capa == NULL) { 139 if (fid) 140 CERROR("seq/fid/opc "LPU64"/"DFID"/"LPX64 141 ": no capability has been passed\n", 142 seq, PFID(fid), opc); 143 else 144 CERROR("seq/opc "LPU64"/"LPX64 145 ": no capability has been passed\n", 146 seq, opc); 147 RETURN(-EACCES); 148 }
The message on the client is:
Jul 3 13:26:50 ansel242 kernel: LustreError: 11-0: lsc-OST00b4-osc-ffff8806244c3800: Communicating with 172.19.1.113@o2ib100, operation ost_read failed with -13. Jul 3 13:26:50 ansel242 kernel: LustreError: Skipped 3495061 previous similar messages
And there are corresponding messages on the server:
Jul 3 13:26:51 sumom13 kernel: LustreError: 24607:0:(filter_capa.c:146:filter_auth_capa()) seq/opc 0/0x40: no capability has been passed Jul 3 13:26:51 sumom13 kernel: LustreError: 24607:0:(filter_capa.c:146:filter_auth_capa()) Skipped 3495057 previous similar messages
It appears the for each "ost_
{read|write}failed" message on the client, there is a "no capability" message on the server.
I'm unsure why the capability isn't being set by the client, but it seems that is causing the -EACCES error to get propagated to the clients.
Lustre versions:
Client: lustre-modules-2.1.1-13chaos_2.6.32_220.17.1.3chaos.ch5.x86_64.x86_64
Server: lustre-modules-2.1.1-4chaos_2.6.32_220.7.1.7chaos.ch5.x86_64.x86_64
Attachments
Issue Links
- is related to
-
LU-1621 Disable lustre capa by force
- Resolved