Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
3
-
15405
Description
when we simulate message drop for portal 17 (LDLM_CANCEL_REQUEST_PORTAL) and portal 18 (LDLM_CANCEL_REPLY_PORTAL), I saw this failure on client and application failed.
LustreError: 13507:0:(layout.c:2042:__req_capsule_get()) @@@ Wrong buffer for field `dlm_rep' (1 of 1) in format `LDLM_INTENT_GETATTR': 0 vs. 112 (server) req@ffff880ecfc8c000 x1476486707242176/t0(0) o101->soaked-MDT0000-mdc-ffff881029559c00@192.168.1.108@o2ib:12/10 lens 576/192 e 0 to 0 dl 1408692442 ref 1 fl Complete:R/2/0 rc 0/0 LustreError: 13507:0:(file.c:3238:ll_inode_revalidate_fini()) soaked: revalidate FID [0x3800004c4:0x79:0x0] error: rc = -71 LustreError: 11-0: soaked-MDT0000-mdc-ffff881029559c00: Communicating with 192.168.1.108@o2ib, operation mds_reint failed with -107. Lustre: soaked-MDT0000-mdc-ffff881029559c00: Connection to soaked-MDT0000 (at 192.168.1.108@o2ib) was lost; in progress operations using this service will wait for recovery to complete LustreError: 167-0: soaked-MDT0000-mdc-ffff881029559c00: This client was evicted by soaked-MDT0000; in progress operations using this service will fail. LustreError: 13506:0:(llite_lib.c:1522:ll_md_setattr()) md_setattr fails: rc = -5 LustreError: 13508:0:(llite_lib.c:1522:ll_md_setattr()) md_setattr fails: rc = -108 LustreError: 13509:0:(llite_lib.c:1522:ll_md_setattr()) md_setattr fails: rc = -108
Application is quite simple, just a MPI program which repeats to fstat and fchmod on two nodes.