[LU-8255] LustreError: 38237:0:(file.c:3165:ll_inode_revalidate_fini()) nbp6: revalidate FID [0x20007200e:0x90d8:0x0] error: rc = -71 Created: 09/Jun/16 Updated: 29/Jun/17 Resolved: 29/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | nasf (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client is running lustre 2.7.1 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
For one particular user's on the clients we are getting lots of these error. Jun 9 15:23:12 r221i4s0 kernel: [1465510992.386829] LustreError: 11-0: nbp6-MDT0000-mdc-ffff8803058bb000: operation ldlm_enqueue to node 10.151.26.79@o2ib failed: rc = -71 Jun 9 15:23:12 r221i4s0 kernel: [1465510992.398829] LustreError: Skipped 5 previous similar messages Jun 9 15:23:12 r221i4s0 kernel: [1465510992.406829] LustreError: 74346:0:(file.c:3165:ll_inode_revalidate_fini()) nbp6: revalidate FID [0x200071fef:0x1fd19:0x0] error: rc = -71 Jun 9 15:23:12 r221i4s0 kernel: [1465510992.406829] LustreError: 74346:0:(file.c:3165:ll_inode_revalidate_fini()) Skipped 5 previous similar messages Jun 9 15:23:13 r154i0n0 kernel: [1465510993.479567] LustreError: 11-0: nbp6-MDT0000-mdc-ffff880302239800: operation ldlm_enqueue to node 10.151.26.79@o2ib failed: rc = -71 Jun 9 15:23:13 r154i0n0 kernel: [1465510993.491567] LustreError: Skipped 2 previous similar messages Jun 9 15:23:13 r154i0n0 kernel: [1465510993.495567] LustreError: 68877:0:(file.c:3165:ll_inode_revalidate_fini()) nbp6: revalidate FID [0x200072005:0x11ea5:0x0] error: rc = -71 Jun 9 15:23:13 r154i0n0 kernel: [1465510993.495567] LustreError: 68877:0:(file.c:3165:ll_inode_revalidate_fini()) Skipped 2 previous similar messages Jun 9 15:23:16 r221i3n1 kernel: [1465510996.818948] LustreError: 11-0: nbp6-MDT0000-mdc-ffff880302157000: operation ldlm_enqueue to node 10.151.26.79@o2ib failed: rc = -71 Jun 9 15:23:16 r221i3n1 kernel: [1465510996.830948] LustreError: 68219:0:(file.c:3165:ll_inode_revalidate_fini()) nbp6: revalidate FID [0x200071ef9:0x1dfce:0x0] er I will upload MDS side debug to ftp site. |
| Comments |
| Comment by Mahmoud Hanafi [ 09/Jun/16 ] |
|
uploaded logs to /uploads/ |
| Comment by Mahmoud Hanafi [ 09/Jun/16 ] |
|
Looks like the user's job is creating and deleting lots of files and directories. |
| Comment by Peter Jones [ 10/Jun/16 ] |
|
Fan Yong Could you please advise? Thanks Peter |
| Comment by nasf (Inactive) [ 12/Jun/16 ] |
|
The log is some huge (1.2 GB), but only contains some Lustre level debug like following:
That means the server returned protocol error when handle ldlm enqueue RPC from the client. But without detailed logs, we cannot exactly point out where is wrong. I have ever try to simulate the interoperability trouble (client b2_7, server b2_5) locally, but cannot reproduce it. So please enable -1 level debug log on both the client and the MDS for a short time, and try the failed operation again, then please collect the Lustre debug logs on both the client and the MDS, and attach them on this Jira ticket directly. Thanks! (note: to make the debug logs to be small, please run "lctl clear" on both the client and the MDS before the new try) |
| Comment by nasf (Inactive) [ 14/Jul/16 ] |
|
Any feedback? Thanks! |
| Comment by Mahmoud Hanafi [ 29/Jun/17 ] |
|
We can close this case |
| Comment by Peter Jones [ 29/Jun/17 ] |
|
ok Mahmoud |