[LU-6712] Hyperion IO error revalidate FID [0x20000040c:0x1f:0x0] error: rc = -5 Created: 11/Jun/15 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | Di Wang |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Hyerion SWL test |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Attempting to setup for SWL test - two clients out of the total have this issue: # ls /p/l_wham/white215/ ls: cannot access /p/l_wham/white215/mybob: Input/output error ls: cannot access /p/l_wham/white215/SWL: Input/output error SWL foo mybob Client errors: LustreError: 47371:0:(file.c:3081:ll_inode_revalidate_fini()) lustre: revalidate FID [0x20000040c:0x1f:0x0] error: rc = -5 LustreError: 47376:0:(file.c:3081:ll_inode_revalidate_fini()) lustre: revalidate FID [0x20000040c:0x1f:0x0] error: rc = -5 LustreError: 47376:0:(file.c:3081:ll_inode_revalidate_fini()) Skipped 1 previous similar message Lustre dump from one client attached. Does not appear to be errors on MDS |
| Comments |
| Comment by Di Wang [ 12/Jun/15 ] |
|
looks like FLD cache on the client has some problem Client allocate the FID at MDT000a 40000000:00000040:3.0:1434056723.090873:0:55019:0:(fid_request.c:382:seq_client_alloc_fid()) cli-cli-lustre-MDT000a-mdc-ffff88085864d400: Allocated FID [0x114000040e:0x14:0x0] Then in the following, it lookup this FID in FLD cache can get the FID in MDT000b 00800000:00000002:3.0:1434056723.090888:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #b for fid=[0x114000040e:0x14:0x0] Cliff: Could you please do lctl get_param fld.*MDT0000.fldb on MDT0000 and post the result here. Thanks. |
| Comment by Cliff White (Inactive) [ 12/Jun/15 ] |
|
results of lctl get_param fld.*MDT0000.fldb” on Hyperion MDT0 |
| Comment by Di Wang [ 13/Jun/15 ] |
|
According to the debug log and FLDB (MDT0), it is clearly the fldb cache on the client side is corrupted. Client side cache iwc34.fini.txt:00800000:00000002:3.0:1434056723.090390:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #0 for fid=[0x200000400:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090402:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #1 for fid=[0x1180000409:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090412:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #3 for fid=[0x1040000403:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090421:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #4 for fid=[0xf80000406:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090430:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #5 for fid=[0x1100000409:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090440:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #a for fid=[0xfc0000405:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090449:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #6 for fid=[0xf40000401:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090458:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #7 for fid=[0x11c0000402:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090467:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #9 for fid=[0x10c0000406:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090476:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #2 for fid=[0x1000000404:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090485:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #b for fid=[0x1140000407:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090494:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #8 for fid=[0x1080000402:0x84f:0x0] iwc34.fini.txt:00800000:00000002:3.0:1434056723.090888:0:55019:0:(lmv_fld.c:79:lmv_fld_lookup()) FLD lookup got mds #b for fid=[0x114000040e:0x14:0x0] On the server side [0x0000000f40000400-0x0000000f80000400):6:mdt [0x0000000f80000400-0x0000000fc0000400):1:mdt [0x0000000fc0000400-0x0000001000000400):7:mdt [0x0000001000000400-0x0000001040000400):2:mdt [0x0000001040000400-0x0000001080000400):8:mdt [0x0000001080000400-0x00000010c0000400):3:mdt [0x00000010c0000400-0x0000001100000400):9:mdt [0x0000001100000400-0x0000001140000400):4:mdt [0x0000001140000400-0x0000001180000400):a:mdt [0x0000001180000400-0x00000011c0000400):5:mdt [0x00000011c0000400-0x0000001200000400):b:mdt Unfortunately, I can not find the problem by checking the code. Cliff: Could you please tell me |
| Comment by Cliff White (Inactive) [ 15/Jun/15 ] |
|
Same data from all MDS |