[LU-17170] Likely at unlink: many LustreError: mdt_open.c:1217:mdt_cross_open() fsname-MDTxxxx: [FID] doesn't exist!: rc = -14 Created: 05/Oct/23 Updated: 24/Oct/23 Resolved: 06/Oct/23 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Stephane Thiell | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.9 kernel 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
With 2.15.3 on Sherlock's scratch filesystem (Fir), we are seeing a LOT of the following messages on all four MDTs when files are being purged by Robinhood: # clush -w @mds -L "journalctl -n 10 -k | grep LustreError" fir-md1-s1: Oct 05 15:30:56 fir-md1-s1 kernel: LustreError: 32843:0:(mdt_open.c:1570:mdt_reint_open()) fir-MDT0000: name '[0x20005b5ae:0x14198:0x0]' present, but FID [0x20005b5ae:0x14198:0x0] is invalid fir-md1-s1: Oct 05 15:31:45 fir-md1-s1 kernel: LustreError: 51313:0:(mdt_open.c:1570:mdt_reint_open()) fir-MDT0000: name '[0x20005b5b5:0x1dd34:0x0]' present, but FID [0x20005b5b5:0x1dd34:0x0] is invalid fir-md1-s1: Oct 05 15:33:22 fir-md1-s1 kernel: LustreError: 32959:0:(mdt_open.c:1570:mdt_reint_open()) fir-MDT0000: name '[0x20005b5cf:0xff79:0x0]' present, but FID [0x20005b5cf:0xff79:0x0] is invalid fir-md1-s2: Oct 05 15:35:57 fir-md1-s2 kernel: LustreError: 125135:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0001: [0x24007e440:0x83be:0x0] doesn't exist!: rc = -14 fir-md1-s2: Oct 05 15:35:57 fir-md1-s2 kernel: LustreError: 125135:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 605 previous similar messages fir-md1-s2: Oct 05 15:36:06 fir-md1-s2 kernel: LustreError: 125409:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0001: [0x24007e440:0x88ad:0x0] doesn't exist!: rc = -14 fir-md1-s2: Oct 05 15:36:06 fir-md1-s2 kernel: LustreError: 125409:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 1256 previous similar messages fir-md1-s2: Oct 05 15:36:25 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0001: [0x24007e440:0x92bd:0x0] doesn't exist!: rc = -14 fir-md1-s2: Oct 05 15:36:25 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 3743 previous similar messages fir-md1-s2: Oct 05 15:37:03 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0001: [0x24007e50d:0x15e22:0x0] doesn't exist!: rc = -14 fir-md1-s2: Oct 05 15:37:03 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 8438 previous similar messages fir-md1-s2: Oct 05 15:38:18 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0001: [0x24007e50e:0x13804:0x0] doesn't exist!: rc = -14 fir-md1-s2: Oct 05 15:38:18 fir-md1-s2 kernel: LustreError: 125341:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 16783 previous similar messages fir-md1-s3: Oct 05 15:01:52 fir-md1-s3 kernel: LustreError: 14993:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0002: [0x2c006c67d:0x2a0b:0x0] doesn't exist!: rc = -14 fir-md1-s3: Oct 05 15:01:52 fir-md1-s3 kernel: LustreError: 14993:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 18907 previous similar messages fir-md1-s3: Oct 05 15:17:31 fir-md1-s3 kernel: LustreError: 12198:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0002: [0x2c006c67d:0x2950:0x0] doesn't exist!: rc = -14 fir-md1-s3: Oct 05 15:17:31 fir-md1-s3 kernel: LustreError: 12198:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 19208 previous similar messages fir-md1-s3: Oct 05 15:46:14 fir-md1-s3 kernel: LustreError: 65665:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0002: [0x2c006c606:0x524d:0x0] doesn't exist!: rc = -14 fir-md1-s3: Oct 05 15:46:14 fir-md1-s3 kernel: LustreError: 65665:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 49094 previous similar messages fir-md1-s3: Oct 05 15:47:29 fir-md1-s3 kernel: LustreError: 12352:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0002: [0x2c006c65c:0x145df:0x0] doesn't exist!: rc = -14 fir-md1-s3: Oct 05 15:47:29 fir-md1-s3 kernel: LustreError: 12352:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 12772 previous similar messages fir-md1-s3: Oct 05 15:49:59 fir-md1-s3 kernel: LustreError: 14987:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0002: [0x2c006c710:0x15304:0x0] doesn't exist!: rc = -14 fir-md1-s3: Oct 05 15:49:59 fir-md1-s3 kernel: LustreError: 14987:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 32807 previous similar messages fir-md1-s4: Oct 05 15:39:54 fir-md1-s4 kernel: LustreError: 23103:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0003: [0x280067e5f:0x1c1ab:0x0] doesn't exist!: rc = -14 fir-md1-s4: Oct 05 15:39:54 fir-md1-s4 kernel: LustreError: 23103:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 19686 previous similar messages fir-md1-s4: Oct 05 15:40:10 fir-md1-s4 kernel: LustreError: 23395:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0003: [0x28006d889:0x18767:0x0] doesn't exist!: rc = -14 fir-md1-s4: Oct 05 15:40:10 fir-md1-s4 kernel: LustreError: 23395:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 2687 previous similar messages fir-md1-s4: Oct 05 15:40:42 fir-md1-s4 kernel: LustreError: 23445:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0003: [0x28006d889:0x195c0:0x0] doesn't exist!: rc = -14 fir-md1-s4: Oct 05 15:40:42 fir-md1-s4 kernel: LustreError: 23445:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 6453 previous similar messages fir-md1-s4: Oct 05 15:41:46 fir-md1-s4 kernel: LustreError: 23017:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0003: [0x28006d889:0x1cf16:0x0] doesn't exist!: rc = -14 fir-md1-s4: Oct 05 15:41:46 fir-md1-s4 kernel: LustreError: 23017:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 15651 previous similar messages fir-md1-s4: Oct 05 15:43:54 fir-md1-s4 kernel: LustreError: 23367:0:(mdt_open.c:1217:mdt_cross_open()) fir-MDT0003: [0x28006daa0:0xd855:0x0] doesn't exist!: rc = -14 fir-md1-s4: Oct 05 15:43:54 fir-md1-s4 kernel: LustreError: 23367:0:(mdt_open.c:1217:mdt_cross_open()) Skipped 23918 previous similar messages However, these errors seem to be harmless, at least we have not been able to find any problem so far. We have verified that those FIDs are files being automatically unlinked by Robinhood (we purge after 90 days) and the LustreError are happening at the same second than the unlink.
|
| Comments |
| Comment by Stephane Thiell [ 06/Oct/23 ] |
|
I am going to close this, as it is not a Lustre issue. We had a misconfiguration where multiple Robinhood instances where not distributed correctly and were deleting the same set of files at the same time (at a very high rate). Lustre was a bit verbose in that case but reported a useful information. Accessing deleted files by FID returns "Bad address" (-14) and not "Not such file or directory" (-2) when accessed by FID as root (the program that we use with Robinhood does that). [root@fir-rbh06 robinhood]# cat '/fir/.lustre/fid/[0x28006db3c:0x9bbb:0x0]' cat: /fir/.lustre/fid/[0x28006db3c:0x9bbb:0x0]: Bad address |
| Comment by Gerrit Updater [ 11/Oct/23 ] |
|
"Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52630 |
| Comment by Sergey Cheremencev [ 24/Oct/23 ] |
Placed here accidentally. The patch is aimed for LU-17179. |