[LU-11857] repeated "could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2" messages Created: 14/Jan/19 Updated: 16/Feb/19 Resolved: 16/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jeff Johnson (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
After upgrade from 2.9 to 2.12 the MDS syslog gets beaucoup error messages lfs-MDD0000 cannot delete orphan messages. On the active MDS the orph_lfs-MDD00 process is pegged at 100% [Sat Jan 12 23:21:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2 [Sat Jan 12 23:21:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) Skipped 8067628 previous similar messages [Sat Jan 12 23:31:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2 [Sat Jan 12 23:31:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) Skipped 7958773 previous similar messages [Sat Jan 12 23:41:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2 |
| Comments |
| Comment by Peter Jones [ 14/Jan/19 ] |
|
Alex Could you please investigate? Thanks Peter |
| Comment by Jeff Johnson (Inactive) [ 14/Jan/19 ] |
|
Update: Performed a full shutdown of the file system and all server systems. Rebooted and performed orderly start of the file system. The reported orph_lfs-MDD00 thread at 100% CPU and syslog entries of LustreError: 109872:0:(mdd_orphans.c:327:mdd_orphan_destroy()) ls15-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0] continue to occur. After 15 minutes of wall time with the file system targets mounted the errors persist. Let me know if you want me to upload another dump of the debug buffer. |
| Comment by Jeff Johnson (Inactive) [ 14/Jan/19 ] |
|
In 13 minutes the mdd_orphans.c:327:mdd_orphan_destroy have incremented quantity from 1979094 to 8056841 (6077747 syslog msg repeats in 13 minutes).
|
| Comment by Jeff Johnson (Inactive) [ 15/Jan/19 ] |
|
After seven hours the orph_lfs-MDD00 thread still pegged at 100% Attaching orph_100pct_201901142137.txt.bz2 |
| Comment by Alex Zhuravlev [ 15/Jan/19 ] |
|
I think https://review.whamcloud.com/#/c/33661/ should help |
| Comment by Alex Zhuravlev [ 16/Jan/19 ] |
|
aeonjeffjcan you please try https://review.whamcloud.com/#/c/33661/ ? |
| Comment by Jeff Johnson (Inactive) [ 16/Jan/19 ] |
|
I can. This is a production LFS. Given that there is data in place, should I? Not arguing, just applying caution and respect for end user's data. |
| Comment by Andreas Dilger [ 17/Jan/19 ] |
|
The patch has passed review and testing and is scheduled to land in 2.13 shortly. This should avoid the repeated attempts to destroy the same object. |
| Comment by Peter Jones [ 27/Jan/19 ] |
|
So, it seems like this is believed to be a duplicate of the recently landed |
| Comment by Peter Jones [ 16/Feb/19 ] |
|
No objections it seems |