[LU-11260] Kernel NULL pointer: osd_iam_lfix.c:190:iam_lfix_init()) Wrong magic in node Created: 16/Aug/18  Updated: 01/Apr/20  Resolved: 01/Apr/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Jian Yu
Resolution: Incomplete Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Per Andreas, crash while attempting to mount MDT with corrupt OI file:

It looks like the 32927 patch worked as intended, and the problematic LASSERT was replaced with an error message:

[  243.128917] LustreError: 2994:0:(llog_osd.c:792:llog_osd_next_block()) proj-MDT0000-osd: invalid llog tail at log id 0x14:1/0 offset 16384
[  243.142871] LustreError: 2994:0:(osp_sync.c:1287:osp_sync_thread()) proj-OST000a-osc-MDT0000: llog process with osp_sync_process_queues failed: -22
[  243.157690] LustreError: 2994:0:(osp_sync.c:1287:osp_sync_thread()) Skipped 1 previous similar message
[  243.181463] LustreError: 3016:0:(llog_osd.c:780:llog_osd_next_block()) proj-MDT0000-osd: invalid llog tail at log id 0x2a:1/0 offset 16384 last_rec idx 4294936591 tail idx 0

Now the MDS is crashing in OI Scrub, due to a corrupt block in the OI file:

[  306.547356] LustreError: 2900:0:(osd_iam_lfix.c:190:iam_lfix_init()) Wrong magic in node 173391004 (#21): 0x0 != 0x1976 or wrong count: 0 (170)
[  306.561797] LustreError: 2900:0:(osd_iam_lfix.c:190:iam_lfix_init()) Skipped 3 previous similar messages
[  307.088114] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  307.096929] IP: [<ffffffffc1182dd0>] __iam_path_lookup+0x70/0x240 [osd_ldiskfs]
[  307.110928] Oops: 0002 [#1] SMP 
[  307.388833] Call Trace:
[  307.393153]  [<ffffffffc118304f>] __iam_it_get+0xaf/0x1b0 [osd_ldiskfs]
[  307.402139]  [<ffffffffc1183bda>] iam_it_get+0x2a/0x160 [osd_ldiskfs]
[  307.410927]  [<ffffffffc117c713>] __osd_oi_lookup+0x113/0x390 [osd_ldiskfs]
[  307.420296]  [<ffffffffc117ed54>] osd_oi_lookup+0x94/0x170 [osd_ldiskfs]
[  307.429354]  [<ffffffffc1194662>] osd_scrub_check_update+0x112/0x12f0 [osd_ldiskfs]
[  307.448551]  [<ffffffffc11973d5>] osd_scrub_exec+0x65/0x5f0 [osd_ldiskfs]
[  307.467661]  [<ffffffffc1198e81>] osd_inode_iteration+0x571/0xd80 [osd_ldiskfs]
[  307.496163]  [<ffffffffc119a100>] osd_scrub_main+0xa70/0x1070 [osd_ldiskfs]

This issue with __iam_path_lookup() should probably be moved into a separate LU ticket, so that the crash (likely due to incorrect error handling) can be identified and avoided. We can't really do anything to repair the corruption in place, but the problematic OI file(s) could be removed and OI Scrub can rebuild them.



 Comments   
Comment by Peter Jones [ 16/Aug/18 ]

Jian

Could you please investigate?

Thanks

Peter

Comment by Cliff White (Inactive) [ 01/Apr/20 ]

This issue has been dead for over a year, can be closed.

Generated at Sat Feb 10 02:42:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.