[LU-14474] Oops in llog_cat_prep_log() in sanity-quota / recovery-small Created: 27/Feb/21 Updated: 15/Dec/22 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
https://testing.whamcloud.com/test_sets/fd169bf5-481b-4906-8891-b5fc6738d14d [28530.564627] general protection fault: 0000 [#1] SMP PTI [28530.565704] CPU: 0 PID: 1366058 Comm: mdt00_002 Kdump: loaded Tainted: P OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [28530.567756] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [28530.568756] RIP: 0010:up_write+0x5/0x30 [28530.569406] Code: f8 02 74 01 c3 48 8b 47 20 a8 04 74 06 f0 48 83 67 20 fb e9 3d ff ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> c7 47 20 00 00 00 00 48 c7 c0 ff ff ff ff f0 48 0f c1 07 a8 02 [28530.572467] RSP: 0018:ffffab6000b93a38 EFLAGS: 00010212 [28530.573333] RAX: 00000000ffffffea RBX: ffff97eb10f06a00 RCX: 000000000293cb3b [28530.574512] RDX: 000000000293cb3a RSI: 0000000000000002 RDI: 5f32303030646f6c [28530.575677] RBP: ffff97eb10f06ac8 R08: 000000000002f0c0 R09: ffffffffc1a13f21 [28530.576846] R10: 00000000ffffffea R11: 0000000000000001 R12: 0000000000000001 [28530.578021] R13: ffff97eb0f0ee140 R14: ffff97eb2d2f7cb8 R15: 00000000ffffffea [28530.579186] FS: 0000000000000000(0000) GS:ffff97eb3fc00000(0000) knlGS:0000000000000000 [28530.580510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28530.581457] CR2: 00007fcd6fb50000 CR3: 0000000042a0a005 CR4: 00000000000606f0 [28530.582640] Call Trace: [28530.583369] llog_cat_prep_log+0xb0/0x3c0 [obdclass] [28530.584256] llog_cat_declare_add_rec+0xbe/0x220 [obdclass] [28530.585211] llog_declare_add+0x187/0x1d0 [obdclass] [28530.586470] top_trans_start+0x212/0x940 [ptlrpc] [28530.587364] mdd_create+0xba4/0x1850 [mdd] [28530.588225] mdt_create+0x7b1/0xc50 [mdt] [28530.588936] mdt_reint_create+0x30b/0x3c0 [mdt] [28530.589721] mdt_reint_rec+0x11f/0x250 [mdt] [28530.590454] mdt_reint_internal+0x498/0x780 [mdt] [28530.591283] mdt_reint+0x5e/0x100 [mdt] [28530.591981] tgt_request_handle+0xc78/0x1910 [ptlrpc] |
| Comments |
| Comment by Andreas Dilger [ 21/Aug/21 ] |
|
+1 on master in recovery-small unmount https://testing.whamcloud.com/test_sets/e3bb9ef5-9a74-4037-9fd5-bd9c1e60e4e2 |
| Comment by Andreas Dilger [ 31/Aug/21 ] |
|
+1 on master in recovery-small after the test had finished: Lustre: DEBUG MARKER: == recovery-small test complete, duration 6212 sec ======= 13:58:23 (1629640703) BUG: unable to handle kernel paging request at 000000000bd00c23 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2618 Comm: mdt00_001 Kdump: loaded 4.18.0-240.22.1.el8_lustre.x86_64 #1 RIP: 0010:llog_exist+0xd9/0x180 [obdclass] Call Trace: llog_cat_prep_log+0x4f/0x3c0 [obdclass] llog_cat_declare_add_rec+0x56/0x220 [obdclass] llog_declare_add+0x187/0x1d0 [obdclass] top_trans_start+0x212/0x940 [ptlrpc] mdd_unlink+0x4a0/0xb30 [mdd] mdt_reint_unlink+0xb0c/0x1290 [mdt] mdt_reint_rec+0x11f/0x250 [mdt] mdt_reint_internal+0x498/0x780 [mdt] mdt_reint+0x5e/0x100 [mdt] tgt_request_handle+0xc90/0x1940 [ptlrpc] ptlrpc_server_handle_request+0x323/0xbc0 [ptlrpc] ptlrpc_main+0xba2/0x1490 [ptlrpc] |
| Comment by Alex Zhuravlev [ 03/Sep/21 ] |
|
a similar one: https://testing.whamcloud.com/test_sets/a3d67622-925a-4122-abf5-f599b59970ba |
| Comment by Alex Zhuravlev [ 14/Sep/21 ] |
|
https://testing.whamcloud.com/test_sessions/66dd0fd7-ab82-4af5-9178-0a12fe5302d5 |
| Comment by Alex Zhuravlev [ 14/Sep/21 ] |
[ 857.600721] LustreError: 167-0: lustre-MDT0001-osp-MDT0002: This client was evicted by lustre-MDT0001; in progress operations using this service will fail. [ 857.857771] LustreError: 64216:0:(llog.c:472:llog_verify_record()) lustre-MDT0001-osp-MDT0000: record is too large: 0 > 32768 [ 857.859905] LustreError: 64216:0:(llog.c:656:llog_process_thread()) lustre-MDT0001-osp-MDT0000: invalid record in llog [0x1:0x40001b70:0x2] record for index 0/1: rc = -22 |
| Comment by Gerrit Updater [ 14/Sep/21 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44910 |
| Comment by Alex Zhuravlev [ 21/Sep/21 ] |
|
the immediate reason is that lod_sub_recovery_thread() destroys just scanned llog which is stored in chd_next_log, but I haven't understood yet how the log destroyed got into chd_next_log. |
| Comment by Gerrit Updater [ 21/Sep/21 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44998 |
| Comment by Gerrit Updater [ 10/Oct/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44998/ |
| Comment by Peter Jones [ 10/Oct/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 23/Dec/21 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45928 |