[LU-12469] sanity test_230b crash: LBUG of lu_buf_free in mdd_iterate_xattrs Created: 25/Jun/19 Updated: 04/Jan/20 Resolved: 14/Dec/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Li Xi | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
[ 8755.816950] LustreError: 26096:0:(lu_object.c:2399:lu_buf_free()) ASSERTION( buf->lb_len > 0 ) failed: [ 8755.819785] LustreError: 26096:0:(lu_object.c:2399:lu_buf_free()) LBUG [ 8755.821791] Pid: 26096, comm: mdt00_002 3.10.0-862.14.4.el7_lustre.2.12.54_52_g948202a.x86_64 #1 SMP Thu Jun 13 11:34:55 CST 2019 [ 8755.826429] Call Trace: [ 8755.827263] [<ffffffffc06d77cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 8755.829385] [<ffffffffc06d787c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 8755.831108] [<ffffffffc08b9b93>] lu_buf_free+0x213/0x220 [obdclass] [ 8755.832160] [<ffffffffc18227cf>] mdd_iterate_xattrs+0x57f/0x8d0 [mdd] [ 8755.833215] [<ffffffffc184845b>] mdd_declare_migrate_create+0x2bb/0x3b6 [mdd] [ 8755.834365] [<ffffffffc182d810>] mdd_migrate+0xea0/0x1810 [mdd] [ 8755.835344] [<ffffffffc16ec60f>] mdt_reint_migrate+0x1def/0x2400 [mdt] [ 8755.836434] [<ffffffffc16ecca3>] mdt_reint_rec+0x83/0x210 [mdt] [ 8755.837417] [<ffffffffc16c8470>] mdt_reint_internal+0x780/0xb90 [mdt] [ 8755.838474] [<ffffffffc16d3877>] mdt_reint+0x67/0x140 [mdt] [ 8755.839408] [<ffffffffc0f6d60a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [ 8755.840575] [<ffffffffc0f117ee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 8755.841810] [<ffffffffc0f152dc>] ptlrpc_main+0xbac/0x1550 [ptlrpc] [ 8755.842865] [<ffffffff8c8bdf21>] kthread+0xd1/0xe0 [ 8755.843706] [<ffffffff8cf255f7>] ret_from_fork_nospec_end+0x0/0x39 [ 8755.844664] [<ffffffffffffffff>] 0xffffffffffffffff [ 8755.845501] Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Li Xi [ 25/Jun/19 ] |
|
Not sure whether it is related to this LUG, but "xbuf.lb_len" shouldn't be changed, otherwise lu_buf_free() might cause fault memory leak detection. |
| Comment by Li Xi [ 25/Jun/19 ] |
|
The codes were introduced in " |
| Comment by Sebastien Buisson [ 01/Nov/19 ] |
|
Hit during Maloo testing: |
| Comment by Andreas Dilger [ 01/Nov/19 ] |
|
Sebastien, I think the failures on Hmm, unless the |
| Comment by James Nunez (Inactive) [ 05/Nov/19 ] |
|
It looks like we are hitting the same assertion for sanity test 230j; https://testing.whamcloud.com/test_sets/40fb7b16-fff6-11e9-8e77-52540065bddc. The call trace looks a bit different [10218.643731] Lustre: DEBUG MARKER: == sanity test 230j: DoM file data not changed after dir migration =================================== 13:23:34 (1572960214) [10218.980145] LustreError: 16021:0:(lu_object.c:2479:lu_buf_free()) ASSERTION( buf->lb_len > 0 ) failed: [10218.981175] LustreError: 16021:0:(lu_object.c:2479:lu_buf_free()) LBUG [10218.981820] Pid: 16021, comm: mdt00_000 3.10.0-957.27.2.el7_lustre.x86_64 #1 SMP Mon Sep 30 22:09:27 UTC 2019 [10218.982797] Call Trace: [10218.983094] [<ffffffffc06a78ac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [10218.983793] [<ffffffffc06a795c>] lbug_with_loc+0x4c/0xa0 [libcfs] [10218.984441] [<ffffffffc08e4f33>] lu_buf_free+0x213/0x220 [obdclass] [10218.985293] [<ffffffffc10cfa20>] mdd_iterate_xattrs+0x5a0/0x8f0 [mdd] [10218.986171] [<ffffffffc10f5251>] mdd_migrate_create+0x280/0x44b [mdd] [10218.986872] [<ffffffffc10da840>] mdd_migrate+0x1290/0x17e0 [mdd] [10218.987518] [<ffffffffc0f67434>] mdt_reint_migrate+0x1c34/0x2230 [mdt] [10218.988285] [<ffffffffc0f67ab3>] mdt_reint_rec+0x83/0x210 [mdt] [10218.988944] [<ffffffffc0f419e0>] mdt_reint_internal+0x7b0/0xba0 [mdt] [10218.989680] [<ffffffffc0f4d6d7>] mdt_reint+0x67/0x140 [mdt] [10218.990319] [<ffffffffc0c2eeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc] [10218.991328] [<ffffffffc0bd5a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [10218.992220] [<ffffffffc0bd959c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [10218.992926] [<ffffffffb12c2e81>] kthread+0xd1/0xe0 [10218.993521] [<ffffffffb1977c37>] ret_from_fork_nospec_end+0x0/0x39 [10218.994190] [<ffffffffffffffff>] 0xffffffffffffffff [10218.994836] Kernel panic - not syncing: LBUG [10218.995275] CPU: 0 PID: 16021 Comm: mdt00_000 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.x86_64 #1 [10218.996449] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [10218.997013] Call Trace: [10218.997283] [<ffffffffb1965147>] dump_stack+0x19/0x1b [10218.997799] [<ffffffffb195e850>] panic+0xe8/0x21f [10218.998278] [<ffffffffc06a79ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [10218.998899] [<ffffffffc08e4f33>] lu_buf_free+0x213/0x220 [obdclass] [10218.999525] [<ffffffffc10cfa20>] mdd_iterate_xattrs+0x5a0/0x8f0 [mdd] [10219.000168] [<ffffffffc10cb700>] ? mdo_ref_del+0x180/0x180 [mdd] [10219.000790] [<ffffffffc10f5251>] mdd_migrate_create+0x280/0x44b [mdd] [10219.001442] [<ffffffffc10da840>] mdd_migrate+0x1290/0x17e0 [mdd] [10219.002058] [<ffffffffc0f67434>] mdt_reint_migrate+0x1c34/0x2230 [mdt] [10219.002712] [<ffffffffc0f67ab3>] mdt_reint_rec+0x83/0x210 [mdt] [10219.003329] [<ffffffffc0f419e0>] mdt_reint_internal+0x7b0/0xba0 [mdt] [10219.003988] [<ffffffffc0f4af67>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt] [10219.004663] [<ffffffffc0f4d6d7>] mdt_reint+0x67/0x140 [mdt] [10219.005257] [<ffffffffc0c2eeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc] [10219.005989] [<ffffffffc07f0f3e>] ? libcfs_nid2str_r+0xfe/0x130 [lnet] [10219.006661] [<ffffffffc0bd5a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [10219.007423] [<ffffffffb12cfeb4>] ? __wake_up+0x44/0x50 [10219.007968] [<ffffffffc0bd959c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [10219.008613] [<ffffffffc0bd89f0>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [10219.009346] [<ffffffffb12c2e81>] kthread+0xd1/0xe0 [10219.009825] [<ffffffffb12c2db0>] ? insert_kthread_work+0x40/0x40 [10219.010429] [<ffffffffb1977c37>] ret_from_fork_nospec_begin+0x21/0x21 [10219.011077] [<ffffffffb12c2db0>] ? insert_kthread_work+0x40/0x40 |
| Comment by Gerrit Updater [ 05/Nov/19 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36675 |
| Comment by Gerrit Updater [ 06/Nov/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36684 |
| Comment by Sebastien Buisson [ 07/Nov/19 ] |
|
The stack trace reported for crash in sanity test_230j is quite similar to the one for crash in test_230b, and is much likely happening because of SELinux. As can be seen in test results for patch https://review.whamcloud.com/36684, all sanity tests in the 230 family are passing now. So I do not think patch https://review.whamcloud.com/36675 to add sanity 230j to the exception list is necessary. |
| Comment by Bruno Faccini (Inactive) [ 12/Nov/19 ] |
|
+1 during sanity/test_230e with latest master @https://testing.whamcloud.com/test_sets/f7c46c1e-055e-11ea-b934-52540065bddc |
| Comment by Bruno Faccini (Inactive) [ 12/Nov/19 ] |
|
+1 during sanity/test_230l with latest master @https://testing.whamcloud.com/test_sets/af8d3b8c-0569-11ea-b934-52540065bddc |
| Comment by Gerrit Updater [ 14/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36684/ |
| Comment by Peter Jones [ 14/Dec/19 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 16/Dec/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37031 |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37031/ |