[LU-12469] sanity test_230b crash: LBUG of lu_buf_free in mdd_iterate_xattrs Created: 25/Jun/19  Updated: 04/Jan/20  Resolved: 14/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0, Lustre 2.12.4

Type: Bug Priority: Major
Reporter: Li Xi Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11956 conf-sanity test_32a failed with 1 Resolved
Rank (Obsolete): 9223372036854775807

 Description   
[ 8755.816950] LustreError: 26096:0:(lu_object.c:2399:lu_buf_free()) ASSERTION( buf->lb_len > 0 ) failed:
[ 8755.819785] LustreError: 26096:0:(lu_object.c:2399:lu_buf_free()) LBUG
[ 8755.821791] Pid: 26096, comm: mdt00_002 3.10.0-862.14.4.el7_lustre.2.12.54_52_g948202a.x86_64 #1 SMP Thu Jun 13 11:34:55 CST 2019
[ 8755.826429] Call Trace:
[ 8755.827263]  [<ffffffffc06d77cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 8755.829385]  [<ffffffffc06d787c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 8755.831108]  [<ffffffffc08b9b93>] lu_buf_free+0x213/0x220 [obdclass]
[ 8755.832160]  [<ffffffffc18227cf>] mdd_iterate_xattrs+0x57f/0x8d0 [mdd]
[ 8755.833215]  [<ffffffffc184845b>] mdd_declare_migrate_create+0x2bb/0x3b6 [mdd]
[ 8755.834365]  [<ffffffffc182d810>] mdd_migrate+0xea0/0x1810 [mdd]
[ 8755.835344]  [<ffffffffc16ec60f>] mdt_reint_migrate+0x1def/0x2400 [mdt]
[ 8755.836434]  [<ffffffffc16ecca3>] mdt_reint_rec+0x83/0x210 [mdt]
[ 8755.837417]  [<ffffffffc16c8470>] mdt_reint_internal+0x780/0xb90 [mdt]
[ 8755.838474]  [<ffffffffc16d3877>] mdt_reint+0x67/0x140 [mdt]
[ 8755.839408]  [<ffffffffc0f6d60a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc]
[ 8755.840575]  [<ffffffffc0f117ee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[ 8755.841810]  [<ffffffffc0f152dc>] ptlrpc_main+0xbac/0x1550 [ptlrpc]
[ 8755.842865]  [<ffffffff8c8bdf21>] kthread+0xd1/0xe0
[ 8755.843706]  [<ffffffff8cf255f7>] ret_from_fork_nospec_end+0x0/0x39
[ 8755.844664]  [<ffffffffffffffff>] 0xffffffffffffffff
[ 8755.845501] Kernel panic - not syncing: LBUG


 Comments   
Comment by Li Xi [ 25/Jun/19 ]

Not sure whether it is related to this LUG, but "xbuf.lb_len" shouldn't be changed, otherwise lu_buf_free() might cause fault memory leak detection.

Comment by Li Xi [ 25/Jun/19 ]

The codes were introduced in "LU-8569 linkea: linkEA size limitation"

Comment by Sebastien Buisson [ 01/Nov/19 ]

Hit during Maloo testing:
https://testing.whamcloud.com/test_sets/a9ae171e-fc30-11e9-8e77-52540065bddc

Comment by Andreas Dilger [ 01/Nov/19 ]

Sebastien, I think the failures on LU-12895 testing are related to those patches, since it hasn't been failing much in other cases (except a bunch of failures for LU-12624 tests that could be attributed to that patch). I don't think it relates to security testing in general, since we are running a lot of that and not seeing this failure.

Hmm, unless the LU-12895 crash in test_185a is preventing testing to get to test_230b with the security test config? Is it possible the security config is doing something strange with the buffers and triggering this LBUG() itself?

Comment by James Nunez (Inactive) [ 05/Nov/19 ]

It looks like we are hitting the same assertion for sanity test 230j; https://testing.whamcloud.com/test_sets/40fb7b16-fff6-11e9-8e77-52540065bddc.

The call trace looks a bit different

[10218.643731] Lustre: DEBUG MARKER: == sanity test 230j: DoM file data not changed after dir migration =================================== 13:23:34 (1572960214)
[10218.980145] LustreError: 16021:0:(lu_object.c:2479:lu_buf_free()) ASSERTION( buf->lb_len > 0 ) failed: 
[10218.981175] LustreError: 16021:0:(lu_object.c:2479:lu_buf_free()) LBUG
[10218.981820] Pid: 16021, comm: mdt00_000 3.10.0-957.27.2.el7_lustre.x86_64 #1 SMP Mon Sep 30 22:09:27 UTC 2019
[10218.982797] Call Trace:
[10218.983094]  [<ffffffffc06a78ac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[10218.983793]  [<ffffffffc06a795c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[10218.984441]  [<ffffffffc08e4f33>] lu_buf_free+0x213/0x220 [obdclass]
[10218.985293]  [<ffffffffc10cfa20>] mdd_iterate_xattrs+0x5a0/0x8f0 [mdd]
[10218.986171]  [<ffffffffc10f5251>] mdd_migrate_create+0x280/0x44b [mdd]
[10218.986872]  [<ffffffffc10da840>] mdd_migrate+0x1290/0x17e0 [mdd]
[10218.987518]  [<ffffffffc0f67434>] mdt_reint_migrate+0x1c34/0x2230 [mdt]
[10218.988285]  [<ffffffffc0f67ab3>] mdt_reint_rec+0x83/0x210 [mdt]
[10218.988944]  [<ffffffffc0f419e0>] mdt_reint_internal+0x7b0/0xba0 [mdt]
[10218.989680]  [<ffffffffc0f4d6d7>] mdt_reint+0x67/0x140 [mdt]
[10218.990319]  [<ffffffffc0c2eeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc]
[10218.991328]  [<ffffffffc0bd5a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[10218.992220]  [<ffffffffc0bd959c>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[10218.992926]  [<ffffffffb12c2e81>] kthread+0xd1/0xe0
[10218.993521]  [<ffffffffb1977c37>] ret_from_fork_nospec_end+0x0/0x39
[10218.994190]  [<ffffffffffffffff>] 0xffffffffffffffff
[10218.994836] Kernel panic - not syncing: LBUG
[10218.995275] CPU: 0 PID: 16021 Comm: mdt00_000 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7_lustre.x86_64 #1
[10218.996449] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[10218.997013] Call Trace:
[10218.997283]  [<ffffffffb1965147>] dump_stack+0x19/0x1b
[10218.997799]  [<ffffffffb195e850>] panic+0xe8/0x21f
[10218.998278]  [<ffffffffc06a79ab>] lbug_with_loc+0x9b/0xa0 [libcfs]
[10218.998899]  [<ffffffffc08e4f33>] lu_buf_free+0x213/0x220 [obdclass]
[10218.999525]  [<ffffffffc10cfa20>] mdd_iterate_xattrs+0x5a0/0x8f0 [mdd]
[10219.000168]  [<ffffffffc10cb700>] ? mdo_ref_del+0x180/0x180 [mdd]
[10219.000790]  [<ffffffffc10f5251>] mdd_migrate_create+0x280/0x44b [mdd]
[10219.001442]  [<ffffffffc10da840>] mdd_migrate+0x1290/0x17e0 [mdd]
[10219.002058]  [<ffffffffc0f67434>] mdt_reint_migrate+0x1c34/0x2230 [mdt]
[10219.002712]  [<ffffffffc0f67ab3>] mdt_reint_rec+0x83/0x210 [mdt]
[10219.003329]  [<ffffffffc0f419e0>] mdt_reint_internal+0x7b0/0xba0 [mdt]
[10219.003988]  [<ffffffffc0f4af67>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
[10219.004663]  [<ffffffffc0f4d6d7>] mdt_reint+0x67/0x140 [mdt]
[10219.005257]  [<ffffffffc0c2eeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc]
[10219.005989]  [<ffffffffc07f0f3e>] ? libcfs_nid2str_r+0xfe/0x130 [lnet]
[10219.006661]  [<ffffffffc0bd5a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[10219.007423]  [<ffffffffb12cfeb4>] ? __wake_up+0x44/0x50
[10219.007968]  [<ffffffffc0bd959c>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[10219.008613]  [<ffffffffc0bd89f0>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
[10219.009346]  [<ffffffffb12c2e81>] kthread+0xd1/0xe0
[10219.009825]  [<ffffffffb12c2db0>] ? insert_kthread_work+0x40/0x40
[10219.010429]  [<ffffffffb1977c37>] ret_from_fork_nospec_begin+0x21/0x21
[10219.011077]  [<ffffffffb12c2db0>] ? insert_kthread_work+0x40/0x40
Comment by Gerrit Updater [ 05/Nov/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36675
Subject: LU-12469 tests: stop running sanity test 230j
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d8e6320e599012bcdebae46124006a8ee6ed3a76

Comment by Gerrit Updater [ 06/Nov/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36684
Subject: LU-12469 mdd: handle migrate case with SELinux
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2192fcce2f7469c4d78b398ecde9ffa1faf1995f

Comment by Sebastien Buisson [ 07/Nov/19 ]

The stack trace reported for crash in sanity test_230j is quite similar to the one for crash in test_230b, and is much likely happening because of SELinux.

As can be seen in test results for patch https://review.whamcloud.com/36684, all sanity tests in the 230 family are passing now. So I do not think patch https://review.whamcloud.com/36675 to add sanity 230j to the exception list is necessary.

Comment by Bruno Faccini (Inactive) [ 12/Nov/19 ]

+1 during sanity/test_230e with latest master @https://testing.whamcloud.com/test_sets/f7c46c1e-055e-11ea-b934-52540065bddc

Comment by Bruno Faccini (Inactive) [ 12/Nov/19 ]

+1 during sanity/test_230l with latest master @https://testing.whamcloud.com/test_sets/af8d3b8c-0569-11ea-b934-52540065bddc

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36684/
Subject: LU-12469 mdd: handle migrate case with SELinux
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8a60fa2e2fcd28c2772d90e76d36430d30b01905

Comment by Peter Jones [ 14/Dec/19 ]

Landed for 2.14

Comment by Gerrit Updater [ 16/Dec/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37031
Subject: LU-12469 mdd: handle migrate case with SELinux
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a977e99219688feb4e1e60ea687f02ef2008fab3

Comment by Gerrit Updater [ 03/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37031/
Subject: LU-12469 mdd: handle migrate case with SELinux
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: f9340d5eb4c8e74909a1aec206844dc4d0f30ec1

Generated at Sat Feb 10 02:52:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.