[LU-11851] kernel BUG at ...ldiskfs/htree_lock.c:665 Created: 10/Jan/19  Updated: 07/Jul/20  Resolved: 25/May/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-13054 MDS kernel BUG at ldiskfs/htree_lock.... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It looks like something broke in our pdirops support on the ldiskfs side. There are several customer reports and I am seeing this in my testing as well.

The crash looks like this:

[11832.408445] ------------[ cut here ]------------
[11832.452194] kernel BUG at /home/green/git/lustre-release/ldiskfs/htree_lock.c:665!
[11832.452194] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[11832.452194] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common pcspkr virtio_console virtio_balloon i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm drm_panel_orientation_quirks ata_piix i2c_core serio_raw virtio_blk libata floppy [last unloaded: libcfs]
[11832.452194] CPU: 5 PID: 3350 Comm: mdt02_004 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #1
[11832.452194] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[11832.452194] task: ffff8803237c87c0 ti: ffff88007b130000 task.ti: ffff88007b130000
[11832.452194] RIP: 0010:[<ffffffffa0bae653>]  [<ffffffffa0bae653>] htree_unlock_internal.isra.8+0x133/0x140 [ldiskfs]
[11832.452194] RSP: 0000:ffff88007b1339c0  EFLAGS: 00010246
[11832.452194] RAX: ffff8802f2c68744 RBX: ffff88008e984e00 RCX: 0000000000000000
[11832.452194] RDX: 0000000000000000 RSI: ffff88008e984e1c RDI: ffff8802f2c68740
[11832.452194] RBP: ffff88007b1339f8 R08: 0000000000000058 R09: ffff88029b9686e0
[11832.452194] R10: 0000000000000000 R11: ffff88029b9686d8 R12: 0000000000000000
[11832.502820] R13: ffff8802f2c68740 R14: ffff88008e984e00 R15: ffff880073e54000
[11832.502820] FS:  0000000000000000(0000) GS:ffff88033db40000(0000) knlGS:0000000000000000
[11832.502820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11832.502820] CR2: 00007f4ccab8bb78 CR3: 00000000ae87a000 CR4: 00000000000006e0
[11832.502820] Call Trace:
[11832.502820]  [<ffffffffa0bafa22>] htree_unlock+0x52/0xc0 [ldiskfs]
[11832.502820]  [<ffffffffa0ac93d9>] osd_index_ea_lookup+0x3b9/0xec0 [osd_ldiskfs]
[11832.502820]  [<ffffffffa03e07de>] ? lu_ucred+0x1e/0x30 [obdclass]
[11832.502820]  [<ffffffffa0d95ad5>] lod_lookup+0x25/0x30 [lod]
[11832.502820]  [<ffffffffa0c51d72>] __mdd_lookup.isra.18+0x2b2/0x460 [mdd]
[11832.502820]  [<ffffffffa0c51fcf>] mdd_lookup+0xaf/0x170 [mdd]
[11832.502820]  [<ffffffffa0cd1cdf>] mdt_lookup_version_check+0x6f/0x2c0 [mdt]
[11832.502820]  [<ffffffffa0cd68f7>] mdt_reint_unlink+0x217/0x14d0 [mdt]
[11832.502820]  [<ffffffffa0cdd8d0>] mdt_reint_rec+0x80/0x210 [mdt]
[11832.502820]  [<ffffffffa0cba723>] mdt_reint_internal+0x6e3/0xab0 [mdt]
[11832.502820]  [<ffffffffa0cc2924>] ? mdt_thread_info_init+0xa4/0x1e0 [mdt]
[11832.502820]  [<ffffffffa0cc58e7>] mdt_reint+0x67/0x140 [mdt]
[11832.502820]  [<ffffffffa0663e65>] tgt_request_handle+0xaf5/0x1590 [ptlrpc]
[11832.502820]  [<ffffffffa0236fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[11832.502820]  [<ffffffffa0607f06>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
[11832.502820]  [<ffffffff810bfbd8>] ? __wake_up_common+0x58/0x90
[11832.544484]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[11832.545188]  [<ffffffffa060bdf9>] ptlrpc_main+0xa99/0x1f60 [ptlrpc]
[11832.545188]  [<ffffffff810c32ed>] ? finish_task_switch+0x5d/0x1b0
[11832.545188]  [<ffffffff817b6cd0>] ? __schedule+0x410/0xa00
[11832.545188]  [<ffffffffa060b360>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc]
[11832.545188]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[11832.545188]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[11832.545188]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
[11832.545188]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[11832.545188] Code: 67 20 e8 f1 c3 51 e0 4c 3b 65 c8 49 8b 47 20 8b 4d d4 48 8d 70 e0 75 80 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 
[11832.545188] RIP  [<ffffffffa0bae653>] htree_unlock_internal.isra.8+0x133/0x140 [ldiskfs]

The crash location is:

htree_unlock_internal(struct htree_lock *lck)
{
        struct htree_lock_head *lhead = lck->lk_head;
        struct htree_lock *tmp;
        struct htree_lock *tmp2;
        int granted = 0;
        int i;

        BUG_ON(lhead->lh_ngranted[lck->lk_mode] == 0);   <=========== HERE

so it's pretty clear we are trying to unlock a lock that has not been locked in this particular mode we are trying to unlock.

I did a debug-fixing patch that does not seem to help here: https://review.whamcloud.com/#/c/33068/
the suspicion was we were waking the waiter wrongly and it always assumed the wake up means the lock is granted.

I have some crashdumps but there's nothing clear there once the situation happens. The problem is first seen in my testing on Aug 16th and comes in waves for some reason.



 Comments   
Comment by Gerrit Updater [ 01/Feb/19 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34160
Subject: LU-11851 ldiskfs: debug for htree_lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e37ea565b0ceab65f736da42620765493df1c59

Comment by James A Simmons [ 24/Apr/19 ]

 Is this a RHEL7 only thing?

Comment by Yang Sheng [ 24/Apr/19 ]

Yes, Looks like only saw it on RHEL7.

Comment by Oleg Drokin [ 13/May/19 ]

Hm, so I hit the crash with this patch with master-next, obviously it's different. have crashdump.

Of course it could be the reason is some other patch, but just wanted to note it anyway (there's a number of other crashes in this master-next iteration)

[ 1045.759027] Thread 6412(mdt04_001) was waken wrongly, dump last_wakee
[ 1045.765133] Pid: 1802201963, comm: kkkkkkkkkkkkkkkkkkkk 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
[ 1045.769041] Call Trace:
[ 1045.770736] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1045.771705] Modules linked in: loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) jbd2 mbcache lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon virtio_console i2c_piix4 ip_tables rpcsec_gss_krb5 drm_kms_helper ata_generic ttm pata_acpi drm crct10dif_pclmul drm_panel_orientation_quirks crct10dif_common ata_piix crc32c_intel serio_raw virtio_blk i2c_core libata floppy
[ 1045.797106] CPU: 9 PID: 6412 Comm: mdt04_001 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #1
[ 1045.814710] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1045.814710] task: ffff8802faff6d00 ti: ffff8802fa9a0000 task.ti: ffff8802fa9a0000
[ 1045.814710] RIP: 0010:[<ffffffff81068264>]  [<ffffffff81068264>] __unwind_start+0x124/0x180
[ 1045.814710] RSP: 0018:ffff8802fa9a3700  EFLAGS: 00010287
[ 1045.814710] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8802fa9a3728 RCX: 0000000000000000
[ 1045.814710] RDX: 0000000000000000 RSI: ffff880084b82dc0 RDI: ffff8802fa9a3718
[ 1045.814710] RBP: ffff8802fa9a3718 R08: 0000000000000000 R09: 00000000000002b8
[ 1045.814710] R10: 0000000000000198 R11: 0000000000000198 R12: 0000000000000001
[ 1045.814710] R13: 6b6b6b6b6b6b6b6b R14: ffff880084b82dc0 R15: ffff8802faff6d00
[ 1045.814710] FS:  0000000000000000(0000) GS:ffff88033dc40000(0000) knlGS:0000000000000000
[ 1045.869183] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1045.869183] CR2: 00007fdba3d339b1 CR3: 0000000001c10000 CR4: 00000000001607e0
[ 1045.869183] Call Trace:
[ 1045.869183]  [<ffffffff8103c03e>] __save_stack_trace+0x5e/0x100
[ 1045.869183]  [<ffffffff8103c13c>] save_stack_trace_tsk+0x2c/0x40
[ 1045.869183]  [<ffffffffa02c77dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 1045.869183]  [<ffffffffa02c7823>] libcfs_debug_dumpstack+0x13/0x30 [libcfs]
[ 1045.869183]  [<ffffffffa0b3c335>] htree_lock_internal+0x165/0x190 [ldiskfs]
[ 1045.869183]  [<ffffffffa0b3c3b4>] htree_lock_try+0x54/0xc0 [ldiskfs]
[ 1045.869183]  [<ffffffffa0b01a96>] ldiskfs_htree_lock+0x66/0xd0 [ldiskfs]
[ 1045.869183]  [<ffffffffa0bbbe0f>] osd_index_ea_insert+0x45f/0x12f0 [osd_ldiskfs]
[ 1045.869183]  [<ffffffffa0df4e8e>] lod_sub_insert+0x1be/0x340 [lod]
[ 1045.869183]  [<ffffffffa0de804c>] ? lod_xattr_set+0x41c/0xc60 [lod]
[ 1045.869183]  [<ffffffffa0dcf5f4>] lod_insert+0x24/0x30 [lod]
[ 1045.869183]  [<ffffffffa0c859d9>] __mdd_index_insert_only+0x1c9/0x280 [mdd]
[ 1045.869183]  [<ffffffffa0c96828>] ? mdd_write_unlock+0x38/0xd0 [mdd]
[ 1045.869183]  [<ffffffffa0cac9ca>] __mdd_index_insert+0xa5/0x11b [mdd]
[ 1045.869183]  [<ffffffffa0c91b04>] mdd_create+0xeb4/0x14e0 [mdd]
[ 1045.869183]  [<ffffffffa0d0e0ff>] mdt_create+0xb4f/0x1080 [mdt]
[ 1045.869183]  [<ffffffffa0456d01>] ? lprocfs_job_stats_log+0xd1/0x600 [obdclass]
[ 1045.869183]  [<ffffffffa0d0e79b>] mdt_reint_create+0x16b/0x360 [mdt]
[ 1045.869183]  [<ffffffffa0d138a0>] mdt_reint_rec+0x80/0x210 [mdt]
[ 1045.869183]  [<ffffffffa0cefb20>] mdt_reint_internal+0x780/0xb50 [mdt]
[ 1045.869183]  [<ffffffffa0cf7b07>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
[ 1045.869183]  [<ffffffffa0cfab87>] mdt_reint+0x67/0x140 [mdt]
[ 1045.869183]  [<ffffffffa06e0b05>] tgt_request_handle+0x915/0x15c0 [ptlrpc]
[ 1045.869183]  [<ffffffffa02ce0f7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 1045.869183]  [<ffffffffa06832b9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
[ 1045.869183]  [<ffffffffa06872bc>] ptlrpc_main+0xb6c/0x20b0 [ptlrpc]
[ 1045.869183]  [<ffffffffa0686750>] ? ptlrpc_register_service+0xfe0/0xfe0 [ptlrpc]
[ 1045.869183]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[ 1045.869183]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[ 1045.869183]  [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21
[ 1045.869183]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[ 1045.869183] Code: 03 00 00 00 00 5b 41 5c 41 5d 5d c3 0f 1f 00 65 48 8b 04 25 00 0f 01 00 48 39 c6 48 89 ef 0f 84 68 ff ff ff 48 8b 86 d0 06 00 00 <48> 8b 38 e9 59 ff ff ff 0f 1f 40 00 48 8d 7b 01 c6 03 00 41 b0 

crashdump and modules re in /export/crashdumps/192.168.123.135-2019-05-12-08:59:01/ on onyx-68 if you want to take a look.

Comment by Yang Sheng [ 13/May/19 ]

We cannot dump last_wakee in there, since the task already was released. The comm "kkkkkkkk" means memory has been poisoned by 0x6b. Looks like a process really waken up the 'mdt04_001' and then exited immediately. So it should be a user process? Then I think my previous patch is necessary in such case. I'll update it without dump_trace.

Comment by Gerrit Updater [ 25/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34160/
Subject: LU-11851 ldiskfs: reschedule for htree thread.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 892f62c7a20d87954744c6c8937960379a870992

Comment by Peter Jones [ 25/May/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 30/May/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35006
Subject: LU-11851 ldiskfs: reschedule for htree thread.
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a9c5021410e9f4e982fbc59dd669b9dc713a3838

Comment by Gerrit Updater [ 08/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35006/
Subject: LU-11851 ldiskfs: reschedule for htree thread.
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 4367d3d41dce8507cabcd76fb9bba8466f5518df

Generated at Sat Feb 10 02:47:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.