[LU-12003] Access to invalid semaphore in osd_trunc_unlock_all (ldiskfs) Created: 25/Feb/19  Updated: 22/Sep/21  Resolved: 20/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.14.0, Lustre 2.12.7

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I am having thse craces mostly in racer, but someimes in other tests where all of a sudden transaction unlock steps on ivalid memory pointer and explodes.

Here's a sample from racer:

[ 1956.129169] BUG: unable to handle kernel paging request at ffff88031ba1ce50
[ 1956.161700] IP: [<ffffffff810ba263>] up_read+0x13/0x30
[ 1956.161700] PGD 241b067 PUD 241e067 PMD 33ff04067 PTE 800000031ba1c060
[ 1956.169565] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 1956.169565] Modules linked in: loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) jbd2 mbcache lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) crc_t10dif crct10dif_generic crct10dif_common i2c_piix4 virtio_console virtio_balloon pcspkr ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm drm_panel_orientation_quirks ata_piix serio_raw i2c_core virtio_blk libata floppy
[ 1956.179867] CPU: 5 PID: 16415 Comm: mdt02_005 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #1
[ 1956.179867] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1956.179867] task: ffff880285c8c8c0 ti: ffff880285c90000 task.ti: ffff880285c90000
[ 1956.193466] RIP: 0010:[<ffffffff810ba263>]  [<ffffffff810ba263>] up_read+0x13/0x30
[ 1956.193466] RSP: 0018:ffff880285c938d8  EFLAGS: 00010202
[ 1956.193466] RAX: ffff88031ba1ce50 RBX: ffff880226d79200 RCX: 0000000000000000
[ 1956.193466] RDX: ffffffffffffffff RSI: 000000000000006b RDI: ffff88031ba1ce50
[ 1956.193466] RBP: ffff880285c938d8 R08: ffff880226d79e80 R09: ffff880226d79e80
[ 1956.193466] R10: ffff8802d8880000 R11: ffff8802d88806a8 R12: ffff88021943be40
[ 1956.193466] R13: ffff880226d79200 R14: ffff880285c93948 R15: ffff8802d88806b0
[ 1956.193466] FS:  0000000000000000(0000) GS:ffff88033db40000(0000) knlGS:0000000000000000
[ 1956.193466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1956.225299] CR2: ffff88031ba1ce50 CR3: 00000002210e4000 CR4: 00000000000006e0
[ 1956.225299] Call Trace:
[ 1956.225299]  [<ffffffffa0ae6fb5>] osd_trunc_unlock_all+0x35/0x150 [osd_ldiskfs]
[ 1956.225299]  [<ffffffffa0accda5>] osd_trans_stop+0x205/0x820 [osd_ldiskfs]
[ 1956.316681]  [<ffffffffa060bf23>] dt_trans_stop+0x13/0x30 [ptlrpc]
[ 1956.317199]  [<ffffffffa060f82d>] top_trans_stop+0x30d/0xa10 [ptlrpc]
[ 1956.317199]  [<ffffffffa0ce9b3c>] lod_trans_stop+0x25c/0x340 [lod]
[ 1956.317199]  [<ffffffffa060e6ba>] ? top_trans_start+0x34a/0x960 [ptlrpc]
[ 1956.317199]  [<ffffffffa0bdf108>] mdd_trans_stop+0x28/0x16e [mdd]
[ 1956.317199]  [<ffffffffa0bd34a6>] mdd_attr_set+0x5e6/0xcf0 [mdd]
[ 1956.317199]  [<ffffffffa0594032>] ? lustre_msg_get_versions+0x22/0xf0 [ptlrpc]
[ 1956.317199]  [<ffffffffa0c41e44>] mdt_reint_setattr+0xad4/0x1510 [mdt]
[ 1956.317199]  [<ffffffffa0c32c71>] ? mdt_root_squash+0x21/0x430 [mdt]
[ 1956.317199]  [<ffffffffa0c325a2>] ? ucred_set_audit_enabled.isra.13+0x22/0x60 [mdt]
[ 1956.369343]  [<ffffffffa0c45c80>] mdt_reint_rec+0x80/0x210 [mdt]
[ 1956.369343]  [<ffffffffa0c22890>] mdt_reint_internal+0x790/0xb30 [mdt]
[ 1956.369343]  [<ffffffffa0c2a9e7>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
[ 1956.369343]  [<ffffffffa0c2d9b7>] mdt_reint+0x67/0x140 [mdt]
[ 1956.369343]  [<ffffffffa05fc2a5>] tgt_request_handle+0x915/0x1610 [ptlrpc]
[ 1956.369343]  [<ffffffffa01a1fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 1956.369343]  [<ffffffffa05a13d9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
[ 1956.369343]  [<ffffffff810bfbd8>] ? __wake_up_common+0x58/0x90
[ 1956.369343]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[ 1956.369343]  [<ffffffffa05a53bc>] ptlrpc_main+0xb7c/0x22c0 [ptlrpc]
[ 1956.369343]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[ 1956.369343]  [<ffffffff817b99fe>] ? _raw_spin_unlock_irq+0xe/0x30
[ 1956.369343]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[ 1956.369343]  [<ffffffffa05a4840>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc]
[ 1956.369343]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[ 1956.369343]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[ 1956.369343]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
[ 1956.369343]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140

Here's a sample from sanity:

[13322.028331] BUG: unable to handle kernel paging request at ffff880243adce50
[13322.028331] IP: [<ffffffff810ba263>] up_read+0x13/0x30
[13322.028331] PGD 241b067 PUD 33edfb067 PMD 33eddd067 PTE 8000000243adc060
[13322.028331] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[13322.028331] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common pcspkr virtio_balloon virtio_console i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm drm_panel_orientation_quirks ata_piix i2c_core serio_raw virtio_blk libata floppy [last unloaded: libcfs]
[13322.028331] CPU: 8 PID: 17372 Comm: mdt04_001 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.6-debug #1
[13322.028331] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[13322.028331] task: ffff88027d992500 ti: ffff88024c3ec000 task.ti: ffff88024c3ec000
[13322.028331] RIP: 0010:[<ffffffff810ba263>]  [<ffffffff810ba263>] up_read+0x13/0x30
[13322.028331] RSP: 0018:ffff88024c3ef8d8  EFLAGS: 00010202
[13322.028331] RAX: ffff880243adce50 RBX: ffff88016832b640 RCX: 0000000000000000
[13322.028331] RDX: ffffffffffffffff RSI: 000000000000006b RDI: ffff880243adce50
[13322.028331] RBP: ffff88024c3ef8d8 R08: ffff88029bb13f98 R09: ffff880295b13e60
[13322.028331] R10: ffff8802d9c12000 R11: ffff8802d9c12228 R12: ffff8800196687a0
[13322.028331] R13: ffff88016832b640 R14: ffff88024c3ef948 R15: ffff8802d9c12230
[13322.028331] FS:  0000000000000000(0000) GS:ffff88033dc00000(0000) knlGS:0000000000000000
[13322.028331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13322.028331] CR2: ffff880243adce50 CR3: 000000006bb86000 CR4: 00000000000006e0
[13322.028331] Call Trace:
[13322.028331]  [<ffffffffa0c1bfb5>] osd_trunc_unlock_all+0x35/0x150 [osd_ldiskfs]
[13322.028331]  [<ffffffffa0c01da5>] osd_trans_stop+0x205/0x820 [osd_ldiskfs]
[13322.028331]  [<ffffffffa0b6c300>] ? ldiskfs_get_acl+0x400/0x410 [ldiskfs]
[13322.028331]  [<ffffffffa0626f23>] dt_trans_stop+0x13/0x30 [ptlrpc]
[13322.028331]  [<ffffffffa062a82d>] top_trans_stop+0x30d/0xa10 [ptlrpc]
[13322.028331]  [<ffffffffa0d99b3c>] lod_trans_stop+0x25c/0x340 [lod]
[13322.028331]  [<ffffffffa0938108>] mdd_trans_stop+0x28/0x16e [mdd]
[13322.028331]  [<ffffffffa092c4a6>] mdd_attr_set+0x5e6/0xcf0 [mdd]
[13322.028331]  [<ffffffffa05af032>] ? lustre_msg_get_versions+0x22/0xf0 [ptlrpc]
[13322.028331]  [<ffffffffa0cf1e44>] mdt_reint_setattr+0xad4/0x1510 [mdt]
[13322.028331]  [<ffffffffa0ce2c71>] ? mdt_root_squash+0x21/0x430 [mdt]
[13322.028331]  [<ffffffffa0ce25a2>] ? ucred_set_audit_enabled.isra.13+0x22/0x60 [mdt]
[13322.028331]  [<ffffffffa0cf5c80>] mdt_reint_rec+0x80/0x210 [mdt]
[13322.028331]  [<ffffffffa0cd2890>] mdt_reint_internal+0x790/0xb30 [mdt]
[13322.028331]  [<ffffffffa0cda9e7>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
[13322.028331]  [<ffffffffa0cdd9b7>] mdt_reint+0x67/0x140 [mdt]
[13322.028331]  [<ffffffffa06172a5>] tgt_request_handle+0x915/0x1610 [ptlrpc]
[13322.028331]  [<ffffffffa0214fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[13322.028331]  [<ffffffffa05bc3d9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
[13322.028331]  [<ffffffff810bfbd8>] ? __wake_up_common+0x58/0x90
[13322.028331]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[13322.028331]  [<ffffffffa05c03bc>] ptlrpc_main+0xb7c/0x22c0 [ptlrpc]
[13322.028331]  [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90
[13322.028331]  [<ffffffffa05bf840>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc]
[13322.028331]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[13322.028331]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[13322.028331]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
[13322.028331]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140


 Comments   
Comment by Gerrit Updater [ 09/Jan/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37170
Subject: LU-12003 osd: take reference to object in osd_trunc_lock()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7e10de74bb3db04d34913cb26c1917ab6f9a0d26

Comment by Gerrit Updater [ 20/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37170/
Subject: LU-12003 osd: take reference to object in osd_trunc_lock()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4fcb9081378f6ad0b7d3cf4105cf5fb2d506966f

Comment by Peter Jones [ 20/May/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 05/Nov/20 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/40547
Subject: LU-12003 osd: take reference to object in osd_trunc_lock()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 16e48ae7bbc41db6f9cfc053844bc4e6f1323cdf

Comment by Gerrit Updater [ 04/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40547/
Subject: LU-12003 osd: take reference to object in osd_trunc_lock()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: d5ac004ba18d290b8a8625c169bd57a8742b0a73

Generated at Sat Feb 10 02:48:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.