[LU-12491] large-scale: crash in lu_env_remove() RIP: memcmp+0x9/0x50 Created: 29/Jun/19  Updated: 26/Jul/19  Resolved: 12/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File c7-08-dmesg.txt     Text File c7-10-dmesg.txt     Text File c8-01-dmesg.txt    
Issue Links:
Duplicate
duplicates LU-12497 sanity-lfsck test_35: crash in lu_env... Resolved
Related
is related to LU-12034 env allocation in ptlrpc_set_wait() c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0f90891c-9a35-11e9-b26a-52540065bddc

[20823.002620] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
[20823.428725] general protection fault: 0000 [#1] SMP 
[20823.429775] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix 8139too
[20823.443544]  libata virtio_pci virtio_ring virtio 8139cp mii [last unloaded: dm_flakey]
[20823.444878] CPU: 1 PID: 19396 Comm: mdt_out00_003 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.21.3.el7_lustre.x86_64 #1
[20823.446885] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[20823.447816] task: ffff966f9f1d6180 ti: ffff966f89258000 task.ti: ffff966f89258000
[20823.449023] RIP: 0010:[<ffffffffad380299>]  [<ffffffffad380299>] memcmp+0x9/0x50
[20823.450254] RSP: 0018:ffff966f8925bd98  EFLAGS: 00010202
[20823.451115] RAX: 00000000ffffffe0 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000008
[20823.452265] RDX: 0000000000000008 RSI: ffff966f8925bdb0 RDI: 5a5a5a5a5a5a5a52
[20823.453414] RBP: ffff966f8925bd98 R08: ffffffffadd589e0 R09: 0000000000000000
[20823.454568] R10: ffff966fbfb80f20 R11: ffffffffffffffec R12: ffff966fb7845000
[20823.455711] R13: fffffffffffffff8 R14: 5a5a5a5a5a5a5a52 R15: 0000000000000000
[20823.456859] FS:  0000000000000000(0000) GS:ffff966fbfd00000(0000) knlGS:0000000000000000
[20823.458155] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20823.459081] CR2: 00007fd5f82ab630 CR3: 000000005c520000 CR4: 00000000000606e0
[20823.460224] Call Trace:
[20823.460790]  [<ffffffffc0a4eee7>] lu_env_remove+0x147/0x380 [obdclass]
[20823.462141]  [<ffffffffc0d3cb10>] ptlrpc_main+0x5f0/0x1560 [ptlrpc]
[20823.463174]  [<ffffffffad0d09f0>] ? finish_task_switch+0x50/0x1c0
[20823.464199]  [<ffffffffc0d3c520>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
[20823.465426]  [<ffffffffad0c1da1>] kthread+0xd1/0xe0
[20823.466226]  [<ffffffffad0c1cd0>] ? insert_kthread_work+0x40/0x40
[20823.467216]  [<ffffffffad775c37>] ret_from_fork_nospec_begin+0x21/0x21
[20823.468271]  [<ffffffffad0c1cd0>] ? insert_kthread_work+0x40/0x40
[20823.469259] Code: 66 90 55 31 c9 48 85 d2 48 89 f8 48 89 e5 74 0f 66 90 48 89 34 c8 48 83 c1 01 48 39 d1 75 f3 5d c3 90 55 48 85 d2 48 89 e5 74 3c <0f> b6 07 0f b6 0e 29 c8 75 27 48 83 ea 01 31 c9 eb 1a 0f 1f 44 
[20823.474428] RIP  [<ffffffffad380299>] memcmp+0x9/0x50
[20823.475292]  RSP <ffff966f8925bd98>


 Comments   
Comment by Alexey Lyashkov [ 04/Jul/19 ]

regression introduced by
LU-12034 aka https://review.whamcloud.com/#/c/34566/.

I think we should don't make release with regressions a specially with panic.

Comment by Peter Jones [ 04/Jul/19 ]

Alex

Does this seem related to your recent change?

Peter

Comment by Alex Zhuravlev [ 04/Jul/19 ]

Peter, yes, it is, my bad, I've submitted a patch already.

Comment by Peter Jones [ 04/Jul/19 ]

ah I see - https://review.whamcloud.com/#/c/35038/ . I wonder why JIRA did not have a comment about that...

Comment by Gerrit Updater [ 08/Jul/19 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35447
Subject: LU-12491 obdclass: add comment for rcu handling in lu_env_remove
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4cb6fbab52bd7ce0b34098c8e7dddf4f0778ebd6

Comment by Alex Zhuravlev [ 11/Jul/19 ]

Shaun, please clarify did you hit this with the latest or previous version of the patch?
thanks for testing!

Comment by Gerrit Updater [ 12/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35038/
Subject: LU-12491 obdclass: use RCU to release lu_env_item
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 87306c22e4b977356f4857d5f750447639d89c26

Comment by Gerrit Updater [ 12/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35447/
Subject: LU-12491 obdclass: add comment for rcu handling in lu_env_remove
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 709fbe6ee54aa2e601237a6981db3d42a8a719cd

Comment by Peter Jones [ 12/Jul/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 12/Jul/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35487
Subject: LU-12491 obdclass: use RCU to release lu_env_item
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a0363369b2c25ef0146a40c13e7aeaafe2b060e3

Comment by Gerrit Updater [ 12/Jul/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35488
Subject: LU-12491 obdclass: add comment for rcu handling in lu_env_remove
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: c2b43c331e747e97e235e63f4f619ff24284972f

Comment by Gerrit Updater [ 26/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35487/
Subject: LU-12491 obdclass: use RCU to release lu_env_item
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: cf2b060ccad73359151867ad419e36f746c9f241

Comment by Gerrit Updater [ 26/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35488/
Subject: LU-12491 obdclass: add comment for rcu handling in lu_env_remove
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: c579e6f6c28d0480bd4a815d9ad9134d3c913343

Generated at Sat Feb 10 02:53:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.