[LU-13388] racer crash: general protection fault: 0000 RIP: lod_obj_for_each_stripe+0xf7/0x2d0 [lod] Created: 25/Mar/20  Updated: 24/Jan/22  Resolved: 01/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Improvement Priority: Minor
Reporter: Vitaly Fertman Assignee: Vitaly Fertman
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   
[1646646.422771] RIP: 0010:[<ffffffffc139b847>]  [<ffffffffc139b847>] lod_obj_for_each_stripe+0xf7/0x2d0 [lod]
[1646646.435082] RSP: 0018:ffff880f9abe3a10  EFLAGS: 00010206
[1646646.442587] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000001 RCX: 0000000000000003
[1646646.452116] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8807b7128308
[1646646.461625] RBP: ffff880f9abe3a50 R08: 0000000000000000 R09: 0000000000000000
[1646646.471118] R10: 000000000000000e R11: ffff880ff2b35800 R12: 0000000000000000
[1646646.480600] R13: ffff88077e4d8a18 R14: ffff880f9abd8800 R15: ffff880f9abe3a68
[1646646.490067] FS:  0000000000000000(0000) GS:ffff88101d980000(0000) knlGS:0000000000000000
[1646646.500607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1646646.508522] CR2: 0000000000838024 CR3: 0000000001a0a000 CR4: 00000000000607e0
[1646646.517983] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1646646.527407] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1646646.536812] Call Trace:
[1646646.540971]  [<ffffffffc139d93f>] lod_declare_attr_set+0x3af/0x690 [lod]
[1646646.549874]  [<ffffffffc138c520>] ? lod_gen_component_id+0x1b0/0x1b0 [lod]
[1646646.558963]  [<ffffffffc1420d2b>] mdd_attr_set+0x70b/0xda0 [mdd]
[1646646.567096]  [<ffffffffc0e2e532>] ? lustre_msg_get_versions+0x22/0xf0 [ptlrpc]
[1646646.576549]  [<ffffffffc1286000>] ? mdt_fail_write+0x180/0x190 [mdt]
[1646646.585006]  [<ffffffffc128e9fa>] mdt_attr_set+0x1ea/0x770 [mdt]
[1646646.593098]  [<ffffffffc0eba9a3>] ? tgt_fmd_put+0x23/0x130 [ptlrpc]
[1646646.601430]  [<ffffffffc128f879>] mdt_reint_setattr+0x679/0xe90 [mdt]
[1646646.609937]  [<ffffffffc127f302>] ? ucred_set_audit_enabled.isra.14+0x22/0x60 [mdt]
[1646646.619786]  [<ffffffffc1293ba3>] mdt_reint_rec+0x83/0x210 [mdt]
[1646646.627770]  [<ffffffffc126f3d3>] mdt_reint_internal+0x703/0xae0 [mdt]
[1646646.636319]  [<ffffffffc127a637>] mdt_reint+0x67/0x140 [mdt]
[1646646.643904]  [<ffffffffc0e9beaa>] tgt_request_handle+0x97a/0x15d0 [ptlrpc]
[1646646.652814]  [<ffffffffc09d3fd7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[1646646.661438]  [<ffffffffc0e3d996>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[1646646.671219]  [<ffffffff810c0bd4>] ? __wake_up+0x44/0x50
[1646646.678282]  [<ffffffffc0e4156c>] ptlrpc_main+0xbcc/0x1510 [ptlrpc]
[1646646.686498]  [<ffffffffc0e409a0>] ? ptlrpc_register_service+0x1010/0x1010 [ptlrpc]
[1646646.696123]  [<ffffffff810b4031>] kthread+0xd1/0xe0
[1646646.702722]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[1646646.710671]  [<ffffffff816c455d>] ret_from_fork+0x5d/0xb0
[1646646.717822]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[1646646.725727] Code: 0f 85 60 01 00 00 49 83 7f 08 00 74 57 66 41 83 7e 24 00 0f 84 b7 01 00 00 31 db 66 0f 1f 84 00 00 00 00 00 49 8b 46 68 48 63 d3 <48> 8b 14 d0 48 85 d2 74 22 4c 89 3c 24 41 89 d9 49 8b 47 08 45 
[1646646.749510] RIP  [<ffffffffc139b847>] lod_obj_for_each_stripe+0xf7/0x2d0 [lod]


 Comments   
Comment by Vitaly Fertman [ 25/Mar/20 ]

1 thread is doing setattr, and takes the update ibit lock (optionally +lookup and perm bits)
another thread is doing layout_change and takes the layout ibit lock.
the layout change may lead to changing the component amount in the object with SEL, and therefore the 1st thread becomes unprotected.
previous operations like component add/del (mdt_reint_setxattr) take update ibit lock, so had no problems.
the layout_change need to take an update ibit lock as well where applicable

Comment by Gerrit Updater [ 25/Mar/20 ]

Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/38069
Subject: LU-13388 lod: unprotected access to component entries
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4045b4490c753659ad779ed772a9807b3d201291

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38069/
Subject: LU-13388 lod: unprotected access to component entries
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6e2c8ef7267694125d87cfc8232dc36034de8b01

Comment by Peter Jones [ 01/May/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 24/Jan/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46280
Subject: LU-13388 lod: unprotected access to component entries
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 8941a00c81daf7a2436ec2ecb802fb3ad7463876

Generated at Sat Feb 10 03:00:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.