[LU-7853] Fixes bitfield in lod qos code Created: 07/Mar/16 Updated: 15/Jul/21 Resolved: 26/Feb/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Rahul Deshmukh (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Replaced bitfields viz. lq_dirty, lq_same_space, and lq_reset to named bits and use atomic functions. As those are more safe. |
| Comments |
| Comment by Gerrit Updater [ 07/Mar/16 ] |
|
Rahul Deshmukh (rahul.deshmukh@seagate.com) uploaded a new patch: http://review.whamcloud.com/18812 |
| Comment by Oleg Drokin [ 07/Mar/16 ] |
|
So were tyou hittign any real problems with the code that you feel you needed the atomic bit operations? |
| Comment by Rahul Deshmukh (Inactive) [ 08/Mar/16 ] |
|
Yes, problem was hit on older code i.e. version 2.5 and it was seen for lq_dirty. So as a fixed other bitfield with those. Actually lq_dirty is not present in latest code but still to be safer side ported this patch. |
| Comment by Andreas Dilger [ 16/Jan/17 ] |
|
I agree with Oleg that "more safe" doesn't provide justification for landing the patch. I can imagine that there is some problem with these bitfields in the code, though it seems lq_rw_sem is held when lq_dirty and lq_same_space are accessed and modified, except in lod_qos_priofree_seq_write() and lod_qos_thresholdrr_seq_write(). Since there is already a lock protecting all of these accesses, adding atomic bit operations is just overhead unless there is a reason to do so. Please provide more details about the original bug so there is justification for the patch to be landed. |
| Comment by Alexander Zarochentsev [ 22/Jan/19 ] |
|
Andreas, there was an old crash with "ASSERTION( lov->lov_qos.lq_statfs_in_progress )" failure: Oct 11 01:24:25 lus207n03 kernel: LustreError: 140684:0:(lov_qos.c:1136:qos_statfs_done()) ASSERTION( lov->lov_qos.lq_statfs_in_progress ) failed: Oct 11 01:24:25 lus207n03 kernel: LustreError: 140684:0:(lov_qos.c:1136:qos_statfs_done()) LBUG Oct 11 01:24:25 lus207n03 kernel: Pid: 140684, comm: ptlrpcd Oct 11 01:24:25 lus207n03 kernel: Oct 11 01:24:25 lus207n03 kernel: Call Trace: Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa042c825>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa042ce27>] lbug_with_loc+0x47/0xb0 [libcfs] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa0a56829>] qos_statfs_done+0x89/0x90 [lov] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa0a4a166>] cb_statfs_update+0x346/0x580 [lov] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa09cdcb6>] osc_statfs_interpret+0xd6/0x400 [osc] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa073171b>] ptlrpc_check_set+0x29b/0x1d50 [ptlrpc] Oct 11 01:24:25 lus207n03 kernel: [<ffffffff814eaf6a>] ? schedule_timeout+0x19a/0x2e0 Oct 11 01:24:25 lus207n03 kernel: [<ffffffff8107c180>] ? process_timeout+0x0/0x10 Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa07636b0>] ptlrpcd_check+0x1a0/0x230 [ptlrpc] Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa0763a43>] ptlrpcd+0x303/0x370 [ptlrpc] Oct 11 01:24:25 lus207n03 kernel: [<ffffffff8105f9f0>] ? default_wake_function+0x0/0x20 Oct 11 01:24:25 lus207n03 kernel: [<ffffffffa0763740>] ? ptlrpcd+0x0/0x370 [ptlrpc] Oct 11 01:24:25 lus207n03 kernel: [<ffffffff81090806>] kthread+0x96/0xa0 Oct 11 01:24:25 lus207n03 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20 Oct 11 01:24:25 lus207n03 kernel: [<ffffffff81090770>] ? kthread+0x0/0xa0 however the assertion doesn't exist in recent Lustre code. a wrong update of bitfield flags still possible due to an unprotected access through lod_qos_thresholdrr_seq_write() , though w/o a crash, just some updates to lq_reset/lq_same_space can be lost. |
| Comment by Andreas Dilger [ 22/Jan/19 ] |
|
It would be good to include such explanation into the commit message of the patch, so that it is clear why it should land. |
| Comment by Gerrit Updater [ 26/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/18812/ |
| Comment by Peter Jones [ 26/Feb/21 ] |
|
Landed for 2.15 |