[LU-17089] Bug in the barrier code could cause barrier freeze to fail everytime Created: 05/Sep/23 Updated: 28/Sep/23 Resolved: 28/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Tim Day | Assignee: | Tim Day |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The barrier code has a bug that could cause the freeze to fail everytime. barrier freeze would be called before trying a FS backup, but would repeatedly fail due to an issue in the mdd_trans_create() function. The barrier entry increments the global counter barrier_writer, but it does not get decremented if mdd_child_ops() returns error. If the barrier_writer counter does not go to 0, the freeze cannot happen. |
| Comments |
| Comment by Gerrit Updater [ 05/Sep/23 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52275 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52275/ |
| Comment by Peter Jones [ 28/Sep/23 ] |
|
Landed for 2.16 |