[LU-15114] ASSERTION( atomic_read(&d->opd_sync_changes) > 0 Created: 15/Oct/21 Updated: 20/Feb/22 Resolved: 30/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Zarochentsev | Assignee: | Alexander Zarochentsev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Customer sees repeatable MDS crashes with ASSERTION( atomic_read(&d->opd_sync_changes) > 0 . From the vmcore it is seen sync_changes indeed overflowed and turned into a negative number triggering the assertion failure. crash> osp_device ffff9a6f5ff4e000
struct osp_device {
opd_dt_dev = {
dd_lu_dev = {
ld_ref = {
counter = 371136
},
ld_type = 0xffffffffc14d1620 <osp_device_type>,
ld_ops = 0xffffffffc14c74c0 <osp_lu_ops>,
ld_site = 0xffff9a6fac302138,
...
crash> osp_device.opd_sync_changes ffff9a6f5ff4e000
opd_sync_changes = {
counter = -2147477073
}
crash>
The whole OSP-sync system of adding unlink/setattr llog records to per OST llog files (two-tier , llog catalog + plain llogs) has no mechanism to prevent growing of llog catalogs/files and eventually overflow sync_updates counter . The counter is a signed integer, so exceeding 2bln turns it into a negative number. The llog catalog + llog files also has a limited capacity to store llog records (approximately 64k * 64k is the max). On a slow system, I can reproduce an unlimited grow of sync_changes by running a simple program changing uid of an open file in a endless loop: [root@cslmo2302 ~]# while sleep 10; do lctl get_param osp.*.sync_changes ; done lctl set_param osp.*.max_rpcs_in_progress=4096 |
| Comments |
| Comment by Gerrit Updater [ 15/Oct/21 ] |
|
"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45265 |
| Comment by Gerrit Updater [ 30/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45265/ |
| Comment by Peter Jones [ 30/Nov/21 ] |
|
Landed for 2.15 |