[LU-14606] llog_changelog_cancel_cb returns ENOENT(-2) Created: 12/Apr/21  Updated: 20/Dec/22  Resolved: 05/May/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.9, Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Gantt End to Start
Related
is related to LU-14705 ASSERTION( llog_osd_exist(loghandle) ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Llog allows parallel processing records, during processing record could be canceled. For a changelog two threads could do processing and canceling records. And race could happen, when both processing the same record. So first will cancel it, and second will get ENOENT. Since this is a valid error, Lustre should hide it from a caller.

The next log show exact race, two threads (28074 and 11741) cancels record in the same time they processed 35285 record. So one thread canceled it and another got -2 (ENOENT).

00000004:00000001:5.0:1614693066.498334:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
00000040:00100000:5.0:1614693066.498336:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35284 in log [0x645e:0x1:0x0]
00000040:00001000:5.0:1614693066.498359:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]
00000004:00000001:5.0:1614693066.498365:0:28074:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:5.0:1614693066.498368:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
00000040:00100000:5.0:1614693066.498369:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35285 in log [0x645e:0x1:0x0]
00000004:00000001:3.0:1614693066.498383:0:11741:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
00000040:00100000:3.0:1614693066.498385:0:11741:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35285 in log [0x645e:0x1:0x0]
00000040:00001000:5.0:1614693066.498393:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]
00000004:00000001:5.0:1614693066.498398:0:28074:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:5.0:1614693066.498401:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
00000040:00100000:5.0:1614693066.498403:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35286 in log [0x645e:0x1:0x0]
00000004:00000001:3.0:1614693066.498422:0:11741:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
00000040:00080000:3.0:1614693066.498423:0:11741:0:(llog.c:699:llog_process_thread()) stop processing plain 0x645e:1:0 index 35285 count 28959
00000040:00001000:5.0:1614693066.498433:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]


 Comments   
Comment by Gerrit Updater [ 12/Apr/21 ]

Alexander Boyko (alexander.boyko@hpe.com) uploaded a new patch: https://review.whamcloud.com/43264
Subject: LU-14606 llog: hide ENOENT for cancelling record
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2daf9f62ba5214ca1a8851349ca33be16fcacb14

Comment by Gerrit Updater [ 05/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43264/
Subject: LU-14606 llog: hide ENOENT for cancelling record
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0b60647c0382426e3b4105d82d04862d2e4831cb

Comment by Peter Jones [ 05/May/21 ]

Landed for 2.15

Comment by Gerrit Updater [ 06/May/21 ]

Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43572
Subject: LU-14606 llog: hide ENOENT for cancelling record
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: c1909f74ba169ef8b5eacccab5de032da190f6d8

Comment by Gerrit Updater [ 30/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43572/
Subject: LU-14606 llog: hide ENOENT for cancelling record
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 73b9f32af9287c37f053ba6b072c5c1a329104d7

Generated at Sat Feb 10 03:11:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.