Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14606

llog_changelog_cancel_cb returns ENOENT(-2)

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Llog allows parallel processing records, during processing record could be canceled. For a changelog two threads could do processing and canceling records. And race could happen, when both processing the same record. So first will cancel it, and second will get ENOENT. Since this is a valid error, Lustre should hide it from a caller.

      The next log show exact race, two threads (28074 and 11741) cancels record in the same time they processed 35285 record. So one thread canceled it and another got -2 (ENOENT).

      00000004:00000001:5.0:1614693066.498334:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
      00000040:00100000:5.0:1614693066.498336:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35284 in log [0x645e:0x1:0x0]
      00000040:00001000:5.0:1614693066.498359:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]
      00000004:00000001:5.0:1614693066.498365:0:28074:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:5.0:1614693066.498368:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
      00000040:00100000:5.0:1614693066.498369:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35285 in log [0x645e:0x1:0x0]
      00000004:00000001:3.0:1614693066.498383:0:11741:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
      00000040:00100000:3.0:1614693066.498385:0:11741:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35285 in log [0x645e:0x1:0x0]
      00000040:00001000:5.0:1614693066.498393:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]
      00000004:00000001:5.0:1614693066.498398:0:28074:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=0 : 0 : 0)
      00000004:00000001:5.0:1614693066.498401:0:28074:0:(mdd_device.c:312:llog_changelog_cancel_cb()) Process entered
      00000040:00100000:5.0:1614693066.498403:0:28074:0:(llog.c:220:llog_cancel_arr_rec()) Canceling 1 records, first 35286 in log [0x645e:0x1:0x0]
      00000004:00000001:3.0:1614693066.498422:0:11741:0:(mdd_device.c:348:llog_changelog_cancel_cb()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
      00000040:00080000:3.0:1614693066.498423:0:11741:0:(llog.c:699:llog_process_thread()) stop processing plain 0x645e:1:0 index 35285 count 28959
      00000040:00001000:5.0:1614693066.498433:0:28074:0:(llog_osd.c:401:llog_osd_write_rec()) new record 10645539 to [0x1:0x645e:0x0]
      

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: