Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3581

Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0
    • 3
    • 9074

    Description

      This is the same issue as described in LU-3020, where EINTR is returned instead of ERESTARTSYS during writes. This issue is caught by the same reproducer as for LU-3020, but the cause is different.

      As I did not hit this issue while testing the fix for LU-3020, I suspect this has been introduced by some subsequent patch. We are seeing this against 2.4 release branch.

      This issue is easy to hit without debugging enabled, and very hard to hit with debugging enabled.

      Here is the relevant portion of the trace logs:

      00000008:00000001:4.0:1372452012.457494:0:13003:0:(osc_cache.c:2206:osc_queue_async_io()) Process entered
      00000008:00000001:4.0:1372452012.457495:0:13003:0:(osc_cache.c:543:osc_extent_release()) Process entered
      00000008:00000001:4.0:1372452012.457496:0:13003:0:(osc_cache.c:240:osc_extent_sanity_check0()) Process leaving via out (rc=0 : 0 : 0x0)
      00000008:00000001:4.0:1372452012.457498:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
      00000008:00000001:4.0:1372452012.457499:0:13003:0:(osc_cache.c:1662:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457500:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
      00000008:00000001:4.0:1372452012.457501:0:13003:0:(osc_cache.c:1652:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457502:0:13003:0:(osc_cache.c:575:osc_extent_release()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457503:0:13003:0:(osc_cache.c:1506:osc_enter_cache()) Process entered
      00000100:00000001:1.0F:1372452012.457511:0:5940:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
      00000008:00000001:4.0:1372452012.457512:0:13003:0:(osc_cache.c:1549:osc_enter_cache()) Process leaving via out (rc=18446744073709551612 : -4 : 0xfffffffffffffffc)
      00000100:00000001:0.0F:1372452012.457512:0:5941:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
      00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1486:ptlrpc_check_set()) Process entered
      00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1561:ptlrpc_check_set()) Process leaving via interpret (rc=0 : 0 : 0x0)
      00000008:00000001:4.0:1372452012.457514:0:13003:0:(osc_cache.c:1564:osc_enter_cache()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      00000100:00000001:0.0:1372452012.457514:0:5941:0:(ptlrpcd.c:395:ptlrpcd_check()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457515:0:13003:0:(osc_cache.c:2352:osc_queue_async_io()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)

      This is hit during writes, specifically during ll_commit_write. I will be attaching the full log.

      This is happening due to a signal arriving during the following l_wait_event call, in osc_enter_cache:
      CDEBUG(D_CACHE, "%s: sleeping for cache space @ %p for %p\n",
      cli->cl_import->imp_obd->obd_name, &ocw, oap);

      rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);

      client_obd_list_lock(&cli->cl_loi_list_lock);

      /* l_wait_event is interrupted by signal */
      if (rc < 0)

      { cfs_list_del_init(&ocw.ocw_entry); GOTO(out, rc); }

      I will attach full trace logs. Search for -4 in the log to find the EINTR.

      The question is: Is it safe to return ERESTARTSYS here, instead of EINTR?

      More generally, Lustre's default behavior in l_wait_event is to return EINTR. Should we consider changing this to ERESTARTSYS and making EINTR the exceptional case? (This may be a terrible idea - I'm just floating it out of curiositiy.)

      Attachments

        1. eintrlog.tail
          1.10 MB
        2. new_eintr_test.c
          4 kB
        3. test.sh
          0.4 kB

        Issue Links

          Activity

            People

              pjones Peter Jones
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: