Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3581

Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0
    • 3
    • 9074

    Description

      This is the same issue as described in LU-3020, where EINTR is returned instead of ERESTARTSYS during writes. This issue is caught by the same reproducer as for LU-3020, but the cause is different.

      As I did not hit this issue while testing the fix for LU-3020, I suspect this has been introduced by some subsequent patch. We are seeing this against 2.4 release branch.

      This issue is easy to hit without debugging enabled, and very hard to hit with debugging enabled.

      Here is the relevant portion of the trace logs:

      00000008:00000001:4.0:1372452012.457494:0:13003:0:(osc_cache.c:2206:osc_queue_async_io()) Process entered
      00000008:00000001:4.0:1372452012.457495:0:13003:0:(osc_cache.c:543:osc_extent_release()) Process entered
      00000008:00000001:4.0:1372452012.457496:0:13003:0:(osc_cache.c:240:osc_extent_sanity_check0()) Process leaving via out (rc=0 : 0 : 0x0)
      00000008:00000001:4.0:1372452012.457498:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
      00000008:00000001:4.0:1372452012.457499:0:13003:0:(osc_cache.c:1662:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457500:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
      00000008:00000001:4.0:1372452012.457501:0:13003:0:(osc_cache.c:1652:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457502:0:13003:0:(osc_cache.c:575:osc_extent_release()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457503:0:13003:0:(osc_cache.c:1506:osc_enter_cache()) Process entered
      00000100:00000001:1.0F:1372452012.457511:0:5940:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
      00000008:00000001:4.0:1372452012.457512:0:13003:0:(osc_cache.c:1549:osc_enter_cache()) Process leaving via out (rc=18446744073709551612 : -4 : 0xfffffffffffffffc)
      00000100:00000001:0.0F:1372452012.457512:0:5941:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
      00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1486:ptlrpc_check_set()) Process entered
      00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1561:ptlrpc_check_set()) Process leaving via interpret (rc=0 : 0 : 0x0)
      00000008:00000001:4.0:1372452012.457514:0:13003:0:(osc_cache.c:1564:osc_enter_cache()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
      00000100:00000001:0.0:1372452012.457514:0:5941:0:(ptlrpcd.c:395:ptlrpcd_check()) Process leaving (rc=0 : 0 : 0)
      00000008:00000001:4.0:1372452012.457515:0:13003:0:(osc_cache.c:2352:osc_queue_async_io()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)

      This is hit during writes, specifically during ll_commit_write. I will be attaching the full log.

      This is happening due to a signal arriving during the following l_wait_event call, in osc_enter_cache:
      CDEBUG(D_CACHE, "%s: sleeping for cache space @ %p for %p\n",
      cli->cl_import->imp_obd->obd_name, &ocw, oap);

      rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);

      client_obd_list_lock(&cli->cl_loi_list_lock);

      /* l_wait_event is interrupted by signal */
      if (rc < 0)

      { cfs_list_del_init(&ocw.ocw_entry); GOTO(out, rc); }

      I will attach full trace logs. Search for -4 in the log to find the EINTR.

      The question is: Is it safe to return ERESTARTSYS here, instead of EINTR?

      More generally, Lustre's default behavior in l_wait_event is to return EINTR. Should we consider changing this to ERESTARTSYS and making EINTR the exceptional case? (This may be a terrible idea - I'm just floating it out of curiositiy.)

      Attachments

        1. eintrlog.tail
          1.10 MB
        2. new_eintr_test.c
          4 kB
        3. test.sh
          0.4 kB

        Issue Links

          Activity

            [LU-3581] Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set
            pjones Peter Jones added a comment -

            Landed for 2.5

            pjones Peter Jones added a comment - Landed for 2.5

            Niu,

            I've added you as a reviewer; could you also re-start the testing process? The test failure doesn't appear to be related to the patch.

            Thanks,
            paf

            paf Patrick Farrell (Inactive) added a comment - Niu, I've added you as a reviewer; could you also re-start the testing process? The test failure doesn't appear to be related to the patch. Thanks, paf

            Testing with the reproducer indicates this change resolves the issue.

            Thank you, niu.

            paf Patrick Farrell (Inactive) added a comment - Testing with the reproducer indicates this change resolves the issue. Thank you, niu.

            I've put a patch with that in it up here:
            http://review.whamcloud.com/7002

            I'll be testing whether it fixes my reproducer soon and will report back.

            paf Patrick Farrell (Inactive) added a comment - - edited I've put a patch with that in it up here: http://review.whamcloud.com/7002 I'll be testing whether it fixes my reproducer soon and will report back.

            Patrick, yes, that's exactly what I meant.

            niu Niu Yawei (Inactive) added a comment - Patrick, yes, that's exactly what I meant.

            Niu,

            I'll put the patch up shortly. You said in the read/write path, so would you like it changed in osc_enter_cache? Or somewhere else in the path?

            This is what I was thinking:

             /* l_wait_event is interrupted by signal */
                            if (rc < 0) {
                                    /* Ensures restartability - LU-3581 */
                                    if(rc == -EINTR)
                                            rc = -ERESTARTSYS;
                                    cfs_list_del_init(&ocw.ocw_entry);
                                    GOTO(out, rc);
                            }
            
            paf Patrick Farrell (Inactive) added a comment - Niu, I'll put the patch up shortly. You said in the read/write path, so would you like it changed in osc_enter_cache? Or somewhere else in the path? This is what I was thinking: /* l_wait_event is interrupted by signal */ if (rc < 0) { /* Ensures restartability - LU-3581 */ if (rc == -EINTR) rc = -ERESTARTSYS; cfs_list_del_init(&ocw.ocw_entry); GOTO(out, rc); }

            Yes, seems we'd return -ERESTARTSYS in such case too (interrupted by signal in osc_enter_cache()). I think we should return -ERESTARTSYS only in read/write path but not in l_wait_event(). Patrick, would you post a patch for this? Thanks.

            niu Niu Yawei (Inactive) added a comment - Yes, seems we'd return -ERESTARTSYS in such case too (interrupted by signal in osc_enter_cache()). I think we should return -ERESTARTSYS only in read/write path but not in l_wait_event(). Patrick, would you post a patch for this? Thanks.
            pjones Peter Jones added a comment -

            Niu

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please look into this one? Thanks Peter

            Attached is the reproducer for this. This is a slightly improved version of the reproducer for LU-3020, which randomizes the timing of the operations slightly.

            The shell script just helps to attempt to gather debug information.

            With debug set to 0, the reproducer hits the problem most of the time. With debugging set to +trace, it is very difficult to hit and may need to be looped for an hour or more.

            paf Patrick Farrell (Inactive) added a comment - Attached is the reproducer for this. This is a slightly improved version of the reproducer for LU-3020 , which randomizes the timing of the operations slightly. The shell script just helps to attempt to gather debug information. With debug set to 0, the reproducer hits the problem most of the time. With debugging set to +trace, it is very difficult to hit and may need to be looped for an hour or more.

            People

              pjones Peter Jones
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: