Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3020

Lustre returns EINTR during writes when SA_RESTART is set

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0
    • 3
    • 7344

    Description

      Lustre is returning EINTR when it should probably be returning ERESTARTSYS.

      The POSIX spec specifies that write system calls must react as follows to signals:
      If a signal arrives before any data is written, the return is -1 and ERRNO is -EINTR.
      If a signal arrives when some data has been written, the return is the amount of data written.

      When SA_RESTART is set, the write call should retry until it completes, and should not return EINTR.

      As I understand it, this is usually handled by returning ERESTARTSYS instead of EINTR - When this is returned, the kernel checks for SA_RESTART on the handler, if any. If SA_RESTART is set, ERESTARTSYS is left alone and the system call is restarted. If SA_RESTART is not set, ERESTARTSYS is changed to EINTR. (See handle_signal in /arch/x86/kernel/signal.c)

      When using the attached reproducer, we consistently see -EINTR returned from cl_lock_state_wait in lustre/obdclass/cl_lock.c. (In brief, the reproducer spawns a child, then starts a series of looped writes in the parent while the child sends SIGALRMS to the parent. Once a write fails with EINTR, the parent and child print out and exit.)

      Turning on full tracing will let you see this return value (-4) coming from cl_lock_state_wait in the logs. (I can attach logs if requested.)

      Moving on to implications and possible fixes:
      The norm in most file system code appears to be to return ERESTARTSYS, except where a system call specifically cannot be restarted. This snippet is from /fs/aio.c in the kernel:

      /*

      • There's no easy way to restart the syscall since other AIO's
      • may be already running. Just fail this IO with EINTR.
        */
        if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
        ret == -ERESTARTNOHAND || ret == -ERESTART_RESTARTBLOCK))
        ret = -EINTR;

      However, a quick search through the Lustre code shows Lustre returns EINTR far more often than ERESTARTSYS. This is presumably because Lustre is more complex and therefore less restartable than most file systems.

      I've made a patch I'll be submitting shortly which changes this specific instance of EINTR to ERESTARTSYS. Some simple testing shows it fixes the problem described above and does not appear to cause other issues.

      I'll add a link to that patch here once it's submitted.

      One other note. This problem occurs at exactly the same location during read system calls. The same patch appears to resolve it.

      Two questions:
      1) Is it most likely OK to return ERESTARTSYS at this particular location (cl_lock_state_wait)? IE, can sys calls be safely restarted from here?

      2) The reproducer is a contrived test case, but the actual application that revealed this issue sends SIGALRM periodically during its normal operation. We're concerned about the general reaction of Lustre to SIGALRM when a handler is set and whether or not we'll fix this spot, but see the problem return from another location.

      Attachments

        Issue Links

          Activity

            [LU-3020] Lustre returns EINTR during writes when SA_RESTART is set
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-8494 [ LU-8494 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-3581 [ LU-3581 ]
            jlevi Jodi Levi (Inactive) made changes -
            Link New: This issue is related to LU-3433 [ LU-3433 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.4.0 [ 10154 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            spitzcor Cory Spitz added a comment -

            Can this land now? Or must we wait until after b2_4 is branched?

            spitzcor Cory Spitz added a comment - Can this land now? Or must we wait until after b2_4 is branched?
            keith Keith Mannthey (Inactive) made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Keith Mannthey [ keith ]
            utopiabound Nathaniel Clark made changes -
            Affects Version/s New: Lustre 2.4.0 [ 10154 ]
            pjones Peter Jones made changes -
            Labels New: patch

            Updates from the patch review:
            Failed test54c. I'm not sure if that's related to my patch or not, but I noticed a problem with the patch in any case.

            This is my review comment. I've pushed a patch with the new code below as of a few minutes ago.

            Upon further reflection, I'm wondering about something in the block of code I modified: If the if statement is not entered, then ERESTARTSYS will be returned.

            Here's the current (modified) code:

            result = -ERESTARTSYS;
            if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) {
                cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE);
                if (!cfs_signal_pending())
                    result = 0;
            }
            

            I'm wondering if this might not be better, since we only want to return ERESTARTSYS when a signal is involved.. (Since ERESTARTSYS should never be returned to user space, and is processed by the signal handling code in the kernel [whether or not a user handler is set]):

            result = -EINTR;
            if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) {
                cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE);
                if (!cfs_signal_pending()) {
                    result = 0;
                }
                else {
                    /* Returning ERESTARTSYS if a signal is found so
                    * system calls can be restarted if the signal handler
                    * calls for it. */
                    result = -ERESTARTSYS;
                }
            }
            

            paf Patrick Farrell (Inactive) added a comment - - edited Updates from the patch review: Failed test54c. I'm not sure if that's related to my patch or not, but I noticed a problem with the patch in any case. This is my review comment. I've pushed a patch with the new code below as of a few minutes ago. — Upon further reflection, I'm wondering about something in the block of code I modified: If the if statement is not entered, then ERESTARTSYS will be returned. Here's the current (modified) code: — result = -ERESTARTSYS; if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) { cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE); if (!cfs_signal_pending()) result = 0; } — I'm wondering if this might not be better, since we only want to return ERESTARTSYS when a signal is involved.. (Since ERESTARTSYS should never be returned to user space, and is processed by the signal handling code in the kernel [whether or not a user handler is set] ): — result = -EINTR; if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) { cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE); if (!cfs_signal_pending()) { result = 0; } else { /* Returning ERESTARTSYS if a signal is found so * system calls can be restarted if the signal handler * calls for it. */ result = -ERESTARTSYS; } } —

            People

              keith Keith Mannthey (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: