Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3020

Lustre returns EINTR during writes when SA_RESTART is set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0
    • 3
    • 7344

    Description

      Lustre is returning EINTR when it should probably be returning ERESTARTSYS.

      The POSIX spec specifies that write system calls must react as follows to signals:
      If a signal arrives before any data is written, the return is -1 and ERRNO is -EINTR.
      If a signal arrives when some data has been written, the return is the amount of data written.

      When SA_RESTART is set, the write call should retry until it completes, and should not return EINTR.

      As I understand it, this is usually handled by returning ERESTARTSYS instead of EINTR - When this is returned, the kernel checks for SA_RESTART on the handler, if any. If SA_RESTART is set, ERESTARTSYS is left alone and the system call is restarted. If SA_RESTART is not set, ERESTARTSYS is changed to EINTR. (See handle_signal in /arch/x86/kernel/signal.c)

      When using the attached reproducer, we consistently see -EINTR returned from cl_lock_state_wait in lustre/obdclass/cl_lock.c. (In brief, the reproducer spawns a child, then starts a series of looped writes in the parent while the child sends SIGALRMS to the parent. Once a write fails with EINTR, the parent and child print out and exit.)

      Turning on full tracing will let you see this return value (-4) coming from cl_lock_state_wait in the logs. (I can attach logs if requested.)

      Moving on to implications and possible fixes:
      The norm in most file system code appears to be to return ERESTARTSYS, except where a system call specifically cannot be restarted. This snippet is from /fs/aio.c in the kernel:

      /*

      • There's no easy way to restart the syscall since other AIO's
      • may be already running. Just fail this IO with EINTR.
        */
        if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
        ret == -ERESTARTNOHAND || ret == -ERESTART_RESTARTBLOCK))
        ret = -EINTR;

      However, a quick search through the Lustre code shows Lustre returns EINTR far more often than ERESTARTSYS. This is presumably because Lustre is more complex and therefore less restartable than most file systems.

      I've made a patch I'll be submitting shortly which changes this specific instance of EINTR to ERESTARTSYS. Some simple testing shows it fixes the problem described above and does not appear to cause other issues.

      I'll add a link to that patch here once it's submitted.

      One other note. This problem occurs at exactly the same location during read system calls. The same patch appears to resolve it.

      Two questions:
      1) Is it most likely OK to return ERESTARTSYS at this particular location (cl_lock_state_wait)? IE, can sys calls be safely restarted from here?

      2) The reproducer is a contrived test case, but the actual application that revealed this issue sends SIGALRM periodically during its normal operation. We're concerned about the general reaction of Lustre to SIGALRM when a handler is set and whether or not we'll fix this spot, but see the problem return from another location.

      Attachments

        Issue Links

          Activity

            People

              keith Keith Mannthey (Inactive)
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: