Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-305

utime() fails with EINTR : not conform to POSIX standard

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 2.0.0, Lustre 2.1.0
    • Lustre 2.0.0
    • None
    • RHEL 6.0 GA, Lustre 2.0.0.1

    Description

      When uncompressing an archive in a Lustre filesystem on a client node, the tar command fails. The error comes from the failure of utime() system call with EINTR when the modification time of the extracted file is updated. However, EINTR is not mentionned as a possible error code for utime() in POSIX standard.

      The problem can be quite easily reproduced on the client node. But, it does not reproduce with lustre logs enabled (echo "-1" > /proc/sys/lnet/debug).

      $ pwd
      /scratch_lustre/xtmp

      $ tar xvfoz netcdf-3.6.1.tar.gz netcdf-3.6.1/src/win32/NET/examples/Form1.resX
      netcdf-3.6.1/src/win32/NET/examples/Form1.resX
      tar: netcdf-3.6.1/src/win32/NET/examples/Form1.resX: Cannot utime: Interrupted system call
      tar: Exiting with failure status due to previous errors

      Here is the output of 'strace' with the same command.
      $ strace -f tar xvfoz netcdf-3.6.1.tar.gz netcdf-3.6.1/src/win32/NET/examples/Form1.resX
      ...
      [pid 3086] open("netcdf-3.6.1/src/win32/NET/examples/Form1.resX", O_WRONLY|O_CREAT|O_EXCL, 0755) = -1 EEXIST (File exists)
      [pid 3086] unlink("netcdf-3.6.1/src/win32/NET/examples/Form1.resX") = 0
      [pid 3086] open("netcdf-3.6.1/src/win32/NET/examples/Form1.resX", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
      [pid 3086] write(4, "<?xml version=\"1.0\" encoding=\"ut"..., 4608) = 4608
      [pid 3086] read(3, "System.Resources.ResXResourceWri"..., 10240) = 10240
      [pid 3087] <... write resumed> ) = 32768
      [pid 3086] write(4, "System.Resources.ResXResourceWri"..., 2289 <unfinished ...>
      [pid 3087] write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096 <unfinished ...>
      [pid 3086] <... write resumed> ) = 2289
      [pid 3087] <... write resumed> ) = 4096
      [pid 3086] close(4 <unfinished ...>
      [pid 3087] close(0 <unfinished ...>
      [pid 3086] <... close resumed> ) = 0
      [pid 3086] utimensat(AT_FDCWD, "netcdf-3.6.1/src/win32/NET/examples/Form1.resX", 1303910452, 237591535}, {1085494499, 0, 0 <unfinished ...>
      [pid 3087] <... close resumed> ) = 0
      [pid 3087] close(1) = 0
      [pid 3087] close(2) = 0
      [pid 3087] exit_group(0) = ?
      Process 3087 detached
      <... utimensat resumed> ) = -1 EINTR (Interrupted system call)
      — SIGCHLD (Child exited) @ 0 (0) —
      ...

      The tar command forks a child process to perform the uncompression of the archive (gunzip) while the parent process creates the extracted files, writes data and restores initial file attribute (modification time).

      When the child process exits, the parent process receives a SIGCHLD signal. Note that the tar command sets the signal handler of SIGCHLD to SIG_DFL (which is 'Ignore'). The signal may lead to the interruption of the utime() implementation in Lustre.

      I have been able to reproduce a similar EINTR with a test-program on one of our test cluster, with the lustre logs enabled. The error occurs during a write system call (which is allowed in POSIX) and comes from the cl_lock_state_wait() routine in lustre/obdclass/cl_lock.c. This routine make the thread wait on a wait queue and when the thread wakes-up, the routine checks the thread pending signals: cfs_signal_pending().

      Is the cl_lock_state_wait() routine part of the utime() call path of the utime() system call on Lustre ?
      Are there other places where EINTR might be returned in this call path ?

      Maybe Lustre should avoid any interruptible wait during the utime() call path ?

      In attachment are

      • the test-program I wrote to reproduce the issue independently of tar,
      • the lctl debug_kernel log when the error reproduced in the write() system call

      Attachments

        Issue Links

          Activity

            [LU-305] utime() fails with EINTR : not conform to POSIX standard

            old ticket for unsupported version

            simmonsja James A Simmons added a comment - old ticket for unsupported version

            We still seem to be hitting this on:

            lustre: 2.1.1
            kernel: patchless_client
            build: RC4--PRISTINE-2.6.32-220.el6_lustre.g4554b65.x86_64

            The userspace code that is triggering the error is:

            if(utime(pathname, &times) == -1) {

            And as has been mentioned before, utime() is not supposed to be considered a blocking call and therefore allowed to return -EINTR.

            brian Brian Murrell (Inactive) added a comment - We still seem to be hitting this on: lustre: 2.1.1 kernel: patchless_client build: RC4--PRISTINE-2.6.32-220.el6_lustre.g4554b65.x86_64 The userspace code that is triggering the error is: if(utime(pathname, &times) == -1) { And as has been mentioned before, utime() is not supposed to be considered a blocking call and therefore allowed to return -EINTR.

            Patch landed in master.

            niu Niu Yawei (Inactive) added a comment - Patch landed in master.

            Hi,

            I have tested the patch set 4 with my test program (utime_sigchild) and it fixes the probem.

            Thanks.

            pichong Gregoire Pichon added a comment - Hi, I have tested the patch set 4 with my test program (utime_sigchild) and it fixes the probem. Thanks.

            Integrated in lustre-master » x86_64,server,el5,ofa #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/obdclass/cl_lock.c
            • libcfs/libcfs/winnt/winnt-prim.c
            • libcfs/include/libcfs/libcfs.h
            • lustre/obdclass/obd_mount.c
            • lustre/include/lustre_lib.h
            • lustre/include/linux/lustre_lib.h
            • lustre/include/darwin/lustre_lib.h
            • libcfs/libcfs/user-prim.c
            • libcfs/libcfs/linux/linux-prim.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,server,el5,ofa #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : libcfs/libcfs/darwin/darwin-prim.c lustre/obdclass/cl_lock.c libcfs/libcfs/winnt/winnt-prim.c libcfs/include/libcfs/libcfs.h lustre/obdclass/obd_mount.c lustre/include/lustre_lib.h lustre/include/linux/lustre_lib.h lustre/include/darwin/lustre_lib.h libcfs/libcfs/user-prim.c libcfs/libcfs/linux/linux-prim.c

            Integrated in lustre-master » i686,server,el5,ofa #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • lustre/include/linux/lustre_lib.h
            • libcfs/libcfs/linux/linux-prim.c
            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/include/darwin/lustre_lib.h
            • lustre/obdclass/cl_lock.c
            • libcfs/libcfs/user-prim.c
            • lustre/obdclass/obd_mount.c
            • lustre/include/lustre_lib.h
            • libcfs/libcfs/winnt/winnt-prim.c
            • libcfs/include/libcfs/libcfs.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,ofa #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : lustre/include/linux/lustre_lib.h libcfs/libcfs/linux/linux-prim.c libcfs/libcfs/darwin/darwin-prim.c lustre/include/darwin/lustre_lib.h lustre/obdclass/cl_lock.c libcfs/libcfs/user-prim.c lustre/obdclass/obd_mount.c lustre/include/lustre_lib.h libcfs/libcfs/winnt/winnt-prim.c libcfs/include/libcfs/libcfs.h

            Integrated in lustre-master » x86_64,client,el5,ofa #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • lustre/include/lustre_lib.h
            • libcfs/include/libcfs/libcfs.h
            • libcfs/libcfs/linux/linux-prim.c
            • lustre/obdclass/obd_mount.c
            • libcfs/libcfs/user-prim.c
            • libcfs/libcfs/winnt/winnt-prim.c
            • lustre/obdclass/cl_lock.c
            • lustre/include/darwin/lustre_lib.h
            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/include/linux/lustre_lib.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,el5,ofa #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : lustre/include/lustre_lib.h libcfs/include/libcfs/libcfs.h libcfs/libcfs/linux/linux-prim.c lustre/obdclass/obd_mount.c libcfs/libcfs/user-prim.c libcfs/libcfs/winnt/winnt-prim.c lustre/obdclass/cl_lock.c lustre/include/darwin/lustre_lib.h libcfs/libcfs/darwin/darwin-prim.c lustre/include/linux/lustre_lib.h

            Integrated in lustre-master » i686,server,el6,inkernel #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • libcfs/include/libcfs/libcfs.h
            • libcfs/libcfs/user-prim.c
            • lustre/obdclass/obd_mount.c
            • libcfs/libcfs/linux/linux-prim.c
            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/include/lustre_lib.h
            • lustre/include/darwin/lustre_lib.h
            • lustre/obdclass/cl_lock.c
            • libcfs/libcfs/winnt/winnt-prim.c
            • lustre/include/linux/lustre_lib.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el6,inkernel #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : libcfs/include/libcfs/libcfs.h libcfs/libcfs/user-prim.c lustre/obdclass/obd_mount.c libcfs/libcfs/linux/linux-prim.c libcfs/libcfs/darwin/darwin-prim.c lustre/include/lustre_lib.h lustre/include/darwin/lustre_lib.h lustre/obdclass/cl_lock.c libcfs/libcfs/winnt/winnt-prim.c lustre/include/linux/lustre_lib.h

            Integrated in lustre-master » i686,server,el5,inkernel #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • lustre/include/linux/lustre_lib.h
            • libcfs/libcfs/user-prim.c
            • libcfs/libcfs/winnt/winnt-prim.c
            • libcfs/libcfs/linux/linux-prim.c
            • libcfs/include/libcfs/libcfs.h
            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/obdclass/cl_lock.c
            • lustre/include/lustre_lib.h
            • lustre/obdclass/obd_mount.c
            • lustre/include/darwin/lustre_lib.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,inkernel #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : lustre/include/linux/lustre_lib.h libcfs/libcfs/user-prim.c libcfs/libcfs/winnt/winnt-prim.c libcfs/libcfs/linux/linux-prim.c libcfs/include/libcfs/libcfs.h libcfs/libcfs/darwin/darwin-prim.c lustre/obdclass/cl_lock.c lustre/include/lustre_lib.h lustre/obdclass/obd_mount.c lustre/include/darwin/lustre_lib.h

            Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #141
            LU-305 Temporarily block 'non-fatal' signals

            Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
            Files :

            • lustre/include/lustre_lib.h
            • libcfs/libcfs/winnt/winnt-prim.c
            • libcfs/libcfs/linux/linux-prim.c
            • lustre/obdclass/cl_lock.c
            • lustre/include/linux/lustre_lib.h
            • libcfs/libcfs/darwin/darwin-prim.c
            • lustre/obdclass/obd_mount.c
            • libcfs/include/libcfs/libcfs.h
            • libcfs/libcfs/user-prim.c
            • lustre/include/darwin/lustre_lib.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #141 LU-305 Temporarily block 'non-fatal' signals Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a Files : lustre/include/lustre_lib.h libcfs/libcfs/winnt/winnt-prim.c libcfs/libcfs/linux/linux-prim.c lustre/obdclass/cl_lock.c lustre/include/linux/lustre_lib.h libcfs/libcfs/darwin/darwin-prim.c lustre/obdclass/obd_mount.c libcfs/include/libcfs/libcfs.h libcfs/libcfs/user-prim.c lustre/include/darwin/lustre_lib.h

            People

              niu Niu Yawei (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: