[LU-305] utime() fails with EINTR : not conform to POSIX standard Created: 11/May/11  Updated: 16/Aug/16  Resolved: 16/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.0.0, Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Gregoire Pichon Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

RHEL 6.0 GA, Lustre 2.0.0.1


Attachments: File utime_sigchild.c     Text File utime_sigchild.dk.log    
Issue Links:
Related
is related to LU-2440 git repositories get corrupted Resolved
Severity: 3
Epic: EINTR, SIGCHLD, signal, tar, utime
Rank (Obsolete): 5016

 Description   

When uncompressing an archive in a Lustre filesystem on a client node, the tar command fails. The error comes from the failure of utime() system call with EINTR when the modification time of the extracted file is updated. However, EINTR is not mentionned as a possible error code for utime() in POSIX standard.

The problem can be quite easily reproduced on the client node. But, it does not reproduce with lustre logs enabled (echo "-1" > /proc/sys/lnet/debug).

$ pwd
/scratch_lustre/xtmp

$ tar xvfoz netcdf-3.6.1.tar.gz netcdf-3.6.1/src/win32/NET/examples/Form1.resX
netcdf-3.6.1/src/win32/NET/examples/Form1.resX
tar: netcdf-3.6.1/src/win32/NET/examples/Form1.resX: Cannot utime: Interrupted system call
tar: Exiting with failure status due to previous errors

Here is the output of 'strace' with the same command.
$ strace -f tar xvfoz netcdf-3.6.1.tar.gz netcdf-3.6.1/src/win32/NET/examples/Form1.resX
...
[pid 3086] open("netcdf-3.6.1/src/win32/NET/examples/Form1.resX", O_WRONLY|O_CREAT|O_EXCL, 0755) = -1 EEXIST (File exists)
[pid 3086] unlink("netcdf-3.6.1/src/win32/NET/examples/Form1.resX") = 0
[pid 3086] open("netcdf-3.6.1/src/win32/NET/examples/Form1.resX", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
[pid 3086] write(4, "<?xml version=\"1.0\" encoding=\"ut"..., 4608) = 4608
[pid 3086] read(3, "System.Resources.ResXResourceWri"..., 10240) = 10240
[pid 3087] <... write resumed> ) = 32768
[pid 3086] write(4, "System.Resources.ResXResourceWri"..., 2289 <unfinished ...>
[pid 3087] write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096 <unfinished ...>
[pid 3086] <... write resumed> ) = 2289
[pid 3087] <... write resumed> ) = 4096
[pid 3086] close(4 <unfinished ...>
[pid 3087] close(0 <unfinished ...>
[pid 3086] <... close resumed> ) = 0
[pid 3086] utimensat(AT_FDCWD, "netcdf-3.6.1/src/win32/NET/examples/Form1.resX", 1303910452, 237591535}, {1085494499, 0, 0 <unfinished ...>
[pid 3087] <... close resumed> ) = 0
[pid 3087] close(1) = 0
[pid 3087] close(2) = 0
[pid 3087] exit_group(0) = ?
Process 3087 detached
<... utimensat resumed> ) = -1 EINTR (Interrupted system call)
— SIGCHLD (Child exited) @ 0 (0) —
...

The tar command forks a child process to perform the uncompression of the archive (gunzip) while the parent process creates the extracted files, writes data and restores initial file attribute (modification time).

When the child process exits, the parent process receives a SIGCHLD signal. Note that the tar command sets the signal handler of SIGCHLD to SIG_DFL (which is 'Ignore'). The signal may lead to the interruption of the utime() implementation in Lustre.

I have been able to reproduce a similar EINTR with a test-program on one of our test cluster, with the lustre logs enabled. The error occurs during a write system call (which is allowed in POSIX) and comes from the cl_lock_state_wait() routine in lustre/obdclass/cl_lock.c. This routine make the thread wait on a wait queue and when the thread wakes-up, the routine checks the thread pending signals: cfs_signal_pending().

Is the cl_lock_state_wait() routine part of the utime() call path of the utime() system call on Lustre ?
Are there other places where EINTR might be returned in this call path ?

Maybe Lustre should avoid any interruptible wait during the utime() call path ?

In attachment are

  • the test-program I wrote to reproduce the issue independently of tar,
  • the lctl debug_kernel log when the error reproduced in the write() system call


 Comments   
Comment by Peter Jones [ 11/May/11 ]

Niu

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 11/May/11 ]

It looks reasonable to me that cl_lock_state_wait() return -EINTR when there is signal pending. As Gregoire mentioned, the sig action of SIGCHLD has been explicitly set as SIG_DFL (in fact, the default sig action of SIGCHLD is SIG_DFL already), then I think SIGCHLD should not be delivered to the tar process, and lustre should not interrupted by this signal, unless the tar process is traced.

Hi, Gregoire
How did you reproduce it with utime_sigchild? I tried several times on my system, but failed to reproduce. Is there any special signal configuration on your system?

BTW: since utime() calls ll_setattr_ost() to set mtime on OSTs, the cl_lock_state_wait() could be called on the code path of utime() either.

Comment by Gregoire Pichon [ 12/May/11 ]

Thanks for looking.

The default handler of SIGCHLD is Ignore. This does not mean the signal is blocked, it means there is no routine handler to execute when the signal is delivered to the process.

To reproduce with utime_sigchild, I first executed the test-program with strace and several values of the size parameter to see what size value makes the SIGCHLD signal be delivered as close as possible to the utime() call.
Then I loop on utime_sigchild calls with this size value until it hopefully return an error.

Comment by Niu Yawei (Inactive) [ 12/May/11 ]

Hi, Gregoire

It looks to me that signal whose default handler is ignored will not be delivered, instead, it will be dropped. See prepare_signal() -> sig_ignored() (kernel 2.6.32-71.18.1.el6), that's why I think the SIGCHLD should not affect lustre in your case. What's the size did you usually choose when you reproduce it? And how long it will usually take? Thanks.

Comment by Gregoire Pichon [ 19/May/11 ]

Hi,

To reproduce the issue, I tune the program values: size (number of write() calls), NLOOPS (number of utime() calls), and the child sleep duration. It depends on your cluster configuration: network speed, stripe size, etc... I currently reproduce on a cluster made of 1 server VM and 1 client VM, with tcp network, using size=1, NLOOPS=10, usleep=40ms.

I have investigated the problem using system tap and it appears the SIGCHLD is not ignored. Look at sig_ignored(), the signal is not ignored when it is blocked (see the comment). At the time prepare_signal() is called, the parent process has its signals all blocked.

[root@c2]# while ./utime_sigchild /fs1/file1 1; do : ; done
utime failed: Interrupted system call. parent pid 14071, child pid 14072

[root@c2]# stap -g eintr.stp
14072:prepare_signal() SIGCHLD not ignored:
blocked: ffffffffffffffff <-- All signals are blocked
real_blocked: 0000000000000000
SIGCHLD handler: 0 <-- SIG_DFL

Returning from: 0xffffffffa13a0260 : cl_lock_state_wait+0x0/0x260 [obdclass]
Returning to : 0xffffffffa13a24d5 : cl_enqueue_locked+0x85/0x1e0 [obdclass]
0xffffffffa13a30ce : cl_lock_request+0x7e/0x1f0 [obdclass] (inexact)
0xffffffffa13a844a : cl_io_lock+0x3ba/0x5f0 [obdclass] (inexact)
0xffffffffa13a501d : cl_io_iter_init+0x5d/0x140 [obdclass] (inexact)
0xffffffffa13a877a : cl_io_loop+0xfa/0x1d0 [obdclass] (inexact)
0xffffffffa13a72e4 : cl_io_init+0x64/0xc0 [obdclass] (inexact)
0xffffffff81176fc0 : mntput_no_expire+0x30/0x110 [kernel] (inexact)
0xffffffff81173fd8 : notify_change+0x168/0x340 [kernel] (inexact)
0xffffffff81164bb5 : putname+0x35/0x50 [kernel] (inexact)
0xffffffff8118730c : utimes_common+0xdc/0x1b0 [kernel] (inexact)
0xffffffff811874be : do_utimes+0xde/0xf0 [kernel] (inexact)
0xffffffff81187653 : sys_utime+0x23/0x90 [kernel] (inexact)
0xffffffff8100c172 : system_call_fastpath+0x16/0x1b [kernel] (inexact)
14071:cl_lock_state_wait() EINTR signal info:
pending: 0000000000000000
shared_pending: 0000000000010000 <-- SIGCHLD is pending
blocked: 0000000000000000
real_blocked: 0000000000000000
TIF_SIGPENDING: 1 (entry 1) <-- TIF_SIGPENDING is set at exit of cl_lock_state_wait() and was also at routine entry.

Comment by Niu Yawei (Inactive) [ 19/May/11 ]

Thank you, Gregoire, that explains why the SIGCHLD is delivered.

I just don't quite see why all signals were blocked, look at the l_wait_event(), we do call l_w_e_set_sigs(0) to block all signals before go to sleep, I suspect that the SIGCHLD was generated at that time, so it's delivered.

So far, I don't quite understand why we have to block all signals in l_wait_event(), maybe it's just for preventing SIGSTOP from waking up the process over and over (bz 977)? But I think bocking only SIGSTOP is enough for that case.

Anyway, no matter why the SIGCHLD is delivered (user program might block all signals too), I think we have to deal with the signal in cl_lock_state_wait(), maybe we should go back to sleep when the pending signal is not critical, such as SIGCHLD.

Hi, Oleg, Xiong, what's your opinion? Thanks.

Comment by Jinshan Xiong (Inactive) [ 20/May/11 ]

Niu: l_wait_even() is uninterpretable so it should be woken up by signals.

Let's return -ERESTARTSYS instead of -EINTR in cl_lock_state_wait().

Hi Gregoire,
Can you please try this patch?

diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c
index de487f6..3b69a53 100644
--- a/lustre/obdclass/cl_lock.c
+++ b/lustre/obdclass/cl_lock.c
@@ -969,7 +969,7 @@ int cl_lock_state_wait(const struct lu_env *env, struct cl_lock *lock)
                 cl_lock_mutex_get(env, lock);
                 cfs_set_current_state(CFS_TASK_RUNNING);
                 cfs_waitq_del(&lock->cll_wq, &waiter);
-                result = cfs_signal_pending() ? -EINTR : 0;
+                result = cfs_signal_pending() ? -ERESTARTSYS : 0;
         }
         RETURN(result);
 }
Comment by Niu Yawei (Inactive) [ 23/May/11 ]

Hi, Xiong

Sometimes user application want to be interruptted (get -EINTR), so we alaways return -ERESTARTSYS to restart syscall here will break such applications, further more, sys_write() should return written bytes but not -ERESTARTSYS when it's interrupted by SIGHLD.

I think we can just temporarily block the non-LUSTRE_FATAL_SIGS sginals in the cl_lock_state_wait() (like l_wait_event does), then we will not be interrupted by the 'unimportant' signals anymore, though this way does change the user application behaviour on signal handling a little bit.

Will post a patch for review soon. Thank you.

Comment by Niu Yawei (Inactive) [ 23/May/11 ]

Hi, Gregoire

Could you try the patch at http://review.whamcloud.com/591 ? Thank you.

Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/obdclass/cl_lock.c
  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/linux/linux-prim.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/include/lustre_lib.h
  • lustre/obdclass/obd_mount.c
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/cl_lock.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • libcfs/libcfs/user-prim.c
  • lustre/include/darwin/lustre_lib.h
  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • libcfs/include/libcfs/libcfs.h
  • lustre/include/lustre_lib.h
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/obdclass/cl_lock.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • libcfs/include/libcfs/libcfs.h
  • lustre/include/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/cl_lock.c
  • lustre/include/linux/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • libcfs/libcfs/user-prim.c
  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/include/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/include/libcfs/libcfs.h
  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/obdclass/cl_lock.c
  • lustre/include/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/obd_mount.c
  • lustre/include/linux/lustre_lib.h
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/darwin/darwin-prim.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,client,el5,ofa #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/lustre_lib.h
  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/obd_mount.c
  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/darwin/darwin-prim.c
  • libcfs/libcfs/user-prim.c
  • lustre/include/darwin/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/darwin/darwin-prim.c
  • libcfs/include/libcfs/libcfs.h
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/include/lustre_lib.h
  • lustre/include/linux/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/include/linux/lustre_lib.h
  • lustre/obdclass/obd_mount.c
  • lustre/include/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/winnt/winnt-prim.c
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/lustre_lib.h
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/cl_lock.c
  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/user-prim.c
  • lustre/include/darwin/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/libcfs/linux/linux-prim.c
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/obdclass/cl_lock.c
  • lustre/include/lustre_lib.h
  • lustre/obdclass/obd_mount.c
  • lustre/include/darwin/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/user-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/linux/linux-prim.c
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/include/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/include/linux/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/lustre_lib.h
  • libcfs/include/libcfs/libcfs.h
  • libcfs/libcfs/linux/linux-prim.c
  • lustre/obdclass/obd_mount.c
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • lustre/obdclass/cl_lock.c
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/include/linux/lustre_lib.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » i686,server,el5,ofa #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • lustre/include/linux/lustre_lib.h
  • libcfs/libcfs/linux/linux-prim.c
  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/include/darwin/lustre_lib.h
  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/user-prim.c
  • lustre/obdclass/obd_mount.c
  • lustre/include/lustre_lib.h
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/include/libcfs/libcfs.h
Comment by Build Master (Inactive) [ 27/May/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #141
LU-305 Temporarily block 'non-fatal' signals

Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
Files :

  • libcfs/libcfs/darwin/darwin-prim.c
  • lustre/obdclass/cl_lock.c
  • libcfs/libcfs/winnt/winnt-prim.c
  • libcfs/include/libcfs/libcfs.h
  • lustre/obdclass/obd_mount.c
  • lustre/include/lustre_lib.h
  • lustre/include/linux/lustre_lib.h
  • lustre/include/darwin/lustre_lib.h
  • libcfs/libcfs/user-prim.c
  • libcfs/libcfs/linux/linux-prim.c
Comment by Gregoire Pichon [ 30/May/11 ]

Hi,

I have tested the patch set 4 with my test program (utime_sigchild) and it fixes the probem.

Thanks.

Comment by Niu Yawei (Inactive) [ 30/May/11 ]

Patch landed in master.

Comment by Brian Murrell (Inactive) [ 05/Jun/12 ]

We still seem to be hitting this on:

lustre: 2.1.1
kernel: patchless_client
build: RC4--PRISTINE-2.6.32-220.el6_lustre.g4554b65.x86_64

The userspace code that is triggering the error is:

if(utime(pathname, &times) == -1) {

And as has been mentioned before, utime() is not supposed to be considered a blocking call and therefore allowed to return -EINTR.

Comment by James A Simmons [ 16/Aug/16 ]

old ticket for unsupported version

Generated at Sat Feb 10 01:05:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.