[LU-305] utime() fails with EINTR : not conform to POSIX standard Created: 11/May/11 Updated: 16/Aug/16 Resolved: 16/Aug/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | Lustre 2.0.0, Lustre 2.1.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Gregoire Pichon | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 6.0 GA, Lustre 2.0.0.1 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Epic: | EINTR, SIGCHLD, signal, tar, utime | ||||||||
| Rank (Obsolete): | 5016 | ||||||||
| Description |
|
When uncompressing an archive in a Lustre filesystem on a client node, the tar command fails. The error comes from the failure of utime() system call with EINTR when the modification time of the extracted file is updated. However, EINTR is not mentionned as a possible error code for utime() in POSIX standard. The problem can be quite easily reproduced on the client node. But, it does not reproduce with lustre logs enabled (echo "-1" > /proc/sys/lnet/debug). $ pwd $ tar xvfoz netcdf-3.6.1.tar.gz netcdf-3.6.1/src/win32/NET/examples/Form1.resX Here is the output of 'strace' with the same command. The tar command forks a child process to perform the uncompression of the archive (gunzip) while the parent process creates the extracted files, writes data and restores initial file attribute (modification time). When the child process exits, the parent process receives a SIGCHLD signal. Note that the tar command sets the signal handler of SIGCHLD to SIG_DFL (which is 'Ignore'). The signal may lead to the interruption of the utime() implementation in Lustre. I have been able to reproduce a similar EINTR with a test-program on one of our test cluster, with the lustre logs enabled. The error occurs during a write system call (which is allowed in POSIX) and comes from the cl_lock_state_wait() routine in lustre/obdclass/cl_lock.c. This routine make the thread wait on a wait queue and when the thread wakes-up, the routine checks the thread pending signals: cfs_signal_pending(). Is the cl_lock_state_wait() routine part of the utime() call path of the utime() system call on Lustre ? Maybe Lustre should avoid any interruptible wait during the utime() call path ? In attachment are
|
| Comments |
| Comment by Peter Jones [ 11/May/11 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 11/May/11 ] |
|
It looks reasonable to me that cl_lock_state_wait() return -EINTR when there is signal pending. As Gregoire mentioned, the sig action of SIGCHLD has been explicitly set as SIG_DFL (in fact, the default sig action of SIGCHLD is SIG_DFL already), then I think SIGCHLD should not be delivered to the tar process, and lustre should not interrupted by this signal, unless the tar process is traced. Hi, Gregoire BTW: since utime() calls ll_setattr_ost() to set mtime on OSTs, the cl_lock_state_wait() could be called on the code path of utime() either. |
| Comment by Gregoire Pichon [ 12/May/11 ] |
|
Thanks for looking. The default handler of SIGCHLD is Ignore. This does not mean the signal is blocked, it means there is no routine handler to execute when the signal is delivered to the process. To reproduce with utime_sigchild, I first executed the test-program with strace and several values of the size parameter to see what size value makes the SIGCHLD signal be delivered as close as possible to the utime() call. |
| Comment by Niu Yawei (Inactive) [ 12/May/11 ] |
|
Hi, Gregoire It looks to me that signal whose default handler is ignored will not be delivered, instead, it will be dropped. See prepare_signal() -> sig_ignored() (kernel 2.6.32-71.18.1.el6), that's why I think the SIGCHLD should not affect lustre in your case. What's the size did you usually choose when you reproduce it? And how long it will usually take? Thanks. |
| Comment by Gregoire Pichon [ 19/May/11 ] |
|
Hi, To reproduce the issue, I tune the program values: size (number of write() calls), NLOOPS (number of utime() calls), and the child sleep duration. It depends on your cluster configuration: network speed, stripe size, etc... I currently reproduce on a cluster made of 1 server VM and 1 client VM, with tcp network, using size=1, NLOOPS=10, usleep=40ms. I have investigated the problem using system tap and it appears the SIGCHLD is not ignored. Look at sig_ignored(), the signal is not ignored when it is blocked (see the comment). At the time prepare_signal() is called, the parent process has its signals all blocked. [root@c2]# while ./utime_sigchild /fs1/file1 1; do : ; done [root@c2]# stap -g eintr.stp Returning from: 0xffffffffa13a0260 : cl_lock_state_wait+0x0/0x260 [obdclass] |
| Comment by Niu Yawei (Inactive) [ 19/May/11 ] |
|
Thank you, Gregoire, that explains why the SIGCHLD is delivered. I just don't quite see why all signals were blocked, look at the l_wait_event(), we do call l_w_e_set_sigs(0) to block all signals before go to sleep, I suspect that the SIGCHLD was generated at that time, so it's delivered. So far, I don't quite understand why we have to block all signals in l_wait_event(), maybe it's just for preventing SIGSTOP from waking up the process over and over (bz 977)? But I think bocking only SIGSTOP is enough for that case. Anyway, no matter why the SIGCHLD is delivered (user program might block all signals too), I think we have to deal with the signal in cl_lock_state_wait(), maybe we should go back to sleep when the pending signal is not critical, such as SIGCHLD. Hi, Oleg, Xiong, what's your opinion? Thanks. |
| Comment by Jinshan Xiong (Inactive) [ 20/May/11 ] |
|
Niu: l_wait_even() is uninterpretable so it should be woken up by signals. Let's return -ERESTARTSYS instead of -EINTR in cl_lock_state_wait(). Hi Gregoire, diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c index de487f6..3b69a53 100644 --- a/lustre/obdclass/cl_lock.c +++ b/lustre/obdclass/cl_lock.c @@ -969,7 +969,7 @@ int cl_lock_state_wait(const struct lu_env *env, struct cl_lock *lock) cl_lock_mutex_get(env, lock); cfs_set_current_state(CFS_TASK_RUNNING); cfs_waitq_del(&lock->cll_wq, &waiter); - result = cfs_signal_pending() ? -EINTR : 0; + result = cfs_signal_pending() ? -ERESTARTSYS : 0; } RETURN(result); } |
| Comment by Niu Yawei (Inactive) [ 23/May/11 ] |
|
Hi, Xiong Sometimes user application want to be interruptted (get -EINTR), so we alaways return -ERESTARTSYS to restart syscall here will break such applications, further more, sys_write() should return written bytes but not -ERESTARTSYS when it's interrupted by SIGHLD. I think we can just temporarily block the non-LUSTRE_FATAL_SIGS sginals in the cl_lock_state_wait() (like l_wait_event does), then we will not be interrupted by the 'unimportant' signals anymore, though this way does change the user application behaviour on signal handling a little bit. Will post a patch for review soon. Thank you. |
| Comment by Niu Yawei (Inactive) [ 23/May/11 ] |
|
Hi, Gregoire Could you try the patch at http://review.whamcloud.com/591 ? Thank you. |
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Build Master (Inactive) [ 27/May/11 ] |
|
Integrated in Oleg Drokin : cdb698a1a036870b6c9d8e51f69809c558d4823a
|
| Comment by Gregoire Pichon [ 30/May/11 ] |
|
Hi, I have tested the patch set 4 with my test program (utime_sigchild) and it fixes the probem. Thanks. |
| Comment by Niu Yawei (Inactive) [ 30/May/11 ] |
|
Patch landed in master. |
| Comment by Brian Murrell (Inactive) [ 05/Jun/12 ] |
|
We still seem to be hitting this on: lustre: 2.1.1 The userspace code that is triggering the error is: if(utime(pathname, ×) == -1) { And as has been mentioned before, utime() is not supposed to be considered a blocking call and therefore allowed to return -EINTR. |
| Comment by James A Simmons [ 16/Aug/16 ] |
|
old ticket for unsupported version |