[LU-3020] Lustre returns EINTR during writes when SA_RESTART is set Created: 22/Mar/13 Updated: 14/Aug/16 Resolved: 23/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 7344 | ||||||||||||||||
| Description |
|
Lustre is returning EINTR when it should probably be returning ERESTARTSYS. The POSIX spec specifies that write system calls must react as follows to signals: When SA_RESTART is set, the write call should retry until it completes, and should not return EINTR. As I understand it, this is usually handled by returning ERESTARTSYS instead of EINTR - When this is returned, the kernel checks for SA_RESTART on the handler, if any. If SA_RESTART is set, ERESTARTSYS is left alone and the system call is restarted. If SA_RESTART is not set, ERESTARTSYS is changed to EINTR. (See handle_signal in /arch/x86/kernel/signal.c) When using the attached reproducer, we consistently see -EINTR returned from cl_lock_state_wait in lustre/obdclass/cl_lock.c. (In brief, the reproducer spawns a child, then starts a series of looped writes in the parent while the child sends SIGALRMS to the parent. Once a write fails with EINTR, the parent and child print out and exit.) Turning on full tracing will let you see this return value (-4) coming from cl_lock_state_wait in the logs. (I can attach logs if requested.) Moving on to implications and possible fixes:
However, a quick search through the Lustre code shows Lustre returns EINTR far more often than ERESTARTSYS. This is presumably because Lustre is more complex and therefore less restartable than most file systems. I've made a patch I'll be submitting shortly which changes this specific instance of EINTR to ERESTARTSYS. Some simple testing shows it fixes the problem described above and does not appear to cause other issues. I'll add a link to that patch here once it's submitted. One other note. This problem occurs at exactly the same location during read system calls. The same patch appears to resolve it. Two questions: 2) The reproducer is a contrived test case, but the actual application that revealed this issue sends SIGALRM periodically during its normal operation. We're concerned about the general reaction of Lustre to SIGALRM when a handler is set and whether or not we'll fix this spot, but see the problem return from another location. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 22/Mar/13 ] |
| Comment by Patrick Farrell (Inactive) [ 25/Mar/13 ] |
|
Updates from the patch review: This is my review comment. I've pushed a patch with the new code below as of a few minutes ago. Here's the current (modified) code: result = -ERESTARTSYS; if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) { cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE); if (!cfs_signal_pending()) result = 0; } — I'm wondering if this might not be better, since we only want to return ERESTARTSYS when a signal is involved.. (Since ERESTARTSYS should never be returned to user space, and is processed by the signal handling code in the kernel [whether or not a user handler is set]): result = -EINTR; if (likely(!OBD_FAIL_CHECK(OBD_FAIL_LOCK_STATE_WAIT_INTR))) { cfs_waitq_wait(&waiter, CFS_TASK_INTERRUPTIBLE); if (!cfs_signal_pending()) { result = 0; } else { /* Returning ERESTARTSYS if a signal is found so * system calls can be restarted if the signal handler * calls for it. */ result = -ERESTARTSYS; } } — |
| Comment by Cory Spitz [ 19/Apr/13 ] |
|
Can this land now? Or must we wait until after b2_4 is branched? |
| Comment by Peter Jones [ 23/Apr/13 ] |
|
Landed for 2.4 |