[LU-3581] Recurrence of LU-3020: Lustre returns EINTR during writes when SA_RESTART is set Created: 12/Jul/13  Updated: 23/Dec/13  Resolved: 03/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Major
Reporter: Patrick Farrell (Inactive) Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: mn4, yuc2

Attachments: File eintrlog.tail     File new_eintr_test.c     File test.sh    
Issue Links:
Related
is related to LU-3020 Lustre returns EINTR during writes wh... Resolved
Severity: 3
Rank (Obsolete): 9074

 Description   

This is the same issue as described in LU-3020, where EINTR is returned instead of ERESTARTSYS during writes. This issue is caught by the same reproducer as for LU-3020, but the cause is different.

As I did not hit this issue while testing the fix for LU-3020, I suspect this has been introduced by some subsequent patch. We are seeing this against 2.4 release branch.

This issue is easy to hit without debugging enabled, and very hard to hit with debugging enabled.

Here is the relevant portion of the trace logs:

00000008:00000001:4.0:1372452012.457494:0:13003:0:(osc_cache.c:2206:osc_queue_async_io()) Process entered
00000008:00000001:4.0:1372452012.457495:0:13003:0:(osc_cache.c:543:osc_extent_release()) Process entered
00000008:00000001:4.0:1372452012.457496:0:13003:0:(osc_cache.c:240:osc_extent_sanity_check0()) Process leaving via out (rc=0 : 0 : 0x0)
00000008:00000001:4.0:1372452012.457498:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
00000008:00000001:4.0:1372452012.457499:0:13003:0:(osc_cache.c:1662:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457500:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
00000008:00000001:4.0:1372452012.457501:0:13003:0:(osc_cache.c:1652:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457502:0:13003:0:(osc_cache.c:575:osc_extent_release()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457503:0:13003:0:(osc_cache.c:1506:osc_enter_cache()) Process entered
00000100:00000001:1.0F:1372452012.457511:0:5940:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
00000008:00000001:4.0:1372452012.457512:0:13003:0:(osc_cache.c:1549:osc_enter_cache()) Process leaving via out (rc=18446744073709551612 : -4 : 0xfffffffffffffffc)
00000100:00000001:0.0F:1372452012.457512:0:5941:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1486:ptlrpc_check_set()) Process entered
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1561:ptlrpc_check_set()) Process leaving via interpret (rc=0 : 0 : 0x0)
00000008:00000001:4.0:1372452012.457514:0:13003:0:(osc_cache.c:1564:osc_enter_cache()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
00000100:00000001:0.0:1372452012.457514:0:5941:0:(ptlrpcd.c:395:ptlrpcd_check()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457515:0:13003:0:(osc_cache.c:2352:osc_queue_async_io()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)

This is hit during writes, specifically during ll_commit_write. I will be attaching the full log.

This is happening due to a signal arriving during the following l_wait_event call, in osc_enter_cache:
CDEBUG(D_CACHE, "%s: sleeping for cache space @ %p for %p\n",
cli->cl_import->imp_obd->obd_name, &ocw, oap);

rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);

client_obd_list_lock(&cli->cl_loi_list_lock);

/* l_wait_event is interrupted by signal */
if (rc < 0)

{ cfs_list_del_init(&ocw.ocw_entry); GOTO(out, rc); }

I will attach full trace logs. Search for -4 in the log to find the EINTR.

The question is: Is it safe to return ERESTARTSYS here, instead of EINTR?

More generally, Lustre's default behavior in l_wait_event is to return EINTR. Should we consider changing this to ERESTARTSYS and making EINTR the exceptional case? (This may be a terrible idea - I'm just floating it out of curiositiy.)



 Comments   
Comment by Patrick Farrell (Inactive) [ 15/Jul/13 ]

Attached is the reproducer for this. This is a slightly improved version of the reproducer for LU-3020, which randomizes the timing of the operations slightly.

The shell script just helps to attempt to gather debug information.

With debug set to 0, the reproducer hits the problem most of the time. With debugging set to +trace, it is very difficult to hit and may need to be looped for an hour or more.

Comment by Peter Jones [ 15/Jul/13 ]

Niu

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 16/Jul/13 ]

Yes, seems we'd return -ERESTARTSYS in such case too (interrupted by signal in osc_enter_cache()). I think we should return -ERESTARTSYS only in read/write path but not in l_wait_event(). Patrick, would you post a patch for this? Thanks.

Comment by Patrick Farrell (Inactive) [ 16/Jul/13 ]

Niu,

I'll put the patch up shortly. You said in the read/write path, so would you like it changed in osc_enter_cache? Or somewhere else in the path?

This is what I was thinking:

 /* l_wait_event is interrupted by signal */
                if (rc < 0) {
                        /* Ensures restartability - LU-3581 */
                        if(rc == -EINTR)
                                rc = -ERESTARTSYS;
                        cfs_list_del_init(&ocw.ocw_entry);
                        GOTO(out, rc);
                }
Comment by Niu Yawei (Inactive) [ 16/Jul/13 ]

Patrick, yes, that's exactly what I meant.

Comment by Patrick Farrell (Inactive) [ 16/Jul/13 ]

I've put a patch with that in it up here:
http://review.whamcloud.com/7002

I'll be testing whether it fixes my reproducer soon and will report back.

Comment by Patrick Farrell (Inactive) [ 16/Jul/13 ]

Testing with the reproducer indicates this change resolves the issue.

Thank you, niu.

Comment by Patrick Farrell (Inactive) [ 24/Jul/13 ]

Niu,

I've added you as a reviewer; could you also re-start the testing process? The test failure doesn't appear to be related to the patch.

Thanks,
paf

Comment by Peter Jones [ 03/Sep/13 ]

Landed for 2.5

Generated at Sat Feb 10 01:35:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.