Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
3
-
9074
Description
This is the same issue as described in LU-3020, where EINTR is returned instead of ERESTARTSYS during writes. This issue is caught by the same reproducer as for LU-3020, but the cause is different.
As I did not hit this issue while testing the fix for LU-3020, I suspect this has been introduced by some subsequent patch. We are seeing this against 2.4 release branch.
This issue is easy to hit without debugging enabled, and very hard to hit with debugging enabled.
Here is the relevant portion of the trace logs:
—
00000008:00000001:4.0:1372452012.457494:0:13003:0:(osc_cache.c:2206:osc_queue_async_io()) Process entered
00000008:00000001:4.0:1372452012.457495:0:13003:0:(osc_cache.c:543:osc_extent_release()) Process entered
00000008:00000001:4.0:1372452012.457496:0:13003:0:(osc_cache.c:240:osc_extent_sanity_check0()) Process leaving via out (rc=0 : 0 : 0x0)
00000008:00000001:4.0:1372452012.457498:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
00000008:00000001:4.0:1372452012.457499:0:13003:0:(osc_cache.c:1662:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457500:0:13003:0:(osc_cache.c:1616:osc_makes_rpc()) Process entered
00000008:00000001:4.0:1372452012.457501:0:13003:0:(osc_cache.c:1652:osc_makes_rpc()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457502:0:13003:0:(osc_cache.c:575:osc_extent_release()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457503:0:13003:0:(osc_cache.c:1506:osc_enter_cache()) Process entered
00000100:00000001:1.0F:1372452012.457511:0:5940:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
00000008:00000001:4.0:1372452012.457512:0:13003:0:(osc_cache.c:1549:osc_enter_cache()) Process leaving via out (rc=18446744073709551612 : -4 : 0xfffffffffffffffc)
00000100:00000001:0.0F:1372452012.457512:0:5941:0:(ptlrpcd.c:293:ptlrpcd_check()) Process entered
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1486:ptlrpc_check_set()) Process entered
00000100:00000001:1.0:1372452012.457513:0:5940:0:(client.c:1561:ptlrpc_check_set()) Process leaving via interpret (rc=0 : 0 : 0x0)
00000008:00000001:4.0:1372452012.457514:0:13003:0:(osc_cache.c:1564:osc_enter_cache()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
00000100:00000001:0.0:1372452012.457514:0:5941:0:(ptlrpcd.c:395:ptlrpcd_check()) Process leaving (rc=0 : 0 : 0)
00000008:00000001:4.0:1372452012.457515:0:13003:0:(osc_cache.c:2352:osc_queue_async_io()) Process leaving (rc=18446744073709551612 : -4 : fffffffffffffffc)
—
This is hit during writes, specifically during ll_commit_write. I will be attaching the full log.
This is happening due to a signal arriving during the following l_wait_event call, in osc_enter_cache:
CDEBUG(D_CACHE, "%s: sleeping for cache space @ %p for %p\n",
cli->cl_import->imp_obd->obd_name, &ocw, oap);
rc = l_wait_event(ocw.ocw_waitq, ocw_granted(cli, &ocw), &lwi);
client_obd_list_lock(&cli->cl_loi_list_lock);
/* l_wait_event is interrupted by signal */
if (rc < 0)
—
I will attach full trace logs. Search for -4 in the log to find the EINTR.
The question is: Is it safe to return ERESTARTSYS here, instead of EINTR?
More generally, Lustre's default behavior in l_wait_event is to return EINTR. Should we consider changing this to ERESTARTSYS and making EINTR the exceptional case? (This may be a terrible idea - I'm just floating it out of curiositiy.)
Attachments
Issue Links
- is related to
-
LU-3020 Lustre returns EINTR during writes when SA_RESTART is set
- Resolved