Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
1
-
9223372036854775807
Description
In current master 2.8.56, a newly created file system failed at mounting OST due to nodemap log error.
From the log message:
00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:1052:ptlrpc_set_destroy()) Process leaving 00000100:00000001:3.0:1470938259.243549:0:9524:0:(client.c:2896:ptlrpc_queue_wait()) Process leaving (rc=0 : 0 : 0) 10000000:00000001:3.0:1470938259.243551:0:9524:0:(mgc_request.c:1716:mgc_process_recover_nodemap_log()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
it looks like the corresponding log has a zero size that triggered this error.
if (ealen == 0) { /* no logs transferred */ #ifdef HAVE_SERVER_SUPPORT /* config changed since first read RPC */ if (cld_is_nodemap(cld) && config_read_offset == 0) { recent_nodemap = NULL; nodemap_config_dealloc(new_config); new_config = NULL; CDEBUG(D_INFO, "nodemap config changed in transit, retrying\n"); /* setting eof to false, we request config again */ eof = false; GOTO(out, rc = 0); } #endif if (!eof) rc = -EINVAL; GOTO(out, rc); }
We have a debug log and will attach it soon.
Attachments
Issue Links
- is related to
-
LU-3291 IU UID/GID Mapping Feature
-
- Resolved
-
Hi John,
Thanks for the logs. I took a quick look, but there's nothing obvious. The MDS says it's sending over a 1MB config RPC, so I'm not sure why the MGC thinks it's not getting anything. I'll take a closer look tomorrow.
Can you confirm you are just running straight master, no patches? FWIW line 1716 doesn't correspond to a GOTO statement on the tip of master for me (hash 6fad3ab).
You could try changing the return code from -EINVAL to 0 on that eof check as a workaround. It shouldn't cause any problems to receive a 0 length RPC if you aren't using nodemap, but it also shouldn't happen as far as I understand it. Here's the eof check I mean:
if (!eof) rc = 0;
What does your test setup look like? Is there any way to reproduce the failure in maloo?
When you say "since this feature landed I have been unable to use master" can you clarify which feature you mean? There have been a number of patches related to nodemap config transfer that have landed in the past couple of months. If you could specify the last version (commit hash) that worked, and the first version that didn't, that would be helpful.
Thanks,
Kit