[LU-3302] ll_fill_super() Unable to process log: -2 Created: 09/May/13 Updated: 17/May/13 Resolved: 10/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Ned Bass | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Environment: |
PPC client |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8173 | ||||||||
| Description |
|
We updated a client to 2.3.64-4chaos and tried to mount a 2.3.63-6chaos server. The mount fails with LustreError: 15c-8: MGC172.20.20.201@o2ib500: The configuration from log 'fsv-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 14351:0:(llite_lib.c:1043:ll_fill_super()) Unable to process log: -2 Lustre: Unmounted fsv-client LustreError: 14351:0:(obd_mount.c:1265:lustre_fill_super()) Unable to mount (-2) Using git bisect I found the mount failure was introduced with this patch: http://review.whamcloud.com/#change,5820 LU-2684 fid: unify ostid and FID The critical questions at this point are:
LLNL-bug-id: TOSS-2060 |
| Comments |
| Comment by Ned Bass [ 09/May/13 ] |
|
As an editorial comment, while we understand that interoperability issues are inevitable in a pre-release branch, we wish such changes would be advertised more prominently. Clear statements about compatibility between tags would really help us plan our update process. At a minimum, patches that introduce incompatibilities should say so clearly in the commit message. |
| Comment by Ned Bass [ 09/May/13 ] |
|
Di, can you advise us on this? Thanks |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
Ned, can you please attach a -1 debug log from the 2.3.64 client, and ideally also from the MGS. I agree that the |
| Comment by Ned Bass [ 09/May/13 ] |
|
Andreas, yes I'll grab the logs. Note the above error was a 2.3.64 client talking to a 2.3.63 server. Do you mean that patch 6044 fixed LLOG handling on the client, or is it needed on the server as well? |
| Comment by Ned Bass [ 09/May/13 ] |
|
Attaching -1 debug logs for client and MDS. Note these were not captured from the same mount attempt. The NID of the client is 172.20.16.10@o2ib500. |
| Comment by Ned Bass [ 09/May/13 ] |
|
I did notice the mgs got ENOENT handling opcodes LLOG_ORIGIN_HANDLE_CREATE and LLOG_ORIGIN_HANDLE_READ_HEADER: 20000000:01000000:6.0:1368040430.786604:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 501: rc = -2 req@ffff881019ecb050 x1434494006460492/t0(0) o501->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 296/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1 ... 20000000:01000000:6.0:1368040430.788063:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 503: rc = -2 req@ffff881019f14850 x1434494006460504/t0(0) o503->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 272/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1 |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
The 6044/ |
| Comment by Peter Jones [ 09/May/13 ] |
|
Di Could you please comment on this? Thanks Peter |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
Ned, is this a PPC client? It would be useful to include this information in the "Environment" section when filing a bug. |
| Comment by Ned Bass [ 09/May/13 ] |
|
Yes it is. Sorry for the omission. |
| Comment by Andreas Dilger [ 09/May/13 ] |
|
John, it was mentioned to me that you have already found some endian issues with the FID-on-OST code? Could you please point out where they are, it might be that this is the source of the problem being seen here, since we didn't see any problems with our x86_64 clients for interoperability. |
| Comment by John Hammond [ 09/May/13 ] |
|
Possibly. Please see Ned, it would be interesting to know what happens when you create a new 2.3.65 FS on ppc, unmount, and then remount it. |
| Comment by Ned Bass [ 09/May/13 ] |
|
John, okay, we're getting a test environment set up where I should be able to do that test. |
| Comment by Ned Bass [ 09/May/13 ] |
|
Haven't tried 2.3.65 yet, but initial testing suggests updating the server to 2.3.64 lets the mount succeed. Here's what I did: 1. Tried to mount a 2.3.62 server from a 2.3.64 PPC client. Fails with "ll_fill_super() Unable to process log: -2" |
| Comment by Di Wang [ 09/May/13 ] |
|
Ned, I just checked the debug log, it seems client get correct log ID after swab, Here are the client log 00000040:00000001:5.0:1368040600.989913:5152:8187:0:(llog_swab.c:86:lustre_swab_llogd_body()) Process entered 00000040:00001000:5.0:1368040600.989914:5328:8187:0:(llog_swab.c:53:print_llogd_body()) llogd body: c000000f50e9a100 00000040:00001000:5.0:1368040600.989915:5328:8187:0:(llog_swab.c:55:print_llogd_body()) lgd_logid.lgl_oi: 0x6400000000000000:16777216 00000040:00001000:5.0:1368040600.989915:5328:8187:0:(llog_swab.c:56:print_llogd_body()) lgd_logid.lgl_ogen: 0x0 00000040:00001000:5.0:1368040600.989916:5328:8187:0:(llog_swab.c:57:print_llogd_body()) lgd_ctxt_idx: 0x0 00000040:00001000:5.0:1368040600.989917:5328:8187:0:(llog_swab.c:58:print_llogd_body()) lgd_llh_flags: 0x0 00000040:00001000:5.0:1368040600.989917:5328:8187:0:(llog_swab.c:59:print_llogd_body()) lgd_index: 0x0 00000040:00001000:5.0:1368040600.989918:5328:8187:0:(llog_swab.c:60:print_llogd_body()) lgd_saved_index: 0x0 00000040:00001000:5.0:1368040600.989918:5328:8187:0:(llog_swab.c:61:print_llogd_body()) lgd_len: 0x0 00000040:00001000:5.0:1368040600.989919:5328:8187:0:(llog_swab.c:62:print_llogd_body()) lgd_cur_offset: 0x0 00000040:00001000:5.0:1368040600.989920:5328:8187:0:(llog_swab.c:53:print_llogd_body()) llogd body: c000000f50e9a100 00000040:00001000:5.0:1368040600.989920:5328:8187:0:(llog_swab.c:55:print_llogd_body()) lgd_logid.lgl_oi: 0x64:1 00000040:00001000:5.0:1368040600.989921:5328:8187:0:(llog_swab.c:56:print_llogd_body()) lgd_logid.lgl_ogen: 0x0 00000040:00001000:5.0:1368040600.989921:5328:8187:0:(llog_swab.c:57:print_llogd_body()) lgd_ctxt_idx: 0x0 00000040:00001000:5.0:1368040600.989922:5328:8187:0:(llog_swab.c:58:print_llogd_body()) lgd_llh_flags: 0x0 00000040:00001000:5.0:1368040600.989923:5328:8187:0:(llog_swab.c:59:print_llogd_body()) lgd_index: 0x0 00000040:00001000:5.0:1368040600.989923:5328:8187:0:(llog_swab.c:60:print_llogd_body()) lgd_saved_index: 0x0 00000040:00001000:5.0:1368040600.989924:5328:8187:0:(llog_swab.c:61:print_llogd_body()) lgd_len: 0x0 00000040:00001000:5.0:1368040600.989924:5328:8187:0:(llog_swab.c:62:print_llogd_body()) lgd_cur_offset: 0x0 00000040:00000001:5.0:1368040600.989925:5152:8187:0:(llog_swab.c:97:lustre_swab_llogd_body()) Process leaving But somehow server can not find the log object by this ID. Unfortunately, I can not find correspondent mgs handling information in the MDS debug log. Could you please redo the test update the debug log. In the mean time, I do see there are some problem during the logid swab(John also point out one in |
| Comment by Di Wang [ 09/May/13 ] |
| Comment by John Hammond [ 09/May/13 ] |
|
Ned would you confirm that these are x86_64 servers and ppc/ppc64 clients? In that case it's unlikely that you're affected by |
| Comment by Ned Bass [ 09/May/13 ] |
|
Yes these are x86_64 servers and ppc64 clients. Also, if it is an unfixed swabbing bug, I would expect the mount to also fail with 2.3.64 servers. |
| Comment by Jodi Levi (Inactive) [ 10/May/13 ] |
|
Now that this patch has landed, can we get confirmation that this is fixed? |
| Comment by Ned Bass [ 10/May/13 ] |
|
With the patch, a 2.3.64 PPC client can mount from a 2.3.63 server. So this appears to be fixed. Thanks |
| Comment by Jodi Levi (Inactive) [ 10/May/13 ] |
|
Based on latest comments, this patch landed and has fixed the issue. Closing ticket. |