Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3302

ll_fill_super() Unable to process log: -2

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • PPC client
    • 3
    • 8173

    Description

      We updated a client to 2.3.64-4chaos and tried to mount a 2.3.63-6chaos server. The mount fails with

      LustreError: 15c-8: MGC172.20.20.201@o2ib500: The configuration from log 'fsv-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 14351:0:(llite_lib.c:1043:ll_fill_super()) Unable to process log: -2
      Lustre: Unmounted fsv-client
      LustreError: 14351:0:(obd_mount.c:1265:lustre_fill_super()) Unable to mount  (-2)
      

      Using git bisect I found the mount failure was introduced with this patch:

      http://review.whamcloud.com/#change,5820

      LU-2684 fid: unify ostid and FID
      

      The critical questions at this point are:

      • Can we solve this problem by updating both server and client to 2.3.64-4chaos?
      • Can we safely upgrade the server, or does the above patch introduce on-disk format incompatibilities?
      • Will we be able to safely revert the server to 2.3.63 in case we find problems, or will it write new objects in an incompatible format?

      LLNL-bug-id: TOSS-2060

      Attachments

        Issue Links

          Activity

            [LU-3302] ll_fill_super() Unable to process log: -2

            Yes it is. Sorry for the omission.

            nedbass Ned Bass (Inactive) added a comment - Yes it is. Sorry for the omission.

            Ned, is this a PPC client? It would be useful to include this information in the "Environment" section when filing a bug.

            adilger Andreas Dilger added a comment - Ned, is this a PPC client? It would be useful to include this information in the "Environment" section when filing a bug.
            pjones Peter Jones added a comment -

            Di

            Could you please comment on this?

            Thanks

            Peter

            pjones Peter Jones added a comment - Di Could you please comment on this? Thanks Peter

            The 6044/LU-2888 patch fixed the handling on the server, but the original problem patch from LU-2684 wasn't in 2.3.63, so it shouldn't be relevant.

            adilger Andreas Dilger added a comment - The 6044/ LU-2888 patch fixed the handling on the server, but the original problem patch from LU-2684 wasn't in 2.3.63, so it shouldn't be relevant.

            I did notice the mgs got ENOENT handling opcodes LLOG_ORIGIN_HANDLE_CREATE and LLOG_ORIGIN_HANDLE_READ_HEADER:

            20000000:01000000:6.0:1368040430.786604:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 501: rc = -2
              req@ffff881019ecb050 x1434494006460492/t0(0) o501->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 296/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1
            ...
            20000000:01000000:6.0:1368040430.788063:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 503: rc = -2
              req@ffff881019f14850 x1434494006460504/t0(0) o503->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 272/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1
            
            nedbass Ned Bass (Inactive) added a comment - I did notice the mgs got ENOENT handling opcodes LLOG_ORIGIN_HANDLE_CREATE and LLOG_ORIGIN_HANDLE_READ_HEADER: 20000000:01000000:6.0:1368040430.786604:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 501: rc = -2 req@ffff881019ecb050 x1434494006460492/t0(0) o501->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 296/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1 ... 20000000:01000000:6.0:1368040430.788063:0:18265:0:(mgs_handler.c:757:mgs_handle()) @@@ MGS fail to handle opc = 503: rc = -2 req@ffff881019f14850 x1434494006460504/t0(0) o503->2e89e428-68d9-71a1-75f0-147bc1963566@172.20.16.10@o2ib500:0/0 lens 272/0 e 0 to 0 dl 1368040491 ref 1 fl Interpret:/0/ffffffff rc 0/-1

            Attaching -1 debug logs for client and MDS. Note these were not captured from the same mount attempt.

            The NID of the client is 172.20.16.10@o2ib500.

            nedbass Ned Bass (Inactive) added a comment - Attaching -1 debug logs for client and MDS. Note these were not captured from the same mount attempt. The NID of the client is 172.20.16.10@o2ib500.

            Andreas, yes I'll grab the logs.

            Note the above error was a 2.3.64 client talking to a 2.3.63 server. Do you mean that patch 6044 fixed LLOG handling on the client, or is it needed on the server as well?

            nedbass Ned Bass (Inactive) added a comment - Andreas, yes I'll grab the logs. Note the above error was a 2.3.64 client talking to a 2.3.63 server. Do you mean that patch 6044 fixed LLOG handling on the client, or is it needed on the server as well?

            Ned, can you please attach a -1 debug log from the 2.3.64 client, and ideally also from the MGS.

            I agree that the LU-2684 change was problematic, and it was intended to only change the network protocol between clients and OSTs when running DNE. The LU-2888 patch http://review.whamcloud.com/6044 (which was already included in 2.3.64) should have fixed the LLOG handling, so I'm not sure what the exact cause of your problem is. AFAIK, the current master code interoperates with 2.1.5 and 2.3.0 properly, but there might be something specific with your setup that is causing grief.

            adilger Andreas Dilger added a comment - Ned, can you please attach a -1 debug log from the 2.3.64 client, and ideally also from the MGS. I agree that the LU-2684 change was problematic, and it was intended to only change the network protocol between clients and OSTs when running DNE. The LU-2888 patch http://review.whamcloud.com/6044 (which was already included in 2.3.64) should have fixed the LLOG handling, so I'm not sure what the exact cause of your problem is. AFAIK, the current master code interoperates with 2.1.5 and 2.3.0 properly, but there might be something specific with your setup that is causing grief.

            Di, can you advise us on this? Thanks

            nedbass Ned Bass (Inactive) added a comment - Di, can you advise us on this? Thanks

            As an editorial comment, while we understand that interoperability issues are inevitable in a pre-release branch, we wish such changes would be advertised more prominently. Clear statements about compatibility between tags would really help us plan our update process. At a minimum, patches that introduce incompatibilities should say so clearly in the commit message.

            nedbass Ned Bass (Inactive) added a comment - As an editorial comment, while we understand that interoperability issues are inevitable in a pre-release branch, we wish such changes would be advertised more prominently. Clear statements about compatibility between tags would really help us plan our update process. At a minimum, patches that introduce incompatibilities should say so clearly in the commit message.

            People

              di.wang Di Wang
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: