Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10535

Improve error message handling when mirrored files are accessed by older clients

Details

    • Improvement
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.11.0
    • 9223372036854775807

    Description

      Issue extracted from the testing captured in LU-10286:

      I have a system configured as 2.11 servers, one 2.11 client and one 2.9.0 client
      1. on the 2.11 client, create 1 pfl file, 1 flr file with plain layout, and 1 flr file with composite layout
      2. on the 2.9 client, got these messages when try to access these files and when I do "ls -al" :

      [root@onyx-77 lustre]# ls
      foo-ext  foo-flr  foo-pfl  foo-plain-2.9
      [root@onyx-77 lustre]# ls -al
      [329391.090438] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0
      [329391.102999] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) Skipped 3 previous similar messages
      [329391.115668] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=552):
      [329391.130044] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 3 previous similar messages
      [329391.142376] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0B2802000003000000010005000200000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFF10010000380000000000000000000000000000000000000001000200100000000000000000000000000010000000000048010000380000000000000000000000000000000000000002000200000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000010003001000000000000000000000000000100000000000FF010000380000000000000000000000000000000000000002000300000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000001000000FF0BFF0B0100000003000000000000000104000002000000000010000200FFFF0000000000000000000000000000000000000000FFFFFFFFFF0BFF0B0100000003000000000000000104000002000000000010[329391.251564] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 3 previous similar messages
      [329391.266288] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x3:0x0]: -22
      [329391.283577] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 3 previous similar messages
      [329391.296622] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22
      [329391.307933] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) Skipped 1 previous similar message
      ls: cannot access foo-ext: Invalid argument
      ls: cannot access foo-pfl: Invalid argument
      ls: cannot access foo-flr: Invalid argument
      total 8
      drwxr-xr-x  3 root root 4096 Dec 22 15:56 .
      drwxr-xr-x. 3 root root 4096 Dec 18 20:52 ..
      -?????????? ? ?    ?       ?            ? foo-ext
      -?????????? ? ?    ?       ?            ? foo-flr
      -?????????? ? ?    ?       ?            ? foo-pfl
      -rw-r--r--  1 root root    0 Dec 22 15:56 foo-plain-2.9
      [root@onyx-77 lustre]# 
      

      Andreas' thoughts: It probably makes sense to improve these error messages to consolidate them to at most one message per unknown magic, or similar. It probably isn't useful to dump the long hex string to the console.

      Attachments

        Issue Links

          Activity

            [LU-10535] Improve error message handling when mirrored files are accessed by older clients
            yujian Jian Yu added a comment - - edited

            Hi Andreas,

            I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998. So, on Lustre 2.10 and master clients, there are no LMM dumping messages. For FID information, it's already printed by cl_file_inode_init().

            Could you please suggest if we still need to work on this ticket?

            yujian Jian Yu added a comment - - edited Hi Andreas, I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998 . So, on Lustre 2.10 and master clients, there are no LMM dumping messages. For FID information, it's already printed by cl_file_inode_init(). Could you please suggest if we still need to work on this ticket?
            yujian Jian Yu added a comment -

            Thank you for the suggestion, Andreas.
            I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998. So, on Lustre 2.10 and master clients, there are no LMM dumping messages.
            For FID information, it's already printed by cl_file_inode_init().

            yujian Jian Yu added a comment - Thank you for the suggestion, Andreas. I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998 . So, on Lustre 2.10 and master clients, there are no LMM dumping messages. For FID information, it's already printed by cl_file_inode_init().

            I don't think we need to fix 2.9 clients, but this message could still be improved in master for the future cases when a new layout type is hit. For example, including the FID and pathname (if possible) in the message, moving the LMM dump to D_RPCTRACE (or other non-console error) and reducing the number of similar messages that appear on the console (e.g. we don't need all of lsm_op_find(), lov_verify_lmm(), cl_file_inode_init(), and ll_prep_inode() to print messages for this file).

            adilger Andreas Dilger added a comment - I don't think we need to fix 2.9 clients, but this message could still be improved in master for the future cases when a new layout type is hit. For example, including the FID and pathname (if possible) in the message, moving the LMM dump to D_RPCTRACE (or other non-console error) and reducing the number of similar messages that appear on the console (e.g. we don't need all of lsm_op_find() , lov_verify_lmm() , cl_file_inode_init() , and ll_prep_inode() to print messages for this file).
            yujian Jian Yu added a comment -

            Hi Andreas,

            The following error messages only showed on Lustre 2.9 client while accessing a PFL or mirrored file. Do you think we need make changes to Lustre 2.9 client codes?

            [162026.379112] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0
            [162026.380180] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) Skipped 1 previous similar message
            [162026.381154] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=416):
            [162026.382278] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 1 previous similar message
            [162026.383244] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0BFF01000005000000010002000100000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFFFF000000600000000000000000000000000000000000000002000200100000000000000000000000FFFFFFFFFFFFFFFFFF000000FF00000000000000000000000000000000000000FF0BFF0B010000001E0000000000000001040000020000000000400002000000666C61736800000000000000000000000B00000000000000000000000000000000000000000000000B0000000000000000000000000000000000000001000000FF0BFF0B010000001E0000000000000001040000020000000000400006000000617263686976650000000000000000000A00000000000000000000000000000000000000060000000A00000000000000000000000000000000000000070000000900000000000000000000000000000000000000020000000900000000000000000000000000000000000000030000000B00000000000000000000000000000000000000040000000B0000000000000000000000000000000000000005000000
            [162026.391274] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 1 previous similar message
            [162026.392302] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x1e:0x0]: -22
            [162026.393595] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 1 previous similar message
            [162026.394669] LustreError: 12141:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22
            

             

            yujian Jian Yu added a comment - Hi Andreas, The following error messages only showed on Lustre 2.9 client while accessing a PFL or mirrored file. Do you think we need make changes to Lustre 2.9 client codes? [162026.379112] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0 [162026.380180] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) Skipped 1 previous similar message [162026.381154] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=416): [162026.382278] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 1 previous similar message [162026.383244] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0BFF01000005000000010002000100000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFFFF000000600000000000000000000000000000000000000002000200100000000000000000000000FFFFFFFFFFFFFFFFFF000000FF00000000000000000000000000000000000000FF0BFF0B010000001E0000000000000001040000020000000000400002000000666C61736800000000000000000000000B00000000000000000000000000000000000000000000000B0000000000000000000000000000000000000001000000FF0BFF0B010000001E0000000000000001040000020000000000400006000000617263686976650000000000000000000A00000000000000000000000000000000000000060000000A00000000000000000000000000000000000000070000000900000000000000000000000000000000000000020000000900000000000000000000000000000000000000030000000B00000000000000000000000000000000000000040000000B0000000000000000000000000000000000000005000000 [162026.391274] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 1 previous similar message [162026.392302] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x1e:0x0]: -22 [162026.393595] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 1 previous similar message [162026.394669] LustreError: 12141:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22  
            yujian Jian Yu added a comment -

            Has there been a test with 2.10 clients accessing FLR files?

            Yes. On Lustre 2.10 client, accessing and opening a mirrored file showed that:

            # ls -l /mnt/testfs/mirrored_file 
            -rw-r--r-- 1 root root 575221 Mar 11 01:11 /mnt/testfs/mirrored_file
            
            # cat /mnt/testfs/mirrored_file 
            cat: /mnt/testfs/mirrored_file: Operation not supported
            
            yujian Jian Yu added a comment - Has there been a test with 2.10 clients accessing FLR files? Yes. On Lustre 2.10 client, accessing and opening a mirrored file showed that: # ls -l /mnt/testfs/mirrored_file -rw-r--r-- 1 root root 575221 Mar 11 01:11 /mnt/testfs/mirrored_file # cat /mnt/testfs/mirrored_file cat: /mnt/testfs/mirrored_file: Operation not supported

            Has there been a test with 2.10 clients accessing FLR files?

            adilger Andreas Dilger added a comment - Has there been a test with 2.10 clients accessing FLR files?

            People

              yujian Jian Yu
              jgmitter Joseph Gmitter (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: