[LU-10535] Improve error message handling when mirrored files are accessed by older clients Created: 19/Jan/18 Updated: 09/Aug/18 Resolved: 09/Aug/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Joseph Gmitter (Inactive) | Assignee: | Jian Yu |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | FLR2 | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Issue extracted from the testing captured in I have a system configured as 2.11 servers, one 2.11 client and one 2.9.0 client [root@onyx-77 lustre]# ls foo-ext foo-flr foo-pfl foo-plain-2.9 [root@onyx-77 lustre]# ls -al [329391.090438] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0 [329391.102999] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) Skipped 3 previous similar messages [329391.115668] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=552): [329391.130044] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 3 previous similar messages [329391.142376] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0B2802000003000000010005000200000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFF10010000380000000000000000000000000000000000000001000200100000000000000000000000000010000000000048010000380000000000000000000000000000000000000002000200000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000010003001000000000000000000000000000100000000000FF010000380000000000000000000000000000000000000002000300000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000001000000FF0BFF0B0100000003000000000000000104000002000000000010000200FFFF0000000000000000000000000000000000000000FFFFFFFFFF0BFF0B0100000003000000000000000104000002000000000010[329391.251564] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 3 previous similar messages [329391.266288] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x3:0x0]: -22 [329391.283577] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 3 previous similar messages [329391.296622] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22 [329391.307933] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) Skipped 1 previous similar message ls: cannot access foo-ext: Invalid argument ls: cannot access foo-pfl: Invalid argument ls: cannot access foo-flr: Invalid argument total 8 drwxr-xr-x 3 root root 4096 Dec 22 15:56 . drwxr-xr-x. 3 root root 4096 Dec 18 20:52 .. -?????????? ? ? ? ? ? foo-ext -?????????? ? ? ? ? ? foo-flr -?????????? ? ? ? ? ? foo-pfl -rw-r--r-- 1 root root 0 Dec 22 15:56 foo-plain-2.9 [root@onyx-77 lustre]# Andreas' thoughts: It probably makes sense to improve these error messages to consolidate them to at most one message per unknown magic, or similar. It probably isn't useful to dump the long hex string to the console. |
| Comments |
| Comment by Andreas Dilger [ 20/Jan/18 ] |
|
Has there been a test with 2.10 clients accessing FLR files? |
| Comment by Jian Yu [ 11/Mar/18 ] |
Yes. On Lustre 2.10 client, accessing and opening a mirrored file showed that: # ls -l /mnt/testfs/mirrored_file -rw-r--r-- 1 root root 575221 Mar 11 01:11 /mnt/testfs/mirrored_file # cat /mnt/testfs/mirrored_file cat: /mnt/testfs/mirrored_file: Operation not supported |
| Comment by Jian Yu [ 12/Mar/18 ] |
|
Hi Andreas, The following error messages only showed on Lustre 2.9 client while accessing a PFL or mirrored file. Do you think we need make changes to Lustre 2.9 client codes? [162026.379112] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0 [162026.380180] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) Skipped 1 previous similar message [162026.381154] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=416): [162026.382278] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 1 previous similar message [162026.383244] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0BFF01000005000000010002000100000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFFFF000000600000000000000000000000000000000000000002000200100000000000000000000000FFFFFFFFFFFFFFFFFF000000FF00000000000000000000000000000000000000FF0BFF0B010000001E0000000000000001040000020000000000400002000000666C61736800000000000000000000000B00000000000000000000000000000000000000000000000B0000000000000000000000000000000000000001000000FF0BFF0B010000001E0000000000000001040000020000000000400006000000617263686976650000000000000000000A00000000000000000000000000000000000000060000000A00000000000000000000000000000000000000070000000900000000000000000000000000000000000000020000000900000000000000000000000000000000000000030000000B00000000000000000000000000000000000000040000000B0000000000000000000000000000000000000005000000 [162026.391274] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 1 previous similar message [162026.392302] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x1e:0x0]: -22 [162026.393595] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 1 previous similar message [162026.394669] LustreError: 12141:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22
|
| Comment by Andreas Dilger [ 12/Mar/18 ] |
|
I don't think we need to fix 2.9 clients, but this message could still be improved in master for the future cases when a new layout type is hit. For example, including the FID and pathname (if possible) in the message, moving the LMM dump to D_RPCTRACE (or other non-console error) and reducing the number of similar messages that appear on the console (e.g. we don't need all of lsm_op_find(), lov_verify_lmm(), cl_file_inode_init(), and ll_prep_inode() to print messages for this file). |
| Comment by Jian Yu [ 12/Mar/18 ] |
|
Thank you for the suggestion, Andreas. |
| Comment by Jian Yu [ 08/Aug/18 ] |
|
Hi Andreas,
Could you please suggest if we still need to work on this ticket? |