[LU-10535] Improve error message handling when mirrored files are accessed by older clients Created: 19/Jan/18  Updated: 09/Aug/18  Resolved: 09/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Joseph Gmitter (Inactive) Assignee: Jian Yu
Resolution: Won't Fix Votes: 0
Labels: FLR2

Issue Links:
Related
is related to LU-10286 Verify the behaviors when mirrored fi... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Issue extracted from the testing captured in LU-10286:

I have a system configured as 2.11 servers, one 2.11 client and one 2.9.0 client
1. on the 2.11 client, create 1 pfl file, 1 flr file with plain layout, and 1 flr file with composite layout
2. on the 2.9 client, got these messages when try to access these files and when I do "ls -al" :

[root@onyx-77 lustre]# ls
foo-ext  foo-flr  foo-pfl  foo-plain-2.9
[root@onyx-77 lustre]# ls -al
[329391.090438] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0
[329391.102999] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) Skipped 3 previous similar messages
[329391.115668] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=552):
[329391.130044] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 3 previous similar messages
[329391.142376] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmmustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 3 previous similar messages
[329391.266288] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x3:0x0]: -22
[329391.283577] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 3 previous similar messages
[329391.296622] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22
[329391.307933] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) Skipped 1 previous similar message
ls: cannot access foo-ext: Invalid argument
ls: cannot access foo-pfl: Invalid argument
ls: cannot access foo-flr: Invalid argument
total 8
drwxr-xr-x  3 root root 4096 Dec 22 15:56 .
drwxr-xr-x. 3 root root 4096 Dec 18 20:52 ..
-?????????? ? ?    ?       ?            ? foo-ext
-?????????? ? ?    ?       ?            ? foo-flr
-?????????? ? ?    ?       ?            ? foo-pfl
-rw-r--r--  1 root root    0 Dec 22 15:56 foo-plain-2.9
[root@onyx-77 lustre]# 

Andreas' thoughts: It probably makes sense to improve these error messages to consolidate them to at most one message per unknown magic, or similar. It probably isn't useful to dump the long hex string to the console.



 Comments   
Comment by Andreas Dilger [ 20/Jan/18 ]

Has there been a test with 2.10 clients accessing FLR files?

Comment by Jian Yu [ 11/Mar/18 ]

Has there been a test with 2.10 clients accessing FLR files?

Yes. On Lustre 2.10 client, accessing and opening a mirrored file showed that:

# ls -l /mnt/testfs/mirrored_file 
-rw-r--r-- 1 root root 575221 Mar 11 01:11 /mnt/testfs/mirrored_file

# cat /mnt/testfs/mirrored_file 
cat: /mnt/testfs/mirrored_file: Operation not supported
Comment by Jian Yu [ 12/Mar/18 ]

Hi Andreas,

The following error messages only showed on Lustre 2.9 client while accessing a PFL or mirrored file. Do you think we need make changes to Lustre 2.9 client codes?

[162026.379112] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0
[162026.380180] LustreError: 12141:0:(lov_internal.h:100:lsm_op_find()) Skipped 1 previous similar message
[162026.381154] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=416):
[162026.382278] LustreError: 12141:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 1 previous similar message
[162026.383244] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm
[162026.391274] LustreError: 12141:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 1 previous similar message
[162026.392302] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x1e:0x0]: -22
[162026.393595] LustreError: 12141:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 1 previous similar message
[162026.394669] LustreError: 12141:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22

 

Comment by Andreas Dilger [ 12/Mar/18 ]

I don't think we need to fix 2.9 clients, but this message could still be improved in master for the future cases when a new layout type is hit. For example, including the FID and pathname (if possible) in the message, moving the LMM dump to D_RPCTRACE (or other non-console error) and reducing the number of similar messages that appear on the console (e.g. we don't need all of lsm_op_find(), lov_verify_lmm(), cl_file_inode_init(), and ll_prep_inode() to print messages for this file).

Comment by Jian Yu [ 12/Mar/18 ]

Thank you for the suggestion, Andreas.
I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998. So, on Lustre 2.10 and master clients, there are no LMM dumping messages.
For FID information, it's already printed by cl_file_inode_init().

Comment by Jian Yu [ 08/Aug/18 ]

Hi Andreas,

I found lov_verify_lmm() had already been removed since Lustre 2.10 in patch https://review.whamcloud.com/24849 for LU-8998. So, on Lustre 2.10 and master clients, there are no LMM dumping messages. For FID information, it's already printed by cl_file_inode_init().

Could you please suggest if we still need to work on this ticket?

Generated at Sat Feb 10 02:35:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.