[LU-11985] Lustre 2.12.0 client compatibility question Created: 20/Feb/19  Updated: 29/Jun/22  Resolved: 21/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Haisong Cai (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Not a Bug Votes: 0
Labels: interop
Environment:

server side:
Linux aeon-eval-nvme-xeon 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
lustre-2.12.0-1.el7.x86_64
ZFS 0.7.9

client side:
2.10.5
2.11.0
2.12.0


Attachments: HTML File client_log     HTML File comet-26-20_history     File dmesg.11445     HTML File server_log    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

 

We are getting errors when accessing a Lustre 2.12.0 filesystem with client version 2.10.5 or 2.11.0.

  • we can make directory, no problem
  • for files, we get errors like
  • ls: cannot access file55: Invalid argument
    ls: cannot access file22: Invalid argument
    ls: cannot access file74: Invalid argument
    ls: cannot access file90: Invalid argument
    ls: cannot access file46: Invalid argument
    ls: cannot access file31: Invalid argument
  • -bash-4.1$ ls -las
    ls: cannot access one: Invalid argument
    total 50
    25 drwxr-xr-x 2 manu1729 csd102 25600 Feb 20 09:30 .
    25 drwxr-xr-x 6 manu1729 csd102 25600 Feb 20 09:30 ..
    ? -????????? ? ? ? ? ? one
  • using client 2.12, all looks fine.
  • 2.12.0 ChangeLog says: Clients & Servers: Latest 2.10.X and Latest 2.11.X

Am I missing something here?



 Comments   
Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

Cai,

Can you do lfs getstripe (from a 2.12 client) on one of these files?  You may be using a layout feature not supported in the earlier version.  (Though you should get -EOPNOTSUP, not -EINVAL, generally.)

Comment by Haisong Cai (Inactive) [ 20/Feb/19 ]

 

[root@comet-26-02 dir1]# ls -las

...

13 rw-rr- 1 manu1729 csd102 1024 Feb 20 09:24 file97
13 rw-rr- 1 manu1729 csd102 1024 Feb 20 09:24 file98
13 rw-rr- 1 manu1729 csd102 1024 Feb 20 09:24 file99
? -????????? ? ? ? ? ? one

[root@comet-26-02 dir1]# lfs getstripe one
lfs getstripe: error opening one: Invalid argument (22)
one
lcm_layout_gen: 2
lcm_mirror_count: 1
lcm_entry_count: 2
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 131072
lmm_stripe_count: 0
lmm_stripe_size: 131072
lmm_pattern: mdt
lmm_layout_gen: 0
lmm_stripe_offset: 0

lcme_id: 2
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 131072
lcme_extent.e_end: EOF
lmm_stripe_count: -1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1

 

dmesg:

LustreError: 2054:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 1 previous similar message
LustreError: 2054:0:(llite_lib.c:2328:ll_prep_inode()) new_inode -fatal: rc -22
LustreError: 2054:0:(llite_lib.c:2328:ll_prep_inode()) Skipped 11 previous similar messages
LustreError: 2078:0:(llite_lib.c:2328:ll_prep_inode()) new_inode -fatal: rc -22
LustreError: 2078:0:(llite_lib.c:2328:ll_prep_inode()) Skipped 4 previous similar messages

Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

Cai,

Is this from a 2.12 client?  I thought you said the 2.12 client didn't get these errors?  If this is not from a 2.12 client, can you try again from a 2.12 client?

But, since you gave a portion of it - Can you share all of dmesg from comet-26-02 ?  (Which I assume is not running 2.12?)

Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

Also, it looks like you've got a data-on-MDT component in this file.  That is not going to work with a 2.10 client, because it lacks the feature entirely.

Comment by Haisong Cai (Inactive) [ 20/Feb/19 ]

 

All above messages are coming from comet-26-02 which is running Lustre 2.11.0

 

dmesg from the same client is coming

 

Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

Interesting, OK.

Let's get some debug, from this client and from the MDS.

DEBUGMB=`lctl get_param -n debug_mb`
lctl set_param *debug=-1 debug_mb=10000
lctl clear
lctl mark "before"

# do the ls -la command on one file

lctl mark "after"
#Write out the log
lctl dk > /tmp/log

#Set debug back to defaults
lctl set_param debug="super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck"
lctl set_param debug_mb=$DEBUGMB 

Please gather the debug log from the client & the MDS and post those here.

Comment by Haisong Cai (Inactive) [ 20/Feb/19 ]

Logs uploaded.

 

Haisong

Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

The client this dklog is from is running 2.10.X (I believe 2.10.5?), and it is rejecting a DoM component for having no stripes.  This is expected behavior.  You cannot use DoM files with a 2.10 client.

Can you please double check your versions and interop issues with this in mind?

 

Comment by Patrick Farrell (Inactive) [ 20/Feb/19 ]

Note also in the dmesg you attached:
Lustre: Lustre: Build Version: 2.10.5

Not 2.11.

Comment by Jeff Johnson (Inactive) [ 21/Feb/19 ]

Also, FWIW the MDS is running spl/zfs 0.7.12. 

Comment by Patrick Farrell (Inactive) [ 21/Feb/19 ]

aeonjeffj good to know, but please check the client versions, etc - The logs I have been shown don't show any bugs.

A 2.10 client can't use DOM.  This is what it looks like when you try (I understand it's not the best representation of that incompatibility, sorry.).

If you're having issues with a 2.11 client and DOM files - or issues with non-DOM or FLR files and a 2.10 client - let us know.

Comment by Haisong Cai (Inactive) [ 21/Feb/19 ]

Hi Patrick,

After seeing your message yesterday afternoon, I tried unloading Lustre (lustre_rmmod) and reloading it again, this time it appeared permission errors went away. I am attaching a command history output to reference I did see the errors on 2.11 clients after the first time loading the Lustre, with mounting the Lustre f/s.

 

The history file was taken on one of the 2 clients (not the one I took debug_kernel on). line 1-30, was the first attempt mounting the f/s with 2.10.5. from 31-109 was when I uninstall 2.10.5 and install 2.11. from 110 to the end was when I made the unloading-reloading yesterday afternoon.

So far we have ran a small sets of tests and haven't seen compatibility issues. Please keep this ticket open for a couple of days as we are about to ramp up the tests.

 

Thanks for the help,

Haisong

Comment by Patrick Farrell (Inactive) [ 21/Feb/19 ]

Haisong,

Sure, glad to help.

Given the description of this ticket is specific to an issue that is now resolved, I'm actually going to ask you to open a new ticket for the next issue.  This is just so we can keep the ticket closely aligned to the problem being discussed.  We'll still help you out.

 

Thanks!

Comment by Patrick Farrell (Inactive) [ 21/Feb/19 ]

Some confusion over exact versions live on the nodes, now resolved.

Generated at Sat Feb 10 02:48:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.