[LU-9052] lod_verify_md_striping()) lsh-MDT0000-mdtlov: invalid lmv_user_md: magic = cd20cd0 Created: 25/Jan/17  Updated: 02/Feb/18  Resolved: 22/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

ssh://review.whamcloud.com/fs/lustre-release-fe-llnl


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We just unleashed users on our new Lustre 2.8 DNE1 filesystem. Some users are migrating data from older Lustre 2.5 filesystems. I started seeing messages like these on an MDT:

Jan 25 12:56:45 zinc1 kernel: LustreError: 98053:0:(lod_object.c:1370:lod_verify_md_striping()) lsh-MDT0000-mdtlov: invalid lmv_user_md: magic = cd20cd0, stripe_offset = 4, stripe_count = 1: rc = -22
Jan 25 12:56:45 zinc1 kernel: LustreError: 98053:0:(lod_object.c:1370:lod_verify_md_striping()) Skipped 6 previous similar messages

I haven't yet tracked down what client-side process is producing these. I notice the magic value  reported in the message (0x0CD20CD0) is LMV_MAGIC_V1 and that message is produced when the lum->lum_magic != LMV_USER_MAGIC (0x0CD30CD0). My hunch is that this is related to users copying files that have their striping layouts stored in v1 format from the older 2.5 filesystems. However I haven't been able to reproduce it that way myself.



 Comments   
Comment by Di Wang [ 25/Jan/17 ]

These fail magic number should not come from the old data, i.e. it has to come from client. I assume all clients are 2.8 now? Is that possible those application still use the old lustre user lib? then they need rebuild their application.

Comment by Ned Bass [ 25/Jan/17 ]

I assume all clients are 2.8 now?

Hi Di. No, there are also some 2.5 clients. Our data transfer nodes are running Lustre 2.5 and mount both the new 2.8 filesystem and the older 2.5 filesystems. That is where users can move their data between filesystems.

Is that possible those application still use the old lustre user lib? then they need rebuild their application.

I tried creating a file on the new filesystem with the 2.5 version of liblustreapi but it doesn't reproduce the error.

Comment by Di Wang [ 25/Jan/17 ]
I tried creating a file on the new filesystem with the 2.5 version of liblustreapi but it doesn't reproduce the error.

you mean create a striped directory from 2.5 client to a 2.8 MDSs? Hmm, in 2.5, it still use #define LMV_USER_MAGIC 0x0CD20CD0 /default lmv magic/, if you tried to create a striped directory with old liblustreapi, it may cause this failure.

Comment by Ned Bass [ 26/Jan/17 ]

you mean create a striped directory from 2.5 client to a 2.8 MDSs?

No, I just used the example program from the llapi_file_open() man page to create a file in an existing directory. Note that we're using DNE1, so all our remote directories have a stripe count of 1.

Comment by Ned Bass [ 26/Jan/17 ]

Ah, you're right. This seems to be coming from lfs mkdir on the 2.5 client. We have an automated process that creates user directories when new users are added. Despite the error the user directory still gets created with the requested MDT index. Would it be safe for lod_verify_md_striping() to accept the V1 magic so we don't see this error all the time?

Comment by Di Wang [ 26/Jan/17 ]

Ned, ah, yes, this check will only be used in create striped directory process (lod_ah_init()), which is a void function. That is why it did not return invalid for lfs mkdir. Yes, it should be safe to also accept V1 user magic. I will cook the patch. Thanks.

Comment by Gerrit Updater [ 26/Jan/17 ]

wangdi (di.wang@intel.com) uploaded a new patch: https://review.whamcloud.com/25091
Subject: LU-9052 lod: accept lfs mkdir from old client
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7740f547ad9cce8bc2552ca7f1499ccd4266a2bd

Comment by Gerrit Updater [ 22/Nov/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25091/
Subject: LU-9052 lod: accept lfs mkdir from old client
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 069f593572509d6ee285ba6ea8950101ccb62d72

Comment by Peter Jones [ 22/Nov/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 14/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30542
Subject: LU-9052 lod: accept lfs mkdir from old client
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 58b021215cdd6be263a608faf60a18f7819b63f9

Comment by Gerrit Updater [ 02/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30542/
Subject: LU-9052 lod: accept lfs mkdir from old client
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 8a760d5ef83a5fb697a61ff38f7182f8a50968d8

Generated at Sat Feb 10 02:22:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.