[LU-9052] lod_verify_md_striping()) lsh-MDT0000-mdtlov: invalid lmv_user_md: magic = cd20cd0 Created: 25/Jan/17 Updated: 02/Feb/18 Resolved: 22/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ned Bass | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
ssh://review.whamcloud.com/fs/lustre-release-fe-llnl |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We just unleashed users on our new Lustre 2.8 DNE1 filesystem. Some users are migrating data from older Lustre 2.5 filesystems. I started seeing messages like these on an MDT: Jan 25 12:56:45 zinc1 kernel: LustreError: 98053:0:(lod_object.c:1370:lod_verify_md_striping()) lsh-MDT0000-mdtlov: invalid lmv_user_md: magic = cd20cd0, stripe_offset = 4, stripe_count = 1: rc = -22 Jan 25 12:56:45 zinc1 kernel: LustreError: 98053:0:(lod_object.c:1370:lod_verify_md_striping()) Skipped 6 previous similar messages I haven't yet tracked down what client-side process is producing these. I notice the magic value reported in the message (0x0CD20CD0) is LMV_MAGIC_V1 and that message is produced when the lum->lum_magic != LMV_USER_MAGIC (0x0CD30CD0). My hunch is that this is related to users copying files that have their striping layouts stored in v1 format from the older 2.5 filesystems. However I haven't been able to reproduce it that way myself. |
| Comments |
| Comment by Di Wang [ 25/Jan/17 ] |
|
These fail magic number should not come from the old data, i.e. it has to come from client. I assume all clients are 2.8 now? Is that possible those application still use the old lustre user lib? then they need rebuild their application. |
| Comment by Ned Bass [ 25/Jan/17 ] |
Hi Di. No, there are also some 2.5 clients. Our data transfer nodes are running Lustre 2.5 and mount both the new 2.8 filesystem and the older 2.5 filesystems. That is where users can move their data between filesystems.
I tried creating a file on the new filesystem with the 2.5 version of liblustreapi but it doesn't reproduce the error. |
| Comment by Di Wang [ 25/Jan/17 ] |
I tried creating a file on the new filesystem with the 2.5 version of liblustreapi but it doesn't reproduce the error. you mean create a striped directory from 2.5 client to a 2.8 MDSs? Hmm, in 2.5, it still use #define LMV_USER_MAGIC 0x0CD20CD0 /default lmv magic/, if you tried to create a striped directory with old liblustreapi, it may cause this failure. |
| Comment by Ned Bass [ 26/Jan/17 ] |
No, I just used the example program from the llapi_file_open() man page to create a file in an existing directory. Note that we're using DNE1, so all our remote directories have a stripe count of 1. |
| Comment by Ned Bass [ 26/Jan/17 ] |
|
Ah, you're right. This seems to be coming from lfs mkdir on the 2.5 client. We have an automated process that creates user directories when new users are added. Despite the error the user directory still gets created with the requested MDT index. Would it be safe for lod_verify_md_striping() to accept the V1 magic so we don't see this error all the time? |
| Comment by Di Wang [ 26/Jan/17 ] |
|
Ned, ah, yes, this check will only be used in create striped directory process (lod_ah_init()), which is a void function. That is why it did not return invalid for lfs mkdir. Yes, it should be safe to also accept V1 user magic. I will cook the patch. Thanks. |
| Comment by Gerrit Updater [ 26/Jan/17 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: https://review.whamcloud.com/25091 |
| Comment by Gerrit Updater [ 22/Nov/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25091/ |
| Comment by Peter Jones [ 22/Nov/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 14/Dec/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30542 |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30542/ |