[LU-9248] conf-sanity: test_55 fails with lov_objid size has to be 8192, not 8192 Created: 23/Mar/17  Updated: 19/Mar/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: tests

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Joe Gmitter <joseph.gmitter@intel.com>

conf-sanity: test_55 fails with lov_objid size has to be 8192, not 8192

Starting client: trevis-33vm1.trevis.hpdd.intel.com: -o user_xattr,flock trevis-33vm7@tcp:/lustre /mnt/lustre
CMD: trevis-33vm1.trevis.hpdd.intel.com mkdir -p /mnt/lustre
CMD: trevis-33vm1.trevis.hpdd.intel.com mount -t lustre -o user_xattr,flock trevis-33vm7@tcp:/lustre /mnt/lustre
checking size of lov_objid for ost index 1023
CMD: trevis-33vm7 debugfs -R 'stat lov_objid' /dev/lvm-Role_MDS/P1 2>/dev/null
conf-sanity test_55: @@@@@@ FAIL: lov_objid size has to be 8192, not 8192

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3e251b4e-0f04-11e7-9053-5254006e85c2.



 Comments   
Comment by James Casper [ 24/May/17 ]

2.9.57, b3575:
https://testing.hpdd.intel.com/test_sessions/0800ff00-8d4c-4627-878e-566e8a697c01

Comment by Andreas Dilger [ 01/Dec/17 ]

This looks like some kind of bash comparison bug in the test that could be easily fixed:

                LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
                                 grep ^User | awk -F 'Size: ' '{print $2}')
                if [ "$LOV_OBJID_SIZE" != $(lov_objid_size $i) ]; then
                        error "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE"

It might be that the quoted "$LOV_OBJID_SIZE" doesn't compare nicely with the unquoted $(lov_objid_size $i)? Try removing the double quotes, and using if [[ ... ]] instead? It would also be useful to add single quotes around the values in the error message, in case there are spaces around the values (which would also cause problems with the quoted value).

Comment by James Nunez (Inactive) [ 13/Apr/18 ]

We haven’t see conf-sanity test 55 fail with

lov_objid size has to be 8192, not 8192

for over a year. I’ve gone back to January of 2017 and don’t see this error message for this test.

What we do see frequently is the error

lov_objid size has to be 8192, not 0

which is not the same issue. We see this error message only during interop testing; when a client with version 2.9.56 (actually 2.9.55.36) or earlier runs against a server with version 2.9.57 (actually ~2.9.55.38) or later. For example, we see this failure with the following Lustre client/server conbinations:

2.9.0 clients and 2.11.50.52 servers
2.9.0 clients and 2.10.52.75 servers
2.5.5-RC2 clients and  2.10.52.97 servers
2.7.3 (2_7_fe) clients and 2.9.55.41 servers
2.9.0 clients and 2.9.56.11 servers

The issue with interop testing is that the patch for LU-4017, commit 91fbc94f3eabe9a, changed the following in conf-sanity test 55

                echo checking size of lov_objid for ost index $i
-               LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" | grep ^User | awk '\{print $6}')
+               LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
+                                grep ^User | awk -F 'Size: ' '\{print $2}')
                if [ "$LOV_OBJID_SIZE" != $(lov_objid_size $i) ]; then
                        error "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE"
                else

Looking at a master, 2.11.50, MDS, on a running system, we see

# debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User
debugfs 1.42.13.wc6 (05-Feb-2017)
User:     0   Group:     0   Project:     0   Size: 32

Using the “old”, pre 2.9.56 grep/awk commands printing $6, we get

# debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User | awk '\{print $6}'
debugfs 1.42.13.wc6 (05-Feb-2017)
0

which explains the output we see with interop testing.

Thus, if we want to "fix" this issue, we would need to change what parameter is printed based on the server version number for all client from 2.9.0 and before which seems unlikely. 

 

Comment by Andreas Dilger [ 13/Apr/18 ]

It would be better to do something like:

do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
                                grep "^User" | sed -e 's/.*Size: //' -e 's/ [A-Z].*//')

That will drop everything before and including "{{Size: }}", and then (just in case this changes again in the future) drop anything after the actual size. That should work with both old and new debugfs output, and be flexible in the future. We might even consider to replace the use of "^User" with "Size: " so that it is triggered on the actual data that we want rather than an unrelated value that just happens to exist on the same line.

Comment by Saurabh Tandan (Inactive) [ 08/May/18 ]

+1 on 2.10.3 https://testing.hpdd.intel.com/test_sets/61ca2062-5067-11e8-abc3-52540065bddc

Generated at Sat Feb 10 02:24:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.