Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9248

conf-sanity: test_55 fails with lov_objid size has to be 8192, not 8192

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.7.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.4
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Joe Gmitter <joseph.gmitter@intel.com>

      conf-sanity: test_55 fails with lov_objid size has to be 8192, not 8192

      Starting client: trevis-33vm1.trevis.hpdd.intel.com: -o user_xattr,flock trevis-33vm7@tcp:/lustre /mnt/lustre
      CMD: trevis-33vm1.trevis.hpdd.intel.com mkdir -p /mnt/lustre
      CMD: trevis-33vm1.trevis.hpdd.intel.com mount -t lustre -o user_xattr,flock trevis-33vm7@tcp:/lustre /mnt/lustre
      checking size of lov_objid for ost index 1023
      CMD: trevis-33vm7 debugfs -R 'stat lov_objid' /dev/lvm-Role_MDS/P1 2>/dev/null
      conf-sanity test_55: @@@@@@ FAIL: lov_objid size has to be 8192, not 8192

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3e251b4e-0f04-11e7-9053-5254006e85c2.

      Attachments

        Issue Links

          Activity

            [LU-9248] conf-sanity: test_55 fails with lov_objid size has to be 8192, not 8192
            standan Saurabh Tandan (Inactive) added a comment - +1 on 2.10.3  https://testing.hpdd.intel.com/test_sets/61ca2062-5067-11e8-abc3-52540065bddc

            It would be better to do something like:

            do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
                                            grep "^User" | sed -e 's/.*Size: //' -e 's/ [A-Z].*//')
            

            That will drop everything before and including "{{Size: }}", and then (just in case this changes again in the future) drop anything after the actual size. That should work with both old and new debugfs output, and be flexible in the future. We might even consider to replace the use of "^User" with "Size: " so that it is triggered on the actual data that we want rather than an unrelated value that just happens to exist on the same line.

            adilger Andreas Dilger added a comment - It would be better to do something like: do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" | grep "^User" | sed -e 's/.*Size: //' -e 's/ [A-Z].*//') That will drop everything before and including "{{Size: }}", and then (just in case this changes again in the future) drop anything after the actual size. That should work with both old and new debugfs output, and be flexible in the future. We might even consider to replace the use of "^User" with "Size: " so that it is triggered on the actual data that we want rather than an unrelated value that just happens to exist on the same line.

            We haven’t see conf-sanity test 55 fail with

            lov_objid size has to be 8192, not 8192
            

            for over a year. I’ve gone back to January of 2017 and don’t see this error message for this test.

            What we do see frequently is the error

            lov_objid size has to be 8192, not 0
            

            which is not the same issue. We see this error message only during interop testing; when a client with version 2.9.56 (actually 2.9.55.36) or earlier runs against a server with version 2.9.57 (actually ~2.9.55.38) or later. For example, we see this failure with the following Lustre client/server conbinations:

            2.9.0 clients and 2.11.50.52 servers
            2.9.0 clients and 2.10.52.75 servers
            2.5.5-RC2 clients and  2.10.52.97 servers
            2.7.3 (2_7_fe) clients and 2.9.55.41 servers
            2.9.0 clients and 2.9.56.11 servers

            The issue with interop testing is that the patch for LU-4017, commit 91fbc94f3eabe9a, changed the following in conf-sanity test 55

                            echo checking size of lov_objid for ost index $i
            -               LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" | grep ^User | awk '\{print $6}')
            +               LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
            +                                grep ^User | awk -F 'Size: ' '\{print $2}')
                            if [ "$LOV_OBJID_SIZE" != $(lov_objid_size $i) ]; then
                                    error "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE"
                            else
            

            Looking at a master, 2.11.50, MDS, on a running system, we see

            # debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User
            debugfs 1.42.13.wc6 (05-Feb-2017)
            User:     0   Group:     0   Project:     0   Size: 32
            

            Using the “old”, pre 2.9.56 grep/awk commands printing $6, we get

            # debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User | awk '\{print $6}'
            debugfs 1.42.13.wc6 (05-Feb-2017)
            0
            

            which explains the output we see with interop testing.

            Thus, if we want to "fix" this issue, we would need to change what parameter is printed based on the server version number for all client from 2.9.0 and before which seems unlikely. 

             

            jamesanunez James Nunez (Inactive) added a comment - We haven’t see conf-sanity test 55 fail with lov_objid size has to be 8192, not 8192 for over a year. I’ve gone back to January of 2017 and don’t see this error message for this test. What we do see frequently is the error lov_objid size has to be 8192, not 0 which is not the same issue. We see this error message only during interop testing; when a client with version 2.9.56 (actually 2.9.55.36) or earlier runs against a server with version 2.9.57 (actually ~2.9.55.38) or later. For example, we see this failure with the following Lustre client/server conbinations: 2.9.0 clients and 2.11.50.52 servers 2.9.0 clients and 2.10.52.75 servers 2.5.5-RC2 clients and  2.10.52.97 servers 2.7.3 (2_7_fe) clients and 2.9.55.41 servers 2.9.0 clients and 2.9.56.11 servers The issue with interop testing is that the patch for LU-4017 , commit 91fbc94f3eabe9a, changed the following in conf-sanity test 55                 echo checking size of lov_objid  for  ost index $i -               LOV_OBJID_SIZE=$(do_facet mds1  "$DEBUGFS -R  'stat lov_objid'  $mdsdev 2>/dev/ null "  | grep ^User | awk  '\{print $6}' ) +               LOV_OBJID_SIZE=$(do_facet mds1  "$DEBUGFS -R  'stat lov_objid'  $mdsdev 2>/dev/ null "  | +                                grep ^User | awk -F  'Size: '   '\{print $2}' )                  if  [  "$LOV_OBJID_SIZE"  != $(lov_objid_size $i) ]; then                         error  "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE"                  else Looking at a master, 2.11.50, MDS, on a running system, we see # debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User debugfs 1.42.13.wc6 (05-Feb-2017) User:     0   Group:     0   Project:     0   Size: 32 Using the “old”, pre 2.9.56 grep/awk commands printing $6, we get # debugfs -R 'stat lov_objid' /dev/vda3 | grep ^User | awk '\{print $6}' debugfs 1.42.13.wc6 (05-Feb-2017) 0 which explains the output we see with interop testing. Thus, if we want to "fix" this issue, we would need to change what parameter is printed based on the server version number for all client from 2.9.0 and before which seems unlikely.   

            This looks like some kind of bash comparison bug in the test that could be easily fixed:

                            LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" |
                                             grep ^User | awk -F 'Size: ' '{print $2}')
                            if [ "$LOV_OBJID_SIZE" != $(lov_objid_size $i) ]; then
                                    error "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE"
            

            It might be that the quoted "$LOV_OBJID_SIZE" doesn't compare nicely with the unquoted $(lov_objid_size $i)? Try removing the double quotes, and using if [[ ... ]] instead? It would also be useful to add single quotes around the values in the error message, in case there are spaces around the values (which would also cause problems with the quoted value).

            adilger Andreas Dilger added a comment - This looks like some kind of bash comparison bug in the test that could be easily fixed: LOV_OBJID_SIZE=$(do_facet mds1 "$DEBUGFS -R 'stat lov_objid' $mdsdev 2>/dev/null" | grep ^User | awk -F 'Size: ' '{print $2}') if [ "$LOV_OBJID_SIZE" != $(lov_objid_size $i) ]; then error "lov_objid size has to be $(lov_objid_size $i), not $LOV_OBJID_SIZE" It might be that the quoted "$LOV_OBJID_SIZE" doesn't compare nicely with the unquoted $(lov_objid_size $i) ? Try removing the double quotes, and using if [[ ... ]] instead? It would also be useful to add single quotes around the values in the error message, in case there are spaces around the values (which would also cause problems with the quoted value).
            jcasper James Casper (Inactive) added a comment - 2.9.57, b3575: https://testing.hpdd.intel.com/test_sessions/0800ff00-8d4c-4627-878e-566e8a697c01

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: