Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16861

Janitor Testing Fails to copy latest obdfilter-survey (Uses old obdfilter-survey)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • Client: 4.18.0-372.9.1.el8(8.5)
      Server: 4.18.0-425.3.1.el8(8.7)
    • 3
    • 9223372036854775807

    Description

      Testing/Fixing LU-16827 it was observerd that the run was failing in janitor testng. However it was passing in maloo. It was noticed that the required changes under lustre-iokit/obdfilter-survey/obdfilter-survey was not getting reflected under janitor. This was leading to janitor always failing while maloo run was passing as the latest (modified) script was getting used. It looks like janitor always used old script (at least for obdfilter-survey)

      Here are the few revelant logs. Let me know if more information is requied. All logs are under https://review.whamcloud.com/c/fs/lustre-release/+/51035)

      CASE 1: This is modfied to use specific path for obdfilter-survey to make janitor passing.

      From logs : https://testing.whamcloud.com/gerrit-janitor/31629/testresults/obdfilter-survey-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/obdfilter-survey.test_1a.test_log.oleg368-client.log

      This is the obdfilter-survey janitor always uses

      ls -ali /usr/bin/obdfilter-survey
      920857 -rwxr-xr-x. 1 root root 16279 Jun 4 2016 /usr/bin/obdfilter-survey
      

      This is the obdfilter-survey which it is supposed to used. (There are changes withing this code -which maloo correctly picks it up). Please notice the binary size and date stamp for both.

       
      ls -ali /home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey
      total 42
      ...
      2278 -rwxr-xr-x 1 green green 15632 May 31 07:47 obdfilter-survey
      ...
      

       

      test_1a under obdfilter-survey.sh was modifed with to use specfic  OBDSURVEY instead of generic system path version (which is /usr/bin/obdfilter-survey). When this is done the test passes janitor. Else it fails.

      export PATH=$PATH:/home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey
      OBDSURVEY=/home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey/obdfilter-survey
      obdflter_survey_run disk

      CASE 2: The fail case

      This uses old /usr/bin/obdfilter-survey and new changes are not reflected.

       

      <snip>
      + eval NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp 'targets="192.168.203.104:lustre-OST0000' '192.168.203.104:lustre-OST0001"' /usr/bin/obdfilter-survey
      ++ NETTYPE=tcp 
      ++ thrlo=2 
      ++ nobjhi=1 
      ++ thrhi=4 
      ++ size=1024 
      ++ case=disk 
      ++ rslt_loc=/tmp
      ++ targets='192.168.203.104:lustre-OST0000 192.168.203.104:lustre-OST0001' 
      ++ /usr/bin/obdfilter-survey
      Warning: Permanently added '192.168.203.104' (ECDSA) to the list of known hosts. 
      bash: lctl: command not found 
      /usr/bin/obdfilter-survey: line 242: ( << 16) | ( << 8) | : syntax error: operand expected (error token is "<< 16) | ( << 8) | ") 
      /usr/bin/obdfilter-survey: line 254: [: -lt: unary operator expected 
      bash: lctl: command not found 
      bash: lctl: command not found 
      OST lustre-OST0000 not setup
      <snip>
      

       

      Attachments

        Issue Links

          Activity

            [LU-16861] Janitor Testing Fails to copy latest obdfilter-survey (Uses old obdfilter-survey)
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53620/
            Subject: LU-16861 obdfilter: Exclude quotes when getting NIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c265e1c7b045bf1f9e5b2919c282b63086929ab6

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53620/ Subject: LU-16861 obdfilter: Exclude quotes when getting NIDs Project: fs/lustre-release Branch: master Current Patch Set: Commit: c265e1c7b045bf1f9e5b2919c282b63086929ab6

            This is failing 100% of runs on master. It looks like something wrong with the quoting of the targets (note the extra double quotes before each of the targets):

            Andreas, thanks for the hint/pointer. When picking up NID it was including the quotes which was causing the failure.

             

            arshad512 Arshad Hussain added a comment - This is failing 100% of runs on master . It looks like something wrong with the quoting of the targets (note the extra double quotes before each of the targets): Andreas, thanks for the hint/pointer. When picking up NID it was including the quotes which was causing the failure.  

            "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53620
            Subject: LU-16861 obdfilter: Exclude quotes when getting NID's
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cf525604b66c6d2a5f0e5158727f319d48c6f076

            gerrit Gerrit Updater added a comment - "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53620 Subject: LU-16861 obdfilter: Exclude quotes when getting NID's Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cf525604b66c6d2a5f0e5158727f319d48c6f076

            Looks like failing line is

            lctl get_param osc.lustre-OST0000-osc-ffff94bac34eb800.import | awk '/current_connection:/ {sub(/@.*/,""); print $2}'
            "192.168.50.95

            I am getting the patch

             

             

             

             

             

             

            arshad512 Arshad Hussain added a comment - Looks like failing line is lctl get_param osc.lustre-OST0000-osc-ffff94bac34eb800.import | awk '/current_connection:/ {sub(/@.*/,""); print $2}' "192.168.50.95 I am getting the patch            

            I am checking.

            arshad512 Arshad Hussain added a comment - I am checking.

            This is failing 100% of runs on master. It looks like something wrong with the quoting of the targets (note the extra double quotes before each of the targets):

            + NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp targets=""10.240.26.105:lustre-OST0000 "10.240.26.105:lustre-OST0001 "10.240.26.105:lustre-OST0002 "10.240.26.105:lustre-OST0003 "10.240.26.105:lustre-OST0004 "10.240.26.105:lustre-OST0005 "10.240.26.105:lustre-OST0006 "10.240.26.105:lustre-OST0007" /usr/bin/obdfilter-survey
            /usr/lib64/lustre/tests/obdfilter-survey.sh: line 77: 10.240.26.105:lustre-OST0001 10.240.26.105:lustre-OST0002: command not found
            cat: '/tmp/obdfilter_survey*': No such file or directory
             obdfilter-survey test_1a: @@@@@@ FAIL: /usr/bin/obdfilter-survey failed: 127 
              Trace dump:
              = /usr/lib64/lustre/tests/test-framework.sh:6947:error()
              = /usr/lib64/lustre/tests/obdfilter-survey.sh:81:obdflter_survey_run()
              = /usr/lib64/lustre/tests/obdfilter-survey.sh:85:test_1a()
              = /usr/lib64/lustre/tests/test-framework.sh:7287:run_one()
            
            adilger Andreas Dilger added a comment - This is failing 100% of runs on master . It looks like something wrong with the quoting of the targets (note the extra double quotes before each of the targets): + NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp targets=""10.240.26.105:lustre-OST0000 "10.240.26.105:lustre-OST0001 "10.240.26.105:lustre-OST0002 "10.240.26.105:lustre-OST0003 "10.240.26.105:lustre-OST0004 "10.240.26.105:lustre-OST0005 "10.240.26.105:lustre-OST0006 "10.240.26.105:lustre-OST0007" /usr/bin/obdfilter-survey /usr/lib64/lustre/tests/obdfilter-survey.sh: line 77: 10.240.26.105:lustre-OST0001 10.240.26.105:lustre-OST0002: command not found cat: '/tmp/obdfilter_survey*': No such file or directory obdfilter-survey test_1a: @@@@@@ FAIL: /usr/bin/obdfilter-survey failed: 127 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6947:error() = /usr/lib64/lustre/tests/obdfilter-survey.sh:81:obdflter_survey_run() = /usr/lib64/lustre/tests/obdfilter-survey.sh:85:test_1a() = /usr/lib64/lustre/tests/test-framework.sh:7287:run_one()

            People

              arshad512 Arshad Hussain
              arshad512 Arshad Hussain
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: