Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      When run obdfilter-survey it would only do one test at a time. After some debugging I traced the issue to destroy_objects function. Here is the errors.

      remote_shell localhost lctl --device 17 destroy 444 1
      error: destroy: invalid objid '444'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>
      remote_shell localhost lctl --device 18 destroy 444 1
      error: destroy: invalid objid '444'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>
      remote_shell localhost lctl --device 19 destroy 444 1
      error: destroy: invalid objid '444'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>
      remote_shell localhost lctl --device 20 destroy 444 1
      error: destroy: invalid objid '444'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>
      remote_shell localhost lctl --device 21 destroy 444 1
      error: destroy: invalid objid '444'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>

       

      Any ideas what could be causing this?

      Attachments

        Issue Links

          Activity

            [LU-10663] obdfilter-survey

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31430/
            Subject: LU-10663 utils: clear errno before check
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: d140cab6f9bbef3d7f77b91628fe8202517fa185

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31430/ Subject: LU-10663 utils: clear errno before check Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: d140cab6f9bbef3d7f77b91628fe8202517fa185

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31430
            Subject: LU-10663 utils: clear errno before check
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 800da3bd685aa2ad4c4ed730ac86d79f7693bfc1

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31430 Subject: LU-10663 utils: clear errno before check Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 800da3bd685aa2ad4c4ed730ac86d79f7693bfc1
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31305/
            Subject: LU-10663 utils: clear errno before check
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9e488fe9413184e61dcf405c9c87ca348dd6824a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31305/ Subject: LU-10663 utils: clear errno before check Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9e488fe9413184e61dcf405c9c87ca348dd6824a

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31305
            Subject: LU-10663 utils: clear errno before check
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 70e9e7bcf4505cba8853117e2eeb92a01e399eec

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31305 Subject: LU-10663 utils: clear errno before check Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 70e9e7bcf4505cba8853117e2eeb92a01e399eec

            copied lctl from a 2.10.1 server to a 2.10.3 obdfilter-survey works. Attaching debug output

            mhanafi Mahmoud Hanafi added a comment - copied lctl from a 2.10.1 server to a 2.10.3 obdfilter-survey works. Attaching debug output

            This start when I updated from 2.10.1 to 2.10.3. I had ran obdfilter-survey both local(oss) and netdisk(2 osses) and I had interrupted it before in 2.10.1.

            I checked the file in /tmp didn't find anything.

            cmd line

            rszlo=1024 rszhi=4096 size=5000 obdfilter-survey 
            

            When run obdfilter-survey on a new OST running 2.10.3 the test only runs 1 time. I coped obdfilter-survey and iokit-libecho from a 2.10.1 server still had the issue.

            I  downgraded the OSS back to 2.10.1 and obdfilter_survey runs without errors.

            nbp7-mds2 ~ # rszlo=1024 rszhi=4096 size=10000 tests_str="write" obdfilter-survey
            
            Wed Feb 14 09:32:41 PST 2018 Obdfilter-survey for case=disk from nbp7-mds2
            ost 3 sz 30720000K rsz 1024K obj 3 thr 3 write 1096.05 [ 335.99, 448.97] 
            ost 3 sz 30720000K rsz 1024K obj 3 thr 6 write 1163.57 [ 377.99, 754.98] 
            ost 3 sz 30720000K rsz 1024K obj 3 thr 12 write 1163.78 [ 383.00, 776.98] 
            ost 3 sz 30720000K rsz 1024K obj 3 thr 24 write 1164.07 [ 385.99, 778.96] 
            ost 3 sz 30720000K rsz 1024K obj 3 thr 48 write 1164.01 [ 384.99, 777.39] 
            ....
            

            So the bug is in 2.10.3 lctl!

            Can I get the priority of this case increased 1 level.**

            mhanafi Mahmoud Hanafi added a comment - This start when I updated from 2.10.1 to 2.10.3. I had ran obdfilter-survey both local(oss) and netdisk(2 osses) and I had interrupted it before in 2.10.1. I checked the file in /tmp didn't find anything. cmd line rszlo=1024 rszhi=4096 size=5000 obdfilter-survey When run obdfilter-survey on a new OST running 2.10.3 the test only runs 1 time. I coped obdfilter-survey and iokit-libecho from a 2.10.1 server still had the issue. I  downgraded the OSS back to 2.10.1 and obdfilter_survey runs without errors. nbp7-mds2 ~ # rszlo=1024 rszhi=4096 size=10000 tests_str= "write" obdfilter-survey Wed Feb 14 09:32:41 PST 2018 Obdfilter-survey for case =disk from nbp7-mds2 ost 3 sz 30720000K rsz 1024K obj 3 thr 3 write 1096.05 [ 335.99, 448.97] ost 3 sz 30720000K rsz 1024K obj 3 thr 6 write 1163.57 [ 377.99, 754.98] ost 3 sz 30720000K rsz 1024K obj 3 thr 12 write 1163.78 [ 383.00, 776.98] ost 3 sz 30720000K rsz 1024K obj 3 thr 24 write 1164.07 [ 385.99, 778.96] ost 3 sz 30720000K rsz 1024K obj 3 thr 48 write 1164.01 [ 384.99, 777.39]  .... So the bug is in 2.10.3 lctl! Can I get the priority of this case increased 1 level.**

            Hello Mahmoud,
            did you interrupt the obdfilter_survey script before getting these errors ?
            I am asking because I am presently doing some re-work (as pert of LU-9730) in the obdfilter_survey framework to strengthen it particularly in its auto-cleanup duty upon normal and interrupted cases.

            Can you also detail the command line/parameters you have used and also the configuration (single node setup?, direct run on OSS ?, ...) being used ?

            Did you also check of any "/tmp//obdfilter_survey_*" left files ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Mahmoud, did you interrupt the obdfilter_survey script before getting these errors ? I am asking because I am presently doing some re-work (as pert of LU-9730 ) in the obdfilter_survey framework to strengthen it particularly in its auto-cleanup duty upon normal and interrupted cases. Can you also detail the command line/parameters you have used and also the configuration (single node setup?, direct run on OSS ?, ...) being used ? Did you also check of any "/tmp//obdfilter_survey_*" left files ?

            People

              jhammond John Hammond
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: