Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8062

recovery-small test_115b: @@@@@@ FAIL: dd success

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      == recovery-small test 115b: write: late REQ MDunlink and no bulk == 21:12:09 (1461384729)
      Filesystem 1K-blocks Used Available Use% Mounted on
      onyx-38vm7@tcp:/lustre
      74157152 309236 69890576 1% /mnt/lustre
      fail_loc=0x8000051b
      fail_val=4
      Filesystem 1K-blocks Used Available Use% Mounted on
      onyx-38vm7@tcp:/lustre
      74157152 309236 69890576 1% /mnt/lustre
      CMD: onyx-38vm8 lctl set_param fail_val=0 fail_loc=0x80000215
      fail_val=0
      fail_loc=0x80000215
      1+0 records in
      1+0 records out
      4096 bytes (4.1 kB) copied, 2.13538 s, 1.9 kB/s
      recovery-small test_115b: @@@@@@ FAIL: dd success
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:4764:error()
      = /usr/lib64/lustre/tests/recovery-small.sh:2161:test_115_write()
      = /usr/lib64/lustre/tests/recovery-small.sh:2181:test_115b()
      = /usr/lib64/lustre/tests/test-framework.sh:5028:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:5067:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:4914:run_test()
      = /usr/lib64/lustre/tests/recovery-small.sh:2183:main()
      Dumping lctl log to /logdir/test_logs/2016-04-22/lustre-reviews-el6_7-x86_64-review-dne-part-1-1_6_1_38438_-70130481106820-100010/recovery-small.test_115b.*.1461384732.log

      Attachments

        Issue Links

          Activity

            [LU-8062] recovery-small test_115b: @@@@@@ FAIL: dd success

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28181/
            Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 39c090bdb9beacc0837cf921d87a451308364131

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28181/ Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack() Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 39c090bdb9beacc0837cf921d87a451308364131

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/28181
            Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack()
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: b13e61f2e689efb8de1b558e9499c497921187e1

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/28181 Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: b13e61f2e689efb8de1b558e9499c497921187e1

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26815/
            Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e9e744ea7352ea0d1a5d9b2bd05e0e7c19f08596

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26815/ Subject: LU-8062 ptlrpc: increase sleep time in ptlrpc_request_bufs_pack() Project: fs/lustre-release Branch: master Current Patch Set: Commit: e9e744ea7352ea0d1a5d9b2bd05e0e7c19f08596

            Vitaly Fertman (vitaly.fertman@seagate.com) uploaded a new patch: https://review.whamcloud.com/26815
            Subject: LU-8062 libcfs: schedule_timeout fix
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fbc35bf832d17afa3aa7c7c9e4178986e7cc0458

            gerrit Gerrit Updater added a comment - Vitaly Fertman (vitaly.fertman@seagate.com) uploaded a new patch: https://review.whamcloud.com/26815 Subject: LU-8062 libcfs: schedule_timeout fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fbc35bf832d17afa3aa7c7c9e4178986e7cc0458

            As Bhagyesh Dudhediya mentioned in his first comment, this problem looks to be a problem with the test itself, rather than a problem with the code. There is a race condition because fail_loc=0x215 (OBD_FAIL_OST_ENOSPC) is insufficiently specific to cause only the write to fail, but it also fails for unrelated OST_STATFS RPCs on that OST (e.g. from the MDS).

            can you clarify how does it happen? there is a protection made with $OSTCOUNT, that failcheck is triggered only if the fail_val matches the ostid. not supposed to happen.

            vitaly_fertman Vitaly Fertman added a comment - As Bhagyesh Dudhediya mentioned in his first comment, this problem looks to be a problem with the test itself, rather than a problem with the code. There is a race condition because fail_loc=0x215 (OBD_FAIL_OST_ENOSPC) is insufficiently specific to cause only the write to fail, but it also fails for unrelated OST_STATFS RPCs on that OST (e.g. from the MDS). can you clarify how does it happen? there is a protection made with $OSTCOUNT, that failcheck is triggered only if the fail_val matches the ostid. not supposed to happen.
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/41cb5880-23e0-11e7-b742-5254006e85c2

            Saw a very similar write not blocked issue with master b3541:

            fail_loc=0x720
            1+0 records in
            1+0 records out
            4096 bytes (4.1 kB) copied, 0.0340116 s, 120 kB/s
             sanity test_313: @@@@@@ FAIL: write should failed
            

            https://testing.hpdd.intel.com/test_sets/1c10446a-0a05-11e7-9053-5254006e85c2

            jcasper James Casper (Inactive) added a comment - Saw a very similar write not blocked issue with master b3541: fail_loc=0x720 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.0340116 s, 120 kB/s sanity test_313: @@@@@@ FAIL: write should failed https://testing.hpdd.intel.com/test_sets/1c10446a-0a05-11e7-9053-5254006e85c2

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25279
            Subject: LU-8062 tests: fix recovery-small test_115b fail_loc
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8b7d81d7c6437ce085870d0c434b7b44c3d6601f

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25279 Subject: LU-8062 tests: fix recovery-small test_115b fail_loc Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8b7d81d7c6437ce085870d0c434b7b44c3d6601f

            As 529964 mentioned in his first comment, this problem looks to be a problem with the test itself, rather than a problem with the code. There is a race condition because fail_loc=0x215 (OBD_FAIL_OST_ENOSPC) is insufficiently specific to cause only the write to fail, but it also fails for unrelated OST_STATFS RPCs on that OST (e.g. from the MDS).

            adilger Andreas Dilger added a comment - As 529964 mentioned in his first comment, this problem looks to be a problem with the test itself, rather than a problem with the code. There is a race condition because fail_loc=0x215 ( OBD_FAIL_OST_ENOSPC ) is insufficiently specific to cause only the write to fail, but it also fails for unrelated OST_STATFS RPCs on that OST (e.g. from the MDS).
            sbuisson Sebastien Buisson (Inactive) added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/904c2bbc-ea9f-11e6-b844-5254006e85c2

            People

              wc-triage WC Triage
              529964 Bhagyesh Dudhediya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: