Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2547

test: recovery-small test_24a, test_24b: multiop didn't fail fsync: rc 0

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.3.0, Lustre 2.4.0
    • Server: 2.1.3-1nasS, centos 6.3, 2.6.32_279.2.1.el6
      Client: 2.3.0-1nasC, sles11sp2, 3.0.42_0.7.3
      1 mds, 2 oss, 2 clients.
      Test was run from service331.
    • 3
    • 5971

    Description

      == recovery-small test 24a: fsync error (should return error) ======================================== 23:15:59 (1356678959)
      multiop /mnt/nbp0-1/d0.recovery-small/d24/f.recovery-small.24a vOwy_wyc
      TMPPIPE=/tmp/multiop_open_wait_pipe.9556
      fail_loc=0x0
      Connected clients:
      service331
      service331
      service332
      recovery-small test_24a: @@@@@@ IGNORE (bz5494): multiop didn't fail fsync: rc 0

      test_logs tarball is attached: recovery-small.24a.tgz

      Attachments

        Activity

          [LU-2547] test: recovery-small test_24a, test_24b: multiop didn't fail fsync: rc 0
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22020/
          Subject: LU-2547 test: re-enable 24a/b of recovery-small
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6912c6f8b1bce483915b3cb386d4695969d29407

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22020/ Subject: LU-2547 test: re-enable 24a/b of recovery-small Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6912c6f8b1bce483915b3cb386d4695969d29407

          Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/22020
          Subject: LU-2547 test: re-enable 24a/b of recovery-small
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: d8c6cdf073a7ae7bd0d9aafb4af288726f4d3bfe

          gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/22020 Subject: LU-2547 test: re-enable 24a/b of recovery-small Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d8c6cdf073a7ae7bd0d9aafb4af288726f4d3bfe

          Lustre fsync was semantically wrong before the fix of LU-4388, it didn't sync data on fsync at all (it get the semantic of fsync and fdatasync wrong), so it's not surprise this test fails often.

          00000080:00000001:4.0:1356678959.907382:0:21907:0:(obd_class.h:2061:md_sync()) Process leaving (rc=0 : 0 : 0)
          00000100:00000001:4.0:1356678959.907382:0:21907:0:(client.c:2323:__ptlrpc_req_finished()) Process entered
          00000100:00000040:4.0:1356678959.907383:0:21907:0:(client.c:2335:__ptlrpc_req_finished()) @@@ refcount now 0  req@ffff8805bdf93800 x1422578972363063/t0(0) o44->lustre-MDT0000-mdc-ffff8806f504e800@10.151.25.187@o2ib:12/10 lens 448/408 e 0 to 0 dl 1356678997 ref 1 fl Complete:R/0/0 rc 0/0
          00000100:00000001:4.0:1356678959.907386:0:21907:0:(client.c:2245:__ptlrpc_free_req()) Process entered
          02000000:00000001:4.0:1356678959.907388:0:21907:0:(sec.c:1697:sptlrpc_cli_free_repbuf()) Process entered
          02000000:00000010:4.0:1356678959.907388:0:21907:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf': 1024 at ffff8805d7a85000.
          02000000:00000001:4.0:1356678959.907389:0:21907:0:(sec.c:1711:sptlrpc_cli_free_repbuf()) Process leaving
          00000020:00000001:4.0:1356678959.907390:0:21907:0:(genops.c:963:class_import_put()) Process entered
          00000020:00000040:4.0:1356678959.907390:0:21907:0:(genops.c:970:class_import_put()) import ffff880379630800 refcount=10 obd=lustre-MDT0000-mdc-ffff8806f504e800
          00000020:00000001:4.0:1356678959.907391:0:21907:0:(genops.c:979:class_import_put()) Process leaving
          02000000:00000010:4.0:1356678959.907392:0:21907:0:(sec_null.c:201:null_free_reqbuf()) kfreed 'req->rq_reqbuf': 512 at ffff8805d8599600.
          02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:437:sptlrpc_req_put_ctx()) Process entered
          02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:453:sptlrpc_req_put_ctx()) Process leaving
          00000100:00000010:4.0:1356678959.907395:0:21907:0:(client.c:2299:__ptlrpc_free_req()) kfreed 'request': 928 at ffff8805bdf93800.
          00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2300:__ptlrpc_free_req()) Process leaving
          00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2339:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1)
          00020000:00000002:4.0:1356678959.907397:0:21907:0:(lov_object.c:787:lov_lsm_addref()) lsm ffff8805b90ed640 addref 2 by ffff880382188280.
          00020000:00000002:4.0:1356678959.907398:0:21907:0:(lov_object.c:799:lov_lsm_decref()) lsm ffff8805b90ed640 decref 2 by ffff880382188280.
          00000080:00000001:4.0:1356678959.907399:0:21907:0:(file.c:2121:ll_fsync()) Process leaving (rc=0 : 0 : 0)
          

          Above log shows fsync synced meta data only.

          Given fsync has been fixed since 2.6 (by LU-4388), I think we can simply re-enable these two tests to see if it will go well.

          niu Niu Yawei (Inactive) added a comment - Lustre fsync was semantically wrong before the fix of LU-4388 , it didn't sync data on fsync at all (it get the semantic of fsync and fdatasync wrong), so it's not surprise this test fails often. 00000080:00000001:4.0:1356678959.907382:0:21907:0:(obd_class.h:2061:md_sync()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:4.0:1356678959.907382:0:21907:0:(client.c:2323:__ptlrpc_req_finished()) Process entered 00000100:00000040:4.0:1356678959.907383:0:21907:0:(client.c:2335:__ptlrpc_req_finished()) @@@ refcount now 0 req@ffff8805bdf93800 x1422578972363063/t0(0) o44->lustre-MDT0000-mdc-ffff8806f504e800@10.151.25.187@o2ib:12/10 lens 448/408 e 0 to 0 dl 1356678997 ref 1 fl Complete:R/0/0 rc 0/0 00000100:00000001:4.0:1356678959.907386:0:21907:0:(client.c:2245:__ptlrpc_free_req()) Process entered 02000000:00000001:4.0:1356678959.907388:0:21907:0:(sec.c:1697:sptlrpc_cli_free_repbuf()) Process entered 02000000:00000010:4.0:1356678959.907388:0:21907:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf' : 1024 at ffff8805d7a85000. 02000000:00000001:4.0:1356678959.907389:0:21907:0:(sec.c:1711:sptlrpc_cli_free_repbuf()) Process leaving 00000020:00000001:4.0:1356678959.907390:0:21907:0:(genops.c:963:class_import_put()) Process entered 00000020:00000040:4.0:1356678959.907390:0:21907:0:(genops.c:970:class_import_put()) import ffff880379630800 refcount=10 obd=lustre-MDT0000-mdc-ffff8806f504e800 00000020:00000001:4.0:1356678959.907391:0:21907:0:(genops.c:979:class_import_put()) Process leaving 02000000:00000010:4.0:1356678959.907392:0:21907:0:(sec_null.c:201:null_free_reqbuf()) kfreed 'req->rq_reqbuf' : 512 at ffff8805d8599600. 02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:437:sptlrpc_req_put_ctx()) Process entered 02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:453:sptlrpc_req_put_ctx()) Process leaving 00000100:00000010:4.0:1356678959.907395:0:21907:0:(client.c:2299:__ptlrpc_free_req()) kfreed 'request' : 928 at ffff8805bdf93800. 00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2300:__ptlrpc_free_req()) Process leaving 00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2339:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1) 00020000:00000002:4.0:1356678959.907397:0:21907:0:(lov_object.c:787:lov_lsm_addref()) lsm ffff8805b90ed640 addref 2 by ffff880382188280. 00020000:00000002:4.0:1356678959.907398:0:21907:0:(lov_object.c:799:lov_lsm_decref()) lsm ffff8805b90ed640 decref 2 by ffff880382188280. 00000080:00000001:4.0:1356678959.907399:0:21907:0:(file.c:2121:ll_fsync()) Process leaving (rc=0 : 0 : 0) Above log shows fsync synced meta data only. Given fsync has been fixed since 2.6 (by LU-4388 ), I think we can simply re-enable these two tests to see if it will go well.
          pjones Peter Jones added a comment -

          Niu

          Can you please check to see what needs to happen to get this test re-enabled?

          Thanks

          Peter

          pjones Peter Jones added a comment - Niu Can you please check to see what needs to happen to get this test re-enabled? Thanks Peter
          adilger Andreas Dilger added a comment - - edited

          The whole point of the always_except label is that it means this test is being skipped, but the original bug has not actually been fixed. This ticket shouldn't be closed until the original problem is fixed (lack of error return to userspace on fsync) and the test is removed from the ALWAYS_EXCEPT list in recovery-small.sh. I verified that this test is still being skipped, but only for ZFS MDT.

          adilger Andreas Dilger added a comment - - edited The whole point of the always_except label is that it means this test is being skipped, but the original bug has not actually been fixed. This ticket shouldn't be closed until the original problem is fixed (lack of error return to userspace on fsync) and the test is removed from the ALWAYS_EXCEPT list in recovery-small.sh. I verified that this test is still being skipped, but only for ZFS MDT.
          pjones Peter Jones added a comment -

          ok Jay

          pjones Peter Jones added a comment - ok Jay

          Please close this ticket since the test was marked "always_except".

          jaylan Jay Lan (Inactive) added a comment - Please close this ticket since the test was marked "always_except".

          recovery-small test_24a and test_24b are being skipped, the problem was not actually fixed.

          adilger Andreas Dilger added a comment - recovery-small test_24a and test_24b are being skipped, the problem was not actually fixed.
          utopiabound Nathaniel Clark added a comment - Patch for b2_4 http://review.whamcloud.com/7424

          People

            niu Niu Yawei (Inactive)
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: