Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2547

test: recovery-small test_24a, test_24b: multiop didn't fail fsync: rc 0

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.3.0, Lustre 2.4.0
    • Server: 2.1.3-1nasS, centos 6.3, 2.6.32_279.2.1.el6
      Client: 2.3.0-1nasC, sles11sp2, 3.0.42_0.7.3
      1 mds, 2 oss, 2 clients.
      Test was run from service331.
    • 3
    • 5971

    Description

      == recovery-small test 24a: fsync error (should return error) ======================================== 23:15:59 (1356678959)
      multiop /mnt/nbp0-1/d0.recovery-small/d24/f.recovery-small.24a vOwy_wyc
      TMPPIPE=/tmp/multiop_open_wait_pipe.9556
      fail_loc=0x0
      Connected clients:
      service331
      service331
      service332
      recovery-small test_24a: @@@@@@ IGNORE (bz5494): multiop didn't fail fsync: rc 0

      test_logs tarball is attached: recovery-small.24a.tgz

      Attachments

        Activity

          [LU-2547] test: recovery-small test_24a, test_24b: multiop didn't fail fsync: rc 0

          Lustre fsync was semantically wrong before the fix of LU-4388, it didn't sync data on fsync at all (it get the semantic of fsync and fdatasync wrong), so it's not surprise this test fails often.

          00000080:00000001:4.0:1356678959.907382:0:21907:0:(obd_class.h:2061:md_sync()) Process leaving (rc=0 : 0 : 0)
          00000100:00000001:4.0:1356678959.907382:0:21907:0:(client.c:2323:__ptlrpc_req_finished()) Process entered
          00000100:00000040:4.0:1356678959.907383:0:21907:0:(client.c:2335:__ptlrpc_req_finished()) @@@ refcount now 0  req@ffff8805bdf93800 x1422578972363063/t0(0) o44->lustre-MDT0000-mdc-ffff8806f504e800@10.151.25.187@o2ib:12/10 lens 448/408 e 0 to 0 dl 1356678997 ref 1 fl Complete:R/0/0 rc 0/0
          00000100:00000001:4.0:1356678959.907386:0:21907:0:(client.c:2245:__ptlrpc_free_req()) Process entered
          02000000:00000001:4.0:1356678959.907388:0:21907:0:(sec.c:1697:sptlrpc_cli_free_repbuf()) Process entered
          02000000:00000010:4.0:1356678959.907388:0:21907:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf': 1024 at ffff8805d7a85000.
          02000000:00000001:4.0:1356678959.907389:0:21907:0:(sec.c:1711:sptlrpc_cli_free_repbuf()) Process leaving
          00000020:00000001:4.0:1356678959.907390:0:21907:0:(genops.c:963:class_import_put()) Process entered
          00000020:00000040:4.0:1356678959.907390:0:21907:0:(genops.c:970:class_import_put()) import ffff880379630800 refcount=10 obd=lustre-MDT0000-mdc-ffff8806f504e800
          00000020:00000001:4.0:1356678959.907391:0:21907:0:(genops.c:979:class_import_put()) Process leaving
          02000000:00000010:4.0:1356678959.907392:0:21907:0:(sec_null.c:201:null_free_reqbuf()) kfreed 'req->rq_reqbuf': 512 at ffff8805d8599600.
          02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:437:sptlrpc_req_put_ctx()) Process entered
          02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:453:sptlrpc_req_put_ctx()) Process leaving
          00000100:00000010:4.0:1356678959.907395:0:21907:0:(client.c:2299:__ptlrpc_free_req()) kfreed 'request': 928 at ffff8805bdf93800.
          00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2300:__ptlrpc_free_req()) Process leaving
          00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2339:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1)
          00020000:00000002:4.0:1356678959.907397:0:21907:0:(lov_object.c:787:lov_lsm_addref()) lsm ffff8805b90ed640 addref 2 by ffff880382188280.
          00020000:00000002:4.0:1356678959.907398:0:21907:0:(lov_object.c:799:lov_lsm_decref()) lsm ffff8805b90ed640 decref 2 by ffff880382188280.
          00000080:00000001:4.0:1356678959.907399:0:21907:0:(file.c:2121:ll_fsync()) Process leaving (rc=0 : 0 : 0)
          

          Above log shows fsync synced meta data only.

          Given fsync has been fixed since 2.6 (by LU-4388), I think we can simply re-enable these two tests to see if it will go well.

          niu Niu Yawei (Inactive) added a comment - Lustre fsync was semantically wrong before the fix of LU-4388 , it didn't sync data on fsync at all (it get the semantic of fsync and fdatasync wrong), so it's not surprise this test fails often. 00000080:00000001:4.0:1356678959.907382:0:21907:0:(obd_class.h:2061:md_sync()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:4.0:1356678959.907382:0:21907:0:(client.c:2323:__ptlrpc_req_finished()) Process entered 00000100:00000040:4.0:1356678959.907383:0:21907:0:(client.c:2335:__ptlrpc_req_finished()) @@@ refcount now 0 req@ffff8805bdf93800 x1422578972363063/t0(0) o44->lustre-MDT0000-mdc-ffff8806f504e800@10.151.25.187@o2ib:12/10 lens 448/408 e 0 to 0 dl 1356678997 ref 1 fl Complete:R/0/0 rc 0/0 00000100:00000001:4.0:1356678959.907386:0:21907:0:(client.c:2245:__ptlrpc_free_req()) Process entered 02000000:00000001:4.0:1356678959.907388:0:21907:0:(sec.c:1697:sptlrpc_cli_free_repbuf()) Process entered 02000000:00000010:4.0:1356678959.907388:0:21907:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf' : 1024 at ffff8805d7a85000. 02000000:00000001:4.0:1356678959.907389:0:21907:0:(sec.c:1711:sptlrpc_cli_free_repbuf()) Process leaving 00000020:00000001:4.0:1356678959.907390:0:21907:0:(genops.c:963:class_import_put()) Process entered 00000020:00000040:4.0:1356678959.907390:0:21907:0:(genops.c:970:class_import_put()) import ffff880379630800 refcount=10 obd=lustre-MDT0000-mdc-ffff8806f504e800 00000020:00000001:4.0:1356678959.907391:0:21907:0:(genops.c:979:class_import_put()) Process leaving 02000000:00000010:4.0:1356678959.907392:0:21907:0:(sec_null.c:201:null_free_reqbuf()) kfreed 'req->rq_reqbuf' : 512 at ffff8805d8599600. 02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:437:sptlrpc_req_put_ctx()) Process entered 02000000:00000001:4.0:1356678959.907394:0:21907:0:(sec.c:453:sptlrpc_req_put_ctx()) Process leaving 00000100:00000010:4.0:1356678959.907395:0:21907:0:(client.c:2299:__ptlrpc_free_req()) kfreed 'request' : 928 at ffff8805bdf93800. 00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2300:__ptlrpc_free_req()) Process leaving 00000100:00000001:4.0:1356678959.907396:0:21907:0:(client.c:2339:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1) 00020000:00000002:4.0:1356678959.907397:0:21907:0:(lov_object.c:787:lov_lsm_addref()) lsm ffff8805b90ed640 addref 2 by ffff880382188280. 00020000:00000002:4.0:1356678959.907398:0:21907:0:(lov_object.c:799:lov_lsm_decref()) lsm ffff8805b90ed640 decref 2 by ffff880382188280. 00000080:00000001:4.0:1356678959.907399:0:21907:0:(file.c:2121:ll_fsync()) Process leaving (rc=0 : 0 : 0) Above log shows fsync synced meta data only. Given fsync has been fixed since 2.6 (by LU-4388 ), I think we can simply re-enable these two tests to see if it will go well.
          pjones Peter Jones added a comment -

          Niu

          Can you please check to see what needs to happen to get this test re-enabled?

          Thanks

          Peter

          pjones Peter Jones added a comment - Niu Can you please check to see what needs to happen to get this test re-enabled? Thanks Peter
          adilger Andreas Dilger added a comment - - edited

          The whole point of the always_except label is that it means this test is being skipped, but the original bug has not actually been fixed. This ticket shouldn't be closed until the original problem is fixed (lack of error return to userspace on fsync) and the test is removed from the ALWAYS_EXCEPT list in recovery-small.sh. I verified that this test is still being skipped, but only for ZFS MDT.

          adilger Andreas Dilger added a comment - - edited The whole point of the always_except label is that it means this test is being skipped, but the original bug has not actually been fixed. This ticket shouldn't be closed until the original problem is fixed (lack of error return to userspace on fsync) and the test is removed from the ALWAYS_EXCEPT list in recovery-small.sh. I verified that this test is still being skipped, but only for ZFS MDT.
          pjones Peter Jones added a comment -

          ok Jay

          pjones Peter Jones added a comment - ok Jay

          Please close this ticket since the test was marked "always_except".

          jaylan Jay Lan (Inactive) added a comment - Please close this ticket since the test was marked "always_except".

          recovery-small test_24a and test_24b are being skipped, the problem was not actually fixed.

          adilger Andreas Dilger added a comment - recovery-small test_24a and test_24b are being skipped, the problem was not actually fixed.
          utopiabound Nathaniel Clark added a comment - Patch for b2_4 http://review.whamcloud.com/7424
          pjones Peter Jones added a comment -

          Landed for 2.5

          pjones Peter Jones added a comment - Landed for 2.5
          bfaccini Bruno Faccini (Inactive) added a comment - Got an occurrence with recovery-small/test_24b during https://maloo.whamcloud.com/test_sets/f43c1ffc-e4ad-11e2-a950-52540035b04c . When will http://review.whamcloud.com/6587 land ?
          utopiabound Nathaniel Clark added a comment - - edited

          Patch to EXCEPT 24b also for ZFS
          http://review.whamcloud.com/6587

          utopiabound Nathaniel Clark added a comment - - edited Patch to EXCEPT 24b also for ZFS http://review.whamcloud.com/6587

          It looked like a patch was landed but I saw a zfs fail today that looked exactly like this.

          https://maloo.whamcloud.com/test_sets/f425e1a6-bc12-11e2-b013-52540035b04c

          keith Keith Mannthey (Inactive) added a comment - It looked like a patch was landed but I saw a zfs fail today that looked exactly like this. https://maloo.whamcloud.com/test_sets/f425e1a6-bc12-11e2-b013-52540035b04c

          People

            niu Niu Yawei (Inactive)
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: