Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10729

replay-dual test_23d: FAIL: Remote creation failed 1 : mkdir: cannot create directory': File exists

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.14.0, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      test fails on failover config: hosts != failoverhosts

      == replay-dual test 23d: c1 rmdir d1, M0 drop update reply and fail M0/M1, c2 mkdir d1 =============== 19:53:39 (1519674819)
      fail_loc=0x1701
      fail_loc=0
      Failing mds1 on fre909
      Stopping /mnt/lustre-mds1 (opts:) on fre909
      pdsh@fre913: fre909: ssh exited with exit code 1
      Failing mds2 on fre910
      Stopping /mnt/lustre-mds2 (opts:) on fre910
      pdsh@fre913: fre910: ssh exited with exit code 1
      reboot facets: mds1
      Failover mds1 to fre910
      ...
      fre915: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID      1168028        6688     1062924   1% /mnt/lustre[MDT:0]
      lustre-MDT0001_UUID      1168028        6272     1063340   1% /mnt/lustre[MDT:1]
      lustre-OST0000_UUID      4607356       34096     4306876   1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID      4607356       34096     4306876   1% /mnt/lustre[OST:1]
      lustre-OST0002_UUID      4607356       34100     4306872   1% /mnt/lustre[OST:2]
      lustre-OST0003_UUID      4607356       34100     4306872   1% /mnt/lustre[OST:3]
      
      filesystem_summary:     18429424      136392    17227496   1% /mnt/lustre
      
      fre914: mkdir: cannot create directory '/mnt/lustre2/d23d.replay-dual/remote_dir': File exists
      pdsh@fre913: fre914: ssh exited with exit code 1
       replay-dual test_23d: @@@@@@ FAIL: Remote creation failed 1 
      

      Attachments

        Issue Links

          Activity

            [LU-10729] replay-dual test_23d: FAIL: Remote creation failed 1 : mkdir: cannot create directory': File exists

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52343/
            Subject: LU-10729 tests: replay-dual/22d to wait
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c0973b9fd64036adb3991615e76a97c6aa0b384e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52343/ Subject: LU-10729 tests: replay-dual/22d to wait Project: fs/lustre-release Branch: master Current Patch Set: Commit: c0973b9fd64036adb3991615e76a97c6aa0b384e

            "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52343
            Subject: LU-10729 tests: replay-dual/22d to wait
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bfa32d30813269f74c721eef5aca8930f40230e8

            gerrit Gerrit Updater added a comment - "Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52343 Subject: LU-10729 tests: replay-dual/22d to wait Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bfa32d30813269f74c721eef5aca8930f40230e8

            I also see this fail intermittently on replay-dual test_22d.

            adilger Andreas Dilger added a comment - I also see this fail intermittently on replay-dual test_22d.
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45623/
            Subject: LU-10729 tests: replay-dual/23d to wait
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 63a19f6f666b9d18fede66ce8bcd2d799b5e0fa7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45623/ Subject: LU-10729 tests: replay-dual/23d to wait Project: fs/lustre-release Branch: master Current Patch Set: Commit: 63a19f6f666b9d18fede66ce8bcd2d799b5e0fa7

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45623
            Subject: LU-10729 tests: replay-dual/23d to wait
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f06970ed6d6e839b82b9623c1ae007164e3d5d50

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45623 Subject: LU-10729 tests: replay-dual/23d to wait Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f06970ed6d6e839b82b9623c1ae007164e3d5d50

            the llog cancel for a previous distributed transaction race with the update from rmdir ../remote_dir and if that llog write wins then rmdir's update is on hold (until the reply for llog write).
            I think the easiest fix is just to wait a bit before rmdir to let that cancel write to get processed.

            bzzz Alex Zhuravlev added a comment - the llog cancel for a previous distributed transaction race with the update from rmdir ../remote_dir and if that llog write wins then rmdir's update is on hold (until the reply for llog write). I think the easiest fix is just to wait a bit before rmdir to let that cancel write to get processed.

            When this test fails, we see that several test suites that run after replay-dual fail because they can't clean up this sub test directory; the tests don't even start to run or the test suite fails on clean up after all tests are run.

            From the end of the replay-vbr test_log at https://testing.whamcloud.com/test_sets/95dc7668-98ef-11e8-a9f7-52540065bddc, we see

            rm: cannot remove '/mnt/lustre/d23d.replay-dual': Directory not empty
              Trace dump:
              = /usr/lib64/lustre/tests/replay-vbr.sh:31:main()
            replay-vbr: FAIL: test-framework exiting on error
            

            I'm gong to attribute these failures to this ticket.

            jamesanunez James Nunez (Inactive) added a comment - When this test fails, we see that several test suites that run after replay-dual fail because they can't clean up this sub test directory; the tests don't even start to run or the test suite fails on clean up after all tests are run. From the end of the replay-vbr test_log at https://testing.whamcloud.com/test_sets/95dc7668-98ef-11e8-a9f7-52540065bddc , we see rm: cannot remove '/mnt/lustre/d23d.replay-dual': Directory not empty Trace dump: = /usr/lib64/lustre/tests/replay-vbr.sh:31:main() replay-vbr: FAIL: test-framework exiting on error I'm gong to attribute these failures to this ticket.
            mdiep Minh Diep added a comment - +1 on b2_10 https://testing.hpdd.intel.com/test_sets/9a12c54c-2eab-11e8-b3c6-52540065bddc

            People

              bzzz Alex Zhuravlev
              egryaznova Elena Gryaznova
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: