Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The test case in patch https://review.whamcloud.com/35991 "LU-11549 tests: link succeded to an ophan remote object" does not currently work. It needs to be updated so that the OBD_RACE() condition is hit reliably.

      Attachments

        Issue Links

          Activity

            [LU-12848] Add test case for LU-11549
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/35991/
            Subject: LU-12848 tests: link succeded to an ophan remote object
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: dcad0502b5682ab76ce4456573dc7060bcce7da0

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/35991/ Subject: LU-12848 tests: link succeded to an ophan remote object Project: fs/lustre-release Branch: master Current Patch Set: Commit: dcad0502b5682ab76ce4456573dc7060bcce7da0

            sanityN 105 test which illustrates the problem https://review.whamcloud.com/#/c/35991/

            zam Alexander Zarochentsev added a comment - sanityN 105 test which illustrates the problem https://review.whamcloud.com/#/c/35991/

            the test reveals problems with ZFS backend (it is test #104 in our branch and test #105 in the patch for master):

            == sanityn test 104: rename to an open file and link race should not cause fs corruption ============= 15:06:58 (1574089618)
            fail_loc=0x8000018a
            /usr/lib64/lustre/tests/sanityn.sh: line 4633: 17769 Terminated              $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c
            rm: cannot remove '/mnt/lustre/d104.sanityn/mdt1dir/file2x': No such file or directory
             sanityn test_104: @@@@@@ FAIL: Removing test dir failed 
              Trace dump:
              = /usr/lib64/lustre/tests/../tests/test-framework.sh:5988:error()
              = /usr/lib64/lustre/tests/sanityn.sh:4634:test_104()
              = /usr/lib64/lustre/tests/../tests/test-framework.sh:6272:run_one()
              = /usr/lib64/lustre/tests/../tests/test-framework.sh:6311:run_one_logged()
              = /usr/lib64/lustre/tests/../tests/test-framework.sh:6107:run_test()
              = /usr/lib64/lustre/tests/sanityn.sh:4636:main()
            Dumping lctl log to /tmp/test_logs/1574089606/sanityn.test_104.*.1574089621.log
            Resetting fail_loc on all nodes...done.
            FAIL 104 (5s)
            sanityn: FAIL: test_104 Removing test dir failed
            Dumping lctl log to /tmp/test_logs/1574089606/sanityn..*.1574089624.log
            Resetting fail_loc on all nodes...done.
            

            the same failure seen in Oleg's testing http://testing.linuxhacker.ru:3333/lustre-reports/4579/results.html :

            == sanityn test 105: A racy rename/link an open file should not cause fs corruption ================== 13:15:42 (1574273742)
            fail_loc=0x8000018a
            /home/green/git/lustre-release/lustre/tests/sanityn.sh: line 4905: 10460 Terminated              $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c
            rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir/file2x': No such file or directory
             sanityn test_105: @@@@@@ FAIL: Removing test dir failed 
              Trace dump:
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error()
              = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4906:test_105()
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6410:run_one()
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6449:run_one_logged()
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6280:run_test()
              = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4908:main()
            Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273747.log
            oleg256-server: Warning: Permanently added 'oleg256-client.virtnet' (ECDSA) to the list of known hosts.
            oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273747.log.Gd6d36" failed: Operation not permitted (1)
            oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273747.log.uwPdAz" failed: Operation not permitted (1)
            oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
            pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23
            Resetting fail_loc on all nodes...done.
            FAIL 105 (6s)
            cleanup: ======================================================
            == sanityn test complete, duration 21 sec ============================================================ 13:15:51 (1574273751)
            sanityn: FAIL: test_105 Removing test dir failed
            rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir': Directory not empty
             sanityn test_105: @@@@@@ FAIL: remove sub-test dirs failed 
              Trace dump:
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error()
              = /home/green/git/lustre-release/lustre/tests/test-framework.sh:5593:check_and_cleanup_lustre()
              = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4920:main()
            Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273752.log
            oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273752.log.65aqHO" failed: Operation not permitted (1)
            oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273752.log.rhRRav" failed: Operation not permitted (1)
            oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
            pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23
            

            I believe it means an fs corruption, but ZFS has no tool to check it.

            zam Alexander Zarochentsev added a comment - the test reveals problems with ZFS backend (it is test #104 in our branch and test #105 in the patch for master): == sanityn test 104: rename to an open file and link race should not cause fs corruption ============= 15:06:58 (1574089618) fail_loc=0x8000018a /usr/lib64/lustre/tests/sanityn.sh: line 4633: 17769 Terminated $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c rm: cannot remove '/mnt/lustre/d104.sanityn/mdt1dir/file2x': No such file or directory sanityn test_104: @@@@@@ FAIL: Removing test dir failed Trace dump: = /usr/lib64/lustre/tests/../tests/test-framework.sh:5988:error() = /usr/lib64/lustre/tests/sanityn.sh:4634:test_104() = /usr/lib64/lustre/tests/../tests/test-framework.sh:6272:run_one() = /usr/lib64/lustre/tests/../tests/test-framework.sh:6311:run_one_logged() = /usr/lib64/lustre/tests/../tests/test-framework.sh:6107:run_test() = /usr/lib64/lustre/tests/sanityn.sh:4636:main() Dumping lctl log to /tmp/test_logs/1574089606/sanityn.test_104.*.1574089621.log Resetting fail_loc on all nodes...done. FAIL 104 (5s) sanityn: FAIL: test_104 Removing test dir failed Dumping lctl log to /tmp/test_logs/1574089606/sanityn..*.1574089624.log Resetting fail_loc on all nodes...done. the same failure seen in Oleg's testing http://testing.linuxhacker.ru:3333/lustre-reports/4579/results.html : == sanityn test 105: A racy rename/link an open file should not cause fs corruption ================== 13:15:42 (1574273742) fail_loc=0x8000018a /home/green/git/lustre-release/lustre/tests/sanityn.sh: line 4905: 10460 Terminated $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir/file2x': No such file or directory sanityn test_105: @@@@@@ FAIL: Removing test dir failed Trace dump: = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error() = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4906:test_105() = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6410:run_one() = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6449:run_one_logged() = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6280:run_test() = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4908:main() Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273747.log oleg256-server: Warning: Permanently added 'oleg256-client.virtnet' (ECDSA) to the list of known hosts. oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273747.log.Gd6d36" failed: Operation not permitted (1) oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273747.log.uwPdAz" failed: Operation not permitted (1) oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2] pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23 Resetting fail_loc on all nodes...done. FAIL 105 (6s) cleanup: ====================================================== == sanityn test complete, duration 21 sec ============================================================ 13:15:51 (1574273751) sanityn: FAIL: test_105 Removing test dir failed rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir': Directory not empty sanityn test_105: @@@@@@ FAIL: remove sub-test dirs failed Trace dump: = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error() = /home/green/git/lustre-release/lustre/tests/test-framework.sh:5593:check_and_cleanup_lustre() = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4920:main() Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273752.log oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273752.log.65aqHO" failed: Operation not permitted (1) oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273752.log.rhRRav" failed: Operation not permitted (1) oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2] pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23 I believe it means an fs corruption, but ZFS has no tool to check it.

            People

              zam Alexander Zarochentsev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: