Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7775

replay-single test_70d: cannot touch /mnt/lustre/d70d.replay-single/test1/a: No such file or directory

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.9.0, Lustre 2.10.0, Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Joe Gmitter <joseph.gmitter@intel.com>

      The sub-test test_70d failed with the following error:

      /usr/lib64/lustre/tests/replay-single.sh: line 2172: kill: (25681) - No such process
      

      suite_log excerpt:

      onyx-61vm2: CMD: onyx-61vm2.onyx.hpdd.intel.com lctl get_param -n at_max
      onyx-61vm1: CMD: onyx-61vm1.onyx.hpdd.intel.com lctl get_param -n at_max
      touch: cannot touch `/mnt/lustre/d70d.replay-single/test1/a': No such file or directory
      touch fails
      onyx-61vm2: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec
      onyx-61vm1: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec
      /usr/lib64/lustre/tests/replay-single.sh: line 2114: kill: (25681) - No such process
       replay-single test_70d: @@@@@@ FAIL: 25681 stopped 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4670:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4704:error()
        = /usr/lib64/lustre/tests/replay-single.sh:2115:random_fail_mdt()
        = /usr/lib64/lustre/tests/replay-single.sh:2235:test_70d()
        = /usr/lib64/lustre/tests/test-framework.sh:4951:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4988:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4853:run_test()
        = /usr/lib64/lustre/tests/replay-single.sh:2241:main()
      Dumping lctl log to /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.*.1455227814.log
      CMD: onyx-61vm1.onyx.hpdd.intel.com,onyx-61vm2,onyx-61vm3,onyx-61vm7,onyx-61vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.debug_log.\$(hostname -s).1455227814.log;
               dmesg > /logdir/test_logs/2016-02-11/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__37315__-70264984240460-122237/replay-single.test_70d.dmesg.\$(hostname -s).1455227814.log
      Resetting fail_loc on all nodes...CMD: onyx-61vm1.onyx.hpdd.intel.com,onyx-61vm2,onyx-61vm3,onyx-61vm7,onyx-61vm8 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
      done.
      /usr/lib64/lustre/tests/replay-single.sh: line 2172: kill: (25681) - No such process
      FAIL 70d (160s)
      

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a8e09b5e-d146-11e5-a17d-5254006e85c2.

      Attachments

        Issue Links

          Activity

            [LU-7775] replay-single test_70d: cannot touch /mnt/lustre/d70d.replay-single/test1/a: No such file or directory

            https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565

            Very similar failure in 2.9.56 (b3565): replay-single, test_70d: 17427 stopped

            Process tries to start before failure, but cannot:

            Started  17427
            rm: cannot remove '/mnt/lustre/d70d.replay-single/test1': Directory not empty
            rmdir fails
            

            Then when it is called for later, it doesn't exist:

            /usr/lib64/lustre/tests/replay-single.sh: line 2108: kill: (17427) - No such process
             replay-single test_70d: @@@@@@ FAIL: 17427 stopped 
              Trace dump:
              = /usr/lib64/lustre/tests/test-framework.sh:4931:error()
              = /usr/lib64/lustre/tests/replay-single.sh:2109:random_fail_mdt()
              = /usr/lib64/lustre/tests/replay-single.sh:2232:test_70d()
              = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one()
              = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged()
              = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test()
              = /usr/lib64/lustre/tests/replay-single.sh:2238:main()
            
            jcasper James Casper (Inactive) added a comment - https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565 Very similar failure in 2.9.56 (b3565): replay-single, test_70d: 17427 stopped Process tries to start before failure, but cannot: Started 17427 rm: cannot remove '/mnt/lustre/d70d.replay-single/test1': Directory not empty rmdir fails Then when it is called for later, it doesn't exist: /usr/lib64/lustre/tests/replay-single.sh: line 2108: kill: (17427) - No such process replay-single test_70d: @@@@@@ FAIL: 17427 stopped Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4931:error() = /usr/lib64/lustre/tests/replay-single.sh:2109:random_fail_mdt() = /usr/lib64/lustre/tests/replay-single.sh:2232:test_70d() = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test() = /usr/lib64/lustre/tests/replay-single.sh:2238:main()
            standan Saurabh Tandan (Inactive) added a comment - Another instance : https://testing.hpdd.intel.com/test_sets/be6215a4-1b9a-11e6-9e5d-5254006e85c2
            sbuisson Sebastien Buisson (Inactive) added a comment - Hit here: https://testing.hpdd.intel.com/test_sets/c4249be6-e26b-11e5-abf3-5254006e85c2

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: