Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3001

sanity 27C: error: getstripe failed for f.sanity.27C0

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.4.0
    • 3
    • 7314

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/fb85b3e0-904d-11e2-8311-52540035b04c.

      The sub-test test_27C failed with the following error:

      Info required for matching: sanity 27C

      == sanity test 27C: check full striping across all OSTs == 13:26:01 (1363638361)
      error: getstripe failed for f.sanity.27C0.
      
      /usr/lib64/lustre/tests/sanity.sh: line 1828: [: -eq: unary operator expected
       sanity test_27C: @@@@@@ FAIL:  
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3977:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4000:error()
        = /usr/lib64/lustre/tests/sanity.sh:1828:test_27C()
        = /usr/lib64/lustre/tests/test-framework.sh:4255:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4288:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4143:run_test()
        = /usr/lib64/lustre/tests/sanity.sh:1832:main()
      Dumping lctl log to /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.*.1363638362.log
      CMD: c01,c02,c03,c04,c05,c06,c08,c09 /usr/sbin/lctl dk > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.debug_log.\$(hostname -s).1363638362.log;
               dmesg > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.dmesg.\$(hostname -s).1363638362.log
      

      Attachments

        Issue Links

          Activity

            [LU-3001] sanity 27C: error: getstripe failed for f.sanity.27C0

            Going to use LU-7550 for this instead of reopening this ancient bug.

            adilger Andreas Dilger added a comment - Going to use LU-7550 for this instead of reopening this ancient bug.
            yong.fan nasf (Inactive) added a comment - It seems to be reproduced on latest master: https://testing.hpdd.intel.com/test_sets/65e48472-a104-11e5-85ed-5254006e85c2
            emoly.liu Emoly Liu added a comment - - edited

            1MDS2MDT: I tried to use a test patch(http://review.whamcloud.com/#change,5983) to reproduce this failure with "Test-Parameters: fortestonly mdtcount=2 testlist=sanity", but failed. The maloo report is at https://maloo.whamcloud.com/test_sessions/5c970e94-a139-11e2-b1c3-52540035b04c
            test_27C output is

            09:14:24:== sanity test 27C: check full striping across all OSTs == 09:14:14 (1365524054)
            09:14:24:0 1 2 3 4 5 6
            09:14:24:1 2 3 4 5 6 0
            09:14:24:2 3 4 5 6 0 1
            09:14:24:3 4 5 6 0 1 2
            09:14:24:4 5 6 0 1 2 3
            09:14:24:5 6 0 1 2 3 4
            09:14:24:6 0 1 2 3 4 5
            

            2MDS2MDT+4OST: Since the valid value for miscount in Test-Parameter is only 1, I tried to use 3 VMs to reproduce it. 2VMs ran 2MDS and the other VM ran OST+client.
            The output is

            [root@centos6-2 tests]# mgs_HOST=centos6-1 mds1_HOST=centos6-1 mds2_HOST=centos6-3 ost_HOST=centos6-2 MDSCOUNT=2 PDSH="pdsh -S -Rrsh -w" ONLY=27C sh sanity.sh
            centos6-3: centos6-1: centos6-2: Logging to local directory: /tmp/test_logs/1365579025
            centos6-2: Checking config lustre mounted on /mnt/lustre
            Checking servers environments
            Checking clients centos6-2 environments
            Using TIMEOUT=100
            centos6-1: centos6-2: seting jobstats to procname_uid
            Setting lustre.sys.jobid_var from disable to procname_uid
            Waiting 90 secs for update
            Updated after 8s: wanted 'procname_uid' got 'procname_uid'
            disable quota as required
            centos6-3: centos6-1: running as uid/gid/euid/egid 500/500/500/500, groups:
             [touch] [/mnt/lustre/d0_runas_test/f2554]
            only running test 27C
            centos6-2: centos6-1: excepting tests: 76 42a 42b 42c 42d 45 51d 68b
            centos6-1: centos6-2: skipping tests SLOW=no: 24o 27m 64b 68 71 77f 78 115 124b
            centos6-1: centos6-2: preparing for tests involving mounts
            mke2fs 1.42.6.wc2 (10-Dec-2012)
            
            debug=-1
            
            
            == sanity test 27C: check full striping across all OSTs == 15:30:35 (1365579035)
            centos6-2: centos6-1: mkdir 1 for /mnt/lustre/d0.sanity/d27
            0 1 2 3
            1 2 3 0
            2 3 0 1
            3 0 1 2
            Resetting fail_loc on all nodes...centos6-2: centos6-1: done.
            centos6-1:
            PASS 27C (0s)
            resend_count is set to 4 4 4 4
            resend_count is set to 4 4 4 4
            resend_count is set to 4 4 4 4
            resend_count is set to 4 4 4 4
            resend_count is set to 4 4 4 4
            == sanity test complete, duration 11 sec == 15:30:36 (1365579036)
            

            And I ran test_27 series several times but still can't reproduce this failure. I'd like to close this ticket and reopen it if we hit this separately in the future.

            emoly.liu Emoly Liu added a comment - - edited 1MDS2MDT: I tried to use a test patch( http://review.whamcloud.com/#change,5983 ) to reproduce this failure with "Test-Parameters: fortestonly mdtcount=2 testlist=sanity", but failed. The maloo report is at https://maloo.whamcloud.com/test_sessions/5c970e94-a139-11e2-b1c3-52540035b04c test_27C output is 09:14:24:== sanity test 27C: check full striping across all OSTs == 09:14:14 (1365524054) 09:14:24:0 1 2 3 4 5 6 09:14:24:1 2 3 4 5 6 0 09:14:24:2 3 4 5 6 0 1 09:14:24:3 4 5 6 0 1 2 09:14:24:4 5 6 0 1 2 3 09:14:24:5 6 0 1 2 3 4 09:14:24:6 0 1 2 3 4 5 2MDS2MDT+4OST: Since the valid value for miscount in Test-Parameter is only 1, I tried to use 3 VMs to reproduce it. 2VMs ran 2MDS and the other VM ran OST+client. The output is [root@centos6-2 tests]# mgs_HOST=centos6-1 mds1_HOST=centos6-1 mds2_HOST=centos6-3 ost_HOST=centos6-2 MDSCOUNT=2 PDSH="pdsh -S -Rrsh -w" ONLY=27C sh sanity.sh centos6-3: centos6-1: centos6-2: Logging to local directory: /tmp/test_logs/1365579025 centos6-2: Checking config lustre mounted on /mnt/lustre Checking servers environments Checking clients centos6-2 environments Using TIMEOUT=100 centos6-1: centos6-2: seting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90 secs for update Updated after 8s: wanted 'procname_uid' got 'procname_uid' disable quota as required centos6-3: centos6-1: running as uid/gid/euid/egid 500/500/500/500, groups: [touch] [/mnt/lustre/d0_runas_test/f2554] only running test 27C centos6-2: centos6-1: excepting tests: 76 42a 42b 42c 42d 45 51d 68b centos6-1: centos6-2: skipping tests SLOW=no: 24o 27m 64b 68 71 77f 78 115 124b centos6-1: centos6-2: preparing for tests involving mounts mke2fs 1.42.6.wc2 (10-Dec-2012) debug=-1 == sanity test 27C: check full striping across all OSTs == 15:30:35 (1365579035) centos6-2: centos6-1: mkdir 1 for /mnt/lustre/d0.sanity/d27 0 1 2 3 1 2 3 0 2 3 0 1 3 0 1 2 Resetting fail_loc on all nodes...centos6-2: centos6-1: done. centos6-1: PASS 27C (0s) resend_count is set to 4 4 4 4 resend_count is set to 4 4 4 4 resend_count is set to 4 4 4 4 resend_count is set to 4 4 4 4 resend_count is set to 4 4 4 4 == sanity test complete, duration 11 sec == 15:30:36 (1365579036) And I ran test_27 series several times but still can't reproduce this failure. I'd like to close this ticket and reopen it if we hit this separately in the future.
            emoly.liu Emoly Liu added a comment -

            https://maloo.whamcloud.com/test_sets/97e8851c-9cf8-11e2-a280-52540035b04c

            Richard, seems the maloo report above is not related to this failure.

            emoly.liu Emoly Liu added a comment - https://maloo.whamcloud.com/test_sets/97e8851c-9cf8-11e2-a280-52540035b04c Richard, seems the maloo report above is not related to this failure.
            rhenwood Richard Henwood (Inactive) added a comment - Here is, apparently, another: https://maloo.whamcloud.com/test_sets/97e8851c-9cf8-11e2-a280-52540035b04c
            sarah Sarah Liu added a comment - another failure hit with 1MDS/2MDTs https://maloo.whamcloud.com/test_sets/7043259c-9656-11e2-9abb-52540035b04c
            emoly.liu Emoly Liu added a comment -

            I can't reproduce this failure in 2 MDSes on 2 VMs either.

            According to Maloo report above, I suspect this failure was probably caused by its previous test failure in test_17k/17n/27u. I will verify it.

            emoly.liu Emoly Liu added a comment - I can't reproduce this failure in 2 MDSes on 2 VMs either. According to Maloo report above, I suspect this failure was probably caused by its previous test failure in test_17k/17n/27u. I will verify it.
            emoly.liu Emoly Liu added a comment -

            I can't reproduce it in 1MDS+2MDT. I will try it in 2MDS later.

            emoly.liu Emoly Liu added a comment - I can't reproduce it in 1MDS+2MDT. I will try it in 2MDS later.

            People

              emoly.liu Emoly Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: