[LU-3001] sanity 27C: error: getstripe failed for f.sanity.27C0 Created: 21/Mar/13  Updated: 14/Dec/15  Resolved: 14/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Cannot Reproduce Votes: 0
Labels: dne

Issue Links:
Duplicate
duplicates LU-7550 sanity test_27C: FAIL: Can not find 5... Resolved
Severity: 3
Rank (Obsolete): 7314

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/fb85b3e0-904d-11e2-8311-52540035b04c.

The sub-test test_27C failed with the following error:

Info required for matching: sanity 27C

== sanity test 27C: check full striping across all OSTs == 13:26:01 (1363638361)
error: getstripe failed for f.sanity.27C0.

/usr/lib64/lustre/tests/sanity.sh: line 1828: [: -eq: unary operator expected
 sanity test_27C: @@@@@@ FAIL:  
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:3977:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4000:error()
  = /usr/lib64/lustre/tests/sanity.sh:1828:test_27C()
  = /usr/lib64/lustre/tests/test-framework.sh:4255:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4288:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4143:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:1832:main()
Dumping lctl log to /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.*.1363638362.log
CMD: c01,c02,c03,c04,c05,c06,c08,c09 /usr/sbin/lctl dk > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.debug_log.\$(hostname -s).1363638362.log;
         dmesg > /logdir/test_logs/2013-03-18/lustre-reviews-el6-x86_64--review--1_1_1__14094__-70104848885040-152208/sanity.test_27C.dmesg.\$(hostname -s).1363638362.log


 Comments   
Comment by Emoly Liu [ 27/Mar/13 ]

I can't reproduce it in 1MDS+2MDT. I will try it in 2MDS later.

Comment by Emoly Liu [ 27/Mar/13 ]

I can't reproduce this failure in 2 MDSes on 2 VMs either.

According to Maloo report above, I suspect this failure was probably caused by its previous test failure in test_17k/17n/27u. I will verify it.

Comment by Sarah Liu [ 28/Mar/13 ]

another failure hit with 1MDS/2MDTs

https://maloo.whamcloud.com/test_sets/7043259c-9656-11e2-9abb-52540035b04c

Comment by Richard Henwood (Inactive) [ 04/Apr/13 ]

Here is, apparently, another:

https://maloo.whamcloud.com/test_sets/97e8851c-9cf8-11e2-a280-52540035b04c

Comment by Emoly Liu [ 08/Apr/13 ]

https://maloo.whamcloud.com/test_sets/97e8851c-9cf8-11e2-a280-52540035b04c

Richard, seems the maloo report above is not related to this failure.

Comment by Emoly Liu [ 10/Apr/13 ]

1MDS2MDT: I tried to use a test patch(http://review.whamcloud.com/#change,5983) to reproduce this failure with "Test-Parameters: fortestonly mdtcount=2 testlist=sanity", but failed. The maloo report is at https://maloo.whamcloud.com/test_sessions/5c970e94-a139-11e2-b1c3-52540035b04c
test_27C output is

09:14:24:== sanity test 27C: check full striping across all OSTs == 09:14:14 (1365524054)
09:14:24:0 1 2 3 4 5 6
09:14:24:1 2 3 4 5 6 0
09:14:24:2 3 4 5 6 0 1
09:14:24:3 4 5 6 0 1 2
09:14:24:4 5 6 0 1 2 3
09:14:24:5 6 0 1 2 3 4
09:14:24:6 0 1 2 3 4 5

2MDS2MDT+4OST: Since the valid value for miscount in Test-Parameter is only 1, I tried to use 3 VMs to reproduce it. 2VMs ran 2MDS and the other VM ran OST+client.
The output is

[root@centos6-2 tests]# mgs_HOST=centos6-1 mds1_HOST=centos6-1 mds2_HOST=centos6-3 ost_HOST=centos6-2 MDSCOUNT=2 PDSH="pdsh -S -Rrsh -w" ONLY=27C sh sanity.sh
centos6-3: centos6-1: centos6-2: Logging to local directory: /tmp/test_logs/1365579025
centos6-2: Checking config lustre mounted on /mnt/lustre
Checking servers environments
Checking clients centos6-2 environments
Using TIMEOUT=100
centos6-1: centos6-2: seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90 secs for update
Updated after 8s: wanted 'procname_uid' got 'procname_uid'
disable quota as required
centos6-3: centos6-1: running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f2554]
only running test 27C
centos6-2: centos6-1: excepting tests: 76 42a 42b 42c 42d 45 51d 68b
centos6-1: centos6-2: skipping tests SLOW=no: 24o 27m 64b 68 71 77f 78 115 124b
centos6-1: centos6-2: preparing for tests involving mounts
mke2fs 1.42.6.wc2 (10-Dec-2012)

debug=-1


== sanity test 27C: check full striping across all OSTs == 15:30:35 (1365579035)
centos6-2: centos6-1: mkdir 1 for /mnt/lustre/d0.sanity/d27
0 1 2 3
1 2 3 0
2 3 0 1
3 0 1 2
Resetting fail_loc on all nodes...centos6-2: centos6-1: done.
centos6-1:
PASS 27C (0s)
resend_count is set to 4 4 4 4
resend_count is set to 4 4 4 4
resend_count is set to 4 4 4 4
resend_count is set to 4 4 4 4
resend_count is set to 4 4 4 4
== sanity test complete, duration 11 sec == 15:30:36 (1365579036)

And I ran test_27 series several times but still can't reproduce this failure. I'd like to close this ticket and reopen it if we hit this separately in the future.

Comment by nasf (Inactive) [ 13/Dec/15 ]

It seems to be reproduced on latest master:
https://testing.hpdd.intel.com/test_sets/65e48472-a104-11e5-85ed-5254006e85c2

Comment by Andreas Dilger [ 14/Dec/15 ]

Going to use LU-7550 for this instead of reopening this ancient bug.

Generated at Sat Feb 10 01:30:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.