[LU-2502] Test failure on test suite ost-pools, subtest test_23a Created: 17/Dec/12  Updated: 18/Nov/16  Resolved: 18/Nov/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.4, Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: quota, test, yuc2

Severity: 3
Rank (Obsolete): 5864

 Description   

This issue was created by maloo for liuying <emoly.liu@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/71f17b88-4826-11e2-8cdc-52540035b04c.

The sub-test test_23a failed with the following error:

test failed to respond and timed out

Client-1 console log showed

23:05:49:Lustre: DEBUG MARKER: == ost-pools test 23a: OST pools and quota =========================================================== 23:05:40 (1355727940)
23:05:49:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool 2>/dev/null || echo foo
23:05:49:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
23:06:01:Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.testpool | sort -u | tr '\n' ' ' 
23:06:01:LustreError: 19086:0:(quota_ctl.c:328:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -114
23:06:01:LustreError: 19087:0:(lmv_obd.c:855:lmv_iocontrol()) error: iocontrol MDC lustre-MDT0000_UUID on MDTidx 0 cmd 800866a1: err = -22
23:06:01:LustreError: 19087:0:(lmv_obd.c:855:lmv_iocontrol()) error: iocontrol MDC lustre-MDT0000_UUID on MDTidx 0 cmd 800866a1: err = -22
23:06:12:LustreError: 19087:0:(lmv_obd.c:855:lmv_iocontrol()) error: iocontrol MDC lustre-MDT0000_UUID on MDTidx 0 cmd 800866a1: err = -22
...

MDS dmesg log showed

Lustre: DEBUG MARKER: == ost-pools test 23a: OST pools and quota =========================================================== 23:05:40 (1355727940)
Lustre: DEBUG MARKER: lctl pool_new lustre.testpool
Lustre: DEBUG MARKER: lctl pool_add lustre.testpool lustre-OST[0000-0006/3]
Lustre: 4861:0:(quota_master.c:793:close_quota_files()) quota[0] is off already
Lustre: 4861:0:(quota_master.c:793:close_quota_files()) Skipped 1 previous similar message
LustreError: 12634:0:(fsfilt-ldiskfs.c:2181:fsfilt_ldiskfs_quotacheck()) quotacheck failed: rc = -22
LustreError: 12634:0:(quota_check.c:112:target_quotacheck_thread()) mdd_obd-lustre-MDT0000: fsfilt_quotacheck: -22


 Comments   
Comment by Johann Lombardi (Inactive) [ 17/Dec/12 ]

Was this interop tests or pure 2.1 testing?

Comment by Johann Lombardi (Inactive) [ 17/Dec/12 ]

cc Emoly.

Comment by Peter Jones [ 17/Dec/12 ]

Johann

I believe that this issue hit on a review run for this patch to b2_1 http://review.whamcloud.com/#change,4831

Peter

Comment by Niu Yawei (Inactive) [ 18/Dec/12 ]

maybe it's caused by some changes in the new kernel? (2.6.32-279.14.1.el6_lustre.g9963a82.x86_64)

Comment by Jian Yu [ 18/Dec/12 ]

Searching the historical reports on Maloo showed that this issue did not occur on b2_1 branch before, neither on 2.1.4 RC1:
http://tinyurl.com/cjevqxt

Comment by Niu Yawei (Inactive) [ 18/Dec/12 ]

Searching the historical reports on Maloo showed that this issue did not occur on b2_1 branch before, neither on 2.1.4 RC1:
http://tinyurl.com/cjevqxt

Yes, then it seems not a new kenrel issue (there were some tests with new kernel passed).

Comment by Niu Yawei (Inactive) [ 26/Dec/12 ]

Looks it was not reproduced in b2_1 tests anymore.

Comment by Niu Yawei (Inactive) [ 17/Feb/13 ]

can't be reproduced.

Comment by Sarah Liu [ 26/Feb/13 ]

I think the following error found in interop between 2.1.4 server and 2.4 client is the same one, so reopen it.

https://maloo.whamcloud.com/test_sets/3539abfa-7d84-11e2-85d0-52540035b04c

MDS console shows:

10:46:09:Lustre: DEBUG MARKER: == ost-pools test 23a: OST pools and quota == 10:46:09 (1361558769)
10:46:09:Lustre: DEBUG MARKER: lctl pool_new lustre.testpool
10:46:20:Lustre: DEBUG MARKER: lctl pool_add lustre.testpool lustre-OST[0000-0006/3]
10:46:31:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version
10:46:31:Lustre: 3818:0:(quota_master.c:795:close_quota_files()) quota[0] is off already
10:46:31:Lustre: 3818:0:(quota_master.c:795:close_quota_files()) quota[1] is off already
10:46:31:LustreError: 13718:0:(fsfilt-ldiskfs.c:2181:fsfilt_ldiskfs_quotacheck()) quotacheck failed: rc = -22
10:46:31:LustreError: 13718:0:(quota_check.c:114:target_quotacheck_thread()) mdd_obd-lustre-MDT0000: fsfilt_quotacheck: -22
Comment by Niu Yawei (Inactive) [ 01/Apr/13 ]

Hi, Sarah, is it reproducable? Could you try to catch the log on MDS with D_TRACE enabled? So we can see how fsfilt_ldiskfs_quotacheck() failed with -22. Thank you.

Comment by Sarah Liu [ 01/Apr/13 ]

ok, will let you know when I get the results.

Comment by Sarah Liu [ 10/Apr/13 ]

I reran the test 3 times and cannot reproduce it with server is running 2.1.5 and client is running 2.4

Comment by Jian Yu [ 11/Sep/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2)
Lustre server: http://build.whamcloud.com/job/lustre-b2_1/215/ (2.1.6)

ost-pools test 23a hit the same failure:
https://maloo.whamcloud.com/test_sets/2f0438ae-1abe-11e3-bf23-52540035b04c

Comment by nasf (Inactive) [ 25/Mar/15 ]

Another failure instance on master:
https://testing.hpdd.intel.com/test_sets/314d4b36-d288-11e4-a0e2-5254006e85c2

Comment by Niu Yawei (Inactive) [ 18/Nov/16 ]

This was an interop issue between 2.1 client with 2.4 servers. I think it's not relevant anymore. The last instance on master reported by nasf is a different issue.

Generated at Sat Feb 10 01:25:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.