[LU-4419] Test failure on test suite recovery-small, subtest test_110a Created: 29/Dec/13  Updated: 22/Jul/18  Resolved: 22/Jul/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 12134

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/e0c22550-6db7-11e3-a191-52540035b04c.

Slave MDT cannot allocate super-sequence:

07:41:38:Lustre: DEBUG MARKER: == recovery-small test 110a: create remote directory: drop client req == 07:40:37 (1387986037)
07:41:38:LustreError: 31316:0:(fid_handler.c:285:__seq_server_alloc_meta()) srv-lustre-MDT0001: Can't allocate super-sequence, rc -5
07:41:38:Lustre: DEBUG MARKER: /usr/sbin/lctl mark recovery-small test_110a: @@@@@@ FAIL: lfs mkdir failed
07:41:38:Lustre: DEBUG MARKER: recovery-small test_110a: @@@@@@ FAIL: lfs mkdir failed



 Comments   
Comment by nasf (Inactive) [ 29/Dec/13 ]

We have hit the failure several times:

https://maloo.whamcloud.com/test_sets/e0c22550-6db7-11e3-a191-52540035b04c
https://maloo.whamcloud.com/test_sets/e553b5f8-6f2d-11e3-ad93-52540035b04c
https://maloo.whamcloud.com/test_sets/fe06d80c-6ebd-11e3-ad93-52540035b04c

Comment by Di Wang [ 29/Dec/13 ]

Hmm, I am not sure this is a new bug or only exists in your or Mike's patch series, since I do not see it exists in other's patch. Please correct me, if I am wrong. Are your patches still dependent?

Comment by nasf (Inactive) [ 29/Dec/13 ]

I cannot search the test results history because of Maloo issues. The first known failure instance was found in Mike's patch. But I do not think it is special issue in such patch, but more like general master bug. Because his patch does not touch MDT/FID stack. Current, LFSCK patches still depends on Mike's patch.

Comment by Di Wang [ 30/Dec/13 ]

oh, it is not about MDT/FID stack. The failure is because the connection is somehow broken between other MDTs/OSTs to MDT0, which cause these target can not allocate the new FID sequence from MDT0. Hmm if your patch still depends on Mike's patch, it is probably Mike's patch problem, since I never saw this problem in current master and even in the run of other people's patch.

Comment by Sarah Liu [ 30/Dec/13 ]

another instance:

https://maloo.whamcloud.com/test_sets/1e071d22-706e-11e3-9fe0-52540035b04c

Comment by Mikhail Pershin [ 05/Jan/14 ]

probably I've found the source of problem, let's wait for the latest patch test results, http://review.whamcloud.com/#/c/7383/

Comment by Mikhail Pershin [ 07/Jan/14 ]

https://maloo.whamcloud.com/test_sessions/ca3dea64-7751-11e3-943d-52540035b04c

Now it works as expected. The problem was the lost chunk of code with OBD_FAIL_CHECK needed for tests.

Comment by John Hammond [ 01/Aug/14 ]

Another instance https://testing.hpdd.intel.com/test_sets/2e799af4-1942-11e4-8c4a-5254006e85c2.

22:16:47:Lustre: DEBUG MARKER: == recovery-small test 110a: create remote directory: drop client req == 22:14:17 (1406870057)
22:16:47:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
22:16:47:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
22:16:47:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
22:16:47:LustreError: 167-0: lustre-MDT0000-lwp-MDT0001: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
22:16:47:LustreError: Skipped 2 previous similar messages
22:16:47:LustreError: 16796:0:(fid_handler.c:284:__seq_server_alloc_meta()) srv-lustre-MDT0001: Can't allocate super-sequence, rc -5
22:16:47:Lustre: DEBUG MARKER: /usr/sbin/lctl mark recovery-small test_110a: @@@@@@ FAIL: lfs mkdir failed
22:16:47:Lustre: DEBUG MARKER: recovery-small test_110a: @@@@@@ FAIL: lfs mkdir failed
22:16:47:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2014-07-31/lustre-reviews-el6-x86_64-review-dne-part-1-2_7_1_25543_-70280407083100-154359/recovery-small.test_110a.debug_log.$(hostname -s).1406870060.log;
22:16:47: dmesg > /logdir/test_logs/2014-07-31/lustre-revi
22:16:47:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 110b: create remote directory: drop Master rep == 22:14:24 (1406870064)

Comment by James Nunez (Inactive) [ 02/Mar/15 ]

Another instance on 2.7.0-RC2. Logs at https://testing.hpdd.intel.com/test_sets/20cb4134-bf80-11e4-881f-5254006e85c2

Client test log:

== recovery-small test 110a: create remote directory: drop client req == 08:02:47 (1425139367)
CMD: onyx-41vm3 lctl set_param fail_loc=0x123
fail_loc=0x123
CMD: onyx-41vm6.onyx.hpdd.intel.com /usr/bin/lfs mkdir -i 1 -c2 /mnt/lustre/d110a.recovery-small/remote_dir
error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d110a.recovery-small/remote_dir' (3): Input/output error
error: mkdir: create stripe dir '/mnt/lustre/d110a.recovery-small/remote_dir' failed
CMD: onyx-41vm3 lctl set_param fail_loc=0

MDT same as the log John posted above.

Comment by James Nunez (Inactive) [ 26/May/15 ]

This test is still failing occasionally. Recent failures are:
2015-04-27 23:46:54 - https://testing.hpdd.intel.com/sub_tests/3da2d940-ed41-11e4-b3fc-5254006e85c2
2015-05-06 15:04:26 - https://testing.hpdd.intel.com/sub_tests/73c503ac-f40a-11e4-b108-5254006e85c2
2015-05-14 09:21:11 - https://testing.hpdd.intel.com/sub_tests/77b99d6e-fa23-11e4-8c8b-5254006e85c2
2015-05-17 00:07:52 - https://testing.hpdd.intel.com/sub_tests/fcaf8172-fc31-11e4-a658-5254006e85c2
2015-05-26 10:10:02 - https://testing.hpdd.intel.com/sub_tests/283e0a20-0399-11e5-a102-5254006e85c2

Comment by James Nunez (Inactive) [ 26/May/15 ]

Mike,
Would you please look at these recent test failures?
Thank you.

Comment by Mikhail Pershin [ 22/Jul/18 ]

the releated test issue was fixed and remaining problems are LU-7612

Generated at Sat Feb 10 01:42:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.