[LU-7102] replay-dual test_26: FAIL: set default dirstripe failed Created: 04/Sep/15  Updated: 19/Sep/15  Resolved: 19/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 3167 RHEL7.1


Issue Links:
Related
is related to LU-3534 async update cross-MDTs Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2f3d009a-5288-11e5-920d-5254006e85c2.

The sub-test test_26 failed with the following error:

tar 31000 stopped

client dmesg

[44291.931908] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 26: dbench and tar with mds failover ============================================= 02:08:37 \(1441271317\)
[44292.061471] Lustre: DEBUG MARKER: == replay-dual test 26: dbench and tar with mds failover ============================================= 02:08:37 (1441271317)
[44292.170528] Lustre: DEBUG MARKER: 
running=$(mount | grep -c /mnt/lustre' ');
rc=0;
if [ $running -eq 0 ] ; then
    mkdir -p /mnt/lustre;
    mount -t lustre  -o user_xattr,flock onyx-30vm3@tcp:/lustre /mnt/lustre;
    rc=$?;
fi;
exit $rc
[44292.648835] Lustre: DEBUG MARKER: mount | grep /mnt/lustre' '
[44292.968947] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
[44293.519953] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_26: @@@@@@ FAIL: set default dirstripe failed 
[44293.519957] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_26: @@@@@@ FAIL: set default dirstripe failed 
[44293.652115] Lustre: DEBUG MARKER: replay-dual test_26: @@@@@@ FAIL: set default dirstripe failed
[44293.654275] Lustre: DEBUG MARKER: replay-dual test_26: @@@@@@ FAIL: set default dirstripe failed
[44293.813286] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2015-09-02/lustre-master-el7-x86_64--full--1_5_1__3167__-70064653655700-192636/replay-dual.test_26.debug_log.$(hostname -s).1441271319.log;
         dmesg > /logdir/test_logs/2015-09-02/lustre-master-el7-x86_64--full-
[44293.815705] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2015-09-02/lustre-master-el7-x86_64--full--1_5_1__3167__-70064653655700-192636/replay-dual.test_26.debug_log.$(hostname -s).1441271319.log;
         dmesg > /logdir/test_logs/2015-09-02/lustre-master-el7-x86_64--full-
[44295.679860] LustreError: 14887:0:(ldlm_resource.c:835:ldlm_resource_complain()) MGC10.2.4.95@tcp: namespace resource [0x65727473756c:0x2:0x0].0 (ffff880079216780) refcount nonzero (1) after lock cleanup; forcing cleanup.
[44295.686436] LustreError: 14887:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x65727473756c:0x2:0x0].0 (ffff880079216780) refcount = 2
[44295.690994] LustreError: 14887:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Waiting locks:
[44295.693004] LustreError: 14887:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: MGC10.2.4.95@tcp lock: ffff88005e1f1400/0xef7bd0e848d38cb0 lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0 rrc: 2 type: PLN flags: 0x1106400000000 nid: local remote: 0x554c7be31dea8d91 expref: -99 pid: 10544 timeout: 0 lvb_type: 0
[44296.218483] Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
[44296.450737] Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi



 Comments   
Comment by Andreas Dilger [ 04/Sep/15 ]

This first started failing on 2015-08-29 10:41:56 so it is likely related to a patch that landed just before that time.

Comment by Gerrit Updater [ 14/Sep/15 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/16414
Subject: LU-7102 tests: fix replay-dual.sh test_26 for MDSCOUNT=1
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d8cbe1db42fdef114009c1c13c50e941c9859aa1

Comment by Andreas Dilger [ 14/Sep/15 ]

This test was broken since the original patch http://review.whamcloud.com/15163 "LU-3534 tests: a few tests cases for async update." first landed on 2015-08-28 09:07:07 - it could never pass for MDSCOUNT=1. It seems that "full" test runs on master run with only a single MDS.

Comment by Gerrit Updater [ 19/Sep/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16414/
Subject: LU-7102 tests: fix replay-dual.sh test_26 for MDSCOUNT=1
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3b75ac83e4266ced141dd53fee0a09c5ed329f4d

Comment by Peter Jones [ 19/Sep/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:06:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.