[LU-8982] replay-vbr test_7g: @@@@@@ replay-vbr test_7g: @@@@@@ FAIL: Test 7g.2 failed; FAIL: Test 7g.1 failed Created: 30/Dec/16  Updated: 18/Apr/17  Resolved: 18/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Release : 191_3.10.0_327.13.1.x3.0.86.x86_64_g8e08a98
Client 2.7.14.x8 Server 2.7.14.x8
4 node DNE SingleMDS - KVM setup


Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

stdout

== replay-vbr test 7g: rename, {lost}, create ======================================================== 14:24:28 (1475850268)
Starting client: fre0132:  -o user_xattr,flock fre0129@tcp:/lustre /mnt/lustre2
fre0132: mount.lustre: according to /etc/mtab fre0129@tcp:/lustre is already mounted on /mnt/lustre2
pdsh@fre0131: fre0132: ssh exited with exit code 17
start cycle: test_7g.1
mdd.lustre-MDT0000.sync_permission=0
mdt.lustre-MDT0000.commit_on_sharing=0
Filesystem                 1K-blocks  Used Available Use% Mounted on
192.168.101.29@tcp:/lustre   1345184 35424   1209144   3% /mnt/lustre
test_7g.1 first: createmany -o /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr- 1; mv /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr-0 /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr-1
total: 1 creates in 0.00 seconds: 435.91 creates/second
test_7g.1 lost: mkdir /mnt/lustre2/d7g.replay-vbr/f7g.replay-vbr-0;rmdir /mnt/lustre2/d7g.replay-vbr/f7g.replay-vbr-0
test_7g.1 last: createmany -o /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr- 1
total: 1 creates in 0.00 seconds: 880.23 creates/second
Stopping client fre0132 /mnt/lustre2 (opts:)
pdsh@fre0131: fre0132: ssh exited with exit code 1
Failing mds1 on fre0129
Stopping /mnt/mds1 (opts:) on fre0129
reboot facets: mds1
Failover mds1 to fre0129
14:24:41 (1475850281) waiting for fre0129 network 900 secs ...
14:24:41 (1475850281) network interface is UP
mount facets: mds1
Starting mds1: -o rw,user_xattr  /dev/vdc /mnt/mds1
Started lustre-MDT0000
affected facets: mds1
fre0129: *.lustre-MDT0000.recovery_status status: COMPLETE
Waiting for orphan cleanup...
osp.lustre-OST0000-osc-MDT0000.old_sync_processed
osp.lustre-OST0000-osc-MDT0001.old_sync_processed
osp.lustre-OST0001-osc-MDT0000.old_sync_processed
osp.lustre-OST0001-osc-MDT0001.old_sync_processed
wait 40 secs maximumly for fre0129 mds-ost sync done.
Starting client: fre0132:  -o user_xattr,flock fre0129@tcp:/lustre /mnt/lustre2
start cycle: test_7g.2
mdd.lustre-MDT0000.sync_permission=0
mdt.lustre-MDT0000.commit_on_sharing=0
Filesystem                 1K-blocks  Used Available Use% Mounted on
192.168.101.29@tcp:/lustre   1345184 35424   1209144   3% /mnt/lustre
test_7g.2 first: createmany -o /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr- 2; mv /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr-0 /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr-1
total: 2 creates in 0.00 seconds: 739.61 creates/second
test_7g.2 lost: createmany -o /mnt/lustre2/d7g.replay-vbr/f7g.replay-vbr- 1; rm /mnt/lustre2/d7g.replay-vbr/f7g.replay-vbr-0
total: 1 creates in 0.00 seconds: 392.76 creates/second
test_7g.2 last: mkdir /mnt/lustre/d7g.replay-vbr/f7g.replay-vbr-0
Stopping client fre0132 /mnt/lustre2 (opts:)
pdsh@fre0131: fre0132: ssh exited with exit code 1
Failing mds1 on fre0129
Stopping /mnt/mds1 (opts:) on fre0129
reboot facets: mds1
Failover mds1 to fre0129
14:26:07 (1475850367) waiting for fre0129 network 900 secs ...
14:26:07 (1475850367) network interface is UP
mount facets: mds1
Starting mds1: -o rw,user_xattr  /dev/vdc /mnt/mds1
Started lustre-MDT0000
fre0131: stat: cannot read file system information for ‘/mnt/lustre’: Input/output error
pdsh@fre0131: fre0131: ssh exited with exit code 1
affected facets: mds1
fre0129: *.lustre-MDT0000.recovery_status status: COMPLETE
Waiting for orphan cleanup...
osp.lustre-OST0000-osc-MDT0000.old_sync_processed
osp.lustre-OST0000-osc-MDT0001.old_sync_processed
osp.lustre-OST0001-osc-MDT0000.old_sync_processed
osp.lustre-OST0001-osc-MDT0001.old_sync_processed
wait 40 secs maximumly for fre0129 mds-ost sync done.
 replay-vbr test_7g: @@@@@@ FAIL: Test 7g.2 failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4863:error()
  = /usr/lib64/lustre/tests/replay-vbr.sh:891:test_7g()
  = /usr/lib64/lustre/tests/test-framework.sh:5123:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5161:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4965:run_test()
  = /usr/lib64/lustre/tests/replay-vbr.sh:906:main()
Dumping lctl log to /tmp/test_logs/1475850252/replay-vbr.test_7g.*.1475850473.log
fre0130: Warning: Permanently added 'fre0131,192.168.101.31' (ECDSA) to the list of known hosts.

fre0129: Warning: Permanently added 'fre0131,192.168.101.31' (ECDSA) to the list of known hosts.

fre0132: Warning: Permanently added 'fre0131,192.168.101.31' (ECDSA) to the list of known hosts.

fre0130: error: set_param: setting debug=: Invalid argument
pdsh@fre0131: fre0130: ssh exited with exit code 22
fre0129: error: set_param: setting debug=: Invalid argument
pdsh@fre0131: fre0129: ssh exited with exit code 22
Resetting fail_loc on all nodes...done.
FAIL 7g (208s)

cmd

SLOW=YES NAME=ncli mgs_HOST=fre0129 MGSDEV=/dev/vdb NETTYPE=tcp mds1_HOST=fre0129 MDSDEV1=/dev/vdc mds_HOST=fre0129 MDSDEV=/dev/vdc mds2_HOST=fre0129 MDSDEV2=/dev/vdd MDSCOUNT=2 ost1_HOST=fre0130 OSTDEV1=/dev/vdb ost2_HOST=fre0130 OSTDEV2=/dev/vdc OSTCOUNT=2 CLIENTS=fre0131 RCLIENTS="fre0132"   PDSH="/usr/bin/pdsh -R ssh -S -w " ONLY=7g MDS_MOUNT_OPTS="-o rw,user_xattr" OST_MOUNT_OPTS="-o user_xattr" MDSSIZE=0 OSTSIZE=0 MDSJOURNALSIZE="22" ENABLE_QUOTA="yes"


 Comments   
Comment by Gerrit Updater [ 30/Dec/16 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/24541
Subject: LU-8982 ldlm: limit recovery timer to allow VBR
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 50a59a4dc35590cc54382e5489283ab6c7e605d3

Comment by Hongchao Zhang [ 17/Apr/17 ]

the issue has been fixed by the patch https://review.whamcloud.com/#/c/23716/ in LU-8826

Comment by Peter Jones [ 18/Apr/17 ]

IIUC this is a duplicate

Generated at Sat Feb 10 02:22:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.