[LU-276] ost-pools: test-18 Degradation with missing pool is 26.07 % (> 15 %) Created: 04/May/11  Updated: 07/Jan/16  Resolved: 07/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 2.2.0, Lustre 2.1.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jinshan Xiong (Inactive) Assignee: Hongchao Zhang
Resolution: Incomplete Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 23,408
Rank (Obsolete): 5036

 Description   

We've seen this bug again it's supposed to be fixed in bug 23408. Need to figure out why it became wrong again.

Also, there is alignment problem in the code of ost-pools:test-18, needs to be fixed as well.



 Comments   
Comment by Johann Lombardi (Inactive) [ 04/May/11 ]

FWIW, Wangdi landed a patch to master to fix the problem of qos_remedy_create() which could allocate objects outside the pool.
That was bug 21379 & commit eb7c28ff977f4e0a280558aa74e23f2a9ab0ea0c.
This patch adds a call to lov_find_pool() in qos_remedy_create() which is going to print another CWARN(). Maybe that's the reason why we see this failure now and not before (and not on b1_8 where this patch is not landed).

Comment by Peter Jones [ 04/May/11 ]

HongChao

This is causing failures with some of the automated test runs so could you please look into this as a priority

Thanks

Peter

Comment by Hongchao Zhang [ 05/May/11 ]

take it and has started to work on it.

Comment by Hongchao Zhang [ 10/May/11 ]

there are several cases leading to performance degradation while using pool,
1, the pool lookup in hash table
2, the extra "CWARN" debug info if the specified pool doesn't exist

but local tests show no big difference in these 3 case(without pool, wide pool, missing pool),
the difference tends to decrease if the created files increase, and will be same if the files count is larger than 20000

the current file number 9877 is a little small, how about increasing it to a bigger value to lessen the affect of those
random condition which could lead to false result?

Comment by Hongchao Zhang [ 12/May/11 ]

the tests in Toro show the same result, will create a patch to increase the count of
the created files in the test_18 to see whether the issue can be fixed

Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,client,el5,ofa #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,server,el5,ofa #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Build Master (Inactive) [ 17/May/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #121
LU-276 increase file count to mitigate affect of random condition

Oleg Drokin : 016c5a0f6e7307a2a3e05eafa8a36ac16b209643
Files :

  • lustre/tests/ost-pools.sh
Comment by Peter Jones [ 17/May/11 ]

Fix landed for 2.1. Please reopen if issue reoccurs or more work is still required

Comment by Jian Yu [ 28/Jul/11 ]

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_0_66_0
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)

ost-pools test 18 failed with the similar issue:

Avg time taken for 9877 creates without pool: 16.32
Avg time taken for 9877 creates with pool: 19.20
Avg time taken for 9877 creates with missing pool: 19.40
No pool to wide pool: 17.64 %.
 ost-pools test_18: @@@@@@ IGNORE (bz23408): Degradation with wide pool is 17.64 % (> 15 %)

Maloo report: https://maloo.whamcloud.com/test_sets/02ffc662-b91b-11e0-8bdf-52540025f9af

Comment by Hongchao Zhang [ 29/Jul/11 ]

the files is still 9877 files in the new occurrence for the patch was only landed on master, but the test is run on b1_8
yujian will try to re-test it with 30000 files soon.

Comment by Jian Yu [ 29/Jul/11 ]

With "numfiles=30000", the test 18 still failed:

== test 18: File create in a directory which references a deleted pool == 00:50:15
Create performance, iteration 1, 30000 files x 3
total: 30000 creates in 50.11 seconds: 598.69 creates/second
iter 1: 30000 creates without pool: 50.11
fat-amd-1-ib: Pool lustre.pool1 created
fat-amd-1-ib: OST lustre-OST0000_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID added to pool lustre.pool1
total: 30000 creates in 68.71 seconds: 436.59 creates/second
iter 1: 30000 creates with pool: 68.71
fat-amd-1-ib: OST lustre-OST0000_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID removed from pool lustre.pool1
fat-amd-1-ib: Pool lustre.pool1 destroyed
total: 30000 creates in 64.46 seconds: 465.39 creates/second
iter 1: 30000 creates with missing pool: 64.46

Create performance, iteration 2, 30000 files x 3
total: 30000 creates in 49.84 seconds: 601.93 creates/second
iter 2: 30000 creates without pool: 49.84
fat-amd-1-ib: Pool lustre.pool1 created
fat-amd-1-ib: OST lustre-OST0000_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID added to pool lustre.pool1
total: 30000 creates in 61.67 seconds: 486.42 creates/second
iter 2: 30000 creates with pool: 61.67
fat-amd-1-ib: OST lustre-OST0000_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID removed from pool lustre.pool1
fat-amd-1-ib: Pool lustre.pool1 destroyed
total: 30000 creates in 69.00 seconds: 434.80 creates/second
iter 2: 30000 creates with missing pool: 69.00

Create performance, iteration 3, 30000 files x 3
total: 30000 creates in 50.25 seconds: 597.06 creates/second
iter 3: 30000 creates without pool: 50.25
fat-amd-1-ib: Pool lustre.pool1 created
fat-amd-1-ib: OST lustre-OST0000_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID added to pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID added to pool lustre.pool1
total: 30000 creates in 69.04 seconds: 434.50 creates/second
iter 3: 30000 creates with pool: 69.04
fat-amd-1-ib: OST lustre-OST0000_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0001_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0002_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0003_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0004_UUID removed from pool lustre.pool1
fat-amd-1-ib: OST lustre-OST0005_UUID removed from pool lustre.pool1
fat-amd-1-ib: Pool lustre.pool1 destroyed
total: 30000 creates in 63.49 seconds: 472.49 creates/second
iter 3: 30000 creates with missing pool: 63.49

Avg time taken for 30000 creates without pool: 50.06
Avg time taken for 30000 creates with pool: 66.47
Avg time taken for 30000 creates with missing pool: 65.65
No pool to wide pool: 32.78 %.
 ost-pools test_18: @@@@@@ IGNORE (bz23408): Degradation with wide pool is 32.78 % (> 15 %) 
Dumping lctl log to /home/yujian/test_logs/2011-07-29/004914/ost-pools.test_18.*.1311927141.log
No pool to missing pool: 31.14 %.
 ost-pools test_18: @@@@@@ IGNORE (bz23408): Degradation with missing pool is 31.14 % (> 30 %) 
Dumping lctl log to /home/yujian/test_logs/2011-07-29/004914/ost-pools.test_18.*.1311927144.log
Resetting fail_loc on all nodes...done.

Maloo report: https://maloo.whamcloud.com/test_sets/3f75d49c-b9bb-11e0-8bdf-52540025f9af

Comment by Hongchao Zhang [ 29/Jul/11 ]

it's related to 1.8<>2.1, the creation performance is almost the same for 2.1<>2.1, but show much difference for 1.8<->2.1

Comment by Peter Jones [ 01/Aug/11 ]

So, is the action here to land this fix on the 1.8.x branch

Comment by Jian Yu [ 01/Aug/11 ]

So, is the action here to land this fix on the 1.8.x branch

Unfortunately, the fix does not resolve the issue under the 1.8<->2.1 interop configuration. I think Hongchao is still investigating.

Comment by Jian Yu [ 04/Aug/11 ]

Hi Hongchao,
Per your suggestion, I ran the test on the latest b1_8 clients with the latest master servers, the issue still existed:
https://maloo.whamcloud.com/test_sets/869c0eea-be6b-11e0-8bdf-52540025f9af

Here is the configuration:

Lustre Clients:
Branch: b1_8
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/119/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/

Lustre Servers:
Branch: master
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/240/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/

Network: IB (inkernel OFED)
ENABLE_QUOTA=yes
Comment by Jian Yu [ 28/Aug/11 ]

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/271/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)

The same issue occurred: https://maloo.whamcloud.com/test_sets/9207414a-cf7e-11e0-8d02-52540025f9af

Comment by Sarah Liu [ 30/Jan/12 ]

Hit the same issue when running interop test between 1.8.7-wc1 and 2.1.55
https://maloo.whamcloud.com/test_sets/fca92dd6-4b0e-11e1-915b-5254004bbbd3

Comment by Peter Jones [ 08/Feb/12 ]

I believe Andreas is fixing this under LU-1042

Comment by Hongchao Zhang [ 09/Feb/12 ]

this issue could be related the config reprocess due to the config change caused by the pool's operation,
it's better to skip the first creation operations in the "with-pool" case and "with-missing-pool" case, will submit
a patch to verify whether it's the case.

Comment by Hongchao Zhang [ 12/Feb/12 ]

the patch is tracked at http://review.whamcloud.com/#change,2136

Comment by Jian Yu [ 16/Feb/12 ]

Lustre Clients:
Tag: 1.8.7-wc1
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-274.3.1.el5)
Build: http://build.whamcloud.com/job/lustre-b1_8/171/
Network: TCP (1GigE)
ENABLE_QUOTA=yes

Lustre Servers:Tag: v2_1_1_0_RC2
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-274.12.1.el5_lustre.g4554b65)
Build: http://build.whamcloud.com/job/lustre-b2_1/41/
Network: TCP (1GigE)

The same issue occurred: https://maloo.whamcloud.com/test_sets/32ec951e-587e-11e1-a226-5254004bbbd3

Comment by John Fuchs-Chesney (Inactive) [ 07/Jan/16 ]

Marking this as resolved/incomplete, given the length of time since it was last updated.

If anyone disagrees, or would like this ticket re-opened, please let us know.
Thanks,
~ jfc.

Generated at Sat Feb 10 01:05:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.