[LU-1668] Test failure on test suite conf-sanity 53a Created: 24/Jul/12  Updated: 16/Feb/14  Resolved: 16/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre-master-tag-2.2.91 OFED build


Severity: 3
Rank (Obsolete): 4486

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/5b019fd0-d5c0-11e1-b078-52540035b04c.

== conf-sanity test 53a: check OSS thread count params == 13:54:59 (1343076899)
Loading modules from /usr/lib64/lustre
detected 4 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
debug=-1
subsystem_debug=0xffb7efff
../lnet/lnet/lnet options: 'accept=all networks="o2ib0(ib0),tcp0(eth0)" accept_port=7988'
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
gss/krb5 is not supported
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
start mds service on client-3-ib
Starting mds1: -o user_xattr,acl  /dev/sda3 /mnt/mds1
Started lustre-MDT0000
start ost1 service on fat-amd-4-ib
Starting ost1:   /dev/sdb1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: client-4.lab.whamcloud.com: -o flock,user_xattr,acl client-3-ib@o2ib:/lustre /mnt/lustre
setup single mount lustre success
checking (($tstarted && $tmin && $tmax)) (((14 && 12 && 384)))...
checking (($tstarted >= $tmin && $tstarted <= $tmax )) (((14 >= 12 && 14 <= 384 )))...
ost.OSS.ost.threads_min=14
ost.OSS.ost.threads_max=382
checking (($tmin2 == ($tmin + $ncpts) && $tmax2 == ($tmax - $ncpts))) (((12 == (12 + 2) && 380 == (384 - 2))))...
 conf-sanity test_53a: @@@@@@ FAIL: Assertion 25 failed: (($tmin2 == ($tmin + $ncpts) && $tmax2 == ($tmax - $ncpts))) (expanded: ((12 == (12 + 2) && 380 == (384 - 2))))
Insane OST thread counts 


 Comments   
Comment by Liang Zhen (Inactive) [ 10/Aug/12 ]
thread_sanity() {
        local modname=$1
        local facet=$2
        local parampat=$3
        local opts=$4
        local tmin
        local tmin2
        local tmax
        local tmax2
        local tstarted
        local paramp
        local msg="Insane $modname thread counts"
        local ncpts=$(check_cpt_number)

check_cpt_number is checking ncpts on localhost, but it really should be "do_fact $facet check_cpt_number", I will post a patch soon

Comment by Liang Zhen (Inactive) [ 10/Aug/12 ]

Patch is here: http://review.whamcloud.com/3595

Comment by James A Simmons [ 10/Aug/12 ]

My testing also has run into this bug. I'm still failing with the patch.

53b (Side note - I'm running with 1024 mds threads by default)

setup single mount lustre success
checking (($tstarted && $tmin && $tmax)) (((1024 && 1024 && 1024)))...
checking (($tstarted >= $tmin && $tstarted <= $tmax )) (((1024 >= 1024 && 1024 <= 1024 )))...
barry-mds1: error: set_param: writing to file /proc/fs/lustre/mdt/lustre-MDT0000/mdt//threads_min: Numerical result out of range
pdsh@spoon01: barry-mds1: ssh exited with exit code 1
mdt.lustre-MDT0000.mdt..threads_min=1026
barry-mds1: error: set_param: writing to file /proc/fs/lustre/mdt/lustre-MDT0000/mdt//threads_max: Numerical result out of range
pdsh@spoon01: barry-mds1: ssh exited with exit code 1
mdt.lustre-MDT0000.mdt..threads_max=1022
checking (($tmin2 == ($tmin + $ncpts) && $tmax2 == ($tmax - $ncpts))) (((1024 == (1024 + 2) && 1024 == (1024 - 2))))...
conf-sanity test_53b: @@@@@@ FAIL: Assertion 25 failed: (($tmin2 == ($tmin + $ncpts) && $tmax2 == ($tmax - $ncpts))) (expanded: ((1024 == (1024 + 2) && 1024 == (1024 - 2))))
Insane MDT thread counts

For 53a I get

mount lustre on /lustre/barry.....
Starting client: spoon01: -o user_xattr,flock 10.37.248.67@o2ib1:/lustre /lustre/barry
setup single mount lustre success
checking (($tstarted == $tmin && $tstarted == $tmax )) (((9 == 6 && 9 == 192 )))...
conf-sanity test_53a: @@@@@@ FAIL: Assertion 28 failed: (($tstarted == $tmin && $tstarted == $tmax )) (expanded: ((9 == 6 && 9 == 192 )))
Insane OST thread counts
Trace dump:
= ./../tests/test-framework.sh:3683:error_noexit()
= ./../tests/test-framework.sh:3705:error()
= ./../tests/functions.sh:120:lassert()
= ./conf-sanity.sh:2620:thread_sanity()
= ./conf-sanity.sh:2632:test_53a()

Comment by Sarah Liu [ 14/Aug/12 ]

another failure, client is running kernel 2.6.38-fc15

https://maloo.whamcloud.com/test_sets/b0206b50-e5c3-11e1-ae4e-52540035b04c

Comment by James A Simmons [ 16/Aug/12 ]

Looking at the data for 53a I'm getting

tmin = 6
tmax = 192
tstarted 8

Which causes the ASSERT to fail

Comment by James A Simmons [ 17/Aug/12 ]

I found the reason for the failure for my testing. In /etc/modprobe.d/lustre.conf I have

options mdt mds_num_threads=1024

When I pound it out both test pass.

Comment by Liang Zhen (Inactive) [ 17/Aug/12 ]

hi James, then could you change the review flag? or you want to me to make more change to the patch so it can pass even with setting this parameter? I think this parameter is conflicting with the conf-sanity because conf-sanity needs to set this parameter as well.

Comment by James A Simmons [ 17/Aug/12 ]

Yes could you arrange for the test to pass even in the case with setting this parameter.

Comment by James A Simmons [ 17/Aug/12 ]

Okay I can change my review since your current patch cover most cases whereas my problem is a specific corner case. If I can change my review can you keep this ticket open until my problem is resolved?

Comment by Peter Jones [ 20/Aug/12 ]

Dropping priority as patch covering most cases has been landed to master for 2.3

Comment by Sarah Liu [ 26/Dec/12 ]

another failure seen in interop testing between 2.1.3 server and 2.4 client: https://maloo.whamcloud.com/test_sets/351a6028-4a86-11e2-8a7b-52540035b04c

Comment by Jian Yu [ 19/Dec/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/69/ (2.4.2 RC1)

conf-sanity test 53b failed:
https://maloo.whamcloud.com/test_sets/dbed7af0-685f-11e3-a16f-52540035b04c

Comment by Liang Zhen (Inactive) [ 16/Feb/14 ]

I think this has already been fixed

Generated at Sat Feb 10 01:18:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.