[LU-7442] conf-sanity test_41c: @@@@@@ FAIL: unexpected concurent MDT mounts rc=17 rc2=0 Created: 17/Nov/15  Updated: 28/Nov/16  Resolved: 06/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: hemaharish Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

single node setup


Issue Links:
Related
is related to LU-5921 conf-sanity test_41c: unexpected conc... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

modules unloaded.
error: set_param: /proc/

{fs,sys}/{lnet,lustre}/fail_loc: Found no match
Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647

error: set_param: /proc/{fs,sys}

/

{lnet,lustre}

/fail_loc: Found no match
Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647

mount.lustre: mount /dev/vdb at /mnt/mds1 failed: File exists
Start of /dev/vdb on mds1 failed 17
Started lustre-MDT0000
Stopping /mnt/mds1 (opts:-f) on fre819
conf-sanity test_41c: @@@@@@ FAIL: unexpected concurent MDT mounts result, rc=17 rc2=0



 Comments   
Comment by Bruno Faccini (Inactive) [ 17/Nov/15 ]

Looks like conf-sanity/test_41c needs some fixes/cleanup, as for LU-5921 which is already assigned to me.
In this particular ticket's case, it seems that the modules unload prevented the fail_loc setting for the test ...
I will cook a patch soon.

Comment by hemaharish [ 20/Nov/15 ]

Hi,
We worked on this patch. Call to "load_modules" fixed the issue, the test case was pass, will land the patch.

== conf-sanity test 41c: concurrent mounts of MDT/OST should all fail but one == 10:42:24 (1447996344)
umount lustre on /mnt/lustre.....
stop ost1 service on centos6.6-Upstream-landing
stop mds service on centos6.6-Upstream-landing
modules unloaded.
Loading modules from /home/hema/xyratex/code/lustre-wc-rel/lustre/tests/..
detected 1 online CPUs by sysfs
libcfs will create CPU partition based on online CPUs
debug=-1
subsystem_debug=all -lnet -lnd -pinger
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
fail_loc=0x703
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
fail_loc=0x0
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
mount.lustre: mount /dev/loop1 at /mnt/mds1 failed: Operation already in progress
The target service is already running. (/dev/loop1)
Start of /tmp/lustre-mdt1 on mds1 failed 114
Started lustre-MDT0000
1st MDT start succeed
2nd MDT start failed with EALREADY
fail_loc=0x703
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
fail_loc=0x0
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
mount.lustre: mount /dev/loop2 at /mnt/ost1 failed: Operation already in progress
The target service is already running. (/dev/loop2)
Start of /tmp/lustre-ost1 on ost1 failed 114
Started lustre-OST0000
1st OST start succeed
2nd OST start failed with EALREADY
stop mds service on centos6.6-Upstream-landing
Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing
Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing
start mds service on centos6.6-Upstream-landing
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
Started lustre-MDT0000
start ost1 service on centos6.6-Upstream-landing
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
Started lustre-OST0000
mount lustre on /mnt/lustre.....
Starting client: centos6.6-Upstream-landing:  -o user_xattr,flock centos6.6-Upstream-landing@tcp:/lustre /mnt/lustre
setup single mount lustre success
umount lustre on /mnt/lustre.....
Stopping client centos6.6-Upstream-landing /mnt/lustre (opts:)
stop ost1 service on centos6.6-Upstream-landing
Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing
stop mds service on centos6.6-Upstream-landing
Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing
modules unloaded.
Resetting fail_loc on all nodes...done.
PASS 41c (78s)

Comment by Gerrit Updater [ 20/Nov/15 ]

HemaHarish (hema.yarramilli@seagate.com) uploaded a new patch: http://review.whamcloud.com/17301
Subject: LU-7442 test: Unexpected concurent MDT mounts in conf-sanity 41c
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3ac88c6a184ca0db1fda8368ec1e4590cd446ffe

Comment by Bruno Faccini (Inactive) [ 20/Nov/15 ]

The reason of the failure (in fact the non-permanent failure!) is still a bit mysterious for me, but patch re-loading of modules after cleanup is harmless and will clear any special cases...
Just for my information, was this failure permanent during you testing ?

Comment by hemaharish [ 23/Nov/15 ]

Yes, failure was permanent on single node setup without patch.

Comment by Bruno Faccini (Inactive) [ 14/Jul/16 ]

hemaharish,
Sorry to be late on this, but am I right if I think that you can encounter this problem solid (I mean the missing load_modules) when you run conf-sanity/test_41c as a single test run and not as part as full conf-sanity test suite ??

Comment by nasf (Inactive) [ 15/Jul/16 ]

Thanks Bruno. I have rebased my patch against the patch http://review.whamcloud.com/#/c/17427 to resolve conf-sanity test_41c failure.

Comment by Gerrit Updater [ 06/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17301/
Subject: LU-7442 tests: Load modules on MDS/OSS in conf-sanity test_41c
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3973c51b0ba246fb9904235206e6b9269d670a51

Comment by Peter Jones [ 06/Aug/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:08:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.