[LU-7082] conf-sanity test_90b: MDT start failed Created: 01/Sep/15  Updated: 30/Oct/15  Resolved: 07/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5319 Support multiple slots per client in ... Resolved
is related to LU-7359 Interop 2.5.5<->master: conf-sanity t... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ed8e54ea-50c6-11e5-9da9-5254006e85c2.

The sub-test test_90b failed with the following error:

MDT start failed

this looks like LU-5319, but that is marked Fixed

Info required for matching: conf-sanity 90b



 Comments   
Comment by Andreas Dilger [ 02/Sep/15 ]

This test was just added via http://review.whamcloud.com/14861.

Comment by Gregoire Pichon [ 03/Sep/15 ]

Looking at MDS1 debug_log, it appears that for part 2 of test90a the fail_loc was reset to zero before the chmod commands that should consum the mod RPC slots was launch.

$ grep -E "fail_loc=|cfs_fail_loc" conf_sanity_test90a_mds_debug_log.txt
00000001:02000400:0.0:1441121327.961865:0:19916:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x159
00000020:02000000:0.0:1441121328.125150:0:19480:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000020:02000000:1.0:1441121328.126229:0:19887:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000020:02000000:0.0:1441121328.130553:0:19480:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000020:02000000:0.0:1441121328.132688:0:19480:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000020:02000000:1.0:1441121328.135003:0:19887:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000020:02000000:0.0:1441121328.138574:0:19480:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000001:02000400:0.0:1441121328.236111:0:19971:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0
00000001:02000400:1.0:1441121335.298931:0:20027:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x159
00000020:02000000:0.0:1441121335.438032:0:19478:0:(libcfs_fail.h:96:cfs_fail_check_set()) *** cfs_fail_loc=159, val=0***
00000001:02000400:0.0:1441121335.564351:0:20082:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0

I am going to add a small delay to ensure serialization occurs as expected.

The MDT start failure for test_90b and other test cases, is probably due to a bad management of the error path. Will fix this also.

Comment by Gerrit Updater [ 03/Sep/15 ]

Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/16215
Subject: LU-7082 test: fix synchronization of conf_sanity test_90
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 21c4377cdb7162b58aa4df767c5535728a56e248

Comment by Gerrit Updater [ 07/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16215/
Subject: LU-7082 test: fix synchronization of conf_sanity test_90
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9937eb5fa44c66af36fae39767468a76714a2207

Comment by Joseph Gmitter (Inactive) [ 07/Oct/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:05:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.