[LU-5857] catastrophe cannot be found: “error: get_param: /proc/{fs,sys}/{lnet,lustre}/catastrophe: Found no match” Created: 04/Nov/14  Updated: 05/Dec/14  Resolved: 05/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Autotest


Severity: 3
Rank (Obsolete): 16396

 Description   

In conf-sanity, each test is called from a wrapper function run_test(). When the test completes, run_test() eventually calls check_catastrophe. For some tests, check_catastrophe() does not complete correctly:

02:31:40:CMD: onyx-38vm3,onyx-38vm4,onyx-38vm5 rc=\$(lctl get_param -n catastrophe);
02:31:40:		if [ \$rc -ne 0 ]; then echo \$(hostname): \$rc; fi
02:31:40:		exit \$rc
02:31:40:onyx-38vm5: error: get_param: /proc/{fs,sys}/{lnet,lustre}/catastrophe: Found no match
02:31:40:onyx-38vm5: sh: line 1: [: -ne: unary operator expected

For example, conf-sanity tests 53a, 53b, 56, 57a, 58, 61, 62, 66, 67 and others at https://testing.hpdd.intel.com/test_sets/5634cf00-6375-11e4-b5da-5254006e85c2 have these errors during the test clean up in the call to check_catastrophe.

More of these error messages in conf-sanity can be found at:
https://testing.hpdd.intel.com/test_sets/20be82f6-637c-11e4-80e1-5254006e85c2

The test is correctly marked as pass or fail regardless of if check_catastrophe works or not.



 Comments   
Comment by Jodi Levi (Inactive) [ 04/Nov/14 ]

Yu Jian,
Could you please look into this one?
Thank you!

Comment by Andreas Dilger [ 04/Nov/14 ]

Is this failing because the filesystem is unmounted and the modules removed when get_param is called, or is there some problem with this file in /proc?

Comment by Jian Yu [ 04/Nov/14 ]

unload_modules() was called at the end of those sub-tests before running check_catastrophe().

Comment by Andreas Dilger [ 07/Nov/14 ]

In that case, this shouldn't even be done if the libcfs module isn't loaded, since it isn't possible to have an LBUG and then unload the modules.

Comment by Jian Yu [ 07/Nov/14 ]

I'll upload a patch to improve check_catastrophe() accordingly.

Comment by Jian Yu [ 10/Nov/14 ]

Patch for master branch: http://review.whamcloud.com/12640

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12640/
Subject: LU-5857 tests: check lctl return value in check_catastrophe()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 43d19e90c719402a8d73e559ce4368aa55a4f16b

Comment by Jodi Levi (Inactive) [ 05/Dec/14 ]

Patch landed to Master.

Generated at Sat Feb 10 01:55:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.