[LU-4441] sanity test_60a: Module llog_test is in use Created: 06/Jan/14  Updated: 16/May/17  Resolved: 28/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.3
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Major
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 12185

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/b44af13a-761f-11e3-a7a8-52540035b04c.

Even though maloo reports a 0% failure, search on subtest 60a of sanity shows a bunch of FAILs

The sub-test test_60a failed with the following error:

wtm-20vm3: ERROR: Module llog_test is in use
test_60a failed with 3

Info required for matching: sanity 60a



 Comments   
Comment by Andreas Dilger [ 09/Jan/14 ]

Bob, can this be correlated to some particular patch landing? For example, are the failures starting on a particular patch (maybe intermittently), and then moving to master after that patch landed? Or do the failures start with a common parent patch?

Comment by Andreas Dilger [ 10/Feb/14 ]

This still seems to be failing on a regular basis - more than once a day.

Comment by Bob Glossman (Inactive) [ 20/Feb/14 ]

another instance:
https://maloo.whamcloud.com/test_sessions/92085f0c-9a4d-11e3-baa9-52540035b04c

Comment by James Nunez (Inactive) [ 02/Mar/14 ]

Another failure at https://maloo.whamcloud.com/test_sessions/0936d8b4-a17a-11e3-a08f-52540035b04c

Comment by Bob Glossman (Inactive) [ 04/Mar/14 ]

another
https://maloo.whamcloud.com/test_sessions/a2060ea0-a3b1-11e3-b259-52540035b04c

Comment by John Hammond [ 13/Mar/14 ]

https://maloo.whamcloud.com/test_sets/c58bc76e-aa61-11e3-a4db-52540035b04c

Comment by Andreas Dilger [ 20/Mar/14 ]

This is still being hit about twice a day on average over the past four weeks.

Comment by Bob Glossman (Inactive) [ 03/Apr/14 ]

another
https://maloo.whamcloud.com/test_sets/596ef978-bb1b-11e3-b43b-52540035b04c

Comment by James Nunez (Inactive) [ 03/Apr/14 ]

Hit this one again: https://maloo.whamcloud.com/test_sessions/f1d4efe4-bafb-11e3-8ec1-52540035b04c

Comment by Emoly Liu [ 24/Apr/14 ]

This error happens very occasionally recently and it's hard to reproduce it locally.

I create a patch http://review.whamcloud.com/9966/ to improve run-llog.sh so that when this error happens again, it can print more information.

Comment by nasf (Inactive) [ 25/Apr/14 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/307cd854-cc56-11e3-8180-52540035b04c

Comment by Andreas Dilger [ 05/May/14 ]

This was hit several times in the past week - 13 failures in total, of 377 test runs, which is about 3.4%. Not fatal, but enough to keep our eye on.

Comment by Andreas Dilger [ 13/May/14 ]

Adding this to the 2.6.0 tracking list, since this test is causing quite a lot of failures.

Emoly, are you actually working on this, or should it be assigned back to HPDD Triage?

Comment by Emoly Liu [ 14/May/14 ]

Andreas, the patch http://review.whamcloud.com/9966/ is waiting for the second review. But, that is not a fix but a debugging patch to print more information.

I don't mind reassigning this ticket to others if the issue can be fixed soon.

Comment by Peter Jones [ 05/Jun/14 ]

Landed for 2.6

Comment by Peter Jones [ 05/Jun/14 ]

Bob correctly pointed out that only the debug patch landed. I have reopened the ticket and moved it out of scope for 2.6 as we still do not understand the scope of this issue

Comment by Jian Yu [ 24/Aug/14 ]

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/84/
Distro/Arch: RHEL6.5/x86_64
MDSCOUNT=2

The same failure occurred:
https://testing.hpdd.intel.com/test_sets/b4faa3be-2a8d-11e4-b21b-5254006e85c2

Comment by Emoly Liu [ 25/Aug/14 ]

Hi Yujian, the patch http://review.whamcloud.com/9966/ has not been backported to b2_5. Although it is a debug patch, the problem has not been seen any more on master with it.

So, do you mind if I backport it to b2_5?

Comment by Jian Yu [ 25/Aug/14 ]

No, of course not. Thank you, Emoly.

Comment by Emoly Liu [ 25/Aug/14 ]

backport to b2_5: http://review.whamcloud.com/11573

Comment by Peter Jones [ 28/Nov/14 ]

As per Andreas there is a change that could have fixed this issue so marking as fixed in 2.7

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11573/
Subject: LU-4441 test: improve run-llog.sh to print more information
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: 623efd2fb272269d3b82b73bbdf759c8d3b20ae0

Generated at Sat Feb 10 01:42:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.