[LU-2776] Test failure: sanityn, subtest test_51a "multiop is still there" Created: 07/Feb/13  Updated: 22/Dec/17  Resolved: 10/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0
Fix Version/s: Lustre 2.4.0, Lustre 2.11.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: zfs

Issue Links:
Related
Severity: 3
Rank (Obsolete): 6729

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/5c072382-70f7-11e2-9241-52540035b04c.

The sub-test test_51a failed with the following error:

multiop is still there

Info required for matching: sanityn 51a



 Comments   
Comment by Andreas Dilger [ 07/Feb/13 ]

My first guess is that this test might be racy. There is only a 0.1s margin for multiop to

{fork, exec, sleep, wake, read}

while the "dd" is ongoing, so this could fail on occasion. Could you try bumping this margin to 0.5s, and hopefully this race will disappear.

Jinshan, could you please comment on the intent of this test? Will increasing the margin ruin the test?

Comment by Nathaniel Clark [ 07/Feb/13 ]

I was going to say that this test seemed racy, and it passes on my local VMs.

Comment by Jinshan Xiong (Inactive) [ 07/Feb/13 ]

Hi Andreas, it turns out your guess is right. From the log, the read was started after sleeping 2 seconds. I will fix it.

Comment by Jinshan Xiong (Inactive) [ 11/Feb/13 ]

patch is at: http://review.whamcloud.com/5321

Comment by Peter Jones [ 05/Mar/13 ]

Landed for 2.4

Comment by Bob Glossman (Inactive) [ 08/Apr/13 ]

Even though this bug is closed I think I see a fresh instance in https://maloo.whamcloud.com/test_sets/7548a8bc-9fcc-11e2-86dc-52540035b04c

Is this the same bug, or should I open a fresh bug report?

Comment by Jinshan Xiong (Inactive) [ 08/Apr/13 ]

let's reopen it.

Comment by Bruno Faccini (Inactive) [ 26/Jun/13 ]

Just hit one more occurence in https://maloo.whamcloud.com/test_sets/aa92a7ca-de1a-11e2-b04c-52540035b04c, with the new sleep 1s from the patch.

I am wondering if multiop is stuck somewhere, is there a way we get a full backtraces dump from this specific test, like for time-outs ??

Comment by Keith Mannthey (Inactive) [ 01/Jul/13 ]

Another occurrence:

https://maloo.whamcloud.com/test_sets/3c2b25fe-e08c-11e2-b3fd-52540035b04c

test_51a 	

    Error: 'multiop is still there'
    Failure Rate: 28.00% of last 100 executions [all branches] 

This seems to be hitting a bit.

Comment by Gerrit Updater [ 15/Jun/17 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/27662
Subject: LU-2776 tests: waiting multiop finished in sanityn:51a
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2426f20903fda471a7dfb33b66df4da4e9de88cb

Comment by Gerrit Updater [ 10/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27662/
Subject: LU-2776 tests: waiting multiop finished in sanityn:51a
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9b63762c47732b8d3cb17935d528e0c0fad814db

Comment by Peter Jones [ 10/Sep/17 ]

Landed for 2.11

Generated at Sat Feb 10 01:28:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.