[LU-3205] Interop 2.1.5<->2.4 failure on test suite sanity test_24u Created: 23/Apr/13  Updated: 27/Apr/13  Resolved: 27/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: LB
Environment:

Client: master 2.3.64--PRISTINE-2.6.32-358.2.1.el6.x86_64 (commit c59dee14f57cd0c1df114d1cafa2148d1bfeb9d1, build http://build.whamcloud.com/job/lustre-master/1411)
Server: 2.1.5 RC1--PRISTINE-2.6.32-279.19.1.el6_lustre.x86_64 (commit 643e972a0ed9ad5abb2d4cf5783ffa0886035d10, build http://build.whamcloud.com/job/lustre-b2_1/191)


Severity: 3
Rank (Obsolete): 7825

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/ec0b91f2-a6bc-11e2-9b48-52540035b04c.

The sub-test test_24u failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 24u



 Comments   
Comment by Andreas Dilger [ 23/Apr/13 ]

It looks like multiop got stuck on the client during the write trying to get cl_lock:

Apr 15 03:04:25 client-24vm2 kernel: multiop       R  running task        0 14143  13996 0x00000080
Apr 15 03:04:25 client-24vm2 kernel: ffff88006fbe3c58 0000000000000086 0000000000000052 0000000100000020
Apr 15 03:04:25 client-24vm2 kernel: 516bd0a700000000 00000000000d14d1 0000373f00000000 000005f400000000
Apr 15 03:04:25 client-24vm2 kernel: ffff88007c3b25f8 ffff88006fbe3fd8 000000000000fb88 ffff88007c3b2600
Apr 15 03:04:25 client-24vm2 kernel: Call Trace:
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffff81064d6a>] __cond_resched+0x2a/0x40
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffff8150e320>] _cond_resched+0x30/0x40
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffff8150ee4e>] mutex_lock+0x1e/0x50
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0590e6f>] cl_lock_mutex_get+0x6f/0xd0 [obdclass]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa05937a9>] cl_wait+0x39/0x250 [obdclass]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0599cc5>] cl_io_lock+0x485/0x560 [obdclass]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0599e42>] cl_io_loop+0xa2/0x1b0 [obdclass]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0a6f7f0>] ll_file_io_generic+0x450/0x600 [lustre]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0a70c12>] ll_file_aio_write+0x142/0x2c0 [lustre]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffffa0a70efc>] ll_file_write+0x16c/0x2a0 [lustre]
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffff81181078>] vfs_write+0xb8/0x1a0
Apr 15 03:04:25 client-24vm2 kernel: [<ffffffff81181971>] sys_write+0x51/0x90

Sarah, can you please submit another test run with this config to see if this problem will repeat?

Comment by Jinshan Xiong (Inactive) [ 23/Apr/13 ]

This problem is due to compatibility check. I'll work out a fix.

Comment by Jinshan Xiong (Inactive) [ 24/Apr/13 ]

patch is at: http://review.whamcloud.com/6137

Comment by Peter Jones [ 27/Apr/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:31:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.