[LU-1942] 2.1.3<->2.3 Test failure on test suite sanity-benchmark, subtest test_fsx Created: 14/Sep/12  Updated: 17/Apr/17  Resolved: 17/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: 2.1.3 RHEL6
client: 2.3 tag-2.2.95 RHEL6


Issue Links:
Related
is related to LU-2246 failure on sanity.sh test_132: ASSERT... Resolved
Severity: 3
Rank (Obsolete): 5300

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6274e3b4-fe45-11e1-a707-52540035b04c.

The sub-test test_fsx failed with the following error:

test failed to respond and timed out

 fsx           S 0000000000000000     0 29817  29664 0x00000080
 ffff8800371ab8c8 0000000000000086 ffff8800371ab938 ffffffffa0977b76
 ffff8800371ab8a8 0000000000000001 ffff8800371ab888 ffffffff81039678
 ffff880077cc45f8 ffff8800371abfd8 000000000000fb88 ffff880077cc45f8
Call Trace:
 [<ffffffffa0977b76>] ? __osc_extent_remove+0xa6/0x440 [osc]
 [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffffa043677e>] cfs_waitq_wait+0xe/0x10 [libcfs]
 [<ffffffffa098013f>] osc_extent_wait+0xef/0x2a0 [osc]
 [<ffffffff81060250>] ? default_wake_function+0x0/0x20
 [<ffffffffa0982987>] osc_cache_truncate_start+0x1cf7/0x20b0 [osc]
 [<ffffffffa0436be0>] ? cfs_alloc+0x30/0x60 [libcfs]
 [<ffffffffa07a301f>] ? null_alloc_reqbuf+0x1cf/0x440 [ptlrpc]
 [<ffffffffa0788727>] ? ptlrpcd_add_req+0x187/0x2e0 [ptlrpc]
 [<ffffffffa074e88c>] ? ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc]
 [<ffffffffa0764f30>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
 [<ffffffffa0971dc4>] osc_io_setattr_start+0x294/0x4c0 [osc]
 [<ffffffffa061a230>] ? cl_io_start+0x0/0x140 [obdclass]
 [<ffffffffa061a29a>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa0a0076e>] lov_io_call+0x8e/0x130 [lov]
 [<ffffffffa0a03a5c>] lov_io_start+0x10c/0x190 [lov]
 [<ffffffffa061a29a>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa061ebb4>] cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0abfa18>] cl_setattr_ost+0x208/0x2d0 [lustre]
 [<ffffffffa0a8edc2>] ll_setattr_raw+0x752/0xfd0 [lustre]
 [<ffffffffa0a8f69b>] ll_setattr+0x5b/0xf0 [lustre]
 [<ffffffff81197368>] notify_change+0x168/0x340
 [<ffffffff811799a4>] do_truncate+0x64/0xa0
 [<ffffffff81179c70>] sys_ftruncate+0xf0/0x100
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Alex Zhuravlev [ 26/Oct/12 ]

it's likely I think this quite often:

PID: 5716 TASK: ffff8800550a31c0 CPU: 0 COMMAND: "dbench"
#0 [ffff88004b62d4e8] schedule at ffffffff813d31c0
#1 [ffff88004b62d650] cfs_waitq_wait at ffffffffa01e1619 [libcfs]
#2 [ffff88004b62d660] osc_enter_cache at ffffffffa00f3447 [osc]
#3 [ffff88004b62d790] osc_queue_async_io at ffffffffa00ec85a [osc]
#4 [ffff88004b62d900] osc_page_cache_add at ffffffffa00d8bad [osc]
#5 [ffff88004b62d940] cl_page_cache_add at ffffffffa037e956 [obdclass]
#6 [ffff88004b62d990] lov_page_cache_add at ffffffffa017020d [lov]
#7 [ffff88004b62d9c0] cl_page_cache_add at ffffffffa037e956 [obdclass]
#8 [ffff88004b62da10] vvp_io_commit_write at ffffffffa0a6baf9 [lustre]
#9 [ffff88004b62da80] cl_io_commit_write at ffffffffa038e7f5 [obdclass]
#10 [ffff88004b62dad0] ll_commit_write at ffffffffa0a3f0a8 [lustre]
#11 [ffff88004b62db30] ll_write_end at ffffffffa0a57e78 [lustre]
#12 [ffff88004b62db60] generic_file_buffered_write at ffffffff8108bd58
#13 [ffff88004b62dc40] __generic_file_aio_write at ffffffff8108c25a
#14 [ffff88004b62dcf0] generic_file_aio_write at ffffffff8108c4d7
#15 [ffff88004b62dd30] vvp_io_write_start at ffffffffa0a6d394 [lustre]
#16 [ffff88004b62dd70] cl_io_start at ffffffffa038aafd [obdclass]
#17 [ffff88004b62dda0] cl_io_loop at ffffffffa038f275 [obdclass]
#18 [ffff88004b62ddd0] ll_file_io_generic at ffffffffa0a17fce [lustre]
#19 [ffff88004b62de40] ll_file_aio_write at ffffffffa0a183ca [lustre]
#20 [ffff88004b62de90] ll_file_write at ffffffffa0a185f5 [lustre]
#21 [ffff88004b62def0] vfs_write at ffffffff810be79c

Comment by Alex Zhuravlev [ 26/Oct/12 ]

please, have a look.

Comment by Alex Zhuravlev [ 26/Oct/12 ]

this time with fsx:

PID: 12878 TASK: ffff880051d7a8c0 CPU: 3 COMMAND: "fsx"
#0 [ffff8800296cf7c8] schedule at ffffffff813d31c0
#1 [ffff8800296cf930] cfs_waitq_wait at ffffffffa0af0619 [libcfs]
#2 [ffff8800296cf940] osc_extent_wait at ffffffffa012f2b9 [osc]
#3 [ffff8800296cf9b0] osc_cache_truncate_start at ffffffffa0135f5e [osc]
#4 [ffff8800296cfb90] osc_io_setattr_start at ffffffffa01241af [osc]
#5 [ffff8800296cfc80] cl_io_start at ffffffffa0bcfafd [obdclass]
#6 [ffff8800296cfcb0] lov_io_call.isra.10 at ffffffffa01bbbf1 [lov]
#7 [ffff8800296cfce0] lov_io_start at ffffffffa01bbdaa [lov]
#8 [ffff8800296cfd00] cl_io_start at ffffffffa0bcfafd [obdclass]
#9 [ffff8800296cfd30] cl_io_loop at ffffffffa0bd4275 [obdclass]
#10 [ffff8800296cfd60] cl_setattr_ost at ffffffffa080ee20 [lustre]
#11 [ffff8800296cfdc0] ll_setattr_raw at ffffffffa07e3891 [lustre]
#12 [ffff8800296cfe40] ll_setattr at ffffffffa07e3dc4 [lustre]
#13 [ffff8800296cfe50] notify_change at ffffffff810d5f47
#14 [ffff8800296cfec0] do_truncate at ffffffff810bd040
#15 [ffff8800296cff40] sys_ftruncate at ffffffff810bd345

Comment by Jinshan Xiong (Inactive) [ 13/Nov/12 ]

This looks similar to LU-2286, at least part of it

Comment by Jinshan Xiong (Inactive) [ 15/Nov/12 ]

The issue mentioned at 26/Oct/12 3:43 AM should be a different one. It looks very like that the grant was running out. The other two occurrences are the duplicate of LU-2286.

It's lack of information, so I can't do anything else except waiting.

I'd like to suggest to lower the priority of the bug.

Comment by Andreas Dilger [ 20/Nov/12 ]

Reducing priority on this bug. There is only a single bug triaged as LU-1942, and all of the recent test runs for sanity-benchmark.sh have passed. However, while there are several 2.2/master interop tests in the past few weeks, none of them are 2.1/master interop tests, so the bug may still be present.

Comment by Andreas Dilger [ 17/Apr/17 ]

Close old issue.

Generated at Sat Feb 10 01:21:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.