[LU-10084] replay-ost-single test_3: test failed to respond and timed out Created: 05/Oct/17 Updated: 19/Mar/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis, full, x86_64 servers, ppc clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
https://testing.hpdd.intel.com/test_sessions/ba995751-659c-4e63-9b5b-fbf101137b78 From client dmesg: [ 600.102679] sync D 00003fff9cdd6448 0 16435 16247 0x00000080 [ 600.102719] Call Trace: [ 600.102734] [c000000079b835d0] [c00000007ffdbb80] 0xc00000007ffdbb80 (unreliable) [ 600.102780] [c000000079b837a0] [c000000000019634] .__switch_to+0x254/0x460 [ 600.102818] [c000000079b83850] [c0000000009a9c1c] .__schedule+0x43c/0xb00 [ 600.102858] [c000000079b83980] [c0000000009a5e78] .schedule_timeout+0x398/0x460 [ 600.102903] [c000000079b83a90] [c0000000009aa7c8] .wait_for_completion+0x148/0x1d0 [ 600.102955] [c000000079b83b60] [c0000000003688b4] .sync_inodes_sb+0xc4/0x260 [ 600.102994] [c000000079b83c70] [c000000000371c9c] .sync_inodes_one_sb+0x1c/0x30 [ 600.103039] [c000000079b83ce0] [c0000000003243cc] .iterate_supers+0x22c/0x2f0 [ 600.103078] [c000000079b83da0] [c000000000371fb8] .sys_sync+0x48/0xd0 [ 600.103117] [c000000079b83e30] [c00000000000a184] system_call+0x38/0xb4 |
| Comments |
| Comment by James Nunez (Inactive) [ 19/Mar/19 ] |
|
We continue to see replay-ost-single test 3 hang. A recent example is for 2.10.7 RC1, https://testing.whamcloud.com/test_sets/42434dfe-4332-11e9-92fe-52540065bddc with the following in the client (vm1) dmesg [ 480.120291] INFO: task tee:18534 blocked for more than 120 seconds. [ 480.120391] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 480.120446] tee D 00003fff8b241588 0 18534 18350 0x00000080 [ 480.120510] Call Trace: [ 480.120546] [c000000077ddabe0] [0000000000003d15] 0x3d15 (unreliable) [ 480.120635] [c000000077ddadb0] [c00000000001b76c] .__switch_to+0x25c/0x470 [ 480.120707] [c000000077ddae60] [c000000000aa40fc] .__schedule+0x42c/0xae0 [ 480.120766] [c000000077ddaf90] [c000000000a9fea4] .schedule_timeout+0x394/0x470 [ 480.120830] [c000000077ddb090] [c000000000aa3c1c] .io_schedule+0xcc/0x180 [ 480.120885] [c000000077ddb120] [c000000000aa01a0] .bit_wait_io+0x20/0x80 [ 480.120939] [c000000077ddb1a0] [c000000000aa043c] .__wait_on_bit+0x17c/0x210 [ 480.121000] [c000000077ddb250] [c000000000291d50] .wait_on_page_bit+0x100/0x120 [ 480.121094] [c000000077ddb310] [d000000003d70848] .vvp_page_assume+0x48/0xe0 [lustre] [ 480.121202] [c000000077ddb390] [d000000002a46ad0] .cl_page_assume+0xf0/0x490 [obdclass] [ 480.121278] [c000000077ddb450] [d000000003d59c68] .ll_write_begin+0x198/0xb60 [lustre] [ 480.121342] [c000000077ddb550] [c00000000028f924] .generic_file_buffered_write+0x134/0x320 [ 480.121408] [c000000077ddb670] [c0000000002919c0] .__generic_file_aio_write+0x320/0x4a0 [ 480.121490] [c000000077ddb740] [d000000003d76948] .vvp_io_write_start+0x378/0x1210 [lustre] [ 480.121575] [c000000077ddb870] [d000000002a4b828] .cl_io_start+0xc8/0x240 [obdclass] [ 480.121657] [c000000077ddb910] [d000000002a51f78] .cl_io_loop+0x948/0x1180 [obdclass] [ 480.121731] [c000000077ddba40] [d000000003cea04c] .ll_file_io_generic+0x27c/0x1020 [lustre] [ 480.121803] [c000000077ddbbd0] [d000000003ceb2dc] .ll_file_aio_write+0x20c/0x320 [lustre] [ 480.121876] [c000000077ddbca0] [d000000003ceb508] .ll_file_write+0x118/0x310 [lustre] [ 480.121945] [c000000077ddbd80] [c000000000371a64] .SyS_write+0x164/0x440 [ 480.122002] [c000000077ddbe30] [c00000000000a284] system_call+0x38/0xfc |