[LU-2851] Interop 2.3.0<->2.4 failure on test suite runtests: timeout when doing cp Created: 22/Feb/13  Updated: 23/Nov/17  Resolved: 23/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: lustre-2.3.0
client: lustre master build# 1256


Attachments: File client_debug.log.tar.bz2    
Severity: 3
Rank (Obsolete): 6902

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/88f6b4e2-7571-11e2-93d9-52540035b04c.

The sub-test runtests failed with the following error:

test failed to respond and timed out

CMD: client-27vm7 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
CMD: client-27vm2.lab.whamcloud.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
enable jobstats, set job scheduler as procname_uid
CMD: client-27vm7 /usr/sbin/lctl conf_param lustre.sys.jobid_var=procname_uid
CMD: client-27vm2.lab.whamcloud.com /usr/sbin/lctl get_param -n jobid_var
enable quota as required
CMD: client-27vm7 /usr/sbin/lctl get_param -n version
CMD: client-27vm1,client-27vm7,client-27vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/sbin:/usr/sbin:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"0x33f0404\" \" 0xffb7e3ff\" 32 
touching /mnt/lustre at Mon Feb 11 15:50:22 PST 2013
create an empty file /mnt/lustre/hosts.12675
copying /etc/hosts to /mnt/lustre/hosts.12675


 Comments   
Comment by Jodi Levi (Inactive) [ 22/Feb/13 ]

Sarah,
We need to see the logs before we can determine this fix.
Can you attach the logs to this ticket?
Thank you!

Comment by Sarah Liu [ 25/Feb/13 ]

Jodi, I cannot find more logs than the above, here is another instance found between 2.1.4 server vs 2.4 client, still no useful logs. I will try to run the test manually to get more information

https://maloo.whamcloud.com/test_sets/e246fd9e-7d7e-11e2-85d0-52540035b04c

Comment by Jodi Levi (Inactive) [ 05/Mar/13 ]

Sarah,
Have you had a chance to reproduce this manually?

Comment by Sarah Liu [ 12/Mar/13 ]

I can reproduce it manually, here is the client trace

cp            R  running task        0  7510   5875 0x00000080
 ffff8803241b42a0 ffffffffa050e355 ffff8803245b4e78 ffff8803245b4cb8
 ffff88031e9f84e0 0000000000000010 ffff8803234d8f08 ffffffffa052e086
 ffff88031e9f84e0 ffff88031eb8ed60 0000000000000000 ffffffffa0511da5
Call Trace:
 [<ffffffffa0505995>] ? cl_env_info+0x15/0x20 [obdclass]
 [<ffffffffa094aafa>] ? lov_io_rw_iter_init+0x19a/0x2f0 [lov]
 [<ffffffffa05193c5>] ? cl_io_lock+0x485/0x560 [obdclass]
 [<ffffffffa0519542>] ? cl_io_loop+0xa2/0x1b0 [obdclass]
 [<ffffffffa0a14528>] ? ll_file_io_generic+0x428/0x570 [lustre]
 [<ffffffffa0a158e2>] ? ll_file_aio_write+0x142/0x2c0 [lustre]
 [<ffffffffa0a15bcc>] ? ll_file_write+0x16c/0x2a0 [lustre]
 [<ffffffff81176588>] ? vfs_write+0xb8/0x1a0
 [<ffffffff81176e81>] ? sys_write+0x51/0x90
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Comment by Sarah Liu [ 12/Mar/13 ]

client debug log

Comment by Keith Mannthey (Inactive) [ 13/Aug/13 ]

A simple patch for runtests has been applied to master and it lets logging work so you can see what has really happened. http://review.whamcloud.com/7014

Comment by Andreas Dilger [ 23/Nov/17 ]

Close old test issues that haven't been seen recently.

Generated at Sat Feb 10 01:28:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.