[LU-2105] 2.3<->2.1 interop: Test failure on test suite sanityn, subtest test_33a Created: 08/Oct/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.1.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4393

 Description   

This issue was created by maloo for yujian <yujian@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/11d06cfe-0e3e-11e2-91a3-52540035b04c.

The sub-test test_33a failed with the following error:

=== START createmany old: 0 transaction
CMD: client-28vm6.lab.whamcloud.com,client-28vm5 createmany -o /mnt/lustre/d0.sanityn/d33-\$(hostname)-3/f- -r /mnt/lustre2/d0.sanityn/d33-\$(hostname)-3/f- 10000 > /dev/null 2>&1
test failed to respond and timed out

Info required for matching: sanityn 33a

Lustre Client Build: http://build.whamcloud.com/job/lustre-b2_3/28
Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/121
Distro/Arch: RHEL6.3/x86_64

Console log on MDS (client-28vm3, 10.10.4.166) showed that:

Lustre: DEBUG MARKER: lctl get_param -n osd*.lustre-MDT0000.mntdev^M
Lustre: DEBUG MARKER: procfile=/proc/fs/jbd/lvm--MDS-P1/info;^M
[ -f $procfile ] || procfile=/proc/fs/jbd2/lvm--MDS-P1/info;^M
[ -f $procfile ] || procfile=/proc/fs/jbd2/lvm--MDS-P1\:\*/info;^M
cat $procfile | head -1;^M
Lustre: 2698:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1349281914/real 1349281914]  req@ffff88005e2b7800 x1414805216339338/t0(0) o400->lustre-OST0000-osc-MDT0000@10.10.4.167@tcp:28/4 lens 192/192 e 0 to 1 dl 1349281921 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
Lustre: 2698:0:(client.c:1780:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.10.4.167@tcp) was lost; in progress operations using this service will wait for recovery to complete^M
Lustre: Skipped 2 previous similar messages^M
Lustre: 2698:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1349281919/real 1349281919]  req@ffff880054d9a400 x1414805216339346/t0(0) o400->lustre-OST0000-osc-MDT0000@10.10.4.167@tcp:28/4 lens 192/192 e 0 to 1 dl 1349281926 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1^M
Lustre: 2698:0:(client.c:1780:ptlrpc_expire_one_request()) Skipped 6 previous similar messages^M
^M
<ConMan> Console [client-28vm3] disconnected from <client-28:6002> at 10-03 09:32.^M
^M
<ConMan> Console [client-28vm3] connected to <client-28:6002> at 10-03 09:32.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^[[H^[[J^M
    GNU GRUB  version 0.97  (617K lower / 2094860K upper memory)^M

Console log on OSS (client-28vm4, 10.10.4.167) showed that:

Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanityn test 33a: commit on sharing, cross crete\/delete, 2 clients, benchmark == 08:29:53 \(1349278193\)^M
Lustre: DEBUG MARKER: == sanityn test 33a: commit on sharing, cross crete/delete, 2 clients, benchmark == 08:29:53 (1349278193)^M
Lustre: lustre-OST0000: already connected client lustre-MDT0000-mdtlov_UUID (at 10.10.4.166@tcp) with handle 0xec210ade624773e8. Rejecting client with the same UUID trying to reconnect with handle 0x332bdfaf3e4636f^M
^M
<ConMan> Console [client-28vm4] disconnected from <client-28:6003> at 10-03 09:31.^M
^M
<ConMan> Console [client-28vm4] connected to <client-28:6003> at 10-03 09:32.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^MPress any key to continue.^M
^[[H^[[J^M
    GNU GRUB  version 0.97  (617K lower / 2094860K upper memory)^M


 Comments   
Comment by Jian Yu [ 10/Oct/12 ]

Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/121
Lustre Client Build: http://build.whamcloud.com/job/lustre-b2_3/32

The same issue occurred again: https://maloo.whamcloud.com/test_sets/1c159394-12a6-11e2-a23c-52540035b04c

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:22:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.