[LU-3923] Interop 2.3.0<->2.5 failure on test suite sanity-quota test_18c Created: 11/Sep/13  Updated: 22/Dec/17  Resolved: 22/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: 2.3.0
client: lustre-master build # 1652


Issue Links:
Related
is related to LU-3924 Interop 2.3.0<->2.5 failure on test s... Resolved
is related to LU-3925 Interop 2.3.0<->2.5 failure on test s... Resolved
is related to LU-3927 Interop 2.3.0<->2.5 failure on test s... Resolved
Severity: 3
Rank (Obsolete): 10368

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/66dda284-19fa-11e3-8fec-52540035b04c.

The sub-test test_18c failed with the following error:

post-failover df: 1

client console:

14:01:38:Lustre: 6763:0:(client.c:1896:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
14:01:38:INFO: task tee:11217 blocked for more than 120 seconds.
14:01:39:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
14:01:39:tee           D 0000000000000000     0 11217   8082 0x00000080
14:01:40: ffff88003e9719c8 0000000000000082 00000000ffffffff 00005a151837ff05
14:01:41: ffff880063084080 ffff880037e04210 000000000174696e ffffffffadd267a6
14:01:44: ffff880063084638 ffff88003e971fd8 000000000000fb88 ffff880063084638
14:01:44:Call Trace:
14:01:45: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
14:01:46: [<ffffffffa0369440>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
14:01:47: [<ffffffff8150e8c3>] io_schedule+0x73/0xc0
14:01:47: [<ffffffffa036944e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
14:01:48: [<ffffffff8150f27f>] __wait_on_bit+0x5f/0x90
14:01:49: [<ffffffffa0369440>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
14:01:49: [<ffffffff8150f328>] out_of_line_wait_on_bit+0x78/0x90
14:01:50: [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
14:01:51: [<ffffffff81119c1e>] ? find_get_page+0x1e/0xa0
14:01:51: [<ffffffffa036942f>] nfs_wait_on_request+0x2f/0x40 [nfs]
14:01:52: [<ffffffffa036fe2a>] nfs_updatepage+0x20a/0x4e0 [nfs]
14:01:53: [<ffffffffa035e552>] nfs_write_end+0x152/0x2b0 [nfs]
14:01:53: [<ffffffff81119582>] ? iov_iter_copy_from_user_atomic+0x92/0x130
14:01:54: [<ffffffff8111a81a>] generic_file_buffered_write+0x18a/0x2e0
14:01:54: [<ffffffff8106327e>] ? try_to_wake_up+0x24e/0x3e0
14:01:55: [<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490
14:01:55: [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100
14:01:56: [<ffffffffa035df8e>] nfs_file_write+0xde/0x1f0 [nfs]
14:01:56: [<ffffffff8118106a>] do_sync_write+0xfa/0x140
14:01:57: [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
14:02:00: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
14:02:00: [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20
14:02:01: [<ffffffff81181368>] vfs_write+0xb8/0x1a0
14:02:02: [<ffffffff8118266b>] ? fget_light+0x3b/0x90
14:02:02: [<ffffffff81181c61>] sys_write+0x51/0x90
14:02:02: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
14:02:03: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
14:02:04:INFO: task tee:20761 blocked for more than 120 seconds.
14:02:04:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
14:02:04:tee           D 0000000000000000     0 20761  11534 0x00000080
14:02:04: ffff880063a8b9c8 0000000000000086 00000000ffffffff 00005a151837ff05
14:02:04: ffff88007cfa4aa0 ffff88004a0dea60 000000000174696c ffffffffadd267a6
14:02:06: ffff88007cfa5058 ffff880063a8bfd8 000000000000fb88 ffff88007cfa5058


 Comments   
Comment by Oleg Drokin [ 13/Sep/13 ]

the stack trace does not sound related to lustre, but rather nfs.

Comment by Niu Yawei (Inactive) [ 16/Sep/13 ]
directio on /mnt/lustre/d0.sanity-quota/d18/f.sanity-quota.18c for 100x1048576 bytes 
PASS
pdsh@client-32vm2: gethostbyname("client-32vm1") failed
 sanity-quota test_18c: @@@@@@ FAIL: post-failover df: 1 

I didn't find anything abnormal from the dmesg, is there something wrong with the teste environment? looks pdsh failed to get hostname of client-32vm1 (which is another client). Can this bug be reproduced?

Comment by Andreas Dilger [ 22/Dec/17 ]

Close old bug that has not been seen in a long time.

Generated at Sat Feb 10 01:38:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.