[LU-15015] test timeout with "tee" hung in nfs_updatepage() Created: 17/Sep/21  Updated: 16/Dec/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14005 Various tests hang on tee/”nfs: serve... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/de86ad94-a713-4683-84d0-368a514164fa

test_13c failed with the following error:

Timeout occurred after 123 mins, last suite running was sanity-pcc

Time out occurred after MDT0 was unmounted.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-pcc test_13c - Timeout occurred after 123 mins, last suite running was sanity-pcc



 Comments   
Comment by James Nunez (Inactive) [ 17/Sep/21 ]

In the client1 console log, I see tee/NFS hung

[ 4178.294408] INFO: task tee:104241 blocked for more than 120 seconds.
[ 4178.295637]       Tainted: G           OE    --------- -  - 4.18.0-240.22.1.el8_3.x86_64 #1
[ 4178.297034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4178.298287] tee             D    0 104241  28015 0x00000080
[ 4178.299201] Call Trace:
[ 4178.299655]  __schedule+0x2c4/0x700
[ 4178.301745]  schedule+0x38/0xa0
[ 4178.302291]  io_schedule+0x12/0x40
[ 4178.302879]  bit_wait_io+0xd/0x50
[ 4178.303464]  __wait_on_bit+0x6c/0x80
[ 4178.304068]  out_of_line_wait_on_bit+0x91/0xb0
[ 4178.305557]  nfs_lock_and_join_requests+0x3d8/0x530 [nfs]
[ 4178.307685]  nfs_updatepage+0x2d8/0x950 [nfs]
[ 4178.308430]  nfs_write_end+0x63/0x4d0 [nfs]
[ 4178.310016]  generic_perform_write+0x138/0x1b0
[ 4178.310778]  nfs_file_write+0xf6/0x270 [nfs]
[ 4178.311511]  new_sync_write+0x124/0x170
[ 4178.312160]  vfs_write+0xa5/0x1a0
[ 4178.312742]  ksys_write+0x4f/0xb0
[ 4178.313315]  do_syscall_64+0x5b/0x1a0
[ 4178.313944]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4178.314788] RIP: 0033:0x7f4d1f5cd8a8
[ 4178.315417] Code: Bad RIP value.
Comment by Sergey Cheremencev [ 18/Nov/21 ]

I've faced the same hung, but on sanity test_27v - https://testing.whamcloud.com/test_sets/0ef31cd1-1dfc-4480-8ad0-d682184bd91f

Probably, it is better to change the name of the ticket to smth like: TIMEOUT due to hung in nfs(nfs_updatepage).

Comment by Sergey Cheremencev [ 16/Dec/21 ]

the same at conf-sanity test_28a: - https://testing.whamcloud.com/test_sets/872319dd-a037-49d8-8ed4-91356bf476e4

Generated at Sat Feb 10 03:14:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.