[LU-15660] parallel-scale-nfsv4 test_racer_on_nfs: statahead doesn't stop Created: 18/Mar/22  Updated: 19/Oct/23  Resolved: 04/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14883 umount blocked due to remaining stata... Open
is related to LU-17092 umount stuck Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Yang Sheng <ys@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c082c323-0510-49bd-98f5-bcd04c073100

test_racer_on_nfs failed with the following error:

............
[Tue Mar 15 01:19:39 2022] umount          D    0 55980  55979 0x00004080
[Tue Mar 15 01:19:39 2022] Call Trace:
[Tue Mar 15 01:19:39 2022]  __schedule+0x2c4/0x700
[Tue Mar 15 01:19:39 2022]  schedule+0x38/0xa0
[Tue Mar 15 01:19:39 2022]  schedule_timeout+0x193/0x2f0
[Tue Mar 15 01:19:39 2022]  ? __next_timer_interrupt+0xf0/0xf0
[Tue Mar 15 01:19:39 2022]  ll_kill_super+0x63/0x130 [lustre]
[Tue Mar 15 01:19:39 2022]  lustre_kill_super+0x28/0x40 [lustre]
[Tue Mar 15 01:19:39 2022]  deactivate_locked_super+0x34/0x70
[Tue Mar 15 01:19:39 2022]  cleanup_mnt+0x3b/0x70
[Tue Mar 15 01:19:39 2022]  task_work_run+0x8a/0xb0
[Tue Mar 15 01:19:39 2022]  exit_to_usermode_loop+0xeb/0xf0
[Tue Mar 15 01:19:39 2022]  do_syscall_64+0x198/0x1a0
[Tue Mar 15 01:19:39 2022]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[Tue Mar 15 01:19:39 2022] RIP: 0033:0x7fd680bf7f5b
[Tue Mar 15 01:19:39 2022] Code: Bad RIP value.
[Tue Mar 15 01:19:39 2022] RSP: 002b:00007ffd07bf7bd8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[Tue Mar 15 01:19:39 2022] RAX: 0000000000000000 RBX: 000055711ddf1400 RCX: 00007fd680bf7f5b
[Tue Mar 15 01:19:39 2022] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055711de013d0
[Tue Mar 15 01:19:39 2022] RBP: 0000000000000001 R08: 000055711de00ce0 R09: 000055711ddee010
[Tue Mar 15 01:19:39 2022] R10: 00007fd680eb8ba0 R11: 0000000000000202 R12: 000055711de013d0
[Tue Mar 15 01:19:39 2022] R13: 00007fd6819a3184 R14: 000055711ddff020 R15: 00000000ffffffff

............

[Tue Mar 15 01:19:39 2022] ll_sa_31092     I    0 40130      2 0x80004080
[Tue Mar 15 01:19:39 2022] Call Trace:
[Tue Mar 15 01:19:39 2022]  __schedule+0x2c4/0x700
[Tue Mar 15 01:19:39 2022]  schedule+0x38/0xa0
[Tue Mar 15 01:19:39 2022]  ll_statahead_thread+0x32c/0xc60 [lustre]
[Tue Mar 15 01:19:39 2022]  ? start_statahead_thread+0x12f0/0x12f0 [lustre]
[Tue Mar 15 01:19:39 2022]  kthread+0x112/0x130
[Tue Mar 15 01:19:39 2022]  ? kthread_flush_work_fn+0x10/0x10
[Tue Mar 15 01:19:39 2022]  ret_from_fork+0x35/0x40
[Tue Mar 15 01:19:39 2022] ll_agl_31092    I    0 40131      2 0x80004080
[Tue Mar 15 01:19:39 2022] Call Trace:
[Tue Mar 15 01:19:39 2022]  __schedule+0x2c4/0x700
[Tue Mar 15 01:19:39 2022]  ? ll_agl_trigger+0x4a0/0x4a0 [lustre]
[Tue Mar 15 01:19:39 2022]  schedule+0x38/0xa0
[Tue Mar 15 01:19:39 2022]  ll_agl_thread+0xf2/0x210 [lustre]
[Tue Mar 15 01:19:39 2022]  kthread+0x112/0x130
[Tue Mar 15 01:19:39 2022]  ? kthread_flush_work_fn+0x10/0x10
[Tue Mar 15 01:19:39 2022]  ret_from_fork+0x35/0x40
...............

The umount was blocked by statahead thread.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
parallel-scale-nfsv4 test_racer_on_nfs -



 Comments   
Comment by Yang Sheng [ 18/Mar/22 ]

The location for statahead:

(gdb) l *(ll_statahead_thread+0x32c)
0x65a6c is in ll_statahead_thread (/root/git/rw/lustre/llite/statahead.c:169).
164	 * (2) consecutive miss more than 8
165	 * then means low hit.
166	 */
167	static inline int sa_low_hit(struct ll_statahead_info *sai)
168	{
169		return ((sai->sai_hit > 7 && sai->sai_hit < 4 * sai->sai_miss) ||
170			(sai->sai_consecutive_miss > 8));
171	}
172
.......
1079				break;
1080			}
1081
1082			dp = page_address(page);
1083			for (ent = lu_dirent_start(dp);
1084			     ent != NULL && sai->sai_task &&
1085			     !sa_low_hit(sai);
1086			     ent = lu_dirent_next(ent)) {
1087				__u64 hash;
1088				int namelen;

Comment by Gerrit Updater [ 17/Jun/22 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47673
Subject: LU-15660 statahead: statahead thread doesn't stop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e8e006a2311dbf8cefac93529e8f96eeb8a03932

Comment by Gerrit Updater [ 04/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47673/
Subject: LU-15660 statahead: statahead thread doesn't stop
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b977caa2dc7dddcec9e20d393ee79dfa9fe31c0d

Comment by Peter Jones [ 04/Apr/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 07/Sep/23 ]

"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52300
Subject: LU-15660 statahead: statahead thread doesn't stop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: eb1d28ed57a6316452db6c8423360ec12e8c4e31

Comment by Gerrit Updater [ 19/Oct/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52300/
Subject: LU-15660 statahead: statahead thread doesn't stop
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: f7438d13b2aadbdf8e90e5b0b7732eeb20f3d475

Generated at Sat Feb 10 03:20:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.