[LU-2585] some dd threads can not be stopped after racer Created: 07/Jan/13  Updated: 09/Jan/20  Resolved: 09/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6026

 Description   

After racer is finished, I saw some dd threads can not be stopped

17:10:58:Stopping client client-32vm1.lab.whamcloud.com /mnt/lustre2 opts:
17:10:59:COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
17:10:59:dd 1756 root 1w REG 1273,181606 49091584 144115205306079415 /mnt/lustre2/racer/19
17:11:01:dd 9219 root 1w REG 1273,181606 8193024 144115205306056725 /mnt/lustre/racer/15
17:11:01:dd 9224 root 1w REG 1273,181606 8193024 144115205306056725 /mnt/lustre2/racer/15
17:11:02:dd 9225 root 1w REG 1273,181606 8193024 144115205306056725 /mnt/lustre2/racer/15
17:11:02:dd 11097 root 1w REG 1273,181606 245867520 144115205255725671 /mnt/lustre/racer/13
17:11:02:Stopping client client-32vm2.lab.whamcloud.com /mnt/lustre2 opts:
17:11:02:/mnt/lustre2 is still busy, wait one second
17:11:02:COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
17:11:02:dd 12954 root 1w REG 1273,181606 119452672 144115205255740618 /mnt/lustre/racer/11
17:11:03:dd 13442 root 1w REG 1273,181606 116897792 144115205289288929 /mnt/lustre2/racer/0
17:11:03:dd 18335 root 1w REG 1273,181606 245867520 144115205255725671 /mnt/lustre/racer/10
17:11:03:dd 22305 root 1w REG 1273,181606 65881088 144115205272518234 /mnt/lustre2/racer/15 (deleted)
17:11:03:/mnt/lustre2 is still busy, wait one second
17:11:03:/mnt/lustre2 is still busy, wait one second
17:11:03:/mnt/lustre2 is still busy, wait one second
17:11:03:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second
17:11:05:/mnt/lustre2 is still busy, wait one second

.....

It happened to me that these threads were doing single page RPC to flush the dirty data to the server, as I investigated the log before. Unfortunately, I do not have debug log right now.



 Comments   
Comment by Andreas Dilger [ 09/Jan/13 ]

Di, what version was running here, and is this possibly related to any patches on the DNE series? Seems unlikely I guess, since you don't change the IO code. Anything special about the test config?

Comment by Andreas Dilger [ 09/Jan/20 ]

Close old ticket.

Generated at Sat Feb 10 01:26:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.