[LU-13470] sysfs ping write creates a flood-ping situation that could not be normally stopped Created: 21/Apr/20  Updated: 07/May/20  Resolved: 07/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.12.5
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It looks like the then ping file was migrated to sysfs, it unfortunately introduced a bug when doing a write:

ssize_t ping_store(struct kobject *kobj, struct attribute *attr,
                   const char *buffer, size_t count)
{
        return ping_show(kobj, attr, (char *)buffer);
}

what it really sohuld be doing is return count, otherwise outer logic thinks it's a short write that needs to be retried (errno = 0) and enters a loop that you cannot really break short of disconnectign from the server:

[root@centos6-16 ~]# cat /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping
[root@centos6-16 ~]# echo blahblah > /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping

^Z
^C

we can see how the cpu is eaten with all the retries and pings now:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2529 root      20   0       0      0      0 R  27.8  0.0  20:07.59 socknal_sd+
12488 root      20   0  115568   2124   1612 S  27.8  0.0  12:48.58 bash       
 2530 root      20   0       0      0      0 S  27.5  0.0  20:09.36 socknal_sd+
 3861 root      20   0       0      0      0 S  14.2  0.0   4:11.05 mdt03_002  
16784 root      20   0       0      0      0 S  10.6  0.0   4:04.16 mdt03_004  
 4410 root      20   0       0      0      0 S   5.0  0.0   4:08.74 mdt03_003  
 3859 root      20   0       0      0      0 S   2.6  0.0   4:11.23 mdt03_000  
   55 root      20   0       0      0      0 S   0.3  0.0   0:22.51 rcuos/6    
 3860 root      20   0       0      0      0 S   0.3  0.0   3:58.34 mdt03_001  
15467 green     20   0  162104   2408   1524 R   0.3  0.0   0:00.05 top        


 Comments   
Comment by Gerrit Updater [ 21/Apr/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38304
Subject: LU-13470 ptlrpc: return proper write count from ping_store
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2b9b5146f31778aaf0e2a1fa4b1b7f7d06245c05

Comment by Gerrit Updater [ 07/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38304/
Subject: LU-13470 ptlrpc: return proper write count from ping_store
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 16d62976d212d94f7a3a5e61817b5ce98a4be3fd

Comment by Peter Jones [ 07/May/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:01:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.