[LU-13470] sysfs ping write creates a flood-ping situation that could not be normally stopped - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.14.0
Affects Version/s: Lustre 2.14.0, Lustre 2.12.5
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

It looks like the then ping file was migrated to sysfs, it unfortunately introduced a bug when doing a write:

ssize_t ping_store(struct kobject *kobj, struct attribute *attr,
                   const char *buffer, size_t count)
{
        return ping_show(kobj, attr, (char *)buffer);
}

what it really sohuld be doing is return count, otherwise outer logic thinks it's a short write that needs to be retried (errno = 0) and enters a loop that you cannot really break short of disconnectign from the server:

[root@centos6-16 ~]# cat /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping
[root@centos6-16 ~]# echo blahblah > /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping

^Z
^C

we can see how the cpu is eaten with all the retries and pings now:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2529 root      20   0       0      0      0 R  27.8  0.0  20:07.59 socknal_sd+
12488 root      20   0  115568   2124   1612 S  27.8  0.0  12:48.58 bash       
 2530 root      20   0       0      0      0 S  27.5  0.0  20:09.36 socknal_sd+
 3861 root      20   0       0      0      0 S  14.2  0.0   4:11.05 mdt03_002  
16784 root      20   0       0      0      0 S  10.6  0.0   4:04.16 mdt03_004  
 4410 root      20   0       0      0      0 S   5.0  0.0   4:08.74 mdt03_003  
 3859 root      20   0       0      0      0 S   2.6  0.0   4:11.23 mdt03_000  
   55 root      20   0       0      0      0 S   0.3  0.0   0:22.51 rcuos/6    
 3860 root      20   0       0      0      0 S   0.3  0.0   3:58.34 mdt03_001  
15467 green     20   0  162104   2408   1524 R   0.3  0.0   0:00.05 top

Attachments

Issue Links

is related to

LU-8066 Move lustre procfs handling to sysfs and debugfs.

Open

Activity

[LU-13470] sysfs ping write creates a flood-ping situation that could not be normally stopped

Peter Jones added a comment - 07/May/20 1:34 PM

Landed for 2.14

Peter Jones added a comment - 07/May/20 1:34 PM Landed for 2.14

Gerrit Updater added a comment - 07/May/20 5:41 AM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38304/
Subject: ~~LU-13470~~ ptlrpc: return proper write count from ping_store
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 16d62976d212d94f7a3a5e61817b5ce98a4be3fd

Gerrit Updater added a comment - 07/May/20 5:41 AM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38304/ Subject: LU-13470 ptlrpc: return proper write count from ping_store Project: fs/lustre-release Branch: master Current Patch Set: Commit: 16d62976d212d94f7a3a5e61817b5ce98a4be3fd

Gerrit Updater added a comment - 21/Apr/20 10:30 PM

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38304
Subject: ~~LU-13470~~ ptlrpc: return proper write count from ping_store
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2b9b5146f31778aaf0e2a1fa4b1b7f7d06245c05

Gerrit Updater added a comment - 21/Apr/20 10:30 PM Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38304 Subject: LU-13470 ptlrpc: return proper write count from ping_store Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2b9b5146f31778aaf0e2a1fa4b1b7f7d06245c05

People

Assignee:: WC Triage

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Apr/20 10:06 PM

Updated:: 07/May/20 1:34 PM

Resolved:: 07/May/20 1:34 PM