Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0, Lustre 2.12.5
-
None
-
3
-
9223372036854775807
Description
It looks like the then ping file was migrated to sysfs, it unfortunately introduced a bug when doing a write:
ssize_t ping_store(struct kobject *kobj, struct attribute *attr, const char *buffer, size_t count) { return ping_show(kobj, attr, (char *)buffer); }
what it really sohuld be doing is return count, otherwise outer logic thinks it's a short write that needs to be retried (errno = 0) and enters a loop that you cannot really break short of disconnectign from the server:
[root@centos6-16 ~]# cat /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping [root@centos6-16 ~]# echo blahblah > /sys/fs/lustre/mdc/lustre-MDT0000-mdc-ffff880387d67800/ping ^Z ^C
we can see how the cpu is eaten with all the retries and pings now:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2529 root 20 0 0 0 0 R 27.8 0.0 20:07.59 socknal_sd+ 12488 root 20 0 115568 2124 1612 S 27.8 0.0 12:48.58 bash 2530 root 20 0 0 0 0 S 27.5 0.0 20:09.36 socknal_sd+ 3861 root 20 0 0 0 0 S 14.2 0.0 4:11.05 mdt03_002 16784 root 20 0 0 0 0 S 10.6 0.0 4:04.16 mdt03_004 4410 root 20 0 0 0 0 S 5.0 0.0 4:08.74 mdt03_003 3859 root 20 0 0 0 0 S 2.6 0.0 4:11.23 mdt03_000 55 root 20 0 0 0 0 S 0.3 0.0 0:22.51 rcuos/6 3860 root 20 0 0 0 0 S 0.3 0.0 3:58.34 mdt03_001 15467 green 20 0 162104 2408 1524 R 0.3 0.0 0:00.05 top
Attachments
Issue Links
- is related to
-
LU-8066 Move lustre procfs handling to sysfs and debugfs.
- Open