[LU-3112] Cannot get stats from MDS and OSS after enable ptlrpc module parameter suppress_pings Created: 05/Apr/13  Updated: 08/Oct/21  Resolved: 08/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Li Wei (Inactive)
Resolution: Won't Do Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 7563

 Description   

After enable ptlrpc module parameter suppress_pings, cannot get stats from MDS or OSS. If the suppress_pings is disabled, it works fine.

MDS:

[root@fat-amd-1 ~]# cat /sys/module/ptlrpc/parameters/suppress_pings 
1
[root@fat-amd-1 ~]# cat /proc/fs/lustre/mdt/lustre-MDT0000/stats 
[root@fat-amd-1 ~]# 

OSS:

[root@fat-amd-3 ~]# cat /proc/fs/lustre/obdfilter/lustre-OST0000/stats 
[root@fat-amd-3 ~]# 


 Comments   
Comment by Li Wei (Inactive) [ 08/Apr/13 ]

I can't reproduce this on the OSS:

[root@linux tests]# PTLDEBUG=-1 DEBUG_SIZE=64 MODOPTS_PTLRPC="suppress_pings=1" MGSDEV=/tmp/lustre-mgs ./llmount.sh 
[...]
disable quota as required       
[root@linux tests]# cat /proc/fs/lustre/obdfilter/lustre-OST0000/stats 
snapshot_time             1365393105.574018 secs.usecs
get_info                  2 samples [reqs]
set_info_async            2 samples [reqs]
connect                   2 samples [reqs]
statfs                    9 samples [reqs]
create                    2 samples [reqs]

As to the MDS, this procfs entry should be used instead:

[root@linux tests]# cat /proc/fs/lustre/mds/MDS/mdt/stats 
snapshot_time             1365393296.454917 secs.usecs
req_waittime              17 samples [usec] 46 259 1483 193635
req_qdepth                17 samples [reqs] 0 0 0 0
req_active                17 samples [reqs] 1 1 17 17
req_timeout               17 samples [sec] 1 10 35 215
reqbuf_avail              40 samples [bufs] 64 64 2560 163840
ldlm_ibits_enqueue        3 samples [reqs] 1 1 3 3
mds_getattr               1 samples [usec] 144 144 144 20736
mds_connect               6 samples [usec] 102 291 1116 232976
mds_getstatus             1 samples [usec] 77 77 77 5929
mds_statfs                4 samples [usec] 80 239 524 84678
obd_ping                  1 samples [usec] 126 126 126 15876
Comment by Li Wei (Inactive) [ 08/Apr/13 ]

P.S., I tried "suppress_pings=0". The lustre-MDT0000 stats did not has any contents as well:

[root@linux tests]# PTLDEBUG=-1 DEBUG_SIZE=64 MODOPTS_PTLRPC="suppress_pings=0" MGSDEV=/tmp/lustre-mgs ./llmount.sh 
[...]

disable quota as required
[root@linux tests]# cat /proc/fs/lustre/mds/MDS/mdt/stats 
snapshot_time             1365393695.697075 secs.usecs
req_waittime              23 samples [usec] 47 1264 10682 8456720
req_qdepth                23 samples [reqs] 0 0 0 0
req_active                23 samples [reqs] 1 1 23 23
req_timeout               23 samples [sec] 1 10 41 221
reqbuf_avail              50 samples [bufs] 64 64 3200 204800
mds_getattr               1 samples [usec] 162 162 162 26244
mds_connect               6 samples [usec] 102 507 1460 479340
mds_getstatus             1 samples [usec] 102 102 102 10404
mds_statfs                2 samples [usec] 115 926 1041 870701
obd_ping                  13 samples [usec] 80 117 1264 124682
[root@linux tests]# cat /proc/fs/lustre/mdt/lustre-MDT0000/stats 
[root@linux tests]# 
Comment by Sarah Liu [ 09/Apr/13 ]

Have you tried on multiple nodes? I can also get the OSS stats all in one node, while still get nothing with server and clients on separate nodes

[root@client-5 tests]# PDSH="pdsh -t 300 -S -w" NAME=ncli mgs_HOST=client-18 MGSDEV=/dev/sda3 SINGLEMDS=mds mds_HOST=fat-amd-1  ost_HOST=fat-amd-3 ost1_HOST=$ost_HOST MDSDEV=/dev/sdb1 MDSDEV1=$MDSDEV MDSSIZE=2000000 OSTCOUNT=1 OSTSIZE=25000000 OSTDEV1=/dev/sdc1 MODOPTS_PTLRPC="suppress_pings=1" REFORMAT="--reformat" DEBUG_SIZE=48 SLOW=yes  bash llmount.sh

[root@fat-amd-3 ~]# cat /proc/fs/lustre/obdfilter/lustre-OST0000/stats
[root@fat-amd-3 ~]#
Generated at Sat Feb 10 01:31:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.