|
Niu
Could you please advise on this one?
Thanks
Peter
|
|
This might be related to LU-5726? There were a couple of other MDS memory issues (LU-5079, LU-5727) but they only affected 2.5 and later, unless you have backported patches to your 2.4.2 release?
|
|
Andreas,
It does look very similar.
thanks,
Haisong
|
|
Right now I have a MDS server that looks like is heading to a memory problem.
== Here is "top" output
Tasks: 820 total, 3 running, 817 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 7.3%sy, 0.0%ni, 91.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24730000k total, 19445820k used, 5284180k free, 16473344k buffers
Swap: 1020116k total, 16056k used, 1004060k free, 12956k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3305 root 20 0 0 0 0 R 99.9 0.0 276:03.80 socknal_sd00_00
3314 root 20 0 0 0 0 S 4.0 0.0 98:28.48 socknal_sd03_00
3960 root 20 0 0 0 0 S 1.7 0.0 4:25.62 mdt_rdpg03_003
18062 root 20 0 0 0 0 S 1.3 0.0 0:47.64 mdt03_018
3428 root 20 0 0 0 0 S 1.0 0.0 9:56.10 mdt03_001
3429 root 20 0 0 0 0 S 1.0 0.0 11:26.84 mdt03_002
3708 root 20 0 0 0 0 S 1.0 0.0 11:32.89 mdt03_005
6209 root 20 0 0 0 0 S 1.0 0.0 10:31.10 mdt03_007
16559 root 20 0 0 0 0 S 1.0 0.0 3:36.86 mdt03_013
16746 root 20 0 0 0 0 S 1.0 0.0 3:35.11 mdt03_014
3427 root 20 0 0 0 0 S 0.7 0.0 11:14.18 mdt03_000
3641 root 20 0 0 0 0 S 0.7 0.0 10:59.76 mdt03_003
3703 root 20 0 0 0 0 S 0.7 0.0 9:50.36 mdt03_004
7181 root 20 0 0 0 0 S 0.7 0.0 9:41.32 mdt03_009
8921 root 20 0 0 0 0 S 0.7 0.0 7:57.07 mdt03_012
18061 root 20 0 0 0 0 S 0.7 0.0 0:52.57 mdt03_017
18405 root 20 0 15560 1832 940 R 0.7 0.0 0:00.13 top
234 root 39 19 0 0 0 S 0.3 0.0 9:27.27 kipmi0
3187 root 20 0 0 0 0 S 0.3 0.0 92:39.94 md0_raid10
3306 root 20 0 0 0 0 S 0.3 0.0 48:36.17 socknal_sd00_01
3309 root 20 0 0 0 0 S 0.3 0.0 17:21.89 socknal_sd01_01
3339 root 20 0 0 0 0 S 0.3 0.0 1:30.66 ptlrpcd_15
6214 root 20 0 0 0 0 S 0.3 0.0 14:40.83 mdt01_011
...
== 2 socknal_sd* processes are hanging:
[root@meerkat-mds-10-1 tmp]# ps -ef | grep 3314
root 3314 2 3 Nov01 ? 01:38:28 [socknal_sd03_00]
[root@meerkat-mds-10-1 tmp]# ps -ef | grep 3305
root 3305 2 8 Nov01 ? 04:36:15 [socknal_sd00_00]
== dmesg of MDS shows clients, as well as OSS servers, timing out:
LustreError: 138-a: meerkat-MDT0000: A client on nid 192.168.230.53@tcp was evicted due to a lock blocking callback time out: rc -107
LustreError: 3630:0:(ldlm_lockd.c:2348:ldlm_cancel_handler()) ldlm_cancel from 192.168.230.53@tcp arrived at 1415059768 with bad export cookie 459840027438824761
Lustre: meerkat-MDT0000: Client e016f72b-cc4a-cee3-5faa-cdb0f5a24764 (at 10.7.102.192@o2ib) reconnecting
Lustre: Skipped 11 previous similar messages
Lustre: meerkat-MDT0000: Client d1fdcea8-c2ec-897d-75dc-7b3fe95da5a3 (at 10.7.102.119@o2ib) refused reconnection, still busy with 1 active RPCs
Lustre: meerkat-MDT0000: Client 677d7d3a-37ea-920c-c096-20a623186fa9 (at 10.7.103.114@o2ib) reconnecting
Lustre: Skipped 21 previous similar messages
LustreError: 13042:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.230.53@tcp ns: mdt-meerkat-MDT0000_UUID lock: ffff8800acc2f480/0x661ae11b70ceedf lrc: 3/0,0 mode: PR/PR res: [0x2000060bc:0x5508:0x0].0 bits 0x2 rrc: 2 type: IBT flags: 0x20 nid: 192.168.230.53@tcp remote: 0x7185dbd0be13beea expref: 36 pid: 3419 timeout: 4486388246 lvb_type: 0
LustreError: 13042:0:(ldlm_lib.c:2730:target_bulk_io()) @@@ bulk PUT failed: rc 107 req@ffff8800a3513800 x1474469876758132/t0(0) o37>453cd0d9-c8e3-0e50-da9e-7953a9c89205@192.168.230.53@tcp:0/0 lens 448/440 e 0 to 0 dl 1415060499 ref 1 fl Interpret:/0/0 rc 0/0
Lustre: 3328:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415060429/real 1415060430] req@ffff88031321c800 x1483597813199484/t0(0) o13->meerkat-OST001c-osc@172.25.32.115@tcp:7/4 lens 224/368 e 0 to 1 dl 1415060438 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: meerkat-OST001c-osc: Connection to meerkat-OST001c (at 172.25.32.115@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 2 previous similar messages
Lustre: 3325:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415060429/real 1415060430] req@ffff880636df9400 x1483597813199476/t0(0) o13->meerkat-OST0026-osc@172.25.32.243@tcp:7/4 lens 224/368 e 0 to 1 dl 1415060438 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: meerkat-OST0026-osc: Connection to meerkat-OST0026 (at 172.25.32.243@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 3336:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1415060432/real 0] req@ffff8800b8211400 x1483597813201928/t0(0) o6->meerkat-OST0034-osc@172.25.32.115@tcp:28/4 lens 664/432 e 0 to 1 dl 1415060440 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: meerkat-OST0034-osc: Connection to meerkat-OST0034 (at 172.25.32.115@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: MGS: Client 340f1fa1-9370-bc71-a6e3-834f520374a2 (at 10.7.103.181@o2ib) reconnecting
Lustre: Skipped 7 previous similar messages
Lustre: meerkat-OST0014-osc: Connection to meerkat-OST0014 (at 172.25.32.115@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 5 previous similar messages
Lustre: 3325:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1415060432/real 0] req@ffff8800aa611400 x1483597813202612/t0(0) o6->meerkat-OST000e-osc@172.25.32.243@tcp:28/4 lens 664/432 e 0 to 1 dl 1415060443 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: 3325:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 82 previous similar messages
Lustre: 3319:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1415060440/real 0] req@ffff8801efc24800 x1483597813206052/t0(0) o8->meerkat-OST0036-osc@172.25.32.243@tcp:28/4 lens 400/544 e 0 to 1 dl 1415060447 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 3319:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 52 previous similar messages
Lustre: meerkat-MDT0000: Client 06bc4379-10ca-76ad-cd98-1d1013f1b911 (at 10.7.103.252@o2ib) refused reconnection, still busy with 1 active RPCs
Lustre: 18052:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415060449/real 1415060451] req@ffff8802ebaad800 x1483597813206540/t0(0) o104->meerkat-MDT0000@10.7.103.252@o2ib:15/16 lens 296/224 e 0 to 1 dl 1415060456 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 18052:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
Lustre: meerkat-OST0004-osc: Connection restored to meerkat-OST0004 (at 172.25.32.115@tcp)
Lustre: Skipped 2 previous similar messages
Lustre: meerkat-OST0006-osc: Connection restored to meerkat-OST0006 (at 172.25.32.243@tcp)
Lustre: meerkat-OST000c-osc: Connection restored to meerkat-OST000c (at 172.25.32.115@tcp)
Lustre: meerkat-MDT0000: Client 7a7ab9a5-c8e6-abb6-2f14-1ebe9b1fdab3 (at 10.7.104.32@o2ib) reconnecting
Lustre: Skipped 218 previous similar messages
Lustre: 3325:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1415060911/real 1415060921] req@ffff8800a7a9fc00 x1483597814687272/t0(0) o6->meerkat-OST000c-osc@172.25.32.115@tcp:28/4 lens 664/432 e 0 to 1 dl 1415060925 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: meerkat-OST000c-osc: Connection to meerkat-OST000c (at 172.25.32.115@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: Skipped 7 previous similar messages
Lustre: meerkat-OST003c-osc: Connection to meerkat-OST003c (at 172.25.32.115@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: meerkat-OST000c-osc: Connection restored to meerkat-OST000c (at 172.25.32.115@tcp)
Lustre: Skipped 13 previous similar messages
LustreError: 11-0: meerkat-OST0006-osc: Communicating with 172.25.32.243@tcp, operation ost_connect failed with -16.
LustreError: Skipped 2 previous similar messages
Lustre: meerkat-OST0024-osc: Connection restored to meerkat-OST0024 (at 172.25.32.115@tcp)
Lustre: Skipped 6 previous similar messages
Lustre: meerkat-MDT0000: Client 22670471-b57b-0d1a-cd38-f4f39735b005 (at 10.7.103.146@o2ib) reconnecting
Lustre: Skipped 34 previous similar messages
== kill -9 of processes 3305 and 3314 (socknal_sd00_00 & socknal_sd03_00) fail
|
|
Hi, Cai
What's the total memory of MDS and the value of min_free_kbytes? Could you try to increase the min_free_kbytes as I suggested in LU-5726 to see if it helps? Thanks.
|
|
Hi Yawei,
Here is the setting of the min_free_kbytes, about 5% of total RAM as you suggested.
[root@meerkat-mds-10-1 ~]# sysctl -a | grep free_kbytes
vm.min_free_kbytes = 1228800
vm.extra_free_kbytes = 0
[root@meerkat-mds-10-1 ~]# free
total used free shared buffers cached
Mem: 24730000 22800600 1929400 0 20016692 15880
-/+ buffers/cache: 2768028 21961972
Swap: 1020116 24828 995288
Haisong
|
|
Overnight into this morning, MDS has dumped more errors and some of the messages I haven't seen before.
I am include dmesg output here for debugging purpose.
Haisong
|
|
MDS came to a point where it became unresponsive, system load at 65, buffer memory at 20GB out of 24GB total and wouldn't release.
I Attempted unmounting MDT and reboot MDS where the server kernel panic'ed
Attach screen dump here.
|
|
Do you have vm.zone_reclaim_mode=0 set on your MDS server? I ran into issues with sluggish MDS server performance earlier this year that were fixed by setting that parameter.
|
|
Rick,
Thank you for the note. I saw your comments in LU-5726 today and have disabled vm.zone_reclaim_mode.
In LU-5726, you commented on disabling vm.zone_reclaim_mode "... just took longer for the same underlying problem to become evident again". Had the problem reoccurring in your MDS?
thanks,
Haisong
|
|
Hi, Haisong
The log & stack trace shows that the server ran into OOM situation at the end, and the initial cause is that unstable network. We can see lots of clients reconnect and bulk io timeout errors on MDT at the beginning, could you check your network if it's healthy?
The last crash in lu_context_key_degister() is dup of LU-3806, I think.
|
|
Hi Yawei,
Typical symptoms of this problem, in our case at least, has been processes, whether LNET, MDT, or MGC hang. Not only processes hang, a lot of times MDS OS itself would hang for a few minutes at a time. What you are seeing I believe are the results of some hanging LNET or Lustre network processes, following by disconnections to OSS/OST and clients.
We have implemented suggestion from Rick Mohr, by disabling vm.zone_reclaim_mode in MDS. So far MDS has been behaving. We will continue monitoring.
thanks,
Haisong
|
|
Haisong, to clarify, you are now running your MDS with vm.zone_reclaim_mode=0 and that has resolved, or at least reduced the memory problems?
We should consider setting this tunable by default on MDS nodes via mount.lustre, as we do with other tunables. There is some concern that this would go against the tunings of the administrator, and I'm not sure how to best handle that...
|
|
Andreas,
Indeed we have set vm.zone_reclaim_mode=0 in our MDS servers since last Wednesday. From observation
using "collectl -sM", 2 noticeable changes:
1) buffer memory doesn't grow like used to, and
2) used memory balances between 2 CPU nodes, where before it was one 2 or 3 times hight than the other.
Here is a sample output I got just now:
[root@meerkat-mds-10-2 ~]# collectl -sM -i 10
waiting for 10 second sample...
- MEMORY STATISTICS
- Node Total Used Free Slab Mapped Anon Locked Inact Hit%
0 12279M 10422M 1856M 2458M 3140K 41836K 0 3528M 100.00
1 12288M 9831M 2456M 3529M 2988K 33116K 0 2768M 100.00
0 12279M 10422M 1856M 2458M 3140K 41840K 0 3528M 100.00
1 12288M 9832M 2455M 3529M 2988K 33112K 0 2767M 100.00
0 12279M 10422M 1856M 2457M 3048K 41836K 0 3528M 100.00
1 12288M 9833M 2454M 3530M 2988K 33004K 0 2767M 100.00
0 12279M 10423M 1855M 2458M 3140K 41844K 0 3528M 100.00
1 12288M 9835M 2452M 3532M 2988K 33108K 0 2768M 100.00
Haisong
|
|
Haisong,
Disabling zone_reclaim_mode seemed to fix our original issue with sluggish MDS performance, although I really don't know if this is in any way directly related to LU-5726 or not.
|
|
Could you try if the fix of LU-5726 can resolve your problem as well? Thanks.
|
|
Dup of LU-5726.
|
Generated at Sat Feb 10 01:54:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.