[LU-179] lustre client lockup when under memory pressure Created: 30/Mar/11  Updated: 26/Feb/16  Resolved: 06/Jul/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 1.8.6

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Client is running 2.6.27.45-lustre-1.8.3.ddn3.3. Connectivity is 10GigE


Attachments: File kern.log.gz    
Severity: 3
Rank (Obsolete): 10103

 Description   

A customer is seeing a problem on a client where the client loses access to Lustre when the node is subjected to memory pressure from an errant application.

Lustre starts reporting -113 (No route to host) errors for certain NIDS in the filesystem despite the TCP/IP network being functional. After the memory pressure is relieved the Lustre errors remain. I am collecting logs currently.

From the customer report:

Lnet is reporting no-route-to-host for a significant number of OSS / MDSs (client log attached).

Mar 29 09:23:27 cgp-bigmem kernel: [589295.826095] LustreError: 4980:0:(events.c:66:request_out_callback()) @@@ type 4, status 113 req@ffff881d2e995400 x1363985318437337/t0 o8>lus03-OST0000_UUID@172.17.128.130@tcp:28/4 lens 368/584 e 0 to 1 dl 1301387122 ref 2 fl Rpc:N/0/0 rc 0/0

but from user-space on the client, all those nodes are pingable:

cgp-bigmem:/var/log# ping 172.17.128.130
PING 172.17.128.130 (172.17.128.130) 56(84) bytes of data.
64 bytes from 172.17.128.130: icmp_seq=1 ttl=62 time=0.102 ms
64 bytes from 172.17.128.130: icmp_seq=2 ttl=62 time=0.091 ms
64 bytes from 172.17.128.130: icmp_seq=3 ttl=62 time=0.091 ms
64 bytes from 172.17.128.130: icmp_seq=4 ttl=62 time=0.090 ms

however a lnet ping hangs:
cgp-bigmem:~# lctl ping 172.17.128.130@tcp

From another client, the ping works as expected

farm2-head1:# lctl ping 172.17.128.130@tcp
12345-0@lo
12345-172.17.128.130@tcp

cgp-bigmem:~# lfs check servers | grep -v active
error: check 'lus01-OST0007-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST0008-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST0009-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST000a-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST000b-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST000c-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST000d-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus01-OST000e-osc-ffff88205bd52000' Resource temporarily unavailable
error: check 'lus02-MDT0000-mdc-ffff8880735ea000' Resource temporarily unavailable
error: check 'lus03-OST0000-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0001-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0002-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0003-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0004-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0005-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0006-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0007-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0008-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0009-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST000a-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST000b-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST000c-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST0019-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus03-OST001a-osc-ffff8840730a1400' Resource temporarily unavailable
error: check 'lus05-OST0010-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0012-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0014-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0016-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0018-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST001a-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST001c-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST000f-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0011-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0013-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0015-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0017-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST0019-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST001b-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus05-OST001d-osc-ffff886070dab800' Resource temporarily unavailable
error: check 'lus04-OST0001-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST0003-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST0005-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST0007-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST0009-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST000b-osc-ffff88806e9d8c00' Resource temporarily unavailable
error: check 'lus04-OST000d-osc-ffff88806e9d8c00' Resource temporarily unavailable



 Comments   
Comment by Peter Jones [ 30/Mar/11 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 01/Apr/11 ]

Can we get more logs?

Comment by Liang Zhen (Inactive) [ 01/Apr/11 ]

I think this could be fixed by patch on bug 21776, just FYI
Liang

Comment by Zhenyu Xu [ 01/Apr/11 ]

patch https://bugzilla.lustre.org/attachment.cgi?id=29521&action=edit for 1.8.1 can also be applied on v1_8_3

We'd still like to check logs to see whether this issue fits the phenomenon of bz 21776.

Comment by Ashley Pittman (Inactive) [ 13/Apr/11 ]

The customer is happy that it's the above issue and is preparing to update. Unless the problem re-occurs they are going to wait for 1.8.6 and go straight to that.

We'll reopen if we observe the issue again once the patch is applied.

Comment by Zhenyu Xu [ 13/Apr/11 ]

dup of bug 21776

Comment by Cory Spitz [ 16/Apr/11 ]

If it is a dup of 21776, note that the o2iblnd does not use PF_MEMALLOC with the patches that were landed to 21776. Also, we (Cray) are chasing a similar instance down attachMD() so other paths might need more help.

Comment by Cory Spitz [ 19/Apr/11 ]

Re the issues with LNetMDAttach(), ptl_send_rpc() calls libcfs_memory_pressure_get_and_set(), but not until after ptlrpc_register_bulk() which will call LNetMDAttach(). We have seen LNetMDAttach() fail to allocate in that case, which then causes the writeback to hang and deadlock ensues. I'm afraid of creeping the scope where PF_MEMALLOC is used to bail out ptlrpc, however. Comments?

Comment by Liang Zhen (Inactive) [ 19/Apr/11 ]

o2iblnd will always try to preallocate memory descriptors, so we don't have patch for it.
After a quick look into the patch on BZ, I think the patch to 1.8 is not correct, in ptl_send_rpc:

  • 2.0: these two lines are before calling of ptlrpc_register_bulk
  • 1.8: they are after ptlrpc_register_bulk() which is wrong:

if (request->rq_memalloc)
mpflag = libcfs_memory_pressure_get_and_set();

So I think we can fix it just by moving these two lines ahead.

Comment by Zhenyu Xu [ 19/Apr/11 ]

patch tracking at http://review.whamcloud.com/439

Comment by Zhenyu Xu [ 19/Apr/11 ]

1.8.6 needs minor fix

Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Zhenyu Xu [ 20/Apr/11 ]

minor fix landed on b1_8

Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Build Master (Inactive) [ 20/Apr/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #17
LU-179 Should let PF_MEMALLOC cover ptlrpc_register_bulk()

Johann Lombardi : 134461a981b134de896d9aa9cc6ec2d816cfa044
Files :

  • lustre/ptlrpc/niobuf.c
Comment by Peter Jones [ 21/Apr/11 ]

Does this fix need landing to master?

Comment by Zhenyu Xu [ 21/Apr/11 ]

no, it was already in master.

Comment by Cory Spitz [ 27/Apr/11 ]

Hi. I agree with the fix to relocate the register bulk. Are you planning on submitting it upstream (to Oracle)?

Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Build Master (Inactive) [ 08/Jun/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #71
Remove changelog entry for LU-179

Johann Lombardi : 08b76cd92b2a4b6854ce3910a07531996449a9fd
Files :

  • lustre/ChangeLog
Comment by Guy Coates [ 10/Jun/11 ]

Client log

Comment by Guy Coates [ 10/Jun/11 ]

We've just had a re-occurrence of this problem running 1.8.5.56 (as tagged in git).
Client starts logging problems at Jun 9 14:49:13.

Comment by Guy Coates [ 13/Jun/11 ]

Was able to get output from top from the last client lockup; pdflush is sat in 100% CPU.

top - 09:10:36 up 2 days, 20:08, 2 users, load average: 801.64, 799.78, 796.51
Tasks: 2891 total, 36 running, 2855 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 25.1%sy, 0.0%ni, 70.8%id, 4.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 528386840k total, 70774068k used, 457612772k free, 112k buffers
Swap: 4192924k total, 0k used, 4192924k free, 81176k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13691 cgppipe 39 19 23.6g 23g 9992 S 201 4.6 7646:35 java
5640 cgppipe 39 19 3122m 2.7g 744 S 100 0.5 5717:22 bwa
18662 root 0 -20 4 4 0 R 100 0.0 3756:26 elim.uptime
153 root 20 0 0 0 0 R 100 0.0 3759:05 pdflush
5528 root 20 0 13992 1528 900 R 100 0.0 3761:22 pim
1809 root 20 0 56440 7628 2240 R 3 0.0 0:04.24 top
4612 root 20 0 8832 532 404 S 0 0.0 2:30.10 irqbalance

Comment by Ashley Pittman (Inactive) [ 20/Jun/11 ]

As above the customer is was still observing this problem using the latest code on the 10th Jun, could you reopen this bug accordingly.

Comment by Zhenyu Xu [ 20/Jun/11 ]

is it the same pattern as lnet ping hangs while ping works ok?

Comment by Ashley Pittman (Inactive) [ 20/Jun/11 ]

Yes.

The system is a NUMA system with 512Gb ram currently. The problem seems to happen during memory pressure, a figure of 70% has been quoted but it's worth saying the application is single-threaded so it's quite likely that some NUMA regions are experiencing 100% memory usage.

One thing I've suggested is pinning the application to a different NUMA region to the lustre kernel threads (if this is even possible) so the application wouldn't starve Lustre of memory so easily.

Comment by Zhenyu Xu [ 20/Jun/11 ]

can you help checkout what lustre threads was doing during this hang? (better to have thread stacks)

Comment by Guy Coates [ 06/Jul/11 ]

We upgraded the kernel on this machine from 2.6.27/SLES11 kernel + 1.8.5.56 lustre client to 2.6.32 kernel +1.8.5.56 lustre client , and the problems seems to have stopped.

You can close this issue.

Thanks,

Guy

Comment by Zhenyu Xu [ 06/Jul/11 ]

close ticket per Guy Coates' update info.

Generated at Sat Feb 10 01:04:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.