[LU-874] Client eviction on lock callback timeout - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.2.0, Lustre 2.4.0, Lustre 2.1.2
Affects Version/s: Lustre 2.2.0, Lustre 2.4.0, Lustre 2.1.1
Labels:
- llnl
- ptr
Environment:
https://github.com/chaos/lustre/tree/2.1.0-13chaos

Severity:
3
Rank (Obsolete):
4740

Description

Our testing has revealed that lustre 2.1 is far more likely than 1.8 to return short reads and writes (return code says fewer bytes read/written than requested).

So far, the frequent reproducer is IOR shared single file, transfer size 128MB, block size 256MB, 32 client nodes, and 512 tasks evenly spread over the clients.

The file is only striped over 2 OSTs.

When the read() or write() return value is less than the requested amount, the size is, in every instance that I have seen thus far, a multiple of 1MB.

I suspect that other loads will show the same problem. I think that our more common large-transfer-request work loads come from our file-per-process apps though, so we'll run some tests to see if it is easy to reproduce there as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

874pdf.pdf
35 kB
20/Dec/11 8:25 AM
874pdf2.pdf
88 kB
20/Dec/11 9:39 AM
874pdf2.pdf
88 kB
20/Dec/11 9:32 AM
lc3-OST001_brw_stats.txt
8 kB
19/Dec/11 4:08 PM
LU-874.lustre-log.oss.1322741854.6037.gz
4.58 MB
02/Dec/11 1:05 PM
reproducer.c
1 kB
01/Oct/12 5:30 AM
zwicky3_brw_stats.txt
22 kB
20/Dec/11 7:47 PM

Issue Links

is duplicated by

LU-2683 Client deadlock in cl_lock_mutex_get

Resolved

LU-1065 High rate of obd_ping failure with client <-> OST evictions

Resolved

Trackbacks

Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....

Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....

Sub-Tasks

Progress

1.	preallocate or pin debug pages to avoid contention with OSS read cache	Resolved	WC Triage
2.	shared single file client IO submission with many cores starves OST DLM locks due to max_rpcs_in_flight	Closed	Jinshan Xiong (Inactive)
3.	ensure that BRW requests prevent lock timeout	Closed	Jinshan Xiong (Inactive)

Activity

[LU-874] Client eviction on lock callback timeout

Jinshan Xiong (Inactive) added a comment - 05/Feb/13 12:42 PM

Thank you Bruno, let's start the landing process then.

Jinshan Xiong (Inactive) added a comment - 05/Feb/13 12:42 PM Thank you Bruno, let's start the landing process then.

Bruno Faccini (Inactive) added a comment - 05/Feb/13 11:37 AM

Jinshan, we are now with 2 more days (total of more than 5 now !) of testing this latest patch on 2 different clusters running with multiple configurations, with heavy reproducer exposure. But still no hang reproduction.

Bruno Faccini (Inactive) added a comment - 05/Feb/13 11:37 AM Jinshan, we are now with 2 more days (total of more than 5 now !) of testing this latest patch on 2 different clusters running with multiple configurations, with heavy reproducer exposure. But still no hang reproduction.

Bruno Faccini (Inactive) added a comment - 04/Feb/13 8:09 AM

Not reproduced during the week-end, I keep trying using different Clients/Servers scenarios/configs.

Bruno Faccini (Inactive) added a comment - 04/Feb/13 8:09 AM Not reproduced during the week-end, I keep trying using different Clients/Servers scenarios/configs.

Jinshan Xiong (Inactive) added a comment - 29/Jan/13 9:34 PM

Hi Bruno Faccini, can you please try to check if this works: http://review.whamcloud.com/5208

Jinshan Xiong (Inactive) added a comment - 29/Jan/13 9:34 PM Hi Bruno Faccini, can you please try to check if this works: http://review.whamcloud.com/5208

Jinshan Xiong (Inactive) added a comment - 29/Jan/13 2:37 AM

I will try to make a patch tomorrow.

Jinshan Xiong (Inactive) added a comment - 29/Jan/13 2:37 AM I will try to make a patch tomorrow.

Oleg Drokin added a comment - 24/Jan/13 12:48 PM

I am not working on this issue myself, I had a discussion with Jinshan yesterday and I believe we now agree the reason for this particular problem is understood.
Jinshan is now thinking on how to properly address it and patch 4306 might be the move in the right direction, but I'll defer to him.

The old rule of "ptlrpcd should not block" is still very important, when it's violated, all async ptlrpcd-handled RPCs are at risk of being stuck because their replies could not be handled.

Oleg Drokin added a comment - 24/Jan/13 12:48 PM I am not working on this issue myself, I had a discussion with Jinshan yesterday and I believe we now agree the reason for this particular problem is understood. Jinshan is now thinking on how to properly address it and patch 4306 might be the move in the right direction, but I'll defer to him. The old rule of "ptlrpcd should not block" is still very important, when it's violated, all async ptlrpcd-handled RPCs are at risk of being stuck because their replies could not be handled.

Bruno Faccini (Inactive) added a comment - 24/Jan/13 2:58 AM

You are right I had to be quick and did not save the necessary symbol stuff, but at this moment it was not intended to serve after more than a month and also I thought all could be easily retrieved with their build versions (saved in job_hang.txt file) ... But if finally not, you are right that's a good experience to learn from for next time.

If you want I can reproduce it again on Toro ?

And according to your analysis we need to pursue with http://review.whamcloud.com/4306 ?

Bruno Faccini (Inactive) added a comment - 24/Jan/13 2:58 AM You are right I had to be quick and did not save the necessary symbol stuff, but at this moment it was not intended to serve after more than a month and also I thought all could be easily retrieved with their build versions (saved in job_hang.txt file) ... But if finally not, you are right that's a good experience to learn from for next time. If you want I can reproduce it again on Toro ? And according to your analysis we need to pursue with http://review.whamcloud.com/4306 ?

Oleg Drokin added a comment - 23/Jan/13 9:29 PM

Bruno, it would have been great if together with vmcore files you'd put lustre modules rpm and the kernel-debuginfo so that they are useful for everybody.

From my look it seems Bobijam analysus from Oct 16 is pretty much what I would think.

There's an extra piece here that explains it all. Reply for the RPC is not processed because ptlrpcd is locked up trying to get mutex, as the result, RPC (reply arrived based on flags) is stuck on the sending list and could not be taken out (also job for ptlrpcd that is locked up).

Oleg Drokin added a comment - 23/Jan/13 9:29 PM Bruno, it would have been great if together with vmcore files you'd put lustre modules rpm and the kernel-debuginfo so that they are useful for everybody. From my look it seems Bobijam analysus from Oct 16 is pretty much what I would think. There's an extra piece here that explains it all. Reply for the RPC is not processed because ptlrpcd is locked up trying to get mutex, as the result, RPC (reply arrived based on flags) is stuck on the sending list and could not be taken out (also job for ptlrpcd that is locked up).

Jinshan Xiong (Inactive) added a comment - 17/Dec/12 1:44 PM

Thank you Bruno.

Jinshan Xiong (Inactive) added a comment - 17/Dec/12 1:44 PM Thank you Bruno.

Bruno Faccini (Inactive) added a comment - 17/Dec/12 6:20 AM

Jinshan,

As I keep Toro reservations since quite a long time for this problem/reproducer, I need to free them now due to higher priority tests to be done on the cluster. So i decided to freeze the Nodes situations by 1st taking a "dump live" image of the 4 client-12vm[1-4] VMs with virsh and then I crashed all Nodes+VMs with Alt+SysRq sequence. All these infos are now available under brent:~bruno/~~LU-874~~/new_hang_101212.

Bruno Faccini (Inactive) added a comment - 17/Dec/12 6:20 AM Jinshan, As I keep Toro reservations since quite a long time for this problem/reproducer, I need to free them now due to higher priority tests to be done on the cluster. So i decided to freeze the Nodes situations by 1st taking a "dump live" image of the 4 client-12vm [1-4] VMs with virsh and then I crashed all Nodes+VMs with Alt+SysRq sequence. All these infos are now available under brent:~bruno/ LU-874 /new_hang_101212.

Bruno Faccini (Inactive) added a comment - 14/Dec/12 9:07 AM

Sorry Doug, you may wait from some feedback here, so yes this are the msgs that we can see when we reproduce the hung situation, and the current hung thread stack on Client too.

Bruno Faccini (Inactive) added a comment - 14/Dec/12 9:07 AM Sorry Doug, you may wait from some feedback here, so yes this are the msgs that we can see when we reproduce the hung situation, and the current hung thread stack on Client too.

People

Assignee:: Zhenyu Xu

Reporter:: Christopher Morrone (Inactive)

Votes:: 4 Vote for this issue

Watchers:: 40 Start watching this issue

Dates

Created:: 22/Nov/11 6:11 PM

Updated:: 28/Feb/18 8:18 PM

Resolved:: 28/Feb/18 8:18 PM