[LU-6389] read()/write() returning less than available bytes intermittently - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.5.2, Lustre 2.5.3
Labels:
- llnl
Environment:
CentOS 6.5 2.6.32-431.17.1.el6.x86_64. 2.5.2 client. 2.5.3 server.

Severity:
4
Rank (Obsolete):
9223372036854775807

Description

Since March 10, 2015, we have be tracking an increasing number of user reports of intermittent I/O problems with our largest Lustre filesystem on Stampede (SCRATCH). This is affecting dozens of users on multiple jobs per user. First detected in Fortran programs and reduced to a 10-line reproducer (test_break.f), we have now also generated a C reproducer (rwb.c) that does not depend on a specific Fortran runtime library. This case was designed to mimic the underlying libc calls that the Fortran case was making without the interference from the runtime library. The attached case fails with either icc or gcc on our system.

The basic case involves a long sequence of ~4MB read() or write() calls which eventually should read or write all of a large file. Intermittently, but reproducibly, one of these calls will come back short before getting to the last block of the file. I.e. a 4MB read may only read 2.5MB somewhere in the middle of the file. The number bytes read on the short call and the position in the sequence are apparently random. This issue does not occur if the file has only 1 stripe, but does consistently occur with 2 stripes or more. The problem does not occur on either of our other Lustre filesystems on Stampede, and nothing appears to have changed that is correlated in time with the start of the problems.

The short read/write does not report an error when running the C code, and subsequent reads continue as normal. Writing behaves identically. Some codes, including the Intel Fortran runtime do not tolerate short reads (though they potentially could), and the codes abort (including the attached one). No codes that I know of are designed to tolerate shorter than requested writes generally. We can find no client or server error messages associated with these short read/write events.

We would be happy to provide access to Stampede for testing and verification.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lustre_bbarth.log.bz2
0.2 kB
20/Mar/15 8:12 PM
lustre_bbarth.log.bz2
0.2 kB
20/Mar/15 1:59 PM
short_io_bug.tar.gz
2 kB
19/Mar/15 11:22 PM

Issue Links

is duplicated by

LU-6392 short read/write with stripe count > 1

Resolved

is related to

LU-6392 short read/write with stripe count > 1

Resolved

LU-6545 MPIIO short reads

Resolved

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-6389] read()/write() returning less than available bytes intermittently

Gerrit Updater added a comment - 20/Feb/16 5:40 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18275/
Subject: ~~LU-6389~~ utils: fix lustre_rsync read retry
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7d165f5fe357010c3b41abf1163aacb09a88816f

Gerrit Updater added a comment - 20/Feb/16 5:40 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18275/ Subject: LU-6389 utils: fix lustre_rsync read retry Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7d165f5fe357010c3b41abf1163aacb09a88816f

Gerrit Updater added a comment - 03/Feb/16 11:07 AM

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18275
Subject: ~~LU-6389~~ utils: fix lustre_rsync read retry
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b0c49e274f75afc4a4e2fcacfc7df9bbf88d5487

Gerrit Updater added a comment - 03/Feb/16 11:07 AM Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18275 Subject: LU-6389 utils: fix lustre_rsync read retry Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b0c49e274f75afc4a4e2fcacfc7df9bbf88d5487

Peter Jones added a comment - 18/May/15 2:23 PM

Landed for 2.8

Peter Jones added a comment - 18/May/15 2:23 PM Landed for 2.8

Gerrit Updater added a comment - 17/May/15 10:50 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14123/
Subject: ~~LU-6389~~ llite: restart short read/write for normal IO
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8badb39913b5e1c614d2fe410ef7200391099855

Gerrit Updater added a comment - 17/May/15 10:50 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14123/ Subject: LU-6389 llite: restart short read/write for normal IO Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8badb39913b5e1c614d2fe410ef7200391099855

Jinshan Xiong (Inactive) added a comment - 13/Apr/15 3:42 AM

Layout lock can be lost in any of the following situations: false sharing, LRU, or memory pressure; and the tricky thing is the client doesn't know if the layout is still the same after it re-enqueues and gets new layout.

Jinshan Xiong (Inactive) added a comment - 13/Apr/15 3:42 AM Layout lock can be lost in any of the following situations: false sharing, LRU, or memory pressure; and the tricky thing is the client doesn't know if the layout is still the same after it re-enqueues and gets new layout.

Patrick Farrell (Inactive) added a comment - 31/Mar/15 9:38 PM

Stepping back, is there any intent to address the underlying layout lock revocation? Is that even addressable, or is it a permanent part of the design? I'm curious to try to better understand the underlying cause.

Aurelien - I don't think I understand this comment:
"Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. "

What's the actual race condition here? You say if it is dropped between the two writes, it is re-enqueued and all is well. So when exactly does it need to be dropped for this to be a problem? (And shouldn't the code that needs it ensure the lock is taken, rather than return?)

Patrick Farrell (Inactive) added a comment - 31/Mar/15 9:38 PM Stepping back, is there any intent to address the underlying layout lock revocation? Is that even addressable, or is it a permanent part of the design? I'm curious to try to better understand the underlying cause. Aurelien - I don't think I understand this comment: "Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. " What's the actual race condition here? You say if it is dropped between the two writes, it is re-enqueued and all is well. So when exactly does it need to be dropped for this to be a problem? (And shouldn't the code that needs it ensure the lock is taken, rather than return?)

Bill Barth (Inactive) added a comment - 27/Mar/15 12:56 AM - edited

I was slightly wrong. We applied patches from ~~LU-5062~~ and ~~LU-5726~~ on Feb 10, 2015. Those are the only changes in our setup recently. Do you think they're related to this issue?

Also, we have been testing patchset 4 from the most recent 2.5 patch with good success. Our current plan is to deploy to production on 3/31/15.

Bill Barth (Inactive) added a comment - 27/Mar/15 12:56 AM - edited I was slightly wrong. We applied patches from LU-5062 and LU-5726 on Feb 10, 2015. Those are the only changes in our setup recently. Do you think they're related to this issue? Also, we have been testing patchset 4 from the most recent 2.5 patch with good success. Our current plan is to deploy to production on 3/31/15.

Zhenyu Xu added a comment - 26/Mar/15 6:31 AM

yes, the patch would try to restart and finish the IO from where it has accomplished until EOF or error encountered.

Zhenyu Xu added a comment - 26/Mar/15 6:31 AM yes, the patch would try to restart and finish the IO from where it has accomplished until EOF or error encountered.

Gerrit Updater added a comment - 26/Mar/15 6:31 AM

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14190
Subject: ~~LU-6389~~ llite: restart short read/write for normal IO
Project: fs/lustre-release
Branch: b2_4
Current Patch Set: 1
Commit: 7f12166c9fedd6d8aba3e59042142935d285d70e

Gerrit Updater added a comment - 26/Mar/15 6:31 AM Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14190 Subject: LU-6389 llite: restart short read/write for normal IO Project: fs/lustre-release Branch: b2_4 Current Patch Set: 1 Commit: 7f12166c9fedd6d8aba3e59042142935d285d70e

Jay Lan (Inactive) added a comment - 25/Mar/15 10:42 PM

NASA Ames hit the same problem in production.

Can I take that the patch would do what Christopher Morrone said: "Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error?"

Jay Lan (Inactive) added a comment - 25/Mar/15 10:42 PM NASA Ames hit the same problem in production. Can I take that the patch would do what Christopher Morrone said: "Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error?"

People

Assignee:: Zhenyu Xu

Reporter:: Bill Barth (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 30 Start watching this issue

Dates

Created:: 19/Mar/15 11:22 PM

Updated:: 19/Jun/20 2:52 PM

Resolved:: 18/May/15 2:23 PM