Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6389

read()/write() returning less than available bytes intermittently

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.5.2, Lustre 2.5.3
    • CentOS 6.5 2.6.32-431.17.1.el6.x86_64. 2.5.2 client. 2.5.3 server.
    • 4
    • 9223372036854775807

    Description

      Since March 10, 2015, we have be tracking an increasing number of user reports of intermittent I/O problems with our largest Lustre filesystem on Stampede (SCRATCH). This is affecting dozens of users on multiple jobs per user. First detected in Fortran programs and reduced to a 10-line reproducer (test_break.f), we have now also generated a C reproducer (rwb.c) that does not depend on a specific Fortran runtime library. This case was designed to mimic the underlying libc calls that the Fortran case was making without the interference from the runtime library. The attached case fails with either icc or gcc on our system.

      The basic case involves a long sequence of ~4MB read() or write() calls which eventually should read or write all of a large file. Intermittently, but reproducibly, one of these calls will come back short before getting to the last block of the file. I.e. a 4MB read may only read 2.5MB somewhere in the middle of the file. The number bytes read on the short call and the position in the sequence are apparently random. This issue does not occur if the file has only 1 stripe, but does consistently occur with 2 stripes or more. The problem does not occur on either of our other Lustre filesystems on Stampede, and nothing appears to have changed that is correlated in time with the start of the problems.

      The short read/write does not report an error when running the C code, and subsequent reads continue as normal. Writing behaves identically. Some codes, including the Intel Fortran runtime do not tolerate short reads (though they potentially could), and the codes abort (including the attached one). No codes that I know of are designed to tolerate shorter than requested writes generally. We can find no client or server error messages associated with these short read/write events.

      We would be happy to provide access to Stampede for testing and verification.

      Attachments

        Issue Links

          Activity

            [LU-6389] read()/write() returning less than available bytes intermittently

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18275/
            Subject: LU-6389 utils: fix lustre_rsync read retry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7d165f5fe357010c3b41abf1163aacb09a88816f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18275/ Subject: LU-6389 utils: fix lustre_rsync read retry Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7d165f5fe357010c3b41abf1163aacb09a88816f

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18275
            Subject: LU-6389 utils: fix lustre_rsync read retry
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b0c49e274f75afc4a4e2fcacfc7df9bbf88d5487

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18275 Subject: LU-6389 utils: fix lustre_rsync read retry Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b0c49e274f75afc4a4e2fcacfc7df9bbf88d5487
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14123/
            Subject: LU-6389 llite: restart short read/write for normal IO
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8badb39913b5e1c614d2fe410ef7200391099855

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14123/ Subject: LU-6389 llite: restart short read/write for normal IO Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8badb39913b5e1c614d2fe410ef7200391099855

            Layout lock can be lost in any of the following situations: false sharing, LRU, or memory pressure; and the tricky thing is the client doesn't know if the layout is still the same after it re-enqueues and gets new layout.

            jay Jinshan Xiong (Inactive) added a comment - Layout lock can be lost in any of the following situations: false sharing, LRU, or memory pressure; and the tricky thing is the client doesn't know if the layout is still the same after it re-enqueues and gets new layout.

            Stepping back, is there any intent to address the underlying layout lock revocation? Is that even addressable, or is it a permanent part of the design? I'm curious to try to better understand the underlying cause.

            Aurelien - I don't think I understand this comment:
            "Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. "

            What's the actual race condition here? You say if it is dropped between the two writes, it is re-enqueued and all is well. So when exactly does it need to be dropped for this to be a problem? (And shouldn't the code that needs it ensure the lock is taken, rather than return?)

            paf Patrick Farrell (Inactive) added a comment - Stepping back, is there any intent to address the underlying layout lock revocation? Is that even addressable, or is it a permanent part of the design? I'm curious to try to better understand the underlying cause. Aurelien - I don't think I understand this comment: "Layout lock should be canceled by ldlm_bl thread during I/O, exactly between 2 calls to vvp_io_write_start(). If LL is dropped between 2 writes, it will be enqueued again before doing the 2nd write and it will be OK. If your I/O does not cover several stripes, it is also fine. " What's the actual race condition here? You say if it is dropped between the two writes, it is re-enqueued and all is well. So when exactly does it need to be dropped for this to be a problem? (And shouldn't the code that needs it ensure the lock is taken, rather than return?)
            bbarth Bill Barth (Inactive) added a comment - - edited

            I was slightly wrong. We applied patches from LU-5062 and LU-5726 on Feb 10, 2015. Those are the only changes in our setup recently. Do you think they're related to this issue?

            Also, we have been testing patchset 4 from the most recent 2.5 patch with good success. Our current plan is to deploy to production on 3/31/15.

            bbarth Bill Barth (Inactive) added a comment - - edited I was slightly wrong. We applied patches from LU-5062 and LU-5726 on Feb 10, 2015. Those are the only changes in our setup recently. Do you think they're related to this issue? Also, we have been testing patchset 4 from the most recent 2.5 patch with good success. Our current plan is to deploy to production on 3/31/15.
            bobijam Zhenyu Xu added a comment -

            yes, the patch would try to restart and finish the IO from where it has accomplished until EOF or error encountered.

            bobijam Zhenyu Xu added a comment - yes, the patch would try to restart and finish the IO from where it has accomplished until EOF or error encountered.

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14190
            Subject: LU-6389 llite: restart short read/write for normal IO
            Project: fs/lustre-release
            Branch: b2_4
            Current Patch Set: 1
            Commit: 7f12166c9fedd6d8aba3e59042142935d285d70e

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14190 Subject: LU-6389 llite: restart short read/write for normal IO Project: fs/lustre-release Branch: b2_4 Current Patch Set: 1 Commit: 7f12166c9fedd6d8aba3e59042142935d285d70e

            NASA Ames hit the same problem in production.

            Can I take that the patch would do what Christopher Morrone said: "Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error?"

            jaylan Jay Lan (Inactive) added a comment - NASA Ames hit the same problem in production. Can I take that the patch would do what Christopher Morrone said: "Our design choice in Lustre has been (for probably well over a decade), that Lustre must not return short reads or writes, except in the cases of a fatal error?"

            People

              bobijam Zhenyu Xu
              bbarth Bill Barth (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: