Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1974

File corruptions when running with LU-1442 patch, LU-1703 patch is also required.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 6320

    Description

      This bug is mainly open for information and to let the community aware, just in case ...

      So, here is the story, running with our/Bull Lustre 2.1.2 version, customer started to report files corruptions where a Client can successfully create/write/re-read files until an other Client tries to access the same file. At this time the files content became corrupted for all (missing blocks of data or zéro file-size ...) !!

      On the other hand, the corruptions have been identified to occur only on OSTs/OSCs where the file-creator Client had no more grant ("/proc/fs/lustre/osc/<OST-import>/cur_

      {dirty|grant|lost_grant}

      _bytes" are Null) and also they seem to never be automatically recovered but only when we run a small program doing O_DIRECT writes to these OSTs ...

      Finally doing a full Lustre-trace of a program/command on a Client writing to these "zero-grant" OSTs, we found that the -EDQUOT was coming up during the cached-write standard path/routines and then the direct-IO path was attempted but ended with -EALREADY finally substitued with a successfull Null/0 return-value and no page written/flushed to the Server at all nor new grants recovered !!!

      Having a look to the concerned source-code, this would only occur if the written page(s) was not set Dirty ...

      Finally we found that this behavior/bug (missing "set_page_dirty()" vvp_io_commit_write() in case of -EDQUOT return from cl_page_add_cache() and implicit switch to direct-IO path/vvp_page_sync_io()) was not in Lustre v2.1.2 Base but has been introduced by patch from LU-1442 that our R&D included due to its high-level critical ...

      This bug has been since fixed by LU-1703 that we (I mean Bull R&D) need to integrate asap !!!!

      Also, I think that an explicit link/comment has to be added in LU-1442 to detail its running-dependency with LU-1703 patch.

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            louveta Alexandre Louvet
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: