Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-860

Lustre quota inconsistencies after multiple usages of LU-601 work-around

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.0.0
    • None
    • linux-2.6.32-71.24.1
    • 3
    • 6521

    Description

      Hi,

      Some users at CEA site complain about inconsistencies between "lfs quota -u" vs "du -s" report.

      After long investigations, on site support finally found that the lost file system space is consumed by orphaned objids on OSTs, and is a consequence of LU-601 work-around.
      When it was impossible to restart the MDS (systematically asserting in "tgt_recov"), the only solution was to mount the volume in ldiskfs mode and rename the PENDING subdirectory.
      Now, there are several old "PENDING* directories", and a lot of orphaned objids belonging to FIDs in these directories.

      In order to recover all this lost space, the support is asking if it is safe to run "lfsck", or if they have to build their own tool to offline parse all OSTs and remove all objids that belongs to FIDs in PENDING* directories ?

      Perhaps the PENDING directory was sometimes removed instead of renamed. In this case, is the recovery identical, or is there something else to do?

      TIA
      Patrick

      Below is the support report, and I have also attached the files containing the traces of the commands executed on Client, MDT and OST.

      #context: 
      
      Some times ago, a few users started to report Lustre quotas inconsistencies regarding to the "lfs quota -u"
      report vs "du -s" over their full hierachy/sub-tree. "lfs quotacheck" did not fix inconsistencies.
      
      #consequences: 
      Quotas are unusable and inaccurate for these users and (a lot ??) filesystem space is consumed by orphaned objids on
      OSTs.
      
      #details:
      
      1st check made was to identify that the inconsistencies are due to real (and orphaned) filesystem space/blocks consumption
      and not only/just a bad Quota value !!...
      
      2nd thing has been to identify that the orphaned objids belong to FIDs in the MDS multiple PENDING* directories that
      have been moved as part of LU-601 work-around !!!
      
      See [Client,MDT,OST]_side files showing the details.
      
      So what can we do now to recover all the space/blocks used by the orphaned objids ??? Can we safelly run "lfsck"
      or do we need to build our own tool to offline parse all OSTs and remove all objids that belongs to FIDs in PENDING*
      directories ???
      
      

      Attachments

        1. Client_side
          14 kB
        2. MDT_side
          5 kB
        3. OST_side
          2 kB

        Activity

          People

            bobijam Zhenyu Xu
            patrick.valentin Patrick Valentin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: