Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13535

Files truncated/corruption due to lfsck

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0, Lustre 2.12.5
    • Lustre 2.12.4
    • None
    • CentOS 7.6
    • 2
    • 9223372036854775807

    Description

      Following several server crashes (eg. LU-13511) when running lfs migrate, we decided to run lfsck on Fir (Lustre 2.12.4). Today, users are reporting that some of their files have been truncated to 128MB (strangely the size of the first component matches the one from our new default PFL layout).

      What led to this situation is likely the following scenario:

      • files were created originally using DoM + PFL (default setup)
      • we changed our default layout to PFL with the first OST component set to 128MB (stripe count 1) to avoid new DoM files
      • because of issues with DoM, we have restriped most of the existing DoM files using lfs migrate -c 1 (DoM/PFL to plain layout) this was done several months ago without problems
      • two days ago, we started to run lfsck namespace + layout
      • today, users are reporting truncated files, only the ones with plain layout > 128MB

      I'm wondering if this could be related to LU-13426. We consider this issue Sev 2 at least as lfsck is likely corrupting files that have been migrated to plain layout.

      More information below.

      Example with file:

      /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      
      [root@fir-rbh01 ~]# stat /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
        File: ‘/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa’
        Size: 134217728 	Blocks: 262152     IO Block: 4194304 regular file
      Device: e64e03a8h/3863872424d	Inode: 144119811155193635  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (65488/ mgebala)   Gid: (52067/astraigh)
      Access: 2020-05-07 11:18:32.000000000 -0700
      Modify: 2020-04-08 23:24:19.000000000 -0700
      Change: 2020-04-29 11:26:53.000000000 -0700
       Birth: -
      
      [root@fir-rbh01 ~]# lfs getstripe /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      lmm_stripe_count:  1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 80
      	obdidx		 objid		 objid		 group
      	    80	      17475505	    0x10aa7b1	  0x1700000402
      

      FID is: [0x200043465:0x6f23:0x0]

      [root@fir-rbh01 ~]# lfs path2fid /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      [0x200043465:0x6f23:0x0]
      

      Thanks to Robinhood, we know that the file size was ~132MB and not 128MB.

      MariaDB [robinhood_fir]> select * from ENTRIES where id='0x200043465:0x6f23:0x0';
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      | id                     | uid     | gid      | size      | blocks | creation_time | last_access | last_mod   | last_mdchange | type | mode | nlink | md_update  | invalid | fileclass | class_update | alert_status | checkdv_status | alert_lstchk | alert_lstalrt | checkdv_lstchk | checkdv_lstsuc | checkdv_out            |
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      | 0x200043465:0x6f23:0x0 | mgebala | astraigh | 138323718 | 270176 |    1586607743 |  1586413465 | 1586413459 |    1588184813 | file |  420 |     1 | 1588185083 |       0 | +groups+  |   1588185083 |              | ok             |            0 |             0 |     1588184813 |     1588184813 | 60239190574:1586607743 |
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      1 row in set (0.00 sec)
      

      Also the original data_version was 60239190574 but now it's:

      [root@fir-rbh01 ~]# lfs data_version /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      30120416758
      

      This file is on MDT0 and lfsck logs show that something was fixed for this FID 0x200043465:0x6f23:0x0:

      [root@fir-rbh01 ~]# grep 0x200043465:0x6f23:0x0 lfsck.fir-md1-s1.log 
      00100000:10000000:24.0:1588797550.743684:0:126810:0:(lfsck_layout.c:4033:lfsck_layout_repair_owner()) fir-MDT0000-osd: layout LFSCK assistant repaired inconsistent file owner for: parent [0x200043465:0x6f23:0x0], child [0x1340000401:0x10bc4c3:0x0], OST-index 65, stripe-index 1, old owner 0/0, new owner 65488/52067: rc = 1
      

      Robinhood also shows that the file was previously stripped on two OSTs, but Robinhood doesn't support DoM or migration, so that is from the original striping info:

      MariaDB [robinhood_fir]> select * from STRIPE_ITEMS where id='0x200043465:0x6f23:0x0';
      +------------------------+--------------+--------+----------------------+
      | id                     | stripe_index | ostidx | details              |
      +------------------------+--------------+--------+----------------------+
             |43465:0x6f23:0x0 |            0 |     64 |          ??
      | 0x200043465:0x6f23:0x0 |            1 |     65 |      @   ??
                                                                           |
      +------------------------+--------------+--------+----------------------+
      2 rows in set (0.00 sec)
      

      LFSCK layout has fixed many files like that:

      [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep status
      fir-md1-s[1-4]: status: completed
      [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep repaired
      fir-md1-s[1,4]: repaired_dangling: 0
      fir-md1-s[2-3]: repaired_dangling: 1
      fir-md1-s[1-4]: repaired_unmatched_pair: 0
      fir-md1-s[1-4]: repaired_multiple_referenced: 0
      fir-md1-s[1-4]: repaired_orphan: 0
      fir-md1-s1: repaired_inconsistent_owner: 10494922
      fir-md1-s2: repaired_inconsistent_owner: 26336224
      fir-md1-s3: repaired_inconsistent_owner: 36300505
      fir-md1-s4: repaired_inconsistent_owner: 15102845
      fir-md1-s1: repaired_others: 429814
      fir-md1-s2: repaired_others: 46955127
      fir-md1-s3: repaired_others: 0
      fir-md1-s4: repaired_others: 1716650
      

      Do you confirm this could be due to LFSCK? I'm not sure why "inconsistent file owner" would corrupt files, but this is the only pointer that we have now. If that's the case, do you think there is a way to repair what LFSCK has "fixed"?

      Attachments

        1. dk.fir-md1-s1.log.gz
          9.94 MB
          Stephane Thiell
        2. dk.fir-md1-s2.log.gz
          8.99 MB
          Stephane Thiell
        3. fir_lfsck_trunc_getstripe_all.log.gz
          6.50 MB
          Stephane Thiell

        Issue Links

          Activity

            [LU-13535] Files truncated/corruption due to lfsck

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38585/
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 775ce1c26c843d9ef9e6919f85e5284828762095

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38585/ Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 775ce1c26c843d9ef9e6919f85e5284828762095
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38584/
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: be009cb4a73b3bef7302083bec7d1d6289d515b7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38584/ Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: master Current Patch Set: Commit: be009cb4a73b3bef7302083bec7d1d6289d515b7

            We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!).

            sthiell Stephane Thiell added a comment - We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!).

            Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it's inherited for new directories (as I don't think we can set a PFL layout as FS-default):

            lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir
            

            The FS-default is still plain layout of 1 stripe, as we haven't modified it. With tunefs.lustre on MDT0 I can see:

            lov.stripecount=1 lov.stripesize=1048576 
            
            sthiell Stephane Thiell added a comment - Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it's inherited for new directories (as I don't think we can set a PFL layout as FS-default): lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir The FS-default is still plain layout of 1 stripe, as we haven't modified it. With tunefs.lustre on MDT0 I can see: lov.stripecount=1 lov.stripesize=1048576

            Stephane, was layout of these files FS-default, so all have the same one or there are many cases?

            tappro Mikhail Pershin added a comment - Stephane, was layout of these files FS-default, so all have the same one or there are many cases?

            Hi Mike – This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be "fixed". Thx.

            sthiell Stephane Thiell added a comment - Hi Mike – This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be "fixed". Thx.

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38585
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: a29eecf94a2fb0642256e600074173428ccf5304

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38585 Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a29eecf94a2fb0642256e600074173428ccf5304

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38584
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: db1aa8f1880162e186467a9a52da21fb319cb1b2

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38584 Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: db1aa8f1880162e186467a9a52da21fb319cb1b2

            Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same.

            I'm attaching the output of lfs getstripe -R -v on all affected files (the truncated ones only) as fir_lfsck_trunc_getstripe_all.log.gz

            As for lfsck, we started it with lctl lfsck_start -M fir-MDTxxx -t namespace first on all 4 MDTs, and then once done, I did lctl lfsck_start -M fir-MDTxxx -t layout on all 4 MDTs.

            sthiell Stephane Thiell added a comment - Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same. I'm attaching the output of lfs getstripe -R -v on all affected files (the truncated ones only) as fir_lfsck_trunc_getstripe_all.log.gz As for lfsck, we started it with lctl lfsck_start -M fir-MDTxxx -t namespace first on all 4 MDTs, and then once done, I did lctl lfsck_start -M fir-MDTxxx -t layout on all 4 MDTs.

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: