Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13535

Files truncated/corruption due to lfsck

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0, Lustre 2.12.5
    • Lustre 2.12.4
    • None
    • CentOS 7.6
    • 2
    • 9223372036854775807

    Description

      Following several server crashes (eg. LU-13511) when running lfs migrate, we decided to run lfsck on Fir (Lustre 2.12.4). Today, users are reporting that some of their files have been truncated to 128MB (strangely the size of the first component matches the one from our new default PFL layout).

      What led to this situation is likely the following scenario:

      • files were created originally using DoM + PFL (default setup)
      • we changed our default layout to PFL with the first OST component set to 128MB (stripe count 1) to avoid new DoM files
      • because of issues with DoM, we have restriped most of the existing DoM files using lfs migrate -c 1 (DoM/PFL to plain layout) this was done several months ago without problems
      • two days ago, we started to run lfsck namespace + layout
      • today, users are reporting truncated files, only the ones with plain layout > 128MB

      I'm wondering if this could be related to LU-13426. We consider this issue Sev 2 at least as lfsck is likely corrupting files that have been migrated to plain layout.

      More information below.

      Example with file:

      /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      
      [root@fir-rbh01 ~]# stat /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
        File: ‘/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa’
        Size: 134217728 	Blocks: 262152     IO Block: 4194304 regular file
      Device: e64e03a8h/3863872424d	Inode: 144119811155193635  Links: 1
      Access: (0644/-rw-r--r--)  Uid: (65488/ mgebala)   Gid: (52067/astraigh)
      Access: 2020-05-07 11:18:32.000000000 -0700
      Modify: 2020-04-08 23:24:19.000000000 -0700
      Change: 2020-04-29 11:26:53.000000000 -0700
       Birth: -
      
      [root@fir-rbh01 ~]# lfs getstripe /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      lmm_stripe_count:  1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 80
      	obdidx		 objid		 objid		 group
      	    80	      17475505	    0x10aa7b1	  0x1700000402
      

      FID is: [0x200043465:0x6f23:0x0]

      [root@fir-rbh01 ~]# lfs path2fid /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      [0x200043465:0x6f23:0x0]
      

      Thanks to Robinhood, we know that the file size was ~132MB and not 128MB.

      MariaDB [robinhood_fir]> select * from ENTRIES where id='0x200043465:0x6f23:0x0';
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      | id                     | uid     | gid      | size      | blocks | creation_time | last_access | last_mod   | last_mdchange | type | mode | nlink | md_update  | invalid | fileclass | class_update | alert_status | checkdv_status | alert_lstchk | alert_lstalrt | checkdv_lstchk | checkdv_lstsuc | checkdv_out            |
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      | 0x200043465:0x6f23:0x0 | mgebala | astraigh | 138323718 | 270176 |    1586607743 |  1586413465 | 1586413459 |    1588184813 | file |  420 |     1 | 1588185083 |       0 | +groups+  |   1588185083 |              | ok             |            0 |             0 |     1588184813 |     1588184813 | 60239190574:1586607743 |
      +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
      1 row in set (0.00 sec)
      

      Also the original data_version was 60239190574 but now it's:

      [root@fir-rbh01 ~]# lfs data_version /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
      30120416758
      

      This file is on MDT0 and lfsck logs show that something was fixed for this FID 0x200043465:0x6f23:0x0:

      [root@fir-rbh01 ~]# grep 0x200043465:0x6f23:0x0 lfsck.fir-md1-s1.log 
      00100000:10000000:24.0:1588797550.743684:0:126810:0:(lfsck_layout.c:4033:lfsck_layout_repair_owner()) fir-MDT0000-osd: layout LFSCK assistant repaired inconsistent file owner for: parent [0x200043465:0x6f23:0x0], child [0x1340000401:0x10bc4c3:0x0], OST-index 65, stripe-index 1, old owner 0/0, new owner 65488/52067: rc = 1
      

      Robinhood also shows that the file was previously stripped on two OSTs, but Robinhood doesn't support DoM or migration, so that is from the original striping info:

      MariaDB [robinhood_fir]> select * from STRIPE_ITEMS where id='0x200043465:0x6f23:0x0';
      +------------------------+--------------+--------+----------------------+
      | id                     | stripe_index | ostidx | details              |
      +------------------------+--------------+--------+----------------------+
             |43465:0x6f23:0x0 |            0 |     64 |          ??
      | 0x200043465:0x6f23:0x0 |            1 |     65 |      @   ??
                                                                           |
      +------------------------+--------------+--------+----------------------+
      2 rows in set (0.00 sec)
      

      LFSCK layout has fixed many files like that:

      [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep status
      fir-md1-s[1-4]: status: completed
      [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep repaired
      fir-md1-s[1,4]: repaired_dangling: 0
      fir-md1-s[2-3]: repaired_dangling: 1
      fir-md1-s[1-4]: repaired_unmatched_pair: 0
      fir-md1-s[1-4]: repaired_multiple_referenced: 0
      fir-md1-s[1-4]: repaired_orphan: 0
      fir-md1-s1: repaired_inconsistent_owner: 10494922
      fir-md1-s2: repaired_inconsistent_owner: 26336224
      fir-md1-s3: repaired_inconsistent_owner: 36300505
      fir-md1-s4: repaired_inconsistent_owner: 15102845
      fir-md1-s1: repaired_others: 429814
      fir-md1-s2: repaired_others: 46955127
      fir-md1-s3: repaired_others: 0
      fir-md1-s4: repaired_others: 1716650
      

      Do you confirm this could be due to LFSCK? I'm not sure why "inconsistent file owner" would corrupt files, but this is the only pointer that we have now. If that's the case, do you think there is a way to repair what LFSCK has "fixed"?

      Attachments

        Issue Links

          Activity

            [LU-13535] Files truncated/corruption due to lfsck

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38584/
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: be009cb4a73b3bef7302083bec7d1d6289d515b7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38584/ Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: master Current Patch Set: Commit: be009cb4a73b3bef7302083bec7d1d6289d515b7

            We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!).

            sthiell Stephane Thiell added a comment - We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!).

            Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it's inherited for new directories (as I don't think we can set a PFL layout as FS-default):

            lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir
            

            The FS-default is still plain layout of 1 stripe, as we haven't modified it. With tunefs.lustre on MDT0 I can see:

            lov.stripecount=1 lov.stripesize=1048576 
            
            sthiell Stephane Thiell added a comment - Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it's inherited for new directories (as I don't think we can set a PFL layout as FS-default): lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir The FS-default is still plain layout of 1 stripe, as we haven't modified it. With tunefs.lustre on MDT0 I can see: lov.stripecount=1 lov.stripesize=1048576

            Stephane, was layout of these files FS-default, so all have the same one or there are many cases?

            tappro Mikhail Pershin added a comment - Stephane, was layout of these files FS-default, so all have the same one or there are many cases?

            Hi Mike – This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be "fixed". Thx.

            sthiell Stephane Thiell added a comment - Hi Mike – This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be "fixed". Thx.

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38585
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: a29eecf94a2fb0642256e600074173428ccf5304

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38585 Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a29eecf94a2fb0642256e600074173428ccf5304

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38584
            Subject: LU-13535 lfsck: fix possible PFL layout corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: db1aa8f1880162e186467a9a52da21fb319cb1b2

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38584 Subject: LU-13535 lfsck: fix possible PFL layout corruption Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: db1aa8f1880162e186467a9a52da21fb319cb1b2

            Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same.

            I'm attaching the output of lfs getstripe -R -v on all affected files (the truncated ones only) as fir_lfsck_trunc_getstripe_all.log.gz

            As for lfsck, we started it with lctl lfsck_start -M fir-MDTxxx -t namespace first on all 4 MDTs, and then once done, I did lctl lfsck_start -M fir-MDTxxx -t layout on all 4 MDTs.

            sthiell Stephane Thiell added a comment - Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same. I'm attaching the output of lfs getstripe -R -v on all affected files (the truncated ones only) as fir_lfsck_trunc_getstripe_all.log.gz As for lfsck, we started it with lctl lfsck_start -M fir-MDTxxx -t namespace first on all 4 MDTs, and then once done, I did lctl lfsck_start -M fir-MDTxxx -t layout on all 4 MDTs.

            I was managed to reproduce that bug and have found why it happens, fix for lfsck is on the way. I am trying to figure out now what can be done for lost stripes.

            tappro Mikhail Pershin added a comment - I was managed to reproduce that bug and have found why it happens, fix for lfsck is on the way. I am trying to figure out now what can be done for lost stripes.
            tappro Mikhail Pershin added a comment - - edited

            Stephane, could you please get extended output of striping info from affected files via lfs getstripe -R -v, it will show all fields in layout and can give some clues. I am trying to reproduce that behavior locally and also inspect lfsck code in 2.12.4 right now

            Also please provide exact lfsck command used

            tappro Mikhail Pershin added a comment - - edited Stephane, could you please get extended output of striping info from affected files via lfs getstripe -R -v , it will show all fields in layout and can give some clues. I am trying to reproduce that behavior locally and also inspect lfsck code in 2.12.4 right now Also please provide exact lfsck command used
            pjones Peter Jones added a comment -

            Details from email

            "I’m contacting you regarding https://jira.whamcloud.com/browse/LU-13535 "Files truncated/corruption due to lfsck”

            We wanted to send you this email to provide updated information about our specific situation.

            A single LFSCK run has changed the layout of millions of files on Fir to plain layout of 1 OST, with 128 MiB maximum. Larger files have been truncated, so today we know that lfsck has corrupted the content of about 215k files. At this state, it looks like it has only happened on directories which have previously been migrated to another MDT (lfs migrate -m 1), directories which worked fine until we ran LFSCK namespace + layout.

            We have indeed scanned the filesystem (Fir on Sherlock) for files that are 128MiB in size with a plain layout of 1 OST, which is our way to detect when a file has lost its default PFL layout and very, very likely been truncated by LFSCK last week. During the weekend, we have been running lfs find -size 128M -c 1 on the whole filesystem (665M inodes) and it has completed: we have found 214,695 files total that have been truncated to 128MiB after this LFSCK run. All files I checked manually have indeed been truncated/corrupted. Also, users are reporting that their quota is still showing the previous volume used, so we think there could be a chance that that the objects are still somehow on the OSTs. Some users have lost tens of TB of scratch research data due to very large files being truncated.

            Thanks for assigning Mike to this ticket. Any insights would be appreciated as soon as possible so we can adjust the communication to our users. My guess is that the layouts are lost, but perhaps you will find a way to reattach the component to these files?"

            pjones Peter Jones added a comment - Details from email "I’m contacting you regarding https://jira.whamcloud.com/browse/LU-13535 "Files truncated/corruption due to lfsck” We wanted to send you this email to provide updated information about our specific situation. A single LFSCK run has changed the layout of millions of files on Fir to plain layout of 1 OST, with 128 MiB maximum. Larger files have been truncated, so today we know that lfsck has corrupted the content of about 215k files. At this state, it looks like it has only happened on directories which have previously been migrated to another MDT (lfs migrate -m 1), directories which worked fine until we ran LFSCK namespace + layout. We have indeed scanned the filesystem (Fir on Sherlock) for files that are 128MiB in size with a plain layout of 1 OST, which is our way to detect when a file has lost its default PFL layout and very, very likely been truncated by LFSCK last week. During the weekend, we have been running lfs find -size 128M -c 1 on the whole filesystem (665M inodes) and it has completed: we have found 214,695 files total that have been truncated to 128MiB after this LFSCK run. All files I checked manually have indeed been truncated/corrupted. Also, users are reporting that their quota is still showing the previous volume used, so we think there could be a chance that that the objects are still somehow on the OSTs. Some users have lost tens of TB of scratch research data due to very large files being truncated. Thanks for assigning Mike to this ticket. Any insights would be appreciated as soon as possible so we can adjust the communication to our users. My guess is that the layouts are lost, but perhaps you will find a way to reattach the component to these files?"

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: