[LU-13535] Files truncated/corruption due to lfsck Created: 07/May/20 Updated: 09/Jul/21 Resolved: 27/May/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.5 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Stephane Thiell | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.6 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 2 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Following several server crashes (eg. What led to this situation is likely the following scenario:
I'm wondering if this could be related to More information below. Example with file: /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa [root@fir-rbh01 ~]# stat /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa File: ‘/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa’ Size: 134217728 Blocks: 262152 IO Block: 4194304 regular file Device: e64e03a8h/3863872424d Inode: 144119811155193635 Links: 1 Access: (0644/-rw-r--r--) Uid: (65488/ mgebala) Gid: (52067/astraigh) Access: 2020-05-07 11:18:32.000000000 -0700 Modify: 2020-04-08 23:24:19.000000000 -0700 Change: 2020-04-29 11:26:53.000000000 -0700 Birth: - [root@fir-rbh01 ~]# lfs getstripe /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 80 obdidx objid objid group 80 17475505 0x10aa7b1 0x1700000402 FID is: [0x200043465:0x6f23:0x0] [root@fir-rbh01 ~]# lfs path2fid /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa [0x200043465:0x6f23:0x0] Thanks to Robinhood, we know that the file size was ~132MB and not 128MB. MariaDB [robinhood_fir]> select * from ENTRIES where id='0x200043465:0x6f23:0x0'; +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+ | id | uid | gid | size | blocks | creation_time | last_access | last_mod | last_mdchange | type | mode | nlink | md_update | invalid | fileclass | class_update | alert_status | checkdv_status | alert_lstchk | alert_lstalrt | checkdv_lstchk | checkdv_lstsuc | checkdv_out | +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+ | 0x200043465:0x6f23:0x0 | mgebala | astraigh | 138323718 | 270176 | 1586607743 | 1586413465 | 1586413459 | 1588184813 | file | 420 | 1 | 1588185083 | 0 | +groups+ | 1588185083 | | ok | 0 | 0 | 1588184813 | 1588184813 | 60239190574:1586607743 | +------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+ 1 row in set (0.00 sec) Also the original data_version was 60239190574 but now it's: [root@fir-rbh01 ~]# lfs data_version /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa 30120416758 This file is on MDT0 and lfsck logs show that something was fixed for this FID 0x200043465:0x6f23:0x0: [root@fir-rbh01 ~]# grep 0x200043465:0x6f23:0x0 lfsck.fir-md1-s1.log 00100000:10000000:24.0:1588797550.743684:0:126810:0:(lfsck_layout.c:4033:lfsck_layout_repair_owner()) fir-MDT0000-osd: layout LFSCK assistant repaired inconsistent file owner for: parent [0x200043465:0x6f23:0x0], child [0x1340000401:0x10bc4c3:0x0], OST-index 65, stripe-index 1, old owner 0/0, new owner 65488/52067: rc = 1 Robinhood also shows that the file was previously stripped on two OSTs, but Robinhood doesn't support DoM or migration, so that is from the original striping info: MariaDB [robinhood_fir]> select * from STRIPE_ITEMS where id='0x200043465:0x6f23:0x0';
+------------------------+--------------+--------+----------------------+
| id | stripe_index | ostidx | details |
+------------------------+--------------+--------+----------------------+
|43465:0x6f23:0x0 | 0 | 64 | ??
| 0x200043465:0x6f23:0x0 | 1 | 65 | @ ??
|
+------------------------+--------------+--------+----------------------+
2 rows in set (0.00 sec)
LFSCK layout has fixed many files like that: [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep status fir-md1-s[1-4]: status: completed [root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL 'tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout' | grep repaired fir-md1-s[1,4]: repaired_dangling: 0 fir-md1-s[2-3]: repaired_dangling: 1 fir-md1-s[1-4]: repaired_unmatched_pair: 0 fir-md1-s[1-4]: repaired_multiple_referenced: 0 fir-md1-s[1-4]: repaired_orphan: 0 fir-md1-s1: repaired_inconsistent_owner: 10494922 fir-md1-s2: repaired_inconsistent_owner: 26336224 fir-md1-s3: repaired_inconsistent_owner: 36300505 fir-md1-s4: repaired_inconsistent_owner: 15102845 fir-md1-s1: repaired_others: 429814 fir-md1-s2: repaired_others: 46955127 fir-md1-s3: repaired_others: 0 fir-md1-s4: repaired_others: 1716650 Do you confirm this could be due to LFSCK? I'm not sure why "inconsistent file owner" would corrupt files, but this is the only pointer that we have now. If that's the case, do you think there is a way to repair what LFSCK has "fixed"? |
| Comments |
| Comment by Stephane Thiell [ 07/May/20 ] |
|
Attached debug logs with lfsck for fir-MDT0000 as dk.fir-md1-s1.log.gz |
| Comment by Stephane Thiell [ 08/May/20 ] |
|
We're not sure anymore if all of these files were originally created with DoM actually. More and more users are reporting truncated files. It seems like users are even reporting truncated files that have been created recently with the non-DoM, PFL config, but only the first stripe (128MiB) remains after LFSCK was run. What seems to be a common cause could be that the parent directories of these files have recently been migrated to another MDT (MDT1). Could a lfs migrate -m followed by a later lfsck_layout be able to truncate PFL files like that to plain layout (with the first component only)? Our default PFL config: [root@fir-rbh01 ~]# lfs getstripe -d /fir
lcm_layout_gen: 0
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 0
lcme_extent.e_end: 134217728
stripe_count: 1 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 134217728
lcme_extent.e_end: 137438953472
stripe_count: 2 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 137438953472
lcme_extent.e_end: EOF
stripe_count: 4 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
|
| Comment by Peter Jones [ 08/May/20 ] |
|
Mike Could you please advise Thanks Peter |
| Comment by Mikhail Pershin [ 11/May/20 ] |
|
yes, I am working at that |
| Comment by Stephane Thiell [ 11/May/20 ] |
|
Thanks, Mike. This seems pretty bad. It looks like all files in some MDT-migrated directories have lost their PFL layout after running LFSCK. They just seem to have a plain layout now. Files < 128MiB (size of our first PFL component) are not truncated and are still usable, but with a plain layout, but the larger files are truncated. Example of previously PFL'ed small file, that now has a plain layout: [root@fir-rbh01 job034]# lfs getstripe /fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034/run_it025_optimiser.star /fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034/run_it025_optimiser.star lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 57 obdidx objid objid group 57 13579585 0xcf3541 0x1140000400 Its directory still has the PFL config: [root@fir-rbh01 job034]# lfs getstripe -d /fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034
lcm_layout_gen: 0
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 0
lcme_extent.e_end: 134217728
stripe_count: 1 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 134217728
lcme_extent.e_end: 137438953472
stripe_count: 2 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 137438953472
lcme_extent.e_end: EOF
stripe_count: 4 stripe_size: 4194304 pattern: raid0 stripe_offset: -1
How is that possible? |
| Comment by Peter Jones [ 12/May/20 ] |
|
Details from email "I’m contacting you regarding https://jira.whamcloud.com/browse/LU-13535 "Files truncated/corruption due to lfsck” We wanted to send you this email to provide updated information about our specific situation. A single LFSCK run has changed the layout of millions of files on Fir to plain layout of 1 OST, with 128 MiB maximum. Larger files have been truncated, so today we know that lfsck has corrupted the content of about 215k files. At this state, it looks like it has only happened on directories which have previously been migrated to another MDT (lfs migrate -m 1), directories which worked fine until we ran LFSCK namespace + layout. We have indeed scanned the filesystem (Fir on Sherlock) for files that are 128MiB in size with a plain layout of 1 OST, which is our way to detect when a file has lost its default PFL layout and very, very likely been truncated by LFSCK last week. During the weekend, we have been running lfs find -size 128M -c 1 on the whole filesystem (665M inodes) and it has completed: we have found 214,695 files total that have been truncated to 128MiB after this LFSCK run. All files I checked manually have indeed been truncated/corrupted. Also, users are reporting that their quota is still showing the previous volume used, so we think there could be a chance that that the objects are still somehow on the OSTs. Some users have lost tens of TB of scratch research data due to very large files being truncated. Thanks for assigning Mike to this ticket. Any insights would be appreciated as soon as possible so we can adjust the communication to our users. My guess is that the layouts are lost, but perhaps you will find a way to reattach the component to these files?" |
| Comment by Mikhail Pershin [ 12/May/20 ] |
|
Stephane, could you please get extended output of striping info from affected files via lfs getstripe -R -v, it will show all fields in layout and can give some clues. I am trying to reproduce that behavior locally and also inspect lfsck code in 2.12.4 right now Also please provide exact lfsck command used |
| Comment by Mikhail Pershin [ 12/May/20 ] |
|
I was managed to reproduce that bug and have found why it happens, fix for lfsck is on the way. I am trying to figure out now what can be done for lost stripes. |
| Comment by Stephane Thiell [ 12/May/20 ] |
|
Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same. I'm attaching the output of lfs getstripe -R -v on all affected files (the truncated ones only) as fir_lfsck_trunc_getstripe_all.log.gz As for lfsck, we started it with lctl lfsck_start -M fir-MDTxxx -t namespace first on all 4 MDTs, and then once done, I did lctl lfsck_start -M fir-MDTxxx -t layout on all 4 MDTs. |
| Comment by Gerrit Updater [ 12/May/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38584 |
| Comment by Gerrit Updater [ 12/May/20 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38585 |
| Comment by Stephane Thiell [ 13/May/20 ] |
|
Hi Mike – This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be "fixed". Thx. |
| Comment by Mikhail Pershin [ 13/May/20 ] |
|
Stephane, was layout of these files FS-default, so all have the same one or there are many cases? |
| Comment by Stephane Thiell [ 13/May/20 ] |
|
Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it's inherited for new directories (as I don't think we can set a PFL layout as FS-default): lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir The FS-default is still plain layout of 1 stripe, as we haven't modified it. With tunefs.lustre on MDT0 I can see: lov.stripecount=1 lov.stripesize=1048576 |
| Comment by Stephane Thiell [ 19/May/20 ] |
|
We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!). |
| Comment by Gerrit Updater [ 27/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38584/ |
| Comment by Peter Jones [ 27/May/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 27/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38585/ |