[LU-7114] ldiskfs: corrupted bitmaps handling patches Created: 08/Sep/15  Updated: 27/Apr/17  Resolved: 05/Jan/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Improvement Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Yang Sheng
Resolution: Done Votes: 0
Labels: patch

Issue Links:
Duplicate
is duplicated by LU-9410 on-disk bitmap corrupted Resolved
Related
is related to LU-8462 OSS keeps dropping into KDB Resolved
is related to LU-1026 ldiskfs_mb_check_ondisk_bitmap: on-di... Resolved
is related to LU-8252 MDS kernel panic after aborting journal Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Currently, we might hit corrupted inode/bitmaps:

1. sanity checks failed, for example system reserved bitmaps are freed, this might because of some unknown kernel bugs.
2. some hardware errors, we did happen such errors in our corruption tests.

Whatever way, Filesystem will become RO in default, and FS become unusable, See a corresponding Bug Reports from LU-1026.

Here is Suggestions From Andreas Dilgerr:

I seem to recall something similar in the upstream kernel. It looks like patches with a similar goal were pushed already for the 3.12 kernel, so you might consider to backport those instead:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=48d9eb97dc74d2446bcc3630c8e51d2afc9b951d
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=dbde0abed8c6e9e938c2194675ce63f5769b0d37
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=163a203ddb36c36d4a1c942aececda0cc8d06aa7
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=87a39389be3e3b007d341be510a7e4a0542bdf05
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=bdfb6ff4a255dcebeb09a901250e13a97eff75af
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=2746f7a17062d3526116f7ae7f91d88b19c2464e
These patches don't prevent the filesystem from being marked read-only however, so you may still want to change the ext4_error() to ext4_warning(). There is also an important fix in the first patch for the caller of this function to ensure that it doesn't continue to use the bad bitmap if there is an error. The last patch is also important because it avoids freeing blocks in this group that might get reallocated later.



 Comments   
Comment by Wang Shilong (Inactive) [ 08/Sep/15 ]

https://bugzilla.redhat.com/show_bug.cgi?id=1260831

Comment by Gerrit Updater [ 08/Sep/15 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/16312
Subject: LU-7114 ldiskfs: corrupted bitmaps handling patches
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 469ee207c523acb958aa9d5eda4b73406214d15f

Comment by Peter Jones [ 08/Sep/15 ]

Yang Sheng

Could you please take care of this patch?

Thanks

Peter

Comment by Gerrit Updater [ 16/Sep/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16312/
Subject: LU-7114 ldiskfs: corrupted bitmaps handling patches
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2963f3d09eb3a0817f87386c0bd7be7ce086809d

Comment by James A Simmons [ 16/Sep/15 ]

Don't close this ticket yet. More work needs to be done for SLES12 and RHEL6.6

Comment by Gerrit Updater [ 27/Nov/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/17374
Subject: LU-7114 ldiskfs: corrupted bitmaps handling patches
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f4c276c20f8395b96482037cf45c50a017f34fea

Comment by Gerrit Updater [ 05/Jan/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17374/
Subject: LU-7114 ldiskfs: corrupted bitmaps handling patches
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3122e8fc97de097279ba7190c63347991bf350c7

Comment by John Fuchs-Chesney (Inactive) [ 05/Jan/16 ]

Hello James,

Can you clarify for us what additional work is required?

Thanks,
~ jfc.

Comment by James A Simmons [ 05/Jan/16 ]

The last patch landed was specifically for SLES12 so this ticket can be closed. Thanks Yang for doing the SLES12 work.

Comment by John Fuchs-Chesney (Inactive) [ 05/Jan/16 ]

Thanks James.
~ jfc.

Generated at Sat Feb 10 02:06:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.