Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.8.0
-
Running Lustre in RHEL6.
-
3
-
9223372036854775807
Description
In our production system, when Lustre server crash happen, e2fsck mostly will report Quota accounting mismatch problems. and sometimes there are huge differences.
Problems are current Lustre quota codes rely on ldiskfs quota accounting
and if ldiskfs quota is wrong, we need run e2fsck or by disable/enable to fix quota accounting.
This encourage me to look quota implement for ldiskfs. while taking at codes
I found there is a big problem with RHEL6 quota codes, that quota updates are
not properly journaled.
Every ext4_mark_dquot_dirty is called, we skip and only add quota updates
to dirty list without journal it. This make quota updates only journaled in ext4_quota_write() which can be called in 'sync_file' which only happen
during sync call or umount.
This will make big problem if we hit crash, we will lost many quota updates.