[LU-8714] too many update logs during soak-test. - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.10.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In the last DNE soak test, we found the recovery stuck there for very long time, (> 4 hours). It looks like there are too much update log being left during recovery.

Each MDT has around 80k-100k records, which seems too much,

wangdi-mac01:~ wangdi$ grep -r "mdt_index 1" /tmp/records  | wc
   76104  913248 15353624
wangdi-mac01:~ wangdi$ grep -r "mdt_index 0" /tmp/records  | wc
   91589 1099068 19376239
wangdi-mac01:~ wangdi$ grep -r "mdt_index 2" /tmp/records  | wc
  102798 1233576 21763151
wangdi-mac01:~ wangdi$ grep -r "mdt_index 3" /tmp/records  | wc
   98332 1179984 20821847

Unfortunately, there are not much logs to help me understanding why there are so much logs being left.

But it seems we can make cancellation smarter. In current implementation, when one batchid is committed, it only cancel the update records for this batchid, but we actually can cancel all of update records, whose batchid < current committed batchid. Then even if some update recordss might be left for some reasons, these recordss can still be deleted by later batchid commitment.

Attachments

Issue Links

is related to

LU-8250 MDT recovery stalled on secondary node

Resolved

LU-8794 update_log_dir consuming 1.1TB on MDT0000

Resolved

Activity

[LU-8714] too many update logs during soak-test.

There are no comments yet on this issue.

People

Assignee:: Di Wang (Inactive)

Reporter:: Di Wang (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Oct/16 8:40 PM

Updated:: 30/Jan/22 9:49 AM