[LU-7426] DNE3: improve llog format for remote update llog - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.8.0
Labels:
- dne3
- patch
- recovery
- scalability

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Current llog bitmap header and records structure is not very nice for recording update llog of DNE, especially for remote update log.

In DNE, update records will be written by top_trans_start()->sub_updates_write()->llog_osd_write_rec()->osp_md_write(), where the write buffer (header bitmap + record) will be packed into the RPC buffer, then it will be sent during transaction stop.
To avoid bitmap being over-written, these RPC needs be serialized and also sent with certain order.

But there are still other problems. In llog_osd_write_rec(), it will packed header information (bitmap, lgh_index, offset etc) into the RPC buffer, but the RPC will not be sent until the transaction stop. Then these information might be invalid when the RPC is being sent, especially if the previous llog update RPC fails. And these RPCs are executed on the remote MDT, it will definitly make update llog corrupted.

There are a few options to fix the problem

1. Once any llog update RPC fails, then all of the following RPC in the sending list fails, and OSP will update the header and the new RPC will be packed with updated header. This is 2.8 approach in patch http://review.whamcloud.com/16969 "LU-7039 llog: update llog header and size", and it is easy to implement but not very nice.

2. Add special llog API in OSP->OUT interface, so the OUT can handle these add llog update records independently, instead of only using dumb write API driven by llog_osd_write_rec(). For example we can add such llog updates inside the RPC,
1. append the record.
2. get the llog index.
3. update record with the index.
4. set the bitmap by index.

3.Using other format for update format.

Any thoughts?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

DNE_recovery_v2-201409.pdf
199 kB
15/Dec/21 12:20 AM
Screenshot 2024-09-03 at 3.51.47 PM.png
830 kB
05/Sep/24 3:47 PM

Issue Links

is related to

LU-8753 Recovery already passed deadline with DNE

Resolved

LU-12310 MDT Device-level Replication/Mirroring

Open

LU-17821 LMR1c: Improve DNE Distributed Transaction Performance

Open

LU-4215 Some expected improvements for OUT

Open

is related to

LU-7039 llog_osd.c:778:llog_osd_next_block()) ASSERTION( last_rec->lrh_index == tail->lrt_index ) failed:

Resolved

LU-4009 Add ZIL support to osd-zfs

Open

LU-4215 Some expected improvements for OUT

Open

LU-7427 DNE3: multiple entries for BATCHID

Open

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 is related to , 6 mentioned in)

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: Di Wang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Nov/15 10:46 PM

Updated:: 4 days ago 4:42 AM