[LU-8320] :(llog_osd.c:338:llog_osd_write_rec()) ASSERTION( llh ) failed: Created: 23/Jun/16  Updated: 14/Jun/18  Resolved: 01/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 12744.save    
Issue Links:
Related
Severity: 1
Rank (Obsolete): 9223372036854775807

 Description   

MDS crash with LBUG.

0>LustreError: 39313:0:(llog_osd.c:338:llog_osd_write_rec()) ASSERTION( llh ) failed: ^M
<0>LustreError: 39313:0:(llog_osd.c:338:llog_osd_write_rec()) LBUG^M
<4>Pid: 39313, comm: mdt02_049^M
<4>^M
<4>Call Trace:^M
<4> [<ffffffffa048b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
<4> [<ffffffffa048be97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
<4> [<ffffffffa05bed55>] llog_osd_write_rec+0xfb5/0x1370 [obdclass]^M
<4> [<ffffffffa0d46ecb>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]^M
<4> [<ffffffffa0d2e5d2>] ? iam_path_release+0x42/0x70 [osd_ldiskfs]^M
<4> [<ffffffffa0590438>] llog_write_rec+0xc8/0x290 [obdclass]^M
<4> [<ffffffffa059910d>] llog_cat_add_rec+0xad/0x480 [obdclass]^M
<4> [<ffffffffa0590231>] llog_add+0x91/0x1d0 [obdclass]^M
<4> [<ffffffffa0fd04f7>] osp_sync_add_rec+0x247/0xad0 [osp]^M
<4> [<ffffffffa0fd0e2b>] osp_sync_add+0x7b/0x80 [osp]^M
<4> [<ffffffffa0fc27d6>] osp_object_destroy+0x106/0x150 [osp]^M
<4> [<ffffffffa0f068e7>] lod_object_destroy+0x1a7/0x350 [lod]^M
<4> [<ffffffffa0f74880>] mdd_finish_unlink+0x210/0x3d0 [mdd]^M
<4> [<ffffffffa0f65d35>] ? mdd_attr_check_set_internal+0x275/0x2c0 [mdd]^M
<4> [<ffffffffa0f75306>] mdd_unlink+0x8c6/0xca0 [mdd]^M
<4> [<ffffffffa0e37788>] mdo_unlink+0x18/0x50 [mdt]^M
<4> [<ffffffffa0e3b005>] mdt_reint_unlink+0x835/0x1030 [mdt]^M
<4> [<ffffffffa0e37571>] mdt_reint_rec+0x41/0xe0 [mdt]^M
<4> [<ffffffffa0e1ced3>] mdt_reint_internal+0x4c3/0x780 [mdt]^M
<4> [<ffffffffa0e1d1d4>] mdt_reint+0x44/0xe0 [mdt]^M
<4> [<ffffffffa0e1fada>] mdt_handle_common+0x52a/0x1470 [mdt]^M
<4> [<ffffffffa0e5c5f5>] mds_regular_handle+0x15/0x20 [mdt]^M
<4> [<ffffffffa07750c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]^M
<4> [<ffffffffa048c5ae>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
<4> [<ffffffffa049d8d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]^M
<4> [<ffffffffa076da69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]^M
<4> [<ffffffff81057779>] ? __wake_up_common+0x59/0x90^M
<4> [<ffffffffa077789d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]^M
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20^M
<4> [<ffffffffa0776da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]^M
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20^M
<4>^M
<0>Kernel panic - not syncing: LBUG^M
<4>Pid: 39313, comm: mdt02_049 Tainted: G           ---------------  T 2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1^M
<4>Call Trace:^M
<4> [<ffffffff81564fb9>] ? panic+0xa7/0x190^M
<4> [<ffffffffa048beeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M
<4> [<ffffffffa05bed55>] ? llog_osd_write_rec+0xfb5/0x1370 [obdclass]^M
<4> [<ffffffffa0d46ecb>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]^M
<4> [<ffffffffa0d2e5d2>] ? iam_path_release+0x42/0x70 [osd_ldiskfs]^M
<4> [<ffffffffa0590438>] ? llog_write_rec+0xc8/0x290 [obdclass]^M
<4> [<ffffffffa059910d>] ? llog_cat_add_rec+0xad/0x480 [obdclass]^M
<4> [<ffffffffa0590231>] ? llog_add+0x91/0x1d0 [obdclass]^M
<4> [<ffffffffa0fd04f7>] ? osp_sync_add_rec+0x247/0xad0 [osp]^M
<4> [<ffffffffa0fd0e2b>] ? osp_sync_add+0x7b/0x80 [osp]^M
<4> [<ffffffffa0fc27d6>] ? osp_object_destroy+0x106/0x150 [osp]^M
<4> [<ffffffffa0f068e7>] ? lod_object_destroy+0x1a7/0x350 [lod]^M
<4> [<ffffffffa0f74880>] ? mdd_finish_unlink+0x210/0x3d0 [mdd]^M
<4> [<ffffffffa0f65d35>] ? mdd_attr_check_set_internal+0x275/0x2c0 [mdd]^M
<4> [<ffffffffa0f75306>] ? mdd_unlink+0x8c6/0xca0 [mdd]^M
<4> [<ffffffffa0e37788>] ? mdo_unlink+0x18/0x50 [mdt]^M
<4> [<ffffffffa0e3b005>] ? mdt_reint_unlink+0x835/0x1030 [mdt]^M
<4> [<ffffffffa0e37571>] ? mdt_reint_rec+0x41/0xe0 [mdt]^M
<4> [<ffffffffa0e1ced3>] ? mdt_reint_internal+0x4c3/0x780 [mdt]^M
<4> [<ffffffffa0e1d1d4>] ? mdt_reint+0x44/0xe0 [mdt]^M
<4> [<ffffffffa0e1fada>] ? mdt_handle_common+0x52a/0x1470 [mdt]^M
<4> [<ffffffffa0e5c5f5>] ? mds_regular_handle+0x15/0x20 [mdt]^M
<4> [<ffffffffa07750c5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]^M
<4> [<ffffffffa048c5ae>] ? cfs_timer_arm+0xe/0x10 [libcfs]^M
<4> [<ffffffffa049d8d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]^M
<4> [<ffffffffa076da69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]^M
<4> [<ffffffff81057779>] ? __wake_up_common+0x59/0x90^M
<4> [<ffffffffa077789d>] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]^M
<4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20^M
<4> [<ffffffffa0776da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]^M
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20^M


 Comments   
Comment by Mahmoud Hanafi [ 23/Jun/16 ]

After the reboot and remount we hit the same LBUG!

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

This is lustre 2.5.3

Comment by Peter Jones [ 23/Jun/16 ]

Looking into details supplied

Comment by Oleg Drokin [ 23/Jun/16 ]

any other messages before this?

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

no.

[-- MARK -- Thu Jun 23 09:00:00 2016]^M
format at ldlm_pool.c:628:ldlm_pool_recalc doesn't end in newline^M
LustreError: 39313:0:(llog_osd.c:338:llog_osd_write_rec()) ASSERTION( llh ) failed: ^M
LustreError: 39313:0:(llog_osd.c:338:llog_osd_write_rec()) LBUG^M
Pid: 39313, comm: mdt02_049^M
^M
Call Trace:^M
 [<ffffffffa048b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
 [<ffffffffa048be97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
^M
Entering kdb (current=0xffff8834a41b8ab0, pid 39313) on processor 16 Oops: (null)^M
due to oops @ 0x0^M
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task^M
Comment by Jay Lan (Inactive) [ 23/Jun/16 ]

Our git repo for 2.5.3 is at
https://github.com/NASAEarthExchange/lustre-nas-fe/commits/nas-2.5.3

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

Do you require any additional info?

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

Any ideas on how we can at least get the mdt mounted and bring up the filesystem?

Comment by Oleg Drokin [ 23/Jun/16 ]

can you see local variables easily?
We wonder what is loghandle->lgh_id value?
The current suspicion is you have a corrupted llog file somehow and in order to analyze and repair it we need to know what's the id so that we don't need to nuke all of them.

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

What was the LU that the tool was part of?

Comment by Oleg Drokin [ 23/Jun/16 ]

LU-6696

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

I am a bit confused how exactly to use the tool. You need me to run it on file in /O/?

Comment by Oleg Drokin [ 23/Jun/16 ]

Yes, pretty much.
If you can get the llog id from the variable - on that one.
Otherwise - I guess on all of them (do you have many?) in some reverse data order, I imagine until it fixes something (it should print a corresponding message)

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

Most of the files are returning this

nbp2-mds /mnt/lustre/nbp2-mdt/O/1/d2 # llog_catfix 12226
Header size : 8192
Time : Thu Jun 23 09:04:29 2016
Number of records: 1544
-----------------------
Llog is not catalog, llh_size: 0, need 64
Comment by Oleg Drokin [ 23/Jun/16 ]

hm, yes, the tool only fixes catalog files.
Anyway it can tell them apart.
Do you have too many to go through all of them?

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

yes there are 11k files under /O

can we just delete all? will it not regenerate them.

Comment by Oleg Drokin [ 23/Jun/16 ]

You can delete them but hat would lead to llog records not replayed - in other words - space leakage on your OSTs.

The alternative is to find the llog id of the log that's bad - either from the crash or since it always crashes - you can just insert a printk there rebuild just for the MDS and run it again to get it?

Comment by Nathan Dauchy (Inactive) [ 23/Jun/16 ]

Can we assume it is one of the files modified in the last day or so that is the problem, and hit them all with something like this?

find /mnt/lustre/nbp2-mdt/O/1 -type f -mtime -2 | grep -v LAST_ID | xargs -t -I {} llog_catfix {} 2>&1 | tee /tmp/llog_catfix.out
Comment by Oleg Drokin [ 23/Jun/16 ]

I think that's reasonable to assume, but there's no guarantee. The records might have been there since last reboot too.

In the worst case you can always remove the CATALOGS file I imagine and it'll forget all about it (with the leaked space problem of course - so save that file too if you do that).
Best of all is to find the llog id of the problematic llog and threat just that.

Comment by Oleg Drokin [ 23/Jun/16 ]

so that lgh_id=

{...}

- what's inside?

Actually, this must be a wrong one? lgh_hdr is not NULL here.

Also, surprisingly the log record is a configuration one somehow:
OBD_CFG_REC = LLOG_OP_MAGIC | 0x20000,

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

I used systemtap to trace the function and this is when we hit the lbug

 0 llog_process_th(9252):->llog_osd_write_rec env={.le_ctx={...}, .le_ses=0xffff881c7aca7ba0} loghandle={.lgh_lock={...}, .lgh_hdr_lock={...}, .lgh_id={...}, .lgh_hdr=0x0, .lgh_obj=0xffff883e7ccb9d40, .lgh_last_idx=0, .lgh_cur_idx=0, .lgh_cur_offset=0, .lgh_ctxt=0xffff883e7d6e8640, .u={...}, .lgh_name="<unknown>", .private_data=0xffff883e8410a340, .lgh_logops=0xffffffffa1013a80, .lgh_refcount={...}} rec={.lrh_len=64, .lrh_index=0, .lrh_type=274989056, .lrh_id=0} reccookie={.lgc_lgl={...}, .lgc_subsys=0, .lgc_index=0, .lgc_padding=0} cookiecount=1 buf=

Comment by Oleg Drokin [ 23/Jun/16 ]

Can you get the .lgh_id= content please? That's what should tell us the filename

Comment by Mahmoud Hanafi [ 23/Jun/16 ]
0 llog_process_th(9553):->llog_osd_write_rec env={.le_ctx={.lc_tags=268435457, .lc_state=2, .lc_thread=0x0, .lc_value=0xffff881d24328600, .lc_remember={.next=0xffff883e8f777b78, .prev=0xffff883e8f777b78}, .lc_version=37, .lc_cookie=0}, .le_ses=0xffff883e8f777ba0} loghandle={.lgh_lock={.count=-4294967295, .wait_lock={.raw_lock={.slock=0}}, .wait_list={.next=0xffff881c6e34fbd0, .prev=0xffff881c6e34fbd0}}, .lgh_hdr_lock={.raw_lock={.slock=0}}, .lgh_id={.lgl_oi={<union>={.oi={.oi_id=12744, .oi_seq=1}, .oi_fid={.f_seq=12744, .f_oid=1, .f_ver=0}}}, .lgl_og


Comment by Oleg Drokin [ 23/Jun/16 ]

So the file we are loking at is named either 12744 or 31c8.

Try the fixing program on it and if it does not help, just move it away (please retain it still for further analysis.

Comment by Mahmoud Hanafi [ 23/Jun/16 ]

found this file

nbp2-mds /mnt/lustre/nbp2-mdt/O # find . -name "12744"
./1/d8/12744
 llog_catfix ./1/d8/12744
Header size : 8192
Time : Tue Feb 23 14:44:48 2016
Number of records: 8
-----------------------
Llog is not catalog, llh_size: 0, need 64

removed the file and mounted without lbug, so far. osp-sync threads are running

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

Filesystem is back up. Would you like to get a copy of the bad file?

Comment by Nathan Dauchy (Inactive) [ 24/Jun/16 ]

Since we have seen a form of this issue a couple times now, can we get the work that was started in LU-6696 and LU-7011 updated and landed?

Comment by Oleg Drokin [ 24/Jun/16 ]

Yes, please attach the file here

Comment by Nathan Dauchy (Inactive) [ 24/Jun/16 ]

FYI, it looks like we hit the same bug again:

<0>LustreError: 37590:0:(llog_osd.c:338:llog_osd_write_rec()) ASSERTION( llh ) failed: 
<0>LustreError: 37590:0:(llog_osd.c:338:llog_osd_write_rec()) LBUG
<4>Pid: 37590, comm: mdt03_120
<4>
<4>Call Trace:
<4> [<ffffffffa0487895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa0487e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa05bad55>] llog_osd_write_rec+0xfb5/0x1370 [obdclass]
<4> [<ffffffffa0d49ecb>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]
<4> [<ffffffffa0d315d2>] ? iam_path_release+0x42/0x70 [osd_ldiskfs]
<4> [<ffffffffa058c438>] llog_write_rec+0xc8/0x290 [obdclass]
<4> [<ffffffffa059510d>] llog_cat_add_rec+0xad/0x480 [obdclass]
<4> [<ffffffffa058c231>] llog_add+0x91/0x1d0 [obdclass]
<4> [<ffffffffa0fd34f7>] osp_sync_add_rec+0x247/0xad0 [osp]
<4> [<ffffffffa0fd3e2b>] osp_sync_add+0x7b/0x80 [osp]
<4> [<ffffffffa0fc57d6>] osp_object_destroy+0x106/0x150 [osp]
<4> [<ffffffffa0f098e7>] lod_object_destroy+0x1a7/0x350 [lod]
<4> [<ffffffffa0f77880>] mdd_finish_unlink+0x210/0x3d0 [mdd]
<4> [<ffffffffa0f68d35>] ? mdd_attr_check_set_internal+0x275/0x2c0 [mdd]
<4> [<ffffffffa0f78306>] mdd_unlink+0x8c6/0xca0 [mdd]
<4> [<ffffffffa0e3a788>] mdo_unlink+0x18/0x50 [mdt]
<4> [<ffffffffa0e3e005>] mdt_reint_unlink+0x835/0x1030 [mdt]
<4> [<ffffffffa0e3a571>] mdt_reint_rec+0x41/0xe0 [mdt]
<4> [<ffffffffa0e1fed3>] mdt_reint_internal+0x4c3/0x780 [mdt]
<4> [<ffffffffa0e201d4>] mdt_reint+0x44/0xe0 [mdt]
<4> [<ffffffffa0e22ada>] mdt_handle_common+0x52a/0x1470 [mdt]
<4> [<ffffffffa0e5f5f5>] mds_regular_handle+0x15/0x20 [mdt]
<4> [<ffffffffa07710c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
<4> [<ffffffffa04998d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
<4> [<ffffffffa0769a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
<4> [<ffffffffa077389d>] ptlrpc_main+0xafd/0x1780 [ptlrpc]
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffffa0772da0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG

We will try to use yesterday's process to identify and fix a bad llog.

Please stand by for more info and let us know if there are additional debug steps we should take.

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

This is the file with bad record causing the LBUG.

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

Today's crash was also from a file dated in Feb. Since we can't tell what files are bad I delete all files from Feb.

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

I think we must have more of these corrupted files. Is there a quick method for finding them?

Comment by Peter Jones [ 24/Jun/16 ]

Mike

Please can you advise as to what is not handled in the attached llog file and how this could be better handled?

Thanks

Peter

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

Would it make sense to deal with this assertion rather than just an LBUG.

Comment by Mahmoud Hanafi [ 24/Jun/16 ]

If llog_catfix can't read the file and errors out

Llog is not catalog, llh_size: 0, need 64

means that the file is corrupted and can potentially trigger this lbug.

Comment by Mikhail Pershin [ 26/Jun/16 ]

First of all the llog_init_handle() should handle llog_read_header() errors correctly, not allowing the further use of such llog_handle. This will prevent similar lbugs(). It is the origin of that issue.

Comment by Mahmoud Hanafi [ 26/Jun/16 ]

Can we expect a patch? Any idea what is causing the corruption?

Comment by Mikhail Pershin [ 26/Jun/16 ]

Mahmoud, the message about 'Llog is not catalog' is not an error or corruption, it is just saying that llog file is not catalog as expected by llog_catfix tool. I am checking 12744.save now. It looks like plain llog with unlink records in it, I am still looking through it.

yes, I am working on patch at least to handle llog_read_header() errors gracefully.

Comment by Mahmoud Hanafi [ 28/Jun/16 ]

Any updates?

Comment by Mikhail Pershin [ 29/Jun/16 ]

Hello, am I right that you are using 2.5.3 Lustre with additional patches? Are there patches related to llog subsystem? Could you make a list of them?

I have found one suspicious thing in corrupted llog, checking the code now to understand how that may happen.

Comment by Peter Jones [ 29/Jun/16 ]

Mike is setting up a github account. I will provide the details to Jay so he can grant the appropriate access to see NASA's tree

Comment by Mahmoud Hanafi [ 29/Jun/16 ]

FYI, we have a mix of 2.5.3, 2.7.1, and 2.7.2. We are moving all of them to 2.7.2 but that will take some time so we would need patch for 2.5.3 and 2.7.1. Jay will give you access to the repo.

Comment by Mikhail Pershin [ 30/Jun/16 ]

Mahmoud, in your very first comment about systemtap info, the llog_process_th is 9252 but in the next one it is 9553. Can that be this is another thread and related llog file is different from that one causing assertion? Do you still have any data related to that or similar lbug?

Comment by Mahmoud Hanafi [ 30/Jun/16 ]

The 9252 was for a previous crash where I didn't print out the full data structure. The second time (9553) I printing the correct structures.
We did have a crash the next day where I used the systemtap script to find the file and delete it. But i didn't save a copy.

Comment by Mikhail Pershin [ 01/Jul/16 ]

OK, got it. Another question then - there should be CATALOG file with just list of catalog llog ids in it, could you look in it and provide those catalog llogs for inspection? No need to remove them, I want just check how big they are and what is inside.

Comment by Gerrit Updater [ 01/Jul/16 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/21128
Subject: LU-8320 llog: prevent llog ID re-use.
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: a051f8c40337d9a577bee88dab1b50477d84061d

Comment by Mikhail Pershin [ 01/Jul/16 ]

I think I've found the reason of this issue, it looks like the newly generated llog ID may cycle through zero and match some very old llog files. In that case it is considered as new llog (not initialized yet, but since it exists on disk several check are passed while they shouldn't and we have that assertion. That scenario fits well with what I see in llog file provided and assertion with llh == NULL, as well as with your comment that problem files are very old.

Comment by Mahmoud Hanafi [ 01/Jul/16 ]

Great!. do you still need to the CATALOGS file? Is the an issue with 2.7? if so we will need a patch for that as well.

Why would we have old llog files still around?

Comment by Mikhail Pershin [ 01/Jul/16 ]

Also I'd recommend to apply patch from LU-5297 in addition to LU-7079 (already in your tree) which might help to avoid having very old llogs.
Consider also the patch from LU-4528 (http://review.whamcloud.com/#/c/11751/) which wasn't merged to 2.5 - it helps to avoid several types of corruptions we had in past. It is already in 2.7 so maybe this is not so critical if you are moving to 2.7.

Comment by Mikhail Pershin [ 01/Jul/16 ]

Mahmoud, there was a bug in OSP which caused some llogs to be not processed after some moment, so they stays forever. This was fixed by LU-7079 which was merged in your tree at May 25, may it be so that failed node had no such patch applied? Also patch from LU-5297 solved similar problem which may cause stuck llog files.

I don't need those CATALOGS file right now, but if you will have time, I'd like to look at llog files in it to check how many plain llogs are in use.

Comment by Gerrit Updater [ 01/Jul/16 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/21130
Subject: LU-8320 llog: prevent llog ID re-use.
Project: fs/lustre-release
Branch: b2_7
Current Patch Set: 1
Commit: d5e1cfbd9bd5bfdcd6d5c6b029e4cf578c9e75b8

Comment by Jay Lan (Inactive) [ 01/Jul/16 ]

The 2.5.3 lustre server still running 2.5.3-6nasS, and LU-7079 patch was included in 2.5.3-6.1nasS.

Comment by Jay Lan (Inactive) [ 01/Jul/16 ]

LU-5297 patch has not landed to b2_7_fe yet. I cherry-picked it to our nas-2.7.1 and nas-2.7.2 anyway (locally, not pushed to github yet.) OTOH, LU-5297 patch caused conflicts in nas-2.5.3. Since there is a workaround (ie, find the offending file and remove it), I think we do not need LU-5297 on nas-2.5.3.

So, we have LU-4528, LU-7079, LU-6696, and LU-5297 on nas-2.7.1 and nas-2.7.2. I tried LU-8320 and it was applied cleanly on nas-2.7.x, but I will wait until it gets reviews.

Comment by Gerrit Updater [ 04/Jul/16 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/21144
Subject: LU-8320 llog: prevent llog ID re-use.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 722f308635f118d00a5c4a44fa72d18986ccdac9

Comment by Gerrit Updater [ 01/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21144/
Subject: LU-8320 llog: prevent llog ID re-use.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a93ede18ababa3fe1ae8f4a5f92e868589a58cb6

Comment by Peter Jones [ 01/Aug/16 ]

Landed for 2.9

Comment by Nathan Dauchy (Inactive) [ 29/Aug/16 ]

Please re-open until the backport patch lands to 2.7 FE.

Generated at Sat Feb 10 02:16:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.