[LU-5859] Running lfs changelog with no registered user results in LBUG Created: 04/Nov/14  Updated: 05/Jun/15  Resolved: 25/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Blocker
Reporter: Michael MacDonald (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Seen on master b6a3222, Robert confirmed he saw it on a master build from yesterday.


Attachments: File LU-5859.sh    
Issue Links:
Duplicate
duplicates LU-5877 lfs changelog hangs after lfs changel... Closed
Related
is related to LU-5038 Mount hangs for hours processing some... Resolved
is related to LU-5877 lfs changelog hangs after lfs changel... Closed
Severity: 3
Rank (Obsolete): 16400

 Description   

See the attached reproducer script for details.

In short, running lfs changelog after deregistering the last registered user results in an LBUG on umount of the client. While it is obviously a user error to run lfs changelog with no registered users, it shouldn't result in an LBUG.



 Comments   
Comment by Michael MacDonald (Inactive) [ 04/Nov/14 ]

Running the attached script results in the following:

...
+ lctl --device LU-5859-MDT0000 changelog_deregister cl1
LU-5859-MDT0000: Deregistered changelog user 'cl1'
+ sleep 1
+ lfs changelog LU-5859-MDT0000
+ kill -9 5798
+ umount /tmp/LU-5859/client

Message from syslogd@test1 at Nov  4 12:32:42 ...
 kernel:LustreError: 5800:0:(llog.c:854:llog_write()) ASSERTION( loghandle->lgh_obj != ((void *)0) ) failed:

Message from syslogd@test1 at Nov  4 12:32:42 ...
 kernel:LustreError: 5800:0:(llog.c:854:llog_write()) LBUG
Comment by Andreas Dilger [ 05/Nov/14 ]

Mike, can you please paste the whole stack into the bug?

Comment by Henri Doreau (Inactive) [ 05/Nov/14 ]

I was bisecting it to determine whether LU-1996 was guilty or not. Seems to have been introduced by LU-5038 instead

I have a stack here:
crash> bt
PID: 3610 TASK: ffff8801fea80ae0 CPU: 5 COMMAND: "mdc_clg_send_th"
#0 [ffff88020be279f8] machine_kexec at ffffffff8103900b
#1 [ffff88020be27a58] crash_kexec at ffffffff810c62c2
#2 [ffff88020be27b28] panic at ffffffff8152896e
#3 [ffff88020be27ba8] lbug_with_loc at ffffffffa0297eeb [libcfs]
#4 [ffff88020be27bc8] llog_write at ffffffffa03d6ae4 [obdclass]
#5 [ffff88020be27c18] llog_cancel_rec at ffffffffa03d6c8f [obdclass]
#6 [ffff88020be27c68] llog_cat_cleanup at ffffffffa03db89c [obdclass]
#7 [ffff88020be27c98] llog_cat_process_cb at ffffffffa03dc7fd [obdclass]
#8 [ffff88020be27cf8] llog_process_thread at ffffffffa03d7b1f [obdclass]
#9 [ffff88020be27da8] llog_process_or_fork at ffffffffa03d9817 [obdclass]
#10 [ffff88020be27df8] llog_cat_process_or_fork at ffffffffa03dab7d [obdclass]
#11 [ffff88020be27e88] llog_cat_process at ffffffffa03dace9 [obdclass]
#12 [ffff88020be27ea8] mdc_changelog_send_thread at ffffffffa08db54b [mdc]
#13 [ffff88020be27ee8] kthread at ffffffff8109af86
#14 [ffff88020be27f48] kernel_thread at ffffffff8100c20a

Comment by Robert Read (Inactive) [ 05/Nov/14 ]

I'm also seeing an LBUG when unmounting the client after testing changelogs.

Comment by Michael MacDonald (Inactive) [ 06/Nov/14 ]

I confirm that LU-5038 seems to be the culprit. I reverted the commit in a local build and my reproducer script no longer causes an LBUG.

Comment by Andreas Dilger [ 02/Dec/14 ]

Alex, could you please offer some advice on how to fix this problem? It was apparently introduced by your patch http://review.whamcloud.com/10308

Comment by Michael MacDonald (Inactive) [ 14/Jan/15 ]

Hi. Just looking for a status update on this ticket. Seems like it should be a blocker... Any input, bzzz?

Comment by Alex Zhuravlev [ 15/Jan/15 ]

Michael, looking at that now, thanks for reminding.

Comment by Gerrit Updater [ 15/Jan/15 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/13414
Subject: LU-5859 llog: do not cleanup orphans in remote catalogs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c418baa20e78da556916f48d5dc407fa6e4aab50

Comment by Gerrit Updater [ 25/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13414/
Subject: LU-5859 llog: do not cleanup orphans in remote catalogs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 97df2f4cae374130c057cbf1168ad1427c96cbc5

Comment by Peter Jones [ 25/Jan/15 ]

Landed for 2.7

Generated at Sat Feb 10 01:55:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.