[LU-38] Kernel panic in ldiskfs on OST unmount Created: 10/Jan/11  Updated: 28/Jun/11  Resolved: 13/Jun/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Christopher Morrone Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre-1.8.3.0-6chaos_2.6.18_93chaos.ch4.3


Severity: 3
Rank (Obsolete): 10407

 Description   

When the sysadmins attempted to unmount the OSTs on a number of OSSs to shutdown them down for scheduled maintenance, six of the nodes kernel panicked.

They all say:

Kernel BUG at fs/ldiskfs/mballoc.c:3714

RIP (for what its worth) is the same for each: :ldiskfs:ldiskfs_mb_release_inode_pa

The stacks are also the same:

sync_buffer
out_of_line_wait_on_bit
__wait_on_buffer
:ldiskfs:bh_submit_read
:ldiskfs:ldiskfs_mb_discard_inode_preallocations
:ldiskfs:ldiskfs_discard_reservation
:ldiskfs:ldiskfs_clear_node
clear_inode
dispose_list
invalidate_inodes
generic_shutdown_super
kill_block_super
deactivate_super
mntput_no_expire
:obdclass:unlock_mntput
:obdclass:server_put_super
invalidate_inodes
generic_shutdown_super
kill_anon_super
:obdclass:lustre_kill_super
deactivate_super
mntput_no_expire
path_release_on_umount
sys_umount
sys_newstat
system_call

I apologize for any typos. That had to be copied by hand.



 Comments   
Comment by Dan Ferber (Inactive) [ 10/Jan/11 ]

Assigned to Alex for initial analysis.

Comment by Alex Zhuravlev [ 10/Jan/11 ]

Christopher, can you confirm from the logs the devices were turned read-only? it looks to be instance of 24214 in bugzilla.

Comment by Christopher Morrone [ 11/Jan/11 ]

It looks like "umount /dev/<ostdev>" does indeed turn the devices read-only as part of the shutdown process.

Comment by Lai Siyao [ 13/Jan/11 ]

bug 16680 explains the cause of this crash, and this patch from bug 22299 should be able to fix it. This fix has been landed on 1.8.4 and 2.0, because your version is 1.8.3, I think you apply it and verify.

Comment by Christopher Morrone [ 16/Feb/11 ]

I applied the patch to our copy of ldiskfs.

Comment by Peter Jones [ 13/Jun/11 ]

Please reopen if this reoccurs with the patches applied

Generated at Sat Feb 10 01:03:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.