[LU-6988]  MDS and OST mount crashes with kernel panic Created: 12/Aug/15  Updated: 14/Sep/15  Resolved: 14/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Aditya Pandit (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Attachments: Text File serial_console_fre1221.log     Text File serial_console_fre1222.log     Text File serial_console_fre1223.log     Text File serial_console_fre1224.log    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Panic on MDS while mounting lustre.

DISKFS-fs (vdb): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
Lustre: lustre-MDT0000: new disk, initializing
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:mdt
------------[ cut here ]------------
kernel BUG at block/blk-core.c:2627!
invalid opcode: 0000 [#1] SMP 
Pid: 6022, comm: ldiskfslazyinit Not tainted 2.6.32-431.29.2.el6_lustreb_neo_stable_6698_6 #1 Red Hat KVM
RIP: 0010:[<ffffffff812698c2>]  [<ffffffff812698c2>] __blk_end_request_all+0x32/0x60
Process ldiskfslazyinit (pid: 6022, threadinfo ffff88011bb80000, task ffff88011933f500)
Call Trace:
 <IRQ> 
 [<ffffffffa005a22a>] blk_done+0x4a/0x110 [virtio_blk]
 [<ffffffff810ecb14>] ? __rcu_process_callbacks+0x54/0x350
 [<ffffffffa004e2ac>] vring_interrupt+0x3c/0xd0 [virtio_ring]
 [<ffffffff810e7090>] handle_IRQ_event+0x60/0x170
 [<ffffffff8107a64f>] ? __do_softirq+0x11f/0x1e0
 [<ffffffff810e99ee>] handle_edge_irq+0xde/0x180
 [<ffffffff8100faf9>] handle_irq+0x49/0xa0
 [<ffffffff81532dbc>] do_IRQ+0x6c/0xf0
 [<ffffffff8100b9d3>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff811222e5>] mempool_alloc_slab+0x15/0x20
 [<ffffffff81122483>] mempool_alloc+0x63/0x140
 [<ffffffff811c4ed2>] bvec_alloc_bs+0xe2/0x110
 [<ffffffff811c4fb2>] bio_alloc_bioset+0xb2/0xf0
 [<ffffffff811c5095>] bio_alloc+0x15/0x30
 [<ffffffff812705a8>] blkdev_issue_zeroout+0x88/0x180
 [<ffffffffa02b1c64>] ldiskfs_init_inode_table+0x154/0x290 [ldiskfs]
 [<ffffffffa02dbdcb>] ldiskfs_lazyinit_thread+0x15b/0x2f0 [ldiskfs]
 [<ffffffff8109abf6>] kthread+0x96/0xa0


 Comments   
Comment by Andreas Dilger [ 13/Aug/15 ]

It looks like this BUG might be:

void __blk_end_request_all(struct request *rq, int error)
{
        bool pending;
        unsigned int bidi_bytes = 0;
                
        if (unlikely(blk_bidi_rq(rq)))
                bidi_bytes = blk_rq_bytes(rq->next_rq);
        
        pending = __blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes);              
        BUG_ON(pending);
}       
EXPORT_SYMBOL(__blk_end_request_all);
Comment by Andreas Dilger [ 13/Aug/15 ]

The bug is happening during lazy inode table initialization after the initial filesystem format. Could you see if this oops is avoided by formatting the filesystem with "mkfs.lustre --mkfsoptions="-E lazy_itable_init=0"? Are you using any non-default options for formatting or mounting your MDT or OST filesystems? What version of e2fsprogs are you using?

Comment by Andreas Dilger [ 13/Aug/15 ]

What kernel is this? RHEL6.3? Does it happen with the stock Lustre RHEL6.6 kernel? Are there any other kernel or ldiskfs patches applied?

Comment by Aditya Pandit (Inactive) [ 14/Aug/15 ]

e2fsprogs: e2fsprogs-1.42.7.x1.mrp.128-8.el6.src.rpm

We have not applied any extra patches.
cat lustre/kernel_patches/series/2.6-rhel6.series
mpt-fusion-max-sge-rhel6.patch
raid5-mmp-unplug-dev-rhel6.patch
dev_read_only-2.6.32-rhel6.patch
blkdev_tunables-2.6-rhel6.patch
bh_lru_size_increase.patch
quota-replace-dqptr-sem.patch
quota-avoid-dqget-calls.patch
jbd2-log_wait_for_space-2.6-rhel6.patch
module-load-deadlock-rhel6.patch

We haven't used any non-standard option for formatting and mounting.

It is Scientific Linux release 6.5 (Carbon)

There are no extra kernel or ldiskfs patches applied.

Will try with RHEL 6.6 and stock kernel and let you know the results.

Comment by Aditya Pandit (Inactive) [ 27/Aug/15 ]

I tried it on stock kernel with lustre patches it is crashed there also.

kernel BUG at block/blk-core.c:2627!
invalid opcode: 0000 1 SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 1
Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Pid: 0, comm: swapper Not tainted 2.6.32-431.29.2.el6_lustremaster_master__57 #1 Red Hat KVM

I have not seen on Oracle VirtualBox.

Comment by Aditya Pandit (Inactive) [ 14/Sep/15 ]

This bug is duplicate of https://jira.hpdd.intel.com/browse/LU-6974.

Comment by Peter Jones [ 14/Sep/15 ]

ok. I will close the ticket as a duplicate. Thanks for letting us know.

Generated at Sat Feb 10 02:05:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.