[LU-13436] a virtio bug could crash lustre kernel Created: 08/Apr/20  Updated: 30/Oct/20  Resolved: 30/Oct/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Got OSS crash several times below. I got same crash at mkfs time, but it hit only formating time, but it also happned when huge amount of files were deleted.

[ 3960.737559] ------------[ cut here ]------------
[ 3960.738188] kernel BUG at drivers/virtio/virtio_ring.c:278!
[ 3960.738774] invalid opcode: 0000 [#1] SMP 
[ 3960.739356] Modules linked in: binfmt_misc osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) ksocklnd(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) virtio_scsi(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx4_en(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4_core(OE) sunrpc iTCO_wdt iTCO_vendor_support ppdev nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev lpc_ich i2c_i801 pcspkr parport_pc sg i6300esb parport ip_tables ext4 mbcache jbd2 sr_mod sd_mod cdrom crc_t10dif crct10dif_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk ahci drm libahci mlx5_core(OE) libata
[ 3960.743320]  crct10dif_pclmul mlxfw(OE) crct10dif_common ptp crc32c_intel pps_core iavf devlink serio_raw virtio_pci virtio_ring mlx_compat(OE) virtio drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [last unloaded: virtio_scsi]
[ 3960.745431] CPU: 18 PID: 2480 Comm: kworker/u40:1 Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1062.1.1.el7_lustre.ddn3.x86_64 #1
[ 3960.746900] Hardware name: DDN SFA400NVXE, BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 3960.747677] Workqueue: writeback bdi_writeback_workfn (flush-253:4)
[ 3960.748445] task: ffff978a23a2c1c0 ti: ffff976c25680000 task.ti: ffff976c25680000
[ 3960.749226] RIP: 0010:[<ffffffffc02533e2>]  [<ffffffffc02533e2>] virtqueue_add+0x4a2/0x4d0 [virtio_ring]
[ 3960.750017] RSP: 0018:ffff976c256834f0  EFLAGS: 00010097
[ 3960.750789] RAX: 0000000000001001 RBX: ffff978a60cc0000 RCX: 0000000000000003
[ 3960.751582] RDX: 0000000000001001 RSI: ffff976c25683610 RDI: ffff978a60cc0000
[ 3960.752374] RBP: ffff976c25683558 R08: 0000000000000001 R09: ffff976ffa9a4300
[ 3960.753169] R10: ffff9789f427b400 R11: ffff978a519a8e80 R12: ffff976c25683630
[ 3960.753950] R13: ffff976c25683630 R14: ffff976c25683610 R15: 0000000000000001
[ 3960.754734] FS:  0000000000000000(0000) GS:ffff978a69480000(0000) knlGS:0000000000000000
[ 3960.755535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3960.756337] CR2: 00007f8446e64e70 CR3: 000000251ea9c000 CR4: 0000000000760fe0
[ 3960.757148] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3960.757930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3960.758703] PKRU: 00000000
[ 3960.759459] Call Trace:
[ 3960.760227]  [<ffffffffc0253497>] virtqueue_add_sgs+0x87/0xa0 [virtio_ring]
[ 3960.761010]  [<ffffffffc04956aa>] virtscsi_add_cmd+0x17a/0x270 [virtio_scsi]
[ 3960.761779]  [<ffffffff873bef3e>] ? mempool_alloc+0x6e/0x170
[ 3960.762549]  [<ffffffffc04957d8>] virtscsi_kick_cmd+0x38/0xa0 [virtio_scsi]
[ 3960.763328]  [<ffffffffc0496c3d>] virtscsi_queuecommand+0x15d/0x370 [virtio_scsi]
[ 3960.764106]  [<ffffffffc0496efe>] virtscsi_queuecommand_multi+0x6e/0xd8 [virtio_scsi]
[ 3960.764878]  [<ffffffff876dcc90>] scsi_dispatch_cmd+0xb0/0x240
[ 3960.765657]  [<ffffffff876e59b5>] scsi_queue_rq+0x595/0x6e0
[ 3960.766435]  [<ffffffff8755ad05>] __blk_mq_try_issue_directly+0x135/0x1a0
[ 3960.767218]  [<ffffffff8755ad9d>] blk_mq_try_issue_directly+0x2d/0xb0
[ 3960.767994]  [<ffffffff8755b256>] blk_mq_make_request+0x436/0x630
[ 3960.768762]  [<ffffffff8754eff7>] generic_make_request+0x147/0x380
[ 3960.769535]  [<ffffffff8754f2a0>] submit_bio+0x70/0x150
[ 3960.770304]  [<ffffffff87486335>] ? bio_alloc_bioset+0x115/0x310
[ 3960.771069]  [<ffffffff87481d57>] _submit_bh+0x127/0x160
[ 3960.771828]  [<ffffffff87481fd2>] __block_write_full_page+0x162/0x390
[ 3960.772592]  [<ffffffff87487b10>] ? set_init_blocksize+0x90/0x90
[ 3960.773357]  [<ffffffff87487b10>] ? set_init_blocksize+0x90/0x90
[ 3960.774103]  [<ffffffff874823e8>] block_write_full_page+0xd8/0x100
[ 3960.774838]  [<ffffffff87488358>] blkdev_writepage+0x18/0x20
[ 3960.775561]  [<ffffffff873c6fd9>] __writepage+0x19/0x50
[ 3960.776273]  [<ffffffff873c7cec>] write_cache_pages+0x21c/0x470
[ 3960.776960]  [<ffffffff873c6fc0>] ? global_dirtyable_memory+0x70/0x70
[ 3960.777632]  [<ffffffff873c7f8d>] generic_writepages+0x4d/0x80
[ 3960.778293]  [<ffffffff8748831e>] blkdev_writepages+0xe/0x10
[ 3960.778929]  [<ffffffff873c8d31>] do_writepages+0x21/0x50
[ 3960.779544]  [<ffffffff874776d0>] __writeback_single_inode+0x40/0x260
[ 3960.780155]  [<ffffffff872c6185>] ? wake_up_bit+0x25/0x30
[ 3960.780735]  [<ffffffff87478264>] writeback_sb_inodes+0x1c4/0x430
[ 3960.781312]  [<ffffffff8747856f>] __writeback_inodes_wb+0x9f/0xd0
[ 3960.781863]  [<ffffffff87478a53>] wb_writeback+0x263/0x2f0
[ 3960.782404]  [<ffffffff8747954c>] bdi_writeback_workfn+0x1cc/0x460
[ 3960.782932]  [<ffffffff872bd0ff>] process_one_work+0x17f/0x440
[ 3960.783454]  [<ffffffff872be216>] worker_thread+0x126/0x3c0
[ 3960.783960]  [<ffffffff872be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
[ 3960.784461]  [<ffffffff872c50d1>] kthread+0xd1/0xe0
[ 3960.784951]  [<ffffffff872c5000>] ? insert_kthread_work+0x40/0x40
[ 3960.785450]  [<ffffffff8798cd37>] ret_from_fork_nospec_begin+0x21/0x21
[ 3960.785942]  [<ffffffff872c5000>] ? insert_kthread_work+0x40/0x40
[ 3960.786433] Code: ff e9 06 fd ff ff 48 89 d9 44 89 f2 48 c7 c6 3c 43 25 c0 48 c7 c7 78 50 25 c0 31 c0 e8 28 91 35 c7 8b 43 60 e9 19 ff ff ff 0f 0b <0f> 0b e8 ea 07 00 00 8b 55 ac 48 c7 c6 88 44 25 c0 48 c7 c7 a0 
[ 3960.787529] RIP  [<ffffffffc02533e2>] virtqueue_add+0x4a2/0x4d0 [virtio_ring]
[ 3960.788055]  RSP <ffff976c256834f0>


 Comments   
Comment by Gerrit Updater [ 08/Apr/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/38178
Subject: LU-13436 kernel: backport a virtio bug to lustre kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ac0802d687a2e6fe8e64ef1f0008e0d9400ddbdb

Generated at Sat Feb 10 03:01:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.