Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.2
-
None
-
3
-
9223372036854775807
Description
Some customer reported that they hit soft lockup when remove changelog as following:
[Mon Sep 25 13:32:58 2017] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [mdt07_006:16965] [Mon Sep 25 13:32:58 2017] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) bonding intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support cryptd sb_edac dm_round_robin edac_core lpc_ich i2c_i801 pcspkr ses mei_me mei enclosure ipmi_devintf sg ipmi_si ipmi_msghandler wmi shpchp acpi_power_meter acpi_pad zfs(POE) dm_multipath nfsd dm_mod zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) auth_rpcgss nfs_acl lockd spl(OE) grace zlib_deflate sunrpc ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic ast drm_kms_helper crct10dif_pclmul [Mon Sep 25 13:32:58 2017] crct10dif_common syscopyarea crc32c_intel sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe igb ahci mpt3sas libahci mdio ptp libata i2c_algo_bit raid_class pps_core i2c_core scsi_transport_sas dca fjes [Mon Sep 25 13:32:58 2017] CPU: 38 PID: 16965 Comm: mdt07_006 Tainted: P OEL ------------ 3.10.0-514.2.2.el7_lustre.x86_64 #1 [Mon Sep 25 13:32:58 2017] Hardware name: Supermicro SSG-2028R-DE2CR24L/X10DRS-2U, BIOS 2.1 11/04/2016 [Mon Sep 25 13:32:58 2017] task: ffff881ffe106dd0 ti: ffff881fec96c000 task.ti: ffff881fec96c000 [Mon Sep 25 13:32:58 2017] RIP: 0010:[<ffffffffa0b5669d>] [<ffffffffa0b5669d>] changelog_block_trim_ext.part.24.constprop.25+0x3d/0xf0 [obdclass] [Mon Sep 25 13:32:58 2017] RSP: 0018:ffff881fec96f710 EFLAGS: 00000246 [Mon Sep 25 13:32:58 2017] RAX: 0000000000000000 RBX: ffff881d143afaa0 RCX: 0000000000000000 [Mon Sep 25 13:32:58 2017] RDX: 0000000000000000 RSI: ffff881d12949268 RDI: ffff881d12948000 [Mon Sep 25 13:32:58 2017] RBP: ffff881fec96f728 R08: 0000000000000000 R09: 0000000180200012 [Mon Sep 25 13:32:58 2017] R10: 0000000000002000 R11: ffff880fc8f02a50 R12: 0000000000000000 [Mon Sep 25 13:32:58 2017] R13: ffffffff00000000 R14: ffff881fec96f698 R15: ffffffffa05a17d8 [Mon Sep 25 13:32:58 2017] FS: 0000000000000000(0000) GS:ffff88203f480000(0000) knlGS:0000000000000000 [Mon Sep 25 13:32:58 2017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Sep 25 13:32:58 2017] CR2: 00007f3fdd845072 CR3: 00000000019ba000 CR4: 00000000003407e0 [Mon Sep 25 13:32:58 2017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Mon Sep 25 13:32:58 2017] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Mon Sep 25 13:32:58 2017] Stack: [Mon Sep 25 13:32:58 2017] ffff881d12948000 ffff881fd1fa7300 ffff881fcee29740 ffff881fec96f7a8 [Mon Sep 25 13:32:58 2017] ffffffffa0b5afde ffff881fec96fc28 ffff881d12949f30 0000000000000000 [Mon Sep 25 13:32:58 2017] ffff881fec96f7a8 000000000000d0a3 ffff881fec96fc28 ffff881fd1fa7370 [Mon Sep 25 13:32:58 2017] Call Trace: [Mon Sep 25 13:32:58 2017] [<ffffffffa0b5afde>] llog_osd_next_block+0x43e/0xb50 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b4d045>] llog_process_thread+0x305/0x1130 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa141aed0>] ? mdd_key_fini+0x350/0x350 [mdd] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b4df4c>] llog_process_or_fork+0xdc/0x590 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b5282d>] llog_cat_process_cb+0x4fd/0x600 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b4d2e5>] llog_process_thread+0x5a5/0x1130 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b52330>] ? llog_cat_cancel_records+0x330/0x330 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b4df4c>] llog_process_or_fork+0xdc/0x590 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b5180a>] llog_cat_process_or_fork+0x13a/0x2f0 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa141aed0>] ? mdd_key_fini+0x350/0x350 [mdd] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b519d9>] llog_cat_process+0x19/0x20 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa141a098>] llog_changelog_cancel+0x58/0x230 [mdd] [Mon Sep 25 13:32:58 2017] [<ffffffffa08ba0c7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [Mon Sep 25 13:32:58 2017] [<ffffffffa0b539e4>] llog_cancel+0x54/0x240 [obdclass] [Mon Sep 25 13:32:58 2017] [<ffffffffa141c662>] mdd_changelog_user_purge+0x472/0x7a0 [mdd] [Mon Sep 25 13:32:58 2017] [<ffffffffa141f21b>] mdd_iocontrol+0x75b/0xc40 [mdd] [Mon Sep 25 13:32:58 2017] [<ffffffffa12cb859>] mdt_ioc_child.isra.61+0xf9/0x1e0 [mdt] [Mon Sep 25 13:32:58 2017] [<ffffffffa12e2ffc>] mdt_iocontrol+0x69c/0xa20 [mdt] [Mon Sep 25 13:32:58 2017] [<ffffffffa0e0019f>] ? lustre_pack_reply_v2+0x14f/0x280 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffffa12e37bc>] mdt_set_info+0x43c/0x450 [mdt] [Mon Sep 25 13:32:58 2017] [<ffffffffa0e68adb>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffffa0e0cb8b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffffa08bbce8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [Mon Sep 25 13:32:58 2017] [<ffffffffa0e09c58>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffff810ba238>] ? __wake_up_common+0x58/0x90 [Mon Sep 25 13:32:58 2017] [<ffffffffa0e104b0>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffffa0e0f8b0>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] [Mon Sep 25 13:32:58 2017] [<ffffffff810b064f>] kthread+0xcf/0xe0 [Mon Sep 25 13:32:58 2017] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 [Mon Sep 25 13:32:58 2017] [<ffffffff81696898>] ret_from_fork+0x58/0x90 [Mon Sep 25 13:32:58 2017] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 [Mon Sep 25 13:32:58 2017] Code: 89 fb eb 35 66 2e 0f 1f 84 00 00 00 00 00 48 8d 7b 70 e8 c7 04 7d e0 0f b7 43 12 f6 c4 20 74 7e 66 25 ff 0f 44 09 e8 66 89 43 12 <8b> 03 48 01 c3 4c 39 e3 0f 87 95 00 00 00 0f b7 4b 12 48 8d 73