[LU-8354] soft lockup in ldlm_plain_compat_queue Created: 30/Jun/16 Updated: 22/Aug/16 Resolved: 26/Jul/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andriy Skulysh | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
<6>[1058680.630618] Lustre: Setting parameter snx11001-MDT0000.mdd.changelog_mask in log snx11001-MDT0000 <0>[1058752.944434] BUG: soft lockup - CPU#9 stuck for 67s! [lctl:79108] <4>[1058753.055094] CPU 9 <4>[1058753.057343] Modules linked in: ost(U) osd_ldiskfs(U) ldiskfs(U) mdt(U) mdd(U) lfsck(U) mgs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx ext4 jbd2 mbcache ib_ipoib(U) rdma_ucm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) rdma_cm(U) ib_cm(U) iw_cm(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack xt_multiport iptable_filter xt_NOTRACK nf_conntrack iptable_raw ip_tables ipmi_devintf acpi_cpufreq freq_table mperf dm_mod sg ses enclosure sd_mod crc_t10dif wmi iTCO_wdt iTCO_vendor_support isci libsas mpt2sas scsi_transport_sas raid_class sb_edac edac_core ahci i2c_i801 lpc_ich mfd_core shpchp nfs lockd fscache auth_rpcgss nfs_acl sunrpc igb dca i2c_algo_bit i2c_core mlx4_en(U) ptp pps_core mlx4_core(U) compat(U) bonding ipv6 8021q garp stp llc [last unloaded: ib_core] <4>[1058753.161188] <4>[1058753.163132] Pid: 79108, comm: lctl Not tainted 2.6.32-431.17.1.x2.0.76.x86_64 #1 Intel Corporation S2600JF/S2600JF <4>[1058753.175129] RIP: 0010:[<ffffffffa0897750>] [<ffffffffa0897750>] ldlm_add_ast_work_item+0x30/0x150 [ptlrpc] <4>[1058753.186440] RSP: 0018:ffff880f4540da48 EFLAGS: 00000246 <4>[1058753.192658] RAX: ffff880fc0039e40 RBX: ffff880f4540da68 RCX: 00000000000013cf <4>[1058753.201000] RDX: ffff880f4540daa8 RSI: ffff880e610c7340 RDI: ffff880e3a3ddd00 <4>[1058753.209342] RBP: ffffffff8100bb8e R08: ffff880fb713fd50 R09: ffff880e610c7340 <4>[1058753.217678] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880fb713fd50 <4>[1058753.226014] R13: ffff880e610c7340 R14: 0000000000000000 R15: 0000000000000000 <4>[1058753.234349] FS: 00007f17bd91c700(0000) GS:ffff880060720000(0000) knlGS:0000000000000000 <4>[1058753.243757] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>[1058753.250465] CR2: 00007fd267c5f000 CR3: 0000000eea234000 CR4: 00000000000407e0 <4>[1058753.258809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[1058753.267146] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[1058753.275483] Process lctl (pid: 79108, threadinfo ffff880f4540c000, task ffff881033667540) <4>[1058753.284983] Stack: <4>[1058753.287507] ffffffffa08b0948 0000000000000010 ffff880fb9d7fac0 ffff880f4540daa8 <4>[1058753.295905] <d> ffff880f4540dae8 ffffffffa08b0958 ffff880f4540dc40 ffff880e610c73a0 <4>[1058753.304897] <d> ffff880fc0039e58 ffff880fc0039e80 ffff880fc0039e40 0000000100000001 <4>[1058753.314187] Call Trace: <4>[1058753.317222] [<ffffffffa08b0948>] ? ldlm_process_plain_lock+0x1b8/0xa80 [ptlrpc] <4>[1058753.325871] [<ffffffffa08b0958>] ? ldlm_process_plain_lock+0x1c8/0xa80 [ptlrpc] <4>[1058753.334520] [<ffffffffa089bbab>] ? ldlm_lock_enqueue+0x48b/0xa60 [ptlrpc] <4>[1058753.342508] [<ffffffffa08bbac1>] ? ldlm_cli_enqueue_local+0x1b1/0x810 [ptlrpc] <4>[1058753.351055] [<ffffffffa0d5d650>] ? mgs_completion_ast_config+0x0/0x20 [mgs] <4>[1058753.359319] [<ffffffffa08ba880>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] <4>[1058753.367097] [<ffffffffa0d5d30b>] ? mgs_revoke_lock+0x1fb/0x350 [mgs] <4>[1058753.374599] [<ffffffffa08ba880>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] <4>[1058753.382377] [<ffffffffa0d5d650>] ? mgs_completion_ast_config+0x0/0x20 [mgs] <4>[1058753.390629] [<ffffffffa0d7b32f>] ? mgs_setparam+0xe6f/0x10f0 [mgs] <4>[1058753.397924] [<ffffffffa0d63712>] ? mgs_iocontrol+0x15b2/0x18e0 [mgs] <4>[1058753.405456] [<ffffffffa0661ed5>] ? obd_ioctl_getdata+0x145/0x1150 [obdclass] <4>[1058753.413811] [<ffffffffa067b2be>] ? class_handle_ioctl+0x16fe/0x2270 [obdclass] <4>[1058753.422346] [<ffffffffa06612ab>] ? obd_class_ioctl+0x4b/0x190 [obdclass] <4>[1058753.430224] [<ffffffff8119e0e2>] ? vfs_ioctl+0x22/0xa0 <4>[1058753.436345] [<ffffffff8119e284>] ? do_vfs_ioctl+0x84/0x580 <4>[1058753.442861] [<ffffffff8119e801>] ? sys_ioctl+0x81/0xa0 <4>[1058753.448989] [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b <4>[1058753.456180] Code: 54 53 48 83 ec 10 0f 1f 44 00 00 f6 05 0d b2 cb ff 01 48 89 fb 49 89 f4 74 0d f6 05 fc b1 cb ff 01 0f 85 9c 00 00 00 48 8b 43 48 <8b> 40 18 89 c1 c1 f9 10 66 39 c1 0f 84 ff 00 00 00 4d 85 e4 0f The lock iteration in ldlm_plain_compat_queue() was previously optimized to skip locks of the same type, but this optimization was broken by patch http://review.whamcloud.com/10945 " |
| Comments |
| Comment by Gerrit Updater [ 30/Jun/16 ] |
|
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/21093 |
| Comment by Gerrit Updater [ 20/Jul/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21093/ |
| Comment by Joseph Gmitter (Inactive) [ 26/Jul/16 ] |
|
Landed to master for 2.9.0 |