[LU-9488] soft lockup in osd_inode_iteration() Created: 10/May/17 Updated: 06/Sep/17 Resolved: 10/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Cliff White (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Soak test cluster |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Restarted testing with latest master, 3577 2017-05-09 20:08:27 ,327:fsmgmt.fsmgmt:INFO oss_failover completed, running lfsck MDS reported a single error: May 9 20:08:39 soak-8 kernel: LustreError: 5550:0:(lfsck_lib.c:2680:lfsck_load_one_trace_file()) soaked-MDT0000-osd: unlink lfsck sub trace file lfsck_namespace_00: rc = 0 Before soak hits timeout, MDS has wedged: May 9 20:12:41 soak-8 kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [OI_scrub:5551]
May 9 20:12:41 soak-8 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd dm_round_robin ipmi_ssif sb_edac ipmi_devintf ntb sg iTCO_wdt ioatdma shpchp edac_core mei_me iTCO_vendor_support mei lpc_ich ipmi_si pcspkr i2c_i801
May 9 20:12:41 soak-8 kernel: ipmi_msghandler wmi nfsd dm_multipath dm_mod nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci igb ttm ahci crct10dif_pclmul crct10dif_common ptp libsas crc32c_intel libahci pps_core drm mlx4_core mpt2sas libata dca raid_class i2c_algo_bit scsi_transport_sas devlink i2c_core fjes
May 9 20:12:41 soak-8 kernel: CPU: 6 PID: 5551 Comm: OI_scrub Tainted: P OE ------------ 3.10.0-514.16.1.el7_lustre.x86_64 #1
May 9 20:12:41 soak-8 kernel: Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
May 9 20:12:41 soak-8 kernel: task: ffff88083fde6dd0 ti: ffff880703600000 task.ti: ffff880703600000
May 9 20:12:41 soak-8 kernel: RIP: 0010:[<ffffffffa121d1d9>] [<ffffffffa121d1d9>] osd_inode_iteration+0x489/0xcc0 [osd_ldiskfs]
May 9 20:12:41 soak-8 kernel: RSP: 0018:ffff880703603d18 EFLAGS: 00000293
May 9 20:12:41 soak-8 kernel: RAX: 0000000000000004 RBX: 0000000023f30a01 RCX: 0000000000000000
May 9 20:12:41 soak-8 kernel: RDX: ffff880703603d78 RSI: ffff8800b2a36000 RDI: ffff8803162f6000
May 9 20:12:41 soak-8 kernel: RBP: ffff880703603df0 R08: ffff880703603d57 R09: 0000000000000004
May 9 20:12:41 soak-8 kernel: R10: 0000000023f30a01 R11: ffffea000c8fcc00 R12: 0000000023f30a01
May 9 20:12:41 soak-8 kernel: R13: ffff880703603d08 R14: 0000000023f30a01 R15: ffff880703603d08
May 9 20:12:41 soak-8 kernel: FS: 0000000000000000(0000) GS:ffff88042e180000(0000) knlGS:0000000000000000
May 9 20:12:41 soak-8 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 9 20:12:41 soak-8 kernel: CR2: 00007f64d55202e0 CR3: 00000000019be000 CR4: 00000000000407e0
May 9 20:12:41 soak-8 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 9 20:12:41 soak-8 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 9 20:12:41 soak-8 kernel: Stack:
May 9 20:12:41 soak-8 kernel: ffffffffa121b990 ffffffffa1217a20 ffff8800b2a36000 00000000810d354f
May 9 20:12:41 soak-8 kernel: ffff8803162f6000 ffff8800b2a37468 0000000020000000 010000000000000c
May 9 20:12:41 soak-8 kernel: 0000000000000000 0000000000000000 ffff8800b2a36000 0000000000000000
May 9 20:12:41 soak-8 kernel: Call Trace:
May 9 20:12:41 soak-8 kernel: [<ffffffffa121b990>] ? osd_ios_ROOT_scan+0x300/0x300 [osd_ldiskfs]
May 9 20:12:41 soak-8 kernel: [<ffffffffa1217a20>] ? osd_preload_next+0xb0/0xb0 [osd_ldiskfs]
May 9 20:12:41 soak-8 kernel: [<ffffffffa121e370>] osd_scrub_main+0x960/0xf30 [osd_ldiskfs]
May 9 20:12:41 soak-8 kernel: [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20
May 9 20:12:41 soak-8 kernel: [<ffffffffa121da10>] ? osd_inode_iteration+0xcc0/0xcc0 [osd_ldiskfs]
May 9 20:12:41 soak-8 kernel: [<ffffffff810b0a4f>] kthread+0xcf/0xe0
May 9 20:12:41 soak-8 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
May 9 20:12:41 soak-8 kernel: [<ffffffff81697318>] ret_from_fork+0x58/0x90
May 9 20:12:41 soak-8 kernel: [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
May 9 20:12:41 soak-8 kernel: Code: 00 e8 7c eb 97 ff e9 0f fc ff ff 0f 1f 80 00 00 00 00 45 89 e9 4c 8d 85 67 ff ff ff 48 8b 4d a8 48 8d 55 88 48 8b b5 38 ff ff ff <48> 8b bd 48 ff ff ff 48 8b 85 28 ff ff ff ff d0 85 c0 41 89 c5
lfsck hit 600 second timeout, abort attempted: 2017-05-09 20:18:46,982:fsmgmt.fsmgmt:ERROR lfsck still running after 600s, aborting 2017-05-09 20:18:46,983:fsmgmt.fsmgmt:INFO executing cmd: lctl lfsck_stop -M soaked-MDT0000 One the MDS hits the lockup, it hits it over and over and within minutes is doing nothing other that hitting the lockup. |
| Comments |
| Comment by Peter Jones [ 11/May/17 ] |
|
Fan Yong Could you please assist with this one? Thanks Peter |
| Comment by Gerrit Updater [ 22/May/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/27228 |
| Comment by nasf (Inactive) [ 22/May/17 ] |
|
Cliff, I hope that the patch https://review.whamcloud.com/27228 can fix the OI scrub soft lockup trouble. But if was not, please dump the oi_scrub via proc interface. The patch adds more information the /proc/fs/lustre/osd-ldiskfs/xxx/oi_scrub output, they are helpful to understand the OI scrub status. Thanks! |
| Comment by Gerrit Updater [ 10/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27228/ |
| Comment by Peter Jones [ 10/Jun/17 ] |
|
Landed for 2.10 |