Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 1.8.6
-
None
-
1.8.4-ddn2 system with a 9900 and async journaling
-
2
-
4874
Description
We're running into a problem at SLAC that we could use some help on, and I'm not quite sure where to put the ticket. It's a 1.8.4-ddn2 system with a 9900 and async journaling and we've been seeing a lot of slow IO messages from Lustre. The strange thing is that the actual IO wait times on the sd devices is pretty low, so the problem seems to exist before it does disk access. We started getting reports of this as the OSTs started approaching 7-8TB used, out of 16TB total.
There have been some watchdogs which point to the mballoc functions as taking a very long time. As these are 16TB LUNs, there are by default a very large amount of inodes. In new installations, we have begun tuning these down, but this system was sort of the impetuous for that. Mounting the OSTs used to take 30 minutes when we were on 1.8.3, but the mballoc optimizations that landed seem to have helped that quite a bit. Still the OSTs are 50% full, but the inodes are 1% used - definitely way too many inodes.
We are starting to get a lot of heat from the customer as their users are unable to do their full workload due to the slow IO going on. Are there any stats or anything we can get to help? Would more of the call traces help?
Here are some example traces:
Nov 17 20:32:44 psanaoss104 kernel: [16961.336408] [<ffffffff80066e15>] getnstimeofday+0x15/0x40
Nov 17 20:32:44 psanaoss104 kernel: [16961.336424] [<ffffffff80070c29>] io_schedule+0x49/0x80
Nov 17 20:32:44 psanaoss104 kernel: [16961.336432] [<ffffffff800171eb>] sync_buffer+0x3b/0x50
Nov 17 20:32:44 psanaoss104 kernel: [16961.336438] [<ffffffff80070ee5>] __wait_on_bit+0x45/0x80
Nov 17 20:32:44 psanaoss104 kernel: [16961.336442] [<ffffffff800171b0>] sync_buffer+0x0/0x50
Nov 17 20:32:44 psanaoss104 kernel: [16961.336449] [<ffffffff800171b0>] sync_buffer+0x0/0x50
Nov 17 20:32:44 psanaoss104 kernel: [16961.336455] [<ffffffff80070f98>] out_of_line_wait_on_bit+0x78/0x90
Nov 17 20:32:44 psanaoss104 kernel: [16961.336465] [<ffffffff800b95a0>] wake_bit_function+0x0/0x40
Nov 17 20:32:44 psanaoss104 kernel: [16961.336478] [<ffffffff80053b90>] __wait_on_buffer+0x20/0x30
Nov 17 20:32:44 psanaoss104 kernel: [16961.336500] [<ffffffff88b430fa>] ldiskfs_mb_init_cache+0x32a/0x930 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336539] [<ffffffff88b437e0>] ldiskfs_mb_init_group+0xe0/0x2a0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336548] [<ffffffff8002569b>] __up_read+0x8b/0xa0
Nov 17 20:32:44 psanaoss104 kernel: [16961.336572] [<ffffffff88b43a8b>] ldiskfs_mb_load_buddy+0xeb/0x400 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336599] [<ffffffff88b44b5a>] ldiskfs_mb_regular_allocator+0x1aa/0x570 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336638] [<ffffffff88b47689>] ldiskfs_mb_new_blocks+0x219/0x4d0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336649] [<ffffffff8018c31a>] __next_cpu+0x1a/0x30
Nov 17 20:32:44 psanaoss104 kernel: [16961.336654] [<ffffffff80071cb7>] __down_write_nested+0x17/0xb0
Nov 17 20:32:44 psanaoss104 kernel: [16961.336677] [<ffffffff88bc50c7>] ldiskfs_ext_new_extent_cb+0x367/0x680 [fsfilt_ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336714] [<ffffffff88b3183e>] ldiskfs_ext_find_extent+0x28e/0x2d0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336739] [<ffffffff88b31a69>] ldiskfs_ext_walk_space+0x1e9/0x270 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336749] [<ffffffff88bc4d60>] ldiskfs_ext_new_extent_cb+0x0/0x680 [fsfilt_ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336771] [<ffffffff88bc1449>] fsfilt_map_nblocks+0xe9/0x120 [fsfilt_ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336777] [<ffffffff800e7dc2>] __rmqueue+0x72/0x120
Nov 17 20:32:44 psanaoss104 kernel: [16961.336786] [<ffffffff800ed2e3>] zone_statistics+0x83/0xa0
Nov 17 20:32:44 psanaoss104 kernel: [16961.336808] [<ffffffff88bc167a>] fsfilt_ldiskfs_map_ext_inode_pages+0x1fa/0x220 [fsfilt_ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336815] [<ffffffff8002569b>] __up_read+0x8b/0xa0
Nov 17 20:32:44 psanaoss104 kernel: [16961.336830] [<ffffffff88b13fb9>] start_this_handle+0x389/0x450 [jbd2]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336853] [<ffffffff88bc16e1>] fsfilt_ldiskfs_map_inode_pages+0x41/0xb0 [fsfilt_ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336875] [<ffffffff88bfe78a>] filter_direct_io+0x46a/0xd50 [obdfilter]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336892] [<ffffffff889d7b10>] filter_quota_acquire+0x0/0x120 [lquota]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336917] [<ffffffff88c014e2>] filter_commitrw_write+0x17a2/0x2b30 [obdfilter]
Nov 17 20:32:44 psanaoss104 kernel: [16961.336926] [<ffffffff800a0348>] find_busiest_group+0x278/0x700
Nov 17 20:32:44 psanaoss104 kernel: [16961.343328] [<ffffffff80071dec>] __down_read+0x8c/0xa9
Nov 17 20:32:44 psanaoss104 kernel: [16961.343340] [<ffffffff800bbd59>] down_read+0x19/0x20
Nov 17 20:32:44 psanaoss104 kernel: [16961.343363] [<ffffffff88b3a606>] ldiskfs_get_blocks+0x56/0x2b0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343393] [<ffffffff88b3b408>] ldiskfs_get_block+0xb8/0x100 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343407] [<ffffffff8003969a>] generic_block_bmap+0x3a/0x40
Nov 17 20:32:44 psanaoss104 kernel: [16961.343419] [<ffffffff88976c35>] ksocknal_alloc_tx+0x1f5/0x290 [ksocklnd]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343429] [<ffffffff8003a3dc>] mapping_tagged+0x4c/0x70
Nov 17 20:32:44 psanaoss104 kernel: [16961.343451] [<ffffffff88b38142>] ldiskfs_bmap+0xc2/0xe0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343470] [<ffffffff88b38080>] ldiskfs_bmap+0x0/0xe0 [ldiskfs]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343484] [<ffffffff88c000c9>] filter_commitrw_write+0x389/0x2b30 [obdfilter]
Nov 17 20:32:44 psanaoss104 kernel: [16961.343511] [<ffffffff80068b65>] cache_alloc_refill+0x95/0x1f0
Nov 17 20:32:44 psanaoss104 kernel: [16961.343520] [<ffffffff800a1243>] dequeue_task+0x23/0x50
Nov 17 20:32:44 psanaoss104 kernel: [16961.343557] [<ffffffff88bf90d8>] filter_commitrw+0x58/0x2a0 [obdfilter]
Nov 17 20:32:44 psanaoss104 kernel: [16961.360298] Lustre: Service thread pid 16192 was inactive for 226.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Nov 17 20:32:44 psanaoss104 kernel: [16961.360306] Lustre: Skipped 12 previous similar messages
Thanks,
Kit
FYI, here is the ddn2 patch list:
async_journals_by_default.patch
disable_lru_resize.patch
disable statahead
updated qla2xxx driver to latest
bug23549: allow crc32c hardware support
bug21456: lctl-ping-fix
bug21581: tarball name length fix
bug22729: use %z for size_t
bug22882: mmp might sleep negative time
bug23074: improve autogen
bug23157: change LBUG assertion to error for network problem
bug22117: limit mmp check interval to something rational
bug23064: umount without sync may cause data lost on 2.6.32 kernel
bug21745: fix obdfilter-survey LBUG
bug22820: too many default ACLs break directory access
bug22906: mkfs.lustre failed due to invalid blocks count on a 16TB target
bug20433: mdsrate fails to write after 1.3+M files opened
bug21760: application hung in direct IO
bug22891: objects not getting deleted for files which have been removed
bug23216: memory leak in obdecho
bug23335: limit echo object IDs to fit in f_oid
bug23781: memory leak with odbo
bug20394: fix bad assertion
bug21587: evict instead of LBUG
bug24003: fix cosmetic bug for lfs getstripe
revert bug22656
Attachments
Issue Links
- is related to
-
LU-918 ensure that BRW requests prevent lock timeout
- Closed
-
LU-10967 MDT page cache management improvements
- Open
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
-
ost to pst [#LU-15] strange slow IO messages and bad performance - Whamcloud Community Jira
-
soon [#LU-15] strange slow IO messages and bad performance - Whamcloud Community Jira