Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15

strange slow IO messages and bad performance

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 1.8.6
    • Fix Version/s: Lustre 1.8.6
    • Labels:
      None
    • Environment:
      1.8.4-ddn2 system with a 9900 and async journaling
    • Severity:
      2
    • Rank (Obsolete):
      4874

      Description

      We're running into a problem at SLAC that we could use some help on, and I'm not quite sure where to put the ticket. It's a 1.8.4-ddn2 system with a 9900 and async journaling and we've been seeing a lot of slow IO messages from Lustre. The strange thing is that the actual IO wait times on the sd devices is pretty low, so the problem seems to exist before it does disk access. We started getting reports of this as the OSTs started approaching 7-8TB used, out of 16TB total.

      There have been some watchdogs which point to the mballoc functions as taking a very long time. As these are 16TB LUNs, there are by default a very large amount of inodes. In new installations, we have begun tuning these down, but this system was sort of the impetuous for that. Mounting the OSTs used to take 30 minutes when we were on 1.8.3, but the mballoc optimizations that landed seem to have helped that quite a bit. Still the OSTs are 50% full, but the inodes are 1% used - definitely way too many inodes.

      We are starting to get a lot of heat from the customer as their users are unable to do their full workload due to the slow IO going on. Are there any stats or anything we can get to help? Would more of the call traces help?

      Here are some example traces:
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336408] [<ffffffff80066e15>] getnstimeofday+0x15/0x40
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336424] [<ffffffff80070c29>] io_schedule+0x49/0x80
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336432] [<ffffffff800171eb>] sync_buffer+0x3b/0x50
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336438] [<ffffffff80070ee5>] __wait_on_bit+0x45/0x80
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336442] [<ffffffff800171b0>] sync_buffer+0x0/0x50
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336449] [<ffffffff800171b0>] sync_buffer+0x0/0x50
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336455] [<ffffffff80070f98>] out_of_line_wait_on_bit+0x78/0x90
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336465] [<ffffffff800b95a0>] wake_bit_function+0x0/0x40
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336478] [<ffffffff80053b90>] __wait_on_buffer+0x20/0x30
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336500] [<ffffffff88b430fa>] ldiskfs_mb_init_cache+0x32a/0x930 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336539] [<ffffffff88b437e0>] ldiskfs_mb_init_group+0xe0/0x2a0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336548] [<ffffffff8002569b>] __up_read+0x8b/0xa0
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336572] [<ffffffff88b43a8b>] ldiskfs_mb_load_buddy+0xeb/0x400 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336599] [<ffffffff88b44b5a>] ldiskfs_mb_regular_allocator+0x1aa/0x570 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336638] [<ffffffff88b47689>] ldiskfs_mb_new_blocks+0x219/0x4d0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336649] [<ffffffff8018c31a>] __next_cpu+0x1a/0x30
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336654] [<ffffffff80071cb7>] __down_write_nested+0x17/0xb0
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336677] [<ffffffff88bc50c7>] ldiskfs_ext_new_extent_cb+0x367/0x680 [fsfilt_ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336714] [<ffffffff88b3183e>] ldiskfs_ext_find_extent+0x28e/0x2d0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336739] [<ffffffff88b31a69>] ldiskfs_ext_walk_space+0x1e9/0x270 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336749] [<ffffffff88bc4d60>] ldiskfs_ext_new_extent_cb+0x0/0x680 [fsfilt_ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336771] [<ffffffff88bc1449>] fsfilt_map_nblocks+0xe9/0x120 [fsfilt_ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336777] [<ffffffff800e7dc2>] __rmqueue+0x72/0x120
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336786] [<ffffffff800ed2e3>] zone_statistics+0x83/0xa0
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336808] [<ffffffff88bc167a>] fsfilt_ldiskfs_map_ext_inode_pages+0x1fa/0x220 [fsfilt_ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336815] [<ffffffff8002569b>] __up_read+0x8b/0xa0
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336830] [<ffffffff88b13fb9>] start_this_handle+0x389/0x450 [jbd2]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336853] [<ffffffff88bc16e1>] fsfilt_ldiskfs_map_inode_pages+0x41/0xb0 [fsfilt_ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336875] [<ffffffff88bfe78a>] filter_direct_io+0x46a/0xd50 [obdfilter]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336892] [<ffffffff889d7b10>] filter_quota_acquire+0x0/0x120 [lquota]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336917] [<ffffffff88c014e2>] filter_commitrw_write+0x17a2/0x2b30 [obdfilter]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.336926] [<ffffffff800a0348>] find_busiest_group+0x278/0x700

      Nov 17 20:32:44 psanaoss104 kernel: [16961.343328] [<ffffffff80071dec>] __down_read+0x8c/0xa9
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343340] [<ffffffff800bbd59>] down_read+0x19/0x20
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343363] [<ffffffff88b3a606>] ldiskfs_get_blocks+0x56/0x2b0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343393] [<ffffffff88b3b408>] ldiskfs_get_block+0xb8/0x100 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343407] [<ffffffff8003969a>] generic_block_bmap+0x3a/0x40
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343419] [<ffffffff88976c35>] ksocknal_alloc_tx+0x1f5/0x290 [ksocklnd]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343429] [<ffffffff8003a3dc>] mapping_tagged+0x4c/0x70
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343451] [<ffffffff88b38142>] ldiskfs_bmap+0xc2/0xe0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343470] [<ffffffff88b38080>] ldiskfs_bmap+0x0/0xe0 [ldiskfs]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343484] [<ffffffff88c000c9>] filter_commitrw_write+0x389/0x2b30 [obdfilter]
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343511] [<ffffffff80068b65>] cache_alloc_refill+0x95/0x1f0
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343520] [<ffffffff800a1243>] dequeue_task+0x23/0x50
      Nov 17 20:32:44 psanaoss104 kernel: [16961.343557] [<ffffffff88bf90d8>] filter_commitrw+0x58/0x2a0 [obdfilter]

      Nov 17 20:32:44 psanaoss104 kernel: [16961.360298] Lustre: Service thread pid 16192 was inactive for 226.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Nov 17 20:32:44 psanaoss104 kernel: [16961.360306] Lustre: Skipped 12 previous similar messages

      Thanks,
      Kit

      FYI, here is the ddn2 patch list:

      async_journals_by_default.patch
      disable_lru_resize.patch
      disable statahead
      updated qla2xxx driver to latest
      bug23549: allow crc32c hardware support
      bug21456: lctl-ping-fix
      bug21581: tarball name length fix
      bug22729: use %z for size_t
      bug22882: mmp might sleep negative time
      bug23074: improve autogen
      bug23157: change LBUG assertion to error for network problem
      bug22117: limit mmp check interval to something rational
      bug23064: umount without sync may cause data lost on 2.6.32 kernel
      bug21745: fix obdfilter-survey LBUG
      bug22820: too many default ACLs break directory access
      bug22906: mkfs.lustre failed due to invalid blocks count on a 16TB target
      bug20433: mdsrate fails to write after 1.3+M files opened
      bug21760: application hung in direct IO
      bug22891: objects not getting deleted for files which have been removed
      bug23216: memory leak in obdecho
      bug23335: limit echo object IDs to fit in f_oid
      bug23781: memory leak with odbo
      bug20394: fix bad assertion
      bug21587: evict instead of LBUG
      bug24003: fix cosmetic bug for lfs getstripe
      revert bug22656

        Attachments

        1. LU-15.tgz
          4.97 MB
        2. messages
          760 kB
        3. patch.aligned_1M
          4 kB
        4. patch.aligned_1M
          1 kB
        5. patch.diff
          9 kB
        6. patch.ext4
          6 kB
        7. readahead_align_1M.patch
          8 kB

          Issue Links

            Activity

              People

              • Assignee:
                ys Yang Sheng
                Reporter:
                kitwestneat Kit Westneat (Inactive)
              • Votes:
                1 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: