Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4716

replay-ost-single test_5: stuck in dbuf_read->zio_wait

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.6.0
    • client and server: lustre-master build # 1911 RHEL6 zfs
    • 3
    • 12962

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/42573266-9f17-11e3-934b-52540035b04c.

      The sub-test test_5 failed with the following error:

      test failed to respond and timed out

      Cannot find useful log, there are some D processes in client 2 console log but they are not lustre process

      Attachments

        Issue Links

          Activity

            [LU-4716] replay-ost-single test_5: stuck in dbuf_read->zio_wait

            Closing old issue, since we haven't seen this in a long time (suspect update to 0.6.3 fixed it).

            adilger Andreas Dilger added a comment - Closing old issue, since we haven't seen this in a long time (suspect update to 0.6.3 fixed it).

            Looking further at the logs today, the busy txg_sync thread was blocked because IO couldn't complete:
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa021e3a4>] dsl_pool_sync+0xf4/0x540 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa023725e>] spa_sync+0x40e/0xa00 [zfs]

            This could be the known ZFS IO starvation issue with the Linux cfq scheduler. Although ZFS automatically set IO scheduler to noop on whole disks, the host OS could still be using cfq for the disks behind the guess OS disks. It seemed that ZFS pools on OSS were set up:
            zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost3

            It was not clear to me how devices under /dev/lvm-Role_OSS were setup and used. I think it makes sense to make sure that our test system:
            1. Use whole disks for zfs pools on guest VMs.
            2. Use noop IO scheduler for corresponding disks on host OS.

            isaac Isaac Huang (Inactive) added a comment - Looking further at the logs today, the busy txg_sync thread was blocked because IO couldn't complete: Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40 Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa021e3a4>] dsl_pool_sync+0xf4/0x540 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa023725e>] spa_sync+0x40e/0xa00 [zfs] This could be the known ZFS IO starvation issue with the Linux cfq scheduler. Although ZFS automatically set IO scheduler to noop on whole disks, the host OS could still be using cfq for the disks behind the guess OS disks. It seemed that ZFS pools on OSS were set up: zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost3 It was not clear to me how devices under /dev/lvm-Role_OSS were setup and used. I think it makes sense to make sure that our test system: 1. Use whole disks for zfs pools on guest VMs. 2. Use noop IO scheduler for corresponding disks on host OS.

            Create ticket TEI-1729 to ask for ZFS debugging data to be included in test reports.

            isaac Isaac Huang (Inactive) added a comment - Create ticket TEI-1729 to ask for ZFS debugging data to be included in test reports.

            On the OSS, there seemed to be 7 ZFS pools, one for each OST, as there were 7 txg_sync threads. Six of them were sleeping waiting for work to do, but one was busy blocking for IO in spa_sync(), writing out the syncing txg. Four ll_ost threads were blocking in dmu_tx_assign()->txg_wait_open() - looks like the open txg couldn't accept new tx due to write throttling, and it couldn't move to the quiescing state as the syncing txg couldn't complete.

            I agree with Andreas that there's little value chasing down problems in 0.6.2, as the ZFS write throttle code has been completely reworked post 0.6.2.

            But there's two more things.

            1. Maybe I've missed something, but I couldn't find any ZFS debugging data on the Maloo test report. It'd be very useful to have a tarball of /proc/spl/, when FSTYPE=zfs. Lots of useful data can be found under that directory, e.g. dmu_tx_assign delay histogram.

            2. Among the 7 ZFS pools, only one was busy writing out data at the time. I wonder why all the load hit a single OST while there were 7.

            isaac Isaac Huang (Inactive) added a comment - On the OSS, there seemed to be 7 ZFS pools, one for each OST, as there were 7 txg_sync threads. Six of them were sleeping waiting for work to do, but one was busy blocking for IO in spa_sync(), writing out the syncing txg. Four ll_ost threads were blocking in dmu_tx_assign()->txg_wait_open() - looks like the open txg couldn't accept new tx due to write throttling, and it couldn't move to the quiescing state as the syncing txg couldn't complete. I agree with Andreas that there's little value chasing down problems in 0.6.2, as the ZFS write throttle code has been completely reworked post 0.6.2. But there's two more things. 1. Maybe I've missed something, but I couldn't find any ZFS debugging data on the Maloo test report. It'd be very useful to have a tarball of /proc/spl/, when FSTYPE=zfs. Lots of useful data can be found under that directory, e.g. dmu_tx_assign delay histogram. 2. Among the 7 ZFS pools, only one was busy writing out data at the time. I wonder why all the load hit a single OST while there were 7.

            I think it makes sense to upgrade our ZFS code to 0.6.3, or the tip of the master branch if that is not available yet. There isn't any value to chase bugs that may already be fixed in the latest ZFS code (which we can hope will be released before Lustre 2.6.0).

            adilger Andreas Dilger added a comment - I think it makes sense to upgrade our ZFS code to 0.6.3, or the tip of the master branch if that is not available yet. There isn't any value to chase bugs that may already be fixed in the latest ZFS code (which we can hope will be released before Lustre 2.6.0).

            There are some ll_ost threads blocked in ZFS:

            Feb 26 02:27:43 client-16vm4 kernel: ll_ost_io00_0
            Feb 26 02:27:43 client-16vm4 kernel: Call Trace:
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142edd>] cv_wait_common+0xed/0x100 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f45>] __cv_wait+0x15/0x20 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0241e9b>] txg_wait_open+0x7b/0xa0 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0206a5d>] dmu_tx_wait+0xed/0xf0 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0206aee>] dmu_tx_assign+0x8e/0x4e0 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa11b456c>] osd_trans_start+0x9c/0x410 [osd_zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15cf54c>] ofd_trans_start+0x7c/0x100 [ofd]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d48b3>] ofd_commitrw_write+0x523/0xfd0 [ofd]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d588a>] ofd_commitrw+0x52a/0x8c0 [ofd]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12e971d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12f093e>] tgt_brw_write+0xc7e/0x1530 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa129e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: ll_ost_io00_0
            Feb 26 02:27:43 client-16vm4 kernel: Call Trace:
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffff81528823>] io_schedule+0x73/0xc0
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01f2fdd>] dbuf_read+0x3fd/0x750 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01f4d84>] dmu_buf_will_dirty+0xa4/0x100 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01fb930>] dmu_write+0x90/0x160 [zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa11c28f7>] osd_write_commit+0x417/0x570 [osd_zfs]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d4994>] ofd_commitrw_write+0x604/0xfd0 [ofd]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d588a>] ofd_commitrw+0x52a/0x8c0 [ofd]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12e971d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12f093e>] tgt_brw_write+0xc7e/0x1530 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa129e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            
            adilger Andreas Dilger added a comment - There are some ll_ost threads blocked in ZFS: Feb 26 02:27:43 client-16vm4 kernel: ll_ost_io00_0 Feb 26 02:27:43 client-16vm4 kernel: Call Trace: Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142edd>] cv_wait_common+0xed/0x100 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f45>] __cv_wait+0x15/0x20 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0241e9b>] txg_wait_open+0x7b/0xa0 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0206a5d>] dmu_tx_wait+0xed/0xf0 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0206aee>] dmu_tx_assign+0x8e/0x4e0 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa11b456c>] osd_trans_start+0x9c/0x410 [osd_zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15cf54c>] ofd_trans_start+0x7c/0x100 [ofd] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d48b3>] ofd_commitrw_write+0x523/0xfd0 [ofd] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d588a>] ofd_commitrw+0x52a/0x8c0 [ofd] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12e971d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12f093e>] tgt_brw_write+0xc7e/0x1530 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa129e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: ll_ost_io00_0 Feb 26 02:27:43 client-16vm4 kernel: Call Trace: Feb 26 02:27:43 client-16vm4 kernel: [<ffffffff81528823>] io_schedule+0x73/0xc0 Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01f2fdd>] dbuf_read+0x3fd/0x750 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01f4d84>] dmu_buf_will_dirty+0xa4/0x100 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa01fb930>] dmu_write+0x90/0x160 [zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa11c28f7>] osd_write_commit+0x417/0x570 [osd_zfs] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d4994>] ofd_commitrw_write+0x604/0xfd0 [ofd] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa15d588a>] ofd_commitrw+0x52a/0x8c0 [ofd] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12e971d>] obd_commitrw.clone.0+0x11d/0x390 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12f093e>] tgt_brw_write+0xc7e/0x1530 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa12ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc] Feb 26 02:27:43 client-16vm4 kernel: [<ffffffffa129e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]

            People

              isaac Isaac Huang (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: