Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6649

obdfilter-survey test_1a: lctl in D state

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.8.0, Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7
    • lustre-master build #3029
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/71df9008-fe72-11e4-a865-5254006e85c2.

      The sub-test test_1a failed with the following error:

      test failed to respond and timed out
      

      similar as LU-5775
      OST console:

      12:57:43:Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey == 12:00:21 (1432036821)
      12:57:43:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
      12:57:43:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
      12:57:43:INFO: task lctl:13285 blocked for more than 120 seconds.
      12:57:43:      Tainted: P           ---------------    2.6.32-504.16.2.el6_lustre.x86_64 #1
      12:57:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      12:57:43:lctl          D 0000000000000000     0 13285  13277 0x00000080
      12:57:43: ffff880070fe3768 0000000000000086 0000000000000000 ffffffff81064a2e
      12:57:43: ffff8800532a8b10 ffffffff00000000 000014516be02fbb 0000000000000001
      12:57:43: ffff880070fe3738 0000000101504d82 ffff8800541bc5f8 ffff880070fe3fd8
      12:57:43:Call Trace:
      12:57:43: [<ffffffff81064a2e>] ? try_to_wake_up+0x24e/0x3e0
      12:57:43: [<ffffffff8109edfe>] ? prepare_to_wait_exclusive+0x4e/0x80
      12:57:43: [<ffffffffa019e78d>] cv_wait_common+0x11d/0x130 [spl]
      12:57:43: [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40
      12:57:43: [<ffffffffa019e7f5>] __cv_wait+0x15/0x20 [spl]
      12:57:43: [<ffffffffa02556db>] txg_wait_open+0x8b/0xd0 [zfs]
      12:57:43: [<ffffffffa0213f27>] dmu_tx_wait+0x3f7/0x400 [zfs]
      12:57:43: [<ffffffffa02285da>] ? dsl_dir_tempreserve_space+0xca/0x190 [zfs]
      12:57:43: [<ffffffffa0214121>] dmu_tx_assign+0xa1/0x570 [zfs]
      12:57:43: [<ffffffffa1c51b3d>] osd_trans_start+0xed/0x430 [osd_zfs]
      12:57:43: [<ffffffffa1af3f0c>] ofd_trans_start+0x7c/0x100 [ofd]
      12:57:43: [<ffffffffa1afb7a3>] ofd_commitrw_write+0x543/0x1050 [ofd]
      12:57:43: [<ffffffffa1afc862>] ofd_commitrw+0x5b2/0xb00 [ofd]
      12:57:43: [<ffffffffa177211f>] echo_client_brw_ioctl+0xccf/0x1430 [obdecho]
      12:57:43: [<ffffffffa177472b>] echo_client_iocontrol+0x64b/0x29e0 [obdecho]
      12:57:43: [<ffffffff810b2a3d>] ? get_futex_key+0x18d/0x2d0
      12:57:43: [<ffffffff81174f6c>] ? __kmalloc+0x21c/0x230
      12:57:43: [<ffffffffa119ef91>] ? obd_ioctl_getdata+0xe1/0x1140 [obdclass]
      12:57:43: [<ffffffffa11b703c>] class_handle_ioctl+0x163c/0x21c0 [obdclass]
      12:57:43: [<ffffffff810b4d60>] ? do_futex+0x100/0xae0
      12:57:43: [<ffffffffa119e2ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
      12:57:43: [<ffffffff811a3ed2>] vfs_ioctl+0x22/0xa0
      12:57:43: [<ffffffff811a4074>] do_vfs_ioctl+0x84/0x580
      12:57:43: [<ffffffff810b57bb>] ? sys_futex+0x7b/0x170
      12:57:43: [<ffffffff811a45f1>] sys_ioctl+0x81/0xa0
      12:57:43: [<ffffffff810e5f9e>] ? __audit_syscall_exit+0x25e/0x290
      12:57:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      12:57:43:INFO: task lctl:13286 blocked for more than 120 seconds.
      12:57:43:      Tainted: P           ---------------    2.6.32-504.16.2.el6_lustre.x86_64 #1
      12:57:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      12:57:43:lctl          D 0000000000000001     0 13286  13277 0x00000080
      12:57:43: ffff8800477a5768 0000000000000086 0000000000000000 ffffffff81064a2e
      12:57:43: ffff8800532a8b10 ffffffff00000000 0000146709f18046 0000000000000001
      12:57:43: ffff8800477a5738 000000010151b82d ffff88006bee1ad8 ffff8800477a5fd8
      

      Attachments

        Issue Links

          Activity

            [LU-6649] obdfilter-survey test_1a: lctl in D state

            An updated stack trace for 2.10.5 RC1 at https://testing.whamcloud.com/test_sets/0c4797ee-9bb9-11e8-8ee3-52540065bddc. The OSS console has

            [35623.218415] Lustre: Echo OBD driver; http://www.lustre.org/
            [37200.342923] INFO: task lctl:28554 blocked for more than 120 seconds.
            [37200.343656] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [37200.344422] lctl            D ffff8f117c520000     0 28554  28547 0x00000080
            [37200.345237] Call Trace:
            [37200.345574]  [<ffffffffb8314029>] schedule+0x29/0x70
            [37200.346111]  [<ffffffffb8311999>] schedule_timeout+0x239/0x2c0
            [37200.346763]  [<ffffffffb7c6814e>] ? kvm_clock_get_cycles+0x1e/0x20
            [37200.347409]  [<ffffffffb7cf7ed2>] ? ktime_get_ts64+0x52/0xf0
            [37200.347992]  [<ffffffffb831353d>] io_schedule_timeout+0xad/0x130
            [37200.348625]  [<ffffffffb7cbc1c6>] ? prepare_to_wait_exclusive+0x56/0x90
            [37200.349268]  [<ffffffffb83135d8>] io_schedule+0x18/0x20
            [37200.350017]  [<ffffffffc026b192>] cv_wait_common+0xb2/0x150 [spl]
            [37200.350591]  [<ffffffffb7cbc610>] ? wake_up_atomic_t+0x30/0x30
            [37200.351167]  [<ffffffffc026b268>] __cv_wait_io+0x18/0x20 [spl]
            [37200.352006]  [<ffffffffc042c023>] zio_wait+0x113/0x1c0 [zfs]
            [37200.352559]  [<ffffffffc03771f4>] dmu_buf_hold_array_by_dnode+0x154/0x4a0 [zfs]
            [37200.353317]  [<ffffffffc03775a9>] dmu_buf_hold_array_by_bonus+0x69/0x90 [zfs]
            [37200.354207]  [<ffffffffc10144f2>] osd_bufs_get+0x412/0xc60 [osd_zfs]
            [37200.354857]  [<ffffffffc11517fb>] ofd_preprw+0x6bb/0x1170 [ofd]
            [37200.355505]  [<ffffffffb7d9934e>] ? __get_free_pages+0xe/0x40
            [37200.356074]  [<ffffffffb7df4f9e>] ? kmalloc_order_trace+0x2e/0xa0
            [37200.356764]  [<ffffffffb7df8b41>] ? __kmalloc+0x211/0x230
            [37200.357300]  [<ffffffffc122217a>] echo_client_prep_commit.isra.49+0x33a/0xc30 [obdecho]
            [37200.358088]  [<ffffffffc1229ebf>] echo_client_iocontrol+0x95f/0x1be0 [obdecho]
            [37200.359298]  [<ffffffffc0b7f7b9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            [37200.360060]  [<ffffffffc0b6a619>] class_handle_ioctl+0x1939/0x1dd0 [obdclass]
            [37200.360728]  [<ffffffffb7dc7c3d>] ? handle_mm_fault+0x39d/0x9b0
            [37200.361369]  [<ffffffffb7ed0b1e>] ? security_capable+0x1e/0x20
            [37200.361938]  [<ffffffffc0b4f5d2>] obd_class_ioctl+0xd2/0x170 [obdclass]
            [37200.362631]  [<ffffffffb7e30350>] do_vfs_ioctl+0x350/0x560
            [37200.363176]  [<ffffffffb831b56c>] ? __do_page_fault+0x1bc/0x4f0
            [37200.363843]  [<ffffffffb7e30601>] SyS_ioctl+0xa1/0xc0
            [37200.364326]  [<ffffffffb83206d5>] ? system_call_after_swapgs+0xa2/0x146
            [37200.364949]  [<ffffffffb8320795>] system_call_fastpath+0x1c/0x21
            [37200.365593]  [<ffffffffb83206e1>] ? system_call_after_swapgs+0xae/0x146
            [37200.366232] INFO: task lctl:28556 blocked for more than 120 seconds.
            [37200.366905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            jamesanunez James Nunez (Inactive) added a comment - An updated stack trace for 2.10.5 RC1 at https://testing.whamcloud.com/test_sets/0c4797ee-9bb9-11e8-8ee3-52540065bddc . The OSS console has [35623.218415] Lustre: Echo OBD driver; http://www.lustre.org/ [37200.342923] INFO: task lctl:28554 blocked for more than 120 seconds. [37200.343656] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [37200.344422] lctl D ffff8f117c520000 0 28554 28547 0x00000080 [37200.345237] Call Trace: [37200.345574] [<ffffffffb8314029>] schedule+0x29/0x70 [37200.346111] [<ffffffffb8311999>] schedule_timeout+0x239/0x2c0 [37200.346763] [<ffffffffb7c6814e>] ? kvm_clock_get_cycles+0x1e/0x20 [37200.347409] [<ffffffffb7cf7ed2>] ? ktime_get_ts64+0x52/0xf0 [37200.347992] [<ffffffffb831353d>] io_schedule_timeout+0xad/0x130 [37200.348625] [<ffffffffb7cbc1c6>] ? prepare_to_wait_exclusive+0x56/0x90 [37200.349268] [<ffffffffb83135d8>] io_schedule+0x18/0x20 [37200.350017] [<ffffffffc026b192>] cv_wait_common+0xb2/0x150 [spl] [37200.350591] [<ffffffffb7cbc610>] ? wake_up_atomic_t+0x30/0x30 [37200.351167] [<ffffffffc026b268>] __cv_wait_io+0x18/0x20 [spl] [37200.352006] [<ffffffffc042c023>] zio_wait+0x113/0x1c0 [zfs] [37200.352559] [<ffffffffc03771f4>] dmu_buf_hold_array_by_dnode+0x154/0x4a0 [zfs] [37200.353317] [<ffffffffc03775a9>] dmu_buf_hold_array_by_bonus+0x69/0x90 [zfs] [37200.354207] [<ffffffffc10144f2>] osd_bufs_get+0x412/0xc60 [osd_zfs] [37200.354857] [<ffffffffc11517fb>] ofd_preprw+0x6bb/0x1170 [ofd] [37200.355505] [<ffffffffb7d9934e>] ? __get_free_pages+0xe/0x40 [37200.356074] [<ffffffffb7df4f9e>] ? kmalloc_order_trace+0x2e/0xa0 [37200.356764] [<ffffffffb7df8b41>] ? __kmalloc+0x211/0x230 [37200.357300] [<ffffffffc122217a>] echo_client_prep_commit.isra.49+0x33a/0xc30 [obdecho] [37200.358088] [<ffffffffc1229ebf>] echo_client_iocontrol+0x95f/0x1be0 [obdecho] [37200.359298] [<ffffffffc0b7f7b9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [37200.360060] [<ffffffffc0b6a619>] class_handle_ioctl+0x1939/0x1dd0 [obdclass] [37200.360728] [<ffffffffb7dc7c3d>] ? handle_mm_fault+0x39d/0x9b0 [37200.361369] [<ffffffffb7ed0b1e>] ? security_capable+0x1e/0x20 [37200.361938] [<ffffffffc0b4f5d2>] obd_class_ioctl+0xd2/0x170 [obdclass] [37200.362631] [<ffffffffb7e30350>] do_vfs_ioctl+0x350/0x560 [37200.363176] [<ffffffffb831b56c>] ? __do_page_fault+0x1bc/0x4f0 [37200.363843] [<ffffffffb7e30601>] SyS_ioctl+0xa1/0xc0 [37200.364326] [<ffffffffb83206d5>] ? system_call_after_swapgs+0xa2/0x146 [37200.364949] [<ffffffffb8320795>] system_call_fastpath+0x1c/0x21 [37200.365593] [<ffffffffb83206e1>] ? system_call_after_swapgs+0xae/0x146 [37200.366232] INFO: task lctl:28556 blocked for more than 120 seconds. [37200.366905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            sarah Sarah Liu added a comment - +1 on b2_10 https://testing.hpdd.intel.com/test_sets/bea30518-5c17-11e8-b303-52540065bddc
            sarah Sarah Liu added a comment -

            I suspect the error found on master is the same as LU-9247

            sarah Sarah Liu added a comment - I suspect the error found on master is the same as LU-9247
            jcasper James Casper (Inactive) added a comment - 2.9.57, b3575: https://testing.hpdd.intel.com/test_sessions/edde2a3e-9ae8-434a-8170-b64e9e85529c

            I think the root cause should be same to LU-5242.

            niu Niu Yawei (Inactive) added a comment - I think the root cause should be same to LU-5242 .
            niu Niu Yawei (Inactive) added a comment - Hit on master: https://testing.hpdd.intel.com/test_sets/b809a044-99cd-11e6-a018-5254006e85c2 It failed on test_1c this time.

            Another instance found for Full tag 2.7.66 - EL6.7 Server/EL6.7 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2

            Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            Another instance found for Full tag 2.7.66 -EL6.7 Server/SLES11 SP3 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/fd4a8d5a-cce9-11e5-8b0e-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for Full tag 2.7.66 - EL6.7 Server/EL6.7 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2 Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2 Another instance found for Full tag 2.7.66 -EL6.7 Server/SLES11 SP3 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/fd4a8d5a-cce9-11e5-8b0e-5254006e85c2
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance for FULL - EL6.7 Server/EL6.7 Client - ZFS , master, build# 3314.
            https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2

            Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance for FULL - EL6.7 Server/EL6.7 Client - ZFS , master, build# 3314. https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2 Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            Another instance found for interop : 2.5.5 Server/EL6.7 Client
            Server: 2.5.5, b2_5_fe/62
            Client: master, build# 3303, RHEL 6.7
            https://testing.hpdd.intel.com/test_sets/1676bc94-bb25-11e5-861c-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop : 2.5.5 Server/EL6.7 Client Server: 2.5.5, b2_5_fe/62 Client: master, build# 3303, RHEL 6.7 https://testing.hpdd.intel.com/test_sets/1676bc94-bb25-11e5-861c-5254006e85c2

            Another instance for EL6.7 Server/EL6.7 Client - ZFS
            Master, build# 3270
            https://testing.hpdd.intel.com/test_sets/a16f9ef6-a275-11e5-bdef-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance for EL6.7 Server/EL6.7 Client - ZFS Master, build# 3270 https://testing.hpdd.intel.com/test_sets/a16f9ef6-a275-11e5-bdef-5254006e85c2

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: