Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6649

obdfilter-survey test_1a: lctl in D state

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.8.0, Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7
    • lustre-master build #3029
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/71df9008-fe72-11e4-a865-5254006e85c2.

      The sub-test test_1a failed with the following error:

      test failed to respond and timed out
      

      similar as LU-5775
      OST console:

      12:57:43:Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey == 12:00:21 (1432036821)
      12:57:43:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
      12:57:43:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
      12:57:43:INFO: task lctl:13285 blocked for more than 120 seconds.
      12:57:43:      Tainted: P           ---------------    2.6.32-504.16.2.el6_lustre.x86_64 #1
      12:57:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      12:57:43:lctl          D 0000000000000000     0 13285  13277 0x00000080
      12:57:43: ffff880070fe3768 0000000000000086 0000000000000000 ffffffff81064a2e
      12:57:43: ffff8800532a8b10 ffffffff00000000 000014516be02fbb 0000000000000001
      12:57:43: ffff880070fe3738 0000000101504d82 ffff8800541bc5f8 ffff880070fe3fd8
      12:57:43:Call Trace:
      12:57:43: [<ffffffff81064a2e>] ? try_to_wake_up+0x24e/0x3e0
      12:57:43: [<ffffffff8109edfe>] ? prepare_to_wait_exclusive+0x4e/0x80
      12:57:43: [<ffffffffa019e78d>] cv_wait_common+0x11d/0x130 [spl]
      12:57:43: [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40
      12:57:43: [<ffffffffa019e7f5>] __cv_wait+0x15/0x20 [spl]
      12:57:43: [<ffffffffa02556db>] txg_wait_open+0x8b/0xd0 [zfs]
      12:57:43: [<ffffffffa0213f27>] dmu_tx_wait+0x3f7/0x400 [zfs]
      12:57:43: [<ffffffffa02285da>] ? dsl_dir_tempreserve_space+0xca/0x190 [zfs]
      12:57:43: [<ffffffffa0214121>] dmu_tx_assign+0xa1/0x570 [zfs]
      12:57:43: [<ffffffffa1c51b3d>] osd_trans_start+0xed/0x430 [osd_zfs]
      12:57:43: [<ffffffffa1af3f0c>] ofd_trans_start+0x7c/0x100 [ofd]
      12:57:43: [<ffffffffa1afb7a3>] ofd_commitrw_write+0x543/0x1050 [ofd]
      12:57:43: [<ffffffffa1afc862>] ofd_commitrw+0x5b2/0xb00 [ofd]
      12:57:43: [<ffffffffa177211f>] echo_client_brw_ioctl+0xccf/0x1430 [obdecho]
      12:57:43: [<ffffffffa177472b>] echo_client_iocontrol+0x64b/0x29e0 [obdecho]
      12:57:43: [<ffffffff810b2a3d>] ? get_futex_key+0x18d/0x2d0
      12:57:43: [<ffffffff81174f6c>] ? __kmalloc+0x21c/0x230
      12:57:43: [<ffffffffa119ef91>] ? obd_ioctl_getdata+0xe1/0x1140 [obdclass]
      12:57:43: [<ffffffffa11b703c>] class_handle_ioctl+0x163c/0x21c0 [obdclass]
      12:57:43: [<ffffffff810b4d60>] ? do_futex+0x100/0xae0
      12:57:43: [<ffffffffa119e2ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
      12:57:43: [<ffffffff811a3ed2>] vfs_ioctl+0x22/0xa0
      12:57:43: [<ffffffff811a4074>] do_vfs_ioctl+0x84/0x580
      12:57:43: [<ffffffff810b57bb>] ? sys_futex+0x7b/0x170
      12:57:43: [<ffffffff811a45f1>] sys_ioctl+0x81/0xa0
      12:57:43: [<ffffffff810e5f9e>] ? __audit_syscall_exit+0x25e/0x290
      12:57:43: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      12:57:43:INFO: task lctl:13286 blocked for more than 120 seconds.
      12:57:43:      Tainted: P           ---------------    2.6.32-504.16.2.el6_lustre.x86_64 #1
      12:57:43:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      12:57:43:lctl          D 0000000000000001     0 13286  13277 0x00000080
      12:57:43: ffff8800477a5768 0000000000000086 0000000000000000 ffffffff81064a2e
      12:57:43: ffff8800532a8b10 ffffffff00000000 0000146709f18046 0000000000000001
      12:57:43: ffff8800477a5738 000000010151b82d ffff88006bee1ad8 ffff8800477a5fd8
      

      Attachments

        Issue Links

          Activity

            [LU-6649] obdfilter-survey test_1a: lctl in D state
            jcasper James Casper (Inactive) added a comment - 2.9.57, b3575: https://testing.hpdd.intel.com/test_sessions/edde2a3e-9ae8-434a-8170-b64e9e85529c

            I think the root cause should be same to LU-5242.

            niu Niu Yawei (Inactive) added a comment - I think the root cause should be same to LU-5242 .
            niu Niu Yawei (Inactive) added a comment - Hit on master: https://testing.hpdd.intel.com/test_sets/b809a044-99cd-11e6-a018-5254006e85c2 It failed on test_1c this time.

            Another instance found for Full tag 2.7.66 - EL6.7 Server/EL6.7 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2

            Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            Another instance found for Full tag 2.7.66 -EL6.7 Server/SLES11 SP3 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/fd4a8d5a-cce9-11e5-8b0e-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for Full tag 2.7.66 - EL6.7 Server/EL6.7 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2 Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2 Another instance found for Full tag 2.7.66 -EL6.7 Server/SLES11 SP3 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/fd4a8d5a-cce9-11e5-8b0e-5254006e85c2
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance for FULL - EL6.7 Server/EL6.7 Client - ZFS , master, build# 3314.
            https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2

            Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314
            https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance for FULL - EL6.7 Server/EL6.7 Client - ZFS , master, build# 3314. https://testing.hpdd.intel.com/test_sets/a6829740-cb47-11e5-a59a-5254006e85c2 Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314 https://testing.hpdd.intel.com/test_sets/e76d64e2-cb88-11e5-b49e-5254006e85c2

            Another instance found for interop : 2.5.5 Server/EL6.7 Client
            Server: 2.5.5, b2_5_fe/62
            Client: master, build# 3303, RHEL 6.7
            https://testing.hpdd.intel.com/test_sets/1676bc94-bb25-11e5-861c-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop : 2.5.5 Server/EL6.7 Client Server: 2.5.5, b2_5_fe/62 Client: master, build# 3303, RHEL 6.7 https://testing.hpdd.intel.com/test_sets/1676bc94-bb25-11e5-861c-5254006e85c2

            Another instance for EL6.7 Server/EL6.7 Client - ZFS
            Master, build# 3270
            https://testing.hpdd.intel.com/test_sets/a16f9ef6-a275-11e5-bdef-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance for EL6.7 Server/EL6.7 Client - ZFS Master, build# 3270 https://testing.hpdd.intel.com/test_sets/a16f9ef6-a275-11e5-bdef-5254006e85c2

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: