Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2124

Test failure on test suite obdfilter-survey, subtest test_1a

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.4.0, Lustre 2.4.1
    • 3
    • 5125

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/584999d6-1207-11e2-a663-52540035b04c.

      The sub-test test_1a failed with the following error:

      test failed to respond and timed out

      Info required for matching: obdfilter-survey 1a

      Attachments

        Issue Links

          Activity

            [LU-2124] Test failure on test suite obdfilter-survey, subtest test_1a
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/47/
            FSTYPE=zfs

            With OSTCOUNT=2, obdfilter-survey test 1a passed:
            https://maloo.whamcloud.com/test_sets/a488f632-4453-11e3-8472-52540035b04c

            Let's close this ticket.

            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/47/ FSTYPE=zfs With OSTCOUNT=2, obdfilter-survey test 1a passed: https://maloo.whamcloud.com/test_sets/a488f632-4453-11e3-8472-52540035b04c Let's close this ticket.
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/46/
            FSTYPE=zfs

            The same failure occurred:
            https://maloo.whamcloud.com/test_sets/42ba0f84-3064-11e3-b28a-52540035b04c

            We'll see whether the timeout failure can disappear after TEI-790 is resolved.

            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/46/ FSTYPE=zfs The same failure occurred: https://maloo.whamcloud.com/test_sets/42ba0f84-3064-11e3-b28a-52540035b04c We'll see whether the timeout failure can disappear after TEI-790 is resolved.
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2)
            Distro/Arch: RHEL6.4/x86_64
            FSTYPE=zfs

            obdfilter-survey test 1a hung as follows:

            == obdfilter-survey test 1a: Object Storage Targets survey == 23:44:44 (1378622684)
            CMD: client-24vm4 lctl dl | grep obdfilter
            CMD: client-24vm4 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
            + NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=disk rslt_loc=/tmp targets="10.10.4.119:lustre-OST0000 10.10.4.119:lustre-OST0001 10.10.4.119:lustre-OST0002 10.10.4.119:lustre-OST0003 10.10.4.119:lustre-OST0004 10.10.4.119:lustre-OST0005 10.10.4.119:lustre-OST0006" /usr/bin/obdfilter-survey
            Warning: Permanently added '10.10.4.119' (RSA) to the list of known hosts.
            Sat Sep  7 23:44:51 PDT 2013 Obdfilter-survey for case=disk from client-24vm2.lab.whamcloud.com
            

            Dmesg on OSS node client-24vm4 showed that:

            lctl          D 0000000000000000     0 19552  19496 0x00000080
             ffff88001bf65748 0000000000000086 ffff8800ffffffff 0000126bad99a78e
             ffff880061356070 ffff8800618efec0 00000000003e8684 ffffffffadd3ec96
             ffff88001fefdaf8 ffff88001bf65fd8 000000000000fb88 ffff88001fefdaf8
            Call Trace:
             [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
             [<ffffffff8150ed03>] io_schedule+0x73/0xc0
             [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl]
             [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
             [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl]
             [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs]
             [<ffffffffa049f07d>] dmu_buf_hold_array_by_dnode+0x1dd/0x560 [zfs]
             [<ffffffffa049ff88>] dmu_buf_hold_array_by_bonus+0x68/0x90 [zfs]
             [<ffffffffa0dc1b33>] osd_bufs_get+0x493/0xa30 [osd_zfs]
             [<ffffffffa0e609cb>] ofd_preprw_read+0x14b/0x7f0 [ofd]
             [<ffffffffa0e617ea>] ofd_preprw+0x77a/0x1480 [ofd]
             [<ffffffffa05a7473>] echo_client_iocontrol+0x2003/0x3b40 [obdecho]
             [<ffffffff81281826>] ? vsnprintf+0x336/0x5e0
             [<ffffffffa071049f>] class_handle_ioctl+0x12ff/0x1ec0 [obdclass]
             [<ffffffffa06f82ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
             [<ffffffff81195352>] vfs_ioctl+0x22/0xa0
             [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0
             [<ffffffff811954f4>] do_vfs_ioctl+0x84/0x580
             [<ffffffff81195a71>] sys_ioctl+0x81/0xa0
             [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
             [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            

            Maloo report: https://maloo.whamcloud.com/test_sets/f9d6f946-18ab-11e3-aa54-52540035b04c

            The same failure also occurred on previous Lustre b2_4 builds:
            https://maloo.whamcloud.com/test_sets/0ce085aa-169c-11e3-aa2a-52540035b04c
            https://maloo.whamcloud.com/test_sets/92b17690-16b4-11e3-8c83-52540035b04c
            https://maloo.whamcloud.com/test_sets/f1befe08-1657-11e3-aa2a-52540035b04c
            https://maloo.whamcloud.com/test_sets/ecb6f352-1409-11e3-980d-52540035b04c
            https://maloo.whamcloud.com/test_sets/e10c9e46-13f3-11e3-9e61-52540035b04c

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2) Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs obdfilter-survey test 1a hung as follows: == obdfilter-survey test 1a: Object Storage Targets survey == 23:44:44 (1378622684) CMD: client-24vm4 lctl dl | grep obdfilter CMD: client-24vm4 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' + NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=disk rslt_loc=/tmp targets="10.10.4.119:lustre-OST0000 10.10.4.119:lustre-OST0001 10.10.4.119:lustre-OST0002 10.10.4.119:lustre-OST0003 10.10.4.119:lustre-OST0004 10.10.4.119:lustre-OST0005 10.10.4.119:lustre-OST0006" /usr/bin/obdfilter-survey Warning: Permanently added '10.10.4.119' (RSA) to the list of known hosts. Sat Sep 7 23:44:51 PDT 2013 Obdfilter-survey for case=disk from client-24vm2.lab.whamcloud.com Dmesg on OSS node client-24vm4 showed that: lctl D 0000000000000000 0 19552 19496 0x00000080 ffff88001bf65748 0000000000000086 ffff8800ffffffff 0000126bad99a78e ffff880061356070 ffff8800618efec0 00000000003e8684 ffffffffadd3ec96 ffff88001fefdaf8 ffff88001bf65fd8 000000000000fb88 ffff88001fefdaf8 Call Trace: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff8150ed03>] io_schedule+0x73/0xc0 [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl] [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl] [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs] [<ffffffffa049f07d>] dmu_buf_hold_array_by_dnode+0x1dd/0x560 [zfs] [<ffffffffa049ff88>] dmu_buf_hold_array_by_bonus+0x68/0x90 [zfs] [<ffffffffa0dc1b33>] osd_bufs_get+0x493/0xa30 [osd_zfs] [<ffffffffa0e609cb>] ofd_preprw_read+0x14b/0x7f0 [ofd] [<ffffffffa0e617ea>] ofd_preprw+0x77a/0x1480 [ofd] [<ffffffffa05a7473>] echo_client_iocontrol+0x2003/0x3b40 [obdecho] [<ffffffff81281826>] ? vsnprintf+0x336/0x5e0 [<ffffffffa071049f>] class_handle_ioctl+0x12ff/0x1ec0 [obdclass] [<ffffffffa06f82ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff81195352>] vfs_ioctl+0x22/0xa0 [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff811954f4>] do_vfs_ioctl+0x84/0x580 [<ffffffff81195a71>] sys_ioctl+0x81/0xa0 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Maloo report: https://maloo.whamcloud.com/test_sets/f9d6f946-18ab-11e3-aa54-52540035b04c The same failure also occurred on previous Lustre b2_4 builds: https://maloo.whamcloud.com/test_sets/0ce085aa-169c-11e3-aa2a-52540035b04c https://maloo.whamcloud.com/test_sets/92b17690-16b4-11e3-8c83-52540035b04c https://maloo.whamcloud.com/test_sets/f1befe08-1657-11e3-aa2a-52540035b04c https://maloo.whamcloud.com/test_sets/ecb6f352-1409-11e3-980d-52540035b04c https://maloo.whamcloud.com/test_sets/e10c9e46-13f3-11e3-9e61-52540035b04c

            OST console log:

            21:57:00:Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey =========================================== 21:56:49 (1349758609)
            21:57:00:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
            21:57:00:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
            22:48:59:hrtimer: interrupt took 55369 ns
            
            utopiabound Nathaniel Clark added a comment - OST console log: 21:57:00:Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey =========================================== 21:56:49 (1349758609) 21:57:00:Lustre: DEBUG MARKER: lctl dl | grep obdfilter 21:57:00:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' 22:48:59:hrtimer: interrupt took 55369 ns

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: