[LU-2124] Test failure on test suite obdfilter-survey, subtest test_1a Created: 09/Oct/12 Updated: 04/Nov/13 Resolved: 04/Nov/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 5125 | ||||||||
| Description |
|
This issue was created by maloo for Li Wei <liwei@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/584999d6-1207-11e2-a663-52540035b04c. The sub-test test_1a failed with the following error:
Info required for matching: obdfilter-survey 1a |
| Comments |
| Comment by Nathaniel Clark [ 23/Jul/13 ] |
|
OST console log: 21:57:00:Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey =========================================== 21:56:49 (1349758609) 21:57:00:Lustre: DEBUG MARKER: lctl dl | grep obdfilter 21:57:00:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' 22:48:59:hrtimer: interrupt took 55369 ns |
| Comment by Jian Yu [ 09/Sep/13 ] |
|
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/45/ (2.4.1 RC2) obdfilter-survey test 1a hung as follows: == obdfilter-survey test 1a: Object Storage Targets survey == 23:44:44 (1378622684) CMD: client-24vm4 lctl dl | grep obdfilter CMD: client-24vm4 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' + NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=disk rslt_loc=/tmp targets="10.10.4.119:lustre-OST0000 10.10.4.119:lustre-OST0001 10.10.4.119:lustre-OST0002 10.10.4.119:lustre-OST0003 10.10.4.119:lustre-OST0004 10.10.4.119:lustre-OST0005 10.10.4.119:lustre-OST0006" /usr/bin/obdfilter-survey Warning: Permanently added '10.10.4.119' (RSA) to the list of known hosts. Sat Sep 7 23:44:51 PDT 2013 Obdfilter-survey for case=disk from client-24vm2.lab.whamcloud.com Dmesg on OSS node client-24vm4 showed that: lctl D 0000000000000000 0 19552 19496 0x00000080 ffff88001bf65748 0000000000000086 ffff8800ffffffff 0000126bad99a78e ffff880061356070 ffff8800618efec0 00000000003e8684 ffffffffadd3ec96 ffff88001fefdaf8 ffff88001bf65fd8 000000000000fb88 ffff88001fefdaf8 Call Trace: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff8150ed03>] io_schedule+0x73/0xc0 [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl] [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl] [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs] [<ffffffffa049f07d>] dmu_buf_hold_array_by_dnode+0x1dd/0x560 [zfs] [<ffffffffa049ff88>] dmu_buf_hold_array_by_bonus+0x68/0x90 [zfs] [<ffffffffa0dc1b33>] osd_bufs_get+0x493/0xa30 [osd_zfs] [<ffffffffa0e609cb>] ofd_preprw_read+0x14b/0x7f0 [ofd] [<ffffffffa0e617ea>] ofd_preprw+0x77a/0x1480 [ofd] [<ffffffffa05a7473>] echo_client_iocontrol+0x2003/0x3b40 [obdecho] [<ffffffff81281826>] ? vsnprintf+0x336/0x5e0 [<ffffffffa071049f>] class_handle_ioctl+0x12ff/0x1ec0 [obdclass] [<ffffffffa06f82ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff81195352>] vfs_ioctl+0x22/0xa0 [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff811954f4>] do_vfs_ioctl+0x84/0x580 [<ffffffff81195a71>] sys_ioctl+0x81/0xa0 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Maloo report: https://maloo.whamcloud.com/test_sets/f9d6f946-18ab-11e3-aa54-52540035b04c The same failure also occurred on previous Lustre b2_4 builds: |
| Comment by Jian Yu [ 01/Nov/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/46/ The same failure occurred: We'll see whether the timeout failure can disappear after TEI-790 is resolved. |
| Comment by Jian Yu [ 04/Nov/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/47/ With OSTCOUNT=2, obdfilter-survey test 1a passed: Let's close this ticket. |