[LU-7741] sanity-benchmark test_bonnie:test failed to respond and timed out Created: 04/Feb/16  Updated: 21/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

EL6.7 Server/EL6.7 Client - ZFS
Master, build# 3314


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/72d94f9c-cb47-11e5-a59a-5254006e85c2.

The sub-test test_bonnie failed with the following error:

test failed to respond and timed out

test log:

== sanity-benchmark test bonnie: bonnie++ == 01:44:57 (1454377497)
min OST has 1963520kB available, using 3844384kB file size
debug=0
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f32403]

mds dmesg:

Lustre: DEBUG MARKER: lctl --device %lustre-OST0005-osc-MDT0000 deactivate
Lustre: setting import lustre-OST0005_UUID INACTIVE by administrator request
Lustre: Skipped 1 previous similar message
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl --device %lustre-OST0005-osc-MDT0000 activate
LustreError: 167-0: lustre-OST0005-osc-MDT0000: This client was evicted by lustre-OST0005; in progress operations using this service will fail.
LustreError: Skipped 1 previous similar message
Lustre: DEBUG MARKER: lctl --device %lustre-OST0006-osc-MDT0000 deactivate
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl get_param -n lov.*md*.target_obd
Lustre: DEBUG MARKER: lctl --device %lustre-OST0006-osc-MDT0000 activate
Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
Lustre: DEBUG MARKER: rc=0;


 Comments   
Comment by Saurabh Tandan (Inactive) [ 08/Sep/16 ]

Another instance on master build# 3431:
https://testing.hpdd.intel.com/test_sets/89c04b0c-7317-11e6-b08e-5254006e85c2

 Out of memory: Kill process 229 (systemd-udevd) score 9 or sacrifice child
21:37:04:[   20.172084] Killed process 229 (systemd-udevd) total-vm:35160kB, anon-rss:420kB, file-rss:928kB
21:37:04:[   20.473792] sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
21:37:04:[   20.484949] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
21:37:04:[  242.959024] INFO: task systemd-udevd:229 blocked for more than 120 seconds.
21:37:04:[  242.967826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:37:04:[  242.977614] systemd-udevd   D ffffffffa02f3540     0   229    227 0x00100004
21:37:04:[  242.986556]  ffff880031bd7d10 0000000000000082 ffff880031a9d080 ffff880031bd7fd8
21:37:04:[  242.995924]  ffff880031bd7fd8 ffff880031bd7fd8 ffff880031a9d080 ffffffffffffffff
21:37:04:[  243.005284]  0000000000000000 0000000000000001 ffff880031bd7ef0 ffffffffa02f3540
21:37:04:[  243.014657] Call Trace:
21:37:04:[  243.018435]  [<ffffffff8163be89>] schedule+0x29/0x70
21:37:04:[  243.025037]  [<ffffffff810ad155>] async_synchronize_cookie_domain+0x85/0x150
21:37:04:[  243.033976]  [<ffffffff810a6b20>] ? wake_up_atomic_t+0x30/0x30
21:37:04:[  243.041541]  [<ffffffff810ad257>] async_synchronize_full+0x17/0x20
21:37:04:[  243.049491]  [<ffffffff810ed769>] load_module+0x13a9/0x1b50
21:37:04:[  243.056748]  [<ffffffff813173a0>] ? ddebug_proc_write+0xf0/0xf0
21:37:04:[  243.064362]  [<ffffffff810e99a3>] ? copy_module_from_fd.isra.42+0x53/0x150
21:37:04:[  243.073029]  [<ffffffff810ee0c6>] SyS_finit_module+0xa6/0xd0
21:37:04:[  243.080305]  [<ffffffff81646e89>] system_call_fastpath+0x16/0x1b
21:37:04:         Mountin[  310.823319] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Generated at Sat Feb 10 02:11:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.