[LU-7872] conf-sanity: test_50i 'test failed to respond and timed out' Created: 14/Mar/16  Updated: 22/Oct/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11556 conf-sanity test 32b crashes on MDT u... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Canary patch failed, during 'review-dne-part-1'

This issue was created by maloo for Richard Henwood <richard.henwood@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5e327dde-e86b-11e5-be76-5254006e85c2.

looks happy enough until:

...
CMD: trevis-45vm1.trevis.hpdd.intel.com /usr/sbin/lctl get_param -n mdc.lustre-MDT0001-mdc-[!M]*.active
CMD: trevis-45vm1.trevis.hpdd.intel.com /usr/sbin/lctl get_param -n mdc.lustre-MDT0001-mdc-[!M]*.active
Updated after 7s: wanted '0' got '0'
error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d50i.conf-sanity/2' (3): No such device
error: mkdir: create stripe dir '/mnt/lustre/d50i.conf-sanity/2' failed
umount lustre on /mnt/lustre.....
CMD: trevis-45vm1.trevis.hpdd.intel.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client trevis-45vm1.trevis.hpdd.intel.com /mnt/lustre (opts:)
CMD: trevis-45vm1.trevis.hpdd.intel.com lsof -t /mnt/lustre
CMD: trevis-45vm1.trevis.hpdd.intel.com umount  /mnt/lustre 2>&1
stop mds service on trevis-45vm7
CMD: trevis-45vm7 grep -c /mnt/mds1' ' /proc/mounts
...


 Comments   
Comment by James Nunez (Inactive) [ 14/Mar/16 ]

From the MDS1 and MDS3 console log, we see:

13:37:35:LustreError: 18333:0:(osp_dev.c:1259:osp_device_free()) } header@ffff8800451c2b40
13:37:35:
13:37:35:LustreError: 18333:0:(osp_dev.c:1259:osp_device_free()) header@ffff880040837b00[0x1, 1, [0x200000001:0x1017:0x0] hash exist]{
13:37:35:
13:37:35:LustreError: 18333:0:(osp_dev.c:1259:osp_device_free()) ....local_storage@ffff880040837b50
13:37:35:
13:37:35:LustreError: 18333:0:(osp_dev.c:1259:osp_device_free()) ....osd-ldiskfs@ffff8800451e5480osd-ldiskfs-object@ffff8800451e5480(i:ffff88007bb7f6e0:25001/3959569064)[plain]
13:37:35:
13:37:35:LustreError: 18333:0:(osp_dev.c:1259:osp_device_free()) } header@ffff880040837b00
13:37:35:
13:37:35:LustreError: 18333:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
13:37:35:LustreError: 18333:0:(lu_object.c:1224:lu_device_fini()) LBUG
13:37:35:Pid: 18333, comm: obd_zombid
13:37:35:
13:37:35:Call Trace:
13:37:35: [<ffffffffa06b6875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
13:37:35: [<ffffffffa06b6e77>] lbug_with_loc+0x47/0xb0 [libcfs]
13:37:35: [<ffffffffa0fadd38>] lu_device_fini+0xb8/0xc0 [obdclass]
13:37:35: [<ffffffffa0fb36ce>] dt_device_fini+0xe/0x10 [obdclass]
13:37:35: [<ffffffffa185f196>] osp_device_free+0x96/0x180 [osp]
13:37:35: [<ffffffffa0f98a2d>] class_decref+0x3dd/0x4c0 [obdclass]
13:37:35: [<ffffffffa0f84b21>] obd_zombie_impexp_cull+0x611/0x970 [obdclass]
13:37:35: [<ffffffffa0f84ee5>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
13:37:35: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
13:37:35: [<ffffffffa0f84e80>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
13:37:35: [<ffffffff810a0fce>] kthread+0x9e/0xc0
13:37:35: [<ffffffff8100c28a>] child_rip+0xa/0x20
13:37:35: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
13:37:35: [<ffffffff8100c280>] ? child_rip+0x0/0x20
13:37:35:
13:37:35:LustreError: 4510:0:(mdt_handler.c:4395:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: 
13:37:35:LustreError: 4510:0:(mdt_handler.c:4395:mdt_fini()) LBUG
13:37:35:Pid: 4510, comm: umount
13:37:35:
13:37:35:Call Trace:
13:37:35: [<ffffffffa06b6875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
13:37:35: [<ffffffffa06b6e77>] lbug_with_loc+0x47/0xb0 [libcfs]
13:37:35: [<ffffffffa17141ba>] mdt_device_fini+0x121a/0x12e0 [mdt]
13:37:35: [<ffffffffa0f85b1d>] ? class_disconnect_exports+0x17d/0x2f0 [obdclass]
13:37:35: [<ffffffffa0f9e302>] class_cleanup+0x572/0xd20 [obdclass]
13:37:35: [<ffffffffa0f81336>] ? class_name2dev+0x56/0xe0 [obdclass]
13:37:35: [<ffffffffa0fa0616>] class_process_config+0x1b66/0x24c0 [obdclass]
13:37:35: [<ffffffffa06c1cf1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
13:37:35: [<ffffffffa0fa142f>] class_manual_cleanup+0x4bf/0xc90 [obdclass]
13:37:35: [<ffffffffa0f81336>] ? class_name2dev+0x56/0xe0 [obdclass]
13:37:35: [<ffffffffa0fd29ec>] server_put_super+0x8bc/0xcd0 [obdclass]
13:37:35: [<ffffffff811946eb>] generic_shutdown_super+0x5b/0xe0
13:37:35: [<ffffffff811947d6>] kill_anon_super+0x16/0x60
13:37:35: [<ffffffffa0fa4616>] lustre_kill_super+0x36/0x60 [obdclass]
13:37:35: [<ffffffff81194f77>] deactivate_super+0x57/0x80
13:37:35: [<ffffffff811b4f5f>] mntput_no_expire+0xbf/0x110
13:37:35: [<ffffffff811b5aab>] sys_umount+0x7b/0x3a0
13:37:35: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
13:37:35:
13:37:35:Kernel panic - not syncing: LBUG
13:37:35:Pid: 4510, comm: umount Not tainted 2.6.32-573.18.1.el6_lustre.ge5f28dc.x86_64 #1
13:37:35:Call Trace:
13:37:35: [<ffffffff81539011>] ? panic+0xa7/0x16f
13:37:35: [<ffffffffa06b6ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
13:37:35: [<ffffffffa17141ba>] ? mdt_device_fini+0x121a/0x12e0 [mdt]
13:37:35: [<ffffffffa0f85b1d>] ? class_disconnect_exports+0x17d/0x2f0 [obdclass]
13:37:35: [<ffffffffa0f9e302>] ? class_cleanup+0x572/0xd20 [obdclass]
13:37:35: [<ffffffffa0f81336>] ? class_name2dev+0x56/0xe0 [obdclass]
13:37:35: [<ffffffffa0fa0616>] ? class_process_config+0x1b66/0x24c0 [obdclass]
13:37:35: [<ffffffffa06c1cf1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
13:37:35: [<ffffffffa0fa142f>] ? class_manual_cleanup+0x4bf/0xc90 [obdclass]
13:37:35: [<ffffffffa0f81336>] ? class_name2dev+0x56/0xe0 [obdclass]
13:37:35: [<ffffffffa0fd29ec>] ? server_put_super+0x8bc/0xcd0 [obdclass]
13:37:35: [<ffffffff811946eb>] ? generic_shutdown_super+0x5b/0xe0
13:37:35: [<ffffffff811947d6>] ? kill_anon_super+0x16/0x60
13:37:35: [<ffffffffa0fa4616>] ? lustre_kill_super+0x36/0x60 [obdclass]
13:37:35: [<ffffffff81194f77>] ? deactivate_super+0x57/0x80
13:37:35: [<ffffffff811b4f5f>] ? mntput_no_expire+0xbf/0x110
13:37:35: [<ffffffff811b5aab>] ? sys_umount+0x7b/0x3a0
13:37:35: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
13:37:35:Initializing cgroup subsys cpuset
13:37:35:Initializing cgroup subsys cpu
Comment by nasf (Inactive) [ 17/Mar/16 ]

Another failure instance on master:
https://testing.hpdd.intel.com/test_sessions/fdcbeeb6-ebb8-11e5-93cc-5254006e85c2

Generated at Sat Feb 10 02:12:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.