[LU-6739] EL7 mds-survey test_1: mds-survey failed Created: 17/Jun/15  Updated: 23/Jun/15  Resolved: 18/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server and client: lustre-master build # 3071 EL7


Issue Links:
Duplicate
duplicates LU-6722 sanity-lfsck test_1a: FAIL: (3) Fail ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ea948bde-135d-11e5-b4b0-5254006e85c2.

The sub-test test_1 failed with the following error:

mds-survey failed

There is no log for MDS at all, this failure blocks all the following tests from being run. test log shows

====> Destroy 4 directories on onyx-42vm3:lustre-MDT0000_ecc
ssh_exchange_identification: Connection closed by remote host
Mon Jun 15 05:04:43 PDT 2015 /usr/bin/mds-survey from onyx-42vm6.onyx.hpdd.intel.com
mdt 1 file  103011 dir    4 thr    4 create 18976.80 [ 11998.67, 23975.59] lookup 397752.68 [ 397752.68, 397752.68] md_getattr 294970.99 [ 294970.99, 294970.99] setxattr 1069.79 [    0.00, 7999.06] destroy             ERROR 
mdt 1 file  103011 dir    4 thr    8 create             ERROR lookup             ERROR md_getattr             ERROR setxattr             ERROR destroy             ERROR 
starting run for config:  test: create  file: 103011 threads: 4  directories: 4
starting run for config:  test: lookup  file: 103011 threads: 4  directories: 4
starting run for config:  test: md_getattr  file: 103011 threads: 4  directories: 4
starting run for config:  test: setxattr  file: 103011 threads: 4  directories: 4
starting run for config:  test: destroy  file: 103011 threads: 4  directories: 4
starting run for config:  test: create  file: 103011 threads: 8  directories: 4
starting run for config:  test: lookup  file: 103011 threads: 8  directories: 4
starting run for config:  test: md_getattr  file: 103011 threads: 8  directories: 4
starting run for config:  test: setxattr  file: 103011 threads: 8  directories: 4
starting run for config:  test: destroy  file: 103011 threads: 8  directories: 4
 mds-survey test_1: @@@@@@ FAIL: mds-survey failed 


 Comments   
Comment by Andreas Dilger [ 18/Jun/15 ]

The console logs are under "lustre-provisioning" from before mds-survey:

12:06:51:[ 1721.271559] WARNING: at lustre-2.7.55/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
12:06:51:[ 1721.294943] CPU: 1 PID: 5479 Comm: lctl Tainted: GF          O--------------   3.10.0-229.4.2.el7_lustre.x86_64 #1
12:06:51:[ 1721.297354] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
12:06:52:[ 1721.306555] Call Trace:
12:06:52:[ 1721.308459]  [<ffffffff816050da>] dump_stack+0x19/0x1b
12:06:52:[ 1721.310520]  [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
12:06:52:[ 1721.312659]  [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
12:06:52:[ 1721.314897]  [<ffffffffa05616b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
12:06:52:[ 1721.319529]  [<ffffffffa0584659>] ldiskfs_free_blocks+0x5c9/0xb90 [ldiskfs]
12:06:53:[ 1721.321814]  [<ffffffffa0578f75>] ldiskfs_xattr_release_block+0x275/0x330 [ldiskfs]
12:06:53:[ 1721.324060]  [<ffffffffa057c1ab>] ldiskfs_xattr_delete_inode+0x2bb/0x300 [ldiskfs]
12:06:53:[ 1721.326316]  [<ffffffffa0576ad5>] ldiskfs_evict_inode+0x1b5/0x610 [ldiskfs]
12:06:53:[ 1721.328683]  [<ffffffff811e23d7>] evict+0xa7/0x170
12:06:53:[ 1721.330790]  [<ffffffff811e2c15>] iput+0xf5/0x180
12:06:53:[ 1721.332864]  [<ffffffffa0ba3e73>] osd_object_delete+0x1d3/0x300 [osd_ldiskfs]
12:06:53:[ 1721.335175]  [<ffffffffa07586ad>] lu_object_free.isra.30+0x9d/0x1a0 [obdclass]
12:06:53:[ 1721.337494]  [<ffffffffa0758872>] lu_object_put+0xc2/0x320 [obdclass]
12:06:54:[ 1721.339735]  [<ffffffffa0f2a6d7>] echo_md_destroy_internal+0xe7/0x520 [obdecho]
12:06:54:[ 1721.342007]  [<ffffffffa0f3217a>] echo_md_handler.isra.43+0x191a/0x2250 [obdecho]
12:06:54:[ 1721.348581]  [<ffffffffa0f34766>] echo_client_iocontrol+0x1146/0x1d10 [obdecho]
12:06:54:[ 1721.354898]  [<ffffffffa0724d1c>] class_handle_ioctl+0x1b3c/0x22b0 [obdclass]
12:06:54:[ 1721.358813]  [<ffffffffa070a5e2>] obd_class_ioctl+0xd2/0x170 [obdclass]
12:06:54:[ 1721.360799]  [<ffffffff811da2c5>] do_vfs_ioctl+0x2e5/0x4c0
12:06:54:[ 1721.364564]  [<ffffffff811da541>] SyS_ioctl+0xa1/0xc0
12:06:55:[ 1721.366357]  [<ffffffff81615029>] system_call_fastpath+0x16/0x1b
12:06:55:[ 1721.368204] ---[ end trace aed93badbc88e370 ]---
12:06:55:[ 1721.370058] LDISKFS-fs: ldiskfs_free_blocks:5107: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
12:06:55:[ 1721.372395] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 5 started at line 240, credits 3/0, errcode -28
12:06:55:[ 1721.385026] LDISKFS-fs error (device dm-0) in ldiskfs_free_blocks:5123: error 28

Looks like the same journal size problem as LU-6722.

Generated at Sat Feb 10 02:02:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.