Loading...

Details

Type: Bug
Resolution: Not a Bug
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- soak
Environment:
lola
build: 2.8.50-6-gf9ca359 ;commit f9ca359284357d145819beb08b316e932f7a3060

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Error happened during soak testing of build '20160218' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160218). DNE is enabled.
MDT's have been formated using ldiskfs, OSTs using zfs.

Sequence of events:

2016-02-18 18:24:30,824:fsmgmt.fsmgmt:INFO executing cmd pm -h powerman -c lola-5 (restart of OSS)

Boot process hang by with several errors (see line 25015 in console-loa-5.log, after timestamp 'Feb 18, 18:20:01')

  25105 WARNING: Pool 'soaked-ost11' has encountered an uncorrectable I/O failure and has been suspended.
  25106 
  25107 INFO: task zpool:5003 blocked for more than 120 seconds.
  25108       Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf9ca359.x86_64 #1
  25109 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  25110 zpool         D 0000000000000011     0  5003   4993 0x00000000
  25111  ffff880830f7bbe8 0000000000000086 0000000000000000 ffffffff81064a6e
  25112  ffff880830f7bba8 0000000000000019 0000000d6e7b4a08 0000000000000001
  25113  ffff880830f7bb68 00000000fffc4649 ffff8808317c5068 ffff880830f7bfd8
  25114 Call Trace:
  25115  [<ffffffff81064a6e>] ? try_to_wake_up+0x24e/0x3e0
  25116  [<ffffffffa02e178d>] cv_wait_common+0x11d/0x130 [spl]
  25117  [<ffffffff8109ec20>] ? autoremove_wake_function+0x0/0x40
  25118  [<ffffffffa02e17f5>] __cv_wait+0x15/0x20 [spl]
  25119  [<ffffffffa039884b>] txg_wait_synced+0x8b/0xd0 [zfs]
  25120  [<ffffffffa039038c>] spa_config_update+0xcc/0x120 [zfs]
  25121  [<ffffffffa038de8a>] spa_import+0x56a/0x730 [zfs]
  25122  [<ffffffffa02fe454>] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
  25123  [<ffffffffa03c0134>] zfs_ioc_pool_import+0xe4/0x120 [zfs]
  25124  [<ffffffffa03c2955>] zfsdev_ioctl+0x495/0x4d0 [zfs]
  25125  [<ffffffff811a3ff2>] vfs_ioctl+0x22/0xa0
  25126  [<ffffffff811a4194>] do_vfs_ioctl+0x84/0x580
  25127  [<ffffffff81190101>] ? __fput+0x1a1/0x210
  25128  [<ffffffff811a4711>] sys_ioctl+0x81/0xa0
  25129  [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

After powercycling the node the zpool soaked-ost3 fails to mount with
error:

LustreError: 11505:0:(llog_obd.c:209:llog_setup()) MGC192.168.1.108@o2ib10: ctxt 0 lop_setup=ffffffffa06da310 failed: rc = -5
LustreError: 11505:0:(obd_mount_server.c:308:server_mgc_set_fs()) can't set_fs -5
LustreError: 11505:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -5
LustreError: 11505:0:(obd_mount_server.c:1512:server_put_super()) no obd soaked-OST0003
LustreError: 11505:0:(obd_mount_server.c:140:server_deregister_mount()) soaked-OST0003 not registered

The MGS is available, IB fabric operational

Trying to mount zpool soaked-ost7 lead to kernel panic:

LustreError: 11938:0:(obd_mount_server.c:140:server_deregister_mount()) soaked-OST0007 not registered
VERIFY3(0 == dmu_buf_hold_array(os, object, offset, size, 0, ((char *)__func__), &numbufs, &dbp)) failed (0 == 5)
PANIC at dmu.c:819:dmu_write()
Showing stack for process 9182
Pid: 9182, comm: txg_sync Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf9ca359.x86_64 #1
Call Trace:
 [<ffffffffa02df7cd>] ? spl_dumpstack+0x3d/0x40 [spl]
 [<ffffffffa02df9c2>] ? spl_panic+0xc2/0xe0 [spl]
 [<ffffffffa0349c51>] ? dmu_buf_hold_array_by_dnode+0x231/0x560 [zfs]
 [<ffffffffa035a8b4>] ? dnode_rele_and_unlock+0x64/0xb0 [zfs]
 [<ffffffffa035a943>] ? dnode_rele+0x43/0x50 [zfs]
 [<ffffffffa034a79b>] ? dmu_write+0x19b/0x1a0 [zfs]
 [<ffffffffa0342af2>] ? dmu_buf_will_dirty+0xb2/0x100 [zfs]
 [<ffffffffa0397421>] ? space_map_write+0x361/0x5f0 [zfs]
 [<ffffffffa037b01b>] ? metaslab_sync+0x11b/0x760 [zfs]
 [<ffffffffa0373cf4>] ? dsl_scan_sync+0x54/0xb80 [zfs]
 [<ffffffff8152b83e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa039be3f>] ? vdev_sync+0x6f/0x140 [zfs]
 [<ffffffffa03839bb>] ? spa_sync+0x4bb/0xb90 [zfs]
 [<ffffffff81057849>] ? __wake_up_common+0x59/0x90
 [<ffffffff8105bd83>] ? __wake_up+0x53/0x70
 [<ffffffff81014a29>] ? read_tsc+0x9/0x20
 [<ffffffffa0399079>] ? txg_sync_thread+0x389/0x5f0 [zfs]
 [<ffffffffa0398cf0>] ? txg_sync_thread+0x0/0x5f0 [zfs]
 [<ffffffffa0398cf0>] ? txg_sync_thread+0x0/0x5f0 [zfs]
 [<ffffffffa02dcfb8>] ? thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa02dcf50>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff8109e78e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20
 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

Both OSTs were mounted and operational before and both error can be reproduced constantly.

Attached messages and console log files of lola-5

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

console-lola-5.log.bz2
19/Feb/16 3:44 PM
246 kB
Frank Heckes
messages-lola-5.log.bz2
19/Feb/16 3:44 PM
200 kB
Frank Heckes

Issue Links

is related to

LU-7798 ll_prep_inode()) ASSERTION( fid_is_sane(&md.body->mbo_fid1) ) failed:

Resolved

LU-7585 Implement OI Scrub for ZFS

Resolved

LU-7134 Ensure ZFS hostid protection if servicenode/failover options given to mkfs.lustre

Resolved

Can't mount zpools after OSS restart

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates