Oct 7 20:31:25 oss74 kernel: [2458757.006251] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Oct 7 20:31:25 oss74 kernel: [2458757.014978] sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 7 20:31:25 oss74 kernel: [2458757.022822] sd 0:0:2:0: [sdc] Sense Key : Illegal Request [current] Oct 7 20:31:25 oss74 kernel: [2458757.029848] Info fld=0x29000f00 Oct 7 20:31:25 oss74 kernel: [2458757.033414] sd 0:0:2:0: [sdc] Add. Sense: Logical block address out of range Oct 7 20:31:25 oss74 kernel: [2458757.041265] sd 0:0:2:0: [sdc] CDB: Write(10): 2a 00 29 00 0f 00 00 00 08 00 Oct 7 20:31:25 oss74 kernel: [2458757.050476] md/raid:md14: Disk failure on sdc, disabling device. Oct 7 20:31:25 oss74 kernel: [2458757.050478] md/raid:md14: Operation continuing on 9 devices. Oct 7 20:31:25 oss74 kernel: [2458757.337553] md: unbind Oct 7 20:31:25 oss74 kernel: [2458757.348410] md: export_rdev(sdaf) Oct 7 20:31:26 oss74 kernel: [2458757.400096] md: bind Oct 7 20:31:26 oss74 kernel: [2458757.425235] md: recovery of RAID array md14 Oct 7 20:31:26 oss74 kernel: [2458757.429850] md: minimum _guaranteed_ speed: 2000 KB/sec/disk. Oct 7 20:31:26 oss74 kernel: [2458757.436618] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Oct 7 20:31:26 oss74 kernel: [2458757.446961] md: using 128k window, over a total of 2930265408k. Oct 7 20:31:26 oss74 kernel: [2458757.453400] ------------[ cut here ]------------ Oct 7 20:31:26 oss74 kernel: [2458757.458497] WARNING: at drivers/md/raid5.c:3628 handle_stripe+0x2969/0x2980 [raid456]() (Not tainted) Oct 7 20:31:26 oss74 kernel: [2458757.468469] Hardware name: DCS8200 Oct 7 20:31:26 oss74 kernel: [2458757.453400] ------------[ cut here ]------------ Oct 7 20:31:26 oss74 kernel: [2458757.458497] WARNING: at drivers/md/raid5.c:3628 handle_stripe+0x2969/0x2980 [raid456]() (Not tainted) Oct 7 20:31:26 oss74 kernel: [2458757.468469] Hardware name: DCS8200 Oct 7 20:31:26 oss74 kernel: [2458757.472394] Modules linked in: ofd(U) lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc acpi_cpufreq freq_tabl e mperf rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_en(U) mlx4_core(U) iTCO_wdt iTCO_vendor_support dcdbas microcode sb_edac edac_core i2c_i801 shpchp lpc_ich mfd_core ioatdma ses enclosure sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 sd_mod crc_t10dif ahci mpt2sas(U) scsi_transport_sas raid_class wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Oct 7 20:31:26 oss74 kernel: [2458757.569892] Pid: 28229, comm: md14_resync Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 Oct 7 20:31:26 oss74 kernel: [2458757.578870] Call Trace: Oct 7 20:31:26 oss74 kernel: [2458757.581823] [] ? warn_slowpath_common+0x87/0xc0 Oct 7 20:31:26 oss74 kernel: [2458757.588469] [] ? warn_slowpath_null+0x1a/0x20 Oct 7 20:31:26 oss74 kernel: [2458757.595285] [] ? handle_stripe+0x2969/0x2980 [raid456] Oct 7 20:31:26 oss74 kernel: [2458757.602535] [] ? get_active_stripe+0x5ab/0x830 [raid456] Oct 7 20:31:26 oss74 kernel: [2458757.610280] [] ? default_wake_function+0x0/0x20 Oct 7 20:31:26 oss74 kernel: [2458757.617044] [] ? sync_request+0x102/0x3a0 [raid456] Oct 7 20:31:26 oss74 kernel: [2458757.624090] [] ? __wake_up+0x53/0x70 Oct 7 20:31:26 oss74 kernel: [2458757.629936] [] ? md_do_sync+0x8ac/0xcc0 Oct 7 20:31:26 oss74 kernel: [2458757.636146] [] ? md_thread+0x115/0x150 Oct 7 20:31:26 oss74 kernel: [2458757.642006] [] ? md_thread+0x0/0x150 Oct 7 20:31:26 oss74 kernel: [2458757.647848] [] ? kthread+0x96/0xa0 Oct 7 20:31:26 oss74 kernel: [2458757.653621] [] ? child_rip+0xa/0x20 Oct 7 20:31:26 oss74 kernel: [2458757.659326] [] ? kthread+0x0/0xa0 Oct 7 20:31:26 oss74 kernel: [2458757.664998] [] ? child_rip+0x0/0x20 Oct 7 20:31:26 oss74 kernel: [2458757.670711] ---[ end trace 5cbb014d6eeb09b2 ]--- Oct 7 20:31:26 oss74 kernel: [2458757.675870] ------------[ cut here ]------------ Oct 8 06:51:23 oss74 kernel: [2495968.518762] Pid: 28229, comm: md14_resync Tainted: G W --------------- 2.6.32-431.17.1.el6_lustre.x86_64 #1 Oct 8 06:51:23 oss74 kernel: [2495968.530926] Call Trace: Oct 8 06:51:23 oss74 kernel: [2495968.533837] [] ? warn_slowpath_common+0x87/0xc0 Oct 8 06:51:23 oss74 kernel: [2495968.540634] [] ? warn_slowpath_null+0x1a/0x20 Oct 8 06:51:23 oss74 kernel: [2495968.547230] [] ? handle_stripe+0x2969/0x2980 [raid456] Oct 8 06:51:23 oss74 kernel: [2495968.554523] [] ? get_active_stripe+0x5ab/0x830 [raid456] Oct 8 06:51:23 oss74 kernel: [2495968.562146] [] ? sync_request+0x102/0x3a0 [raid456] Oct 8 06:51:23 oss74 kernel: [2495968.569170] [] ? md_do_sync+0x8ac/0xcc0 Oct 8 06:51:23 oss74 kernel: [2495968.575039] [] ? autoremove_wake_function+0x0/0x40 Oct 8 06:51:23 oss74 kernel: [2495968.582041] [] ? md_thread+0x115/0x150 Oct 8 06:51:23 oss74 kernel: [2495968.588049] [] ? md_thread+0x0/0x150 Oct 8 06:51:23 oss74 kernel: [2495968.594084] [] ? kthread+0x96/0xa0 Oct 8 06:51:23 oss74 kernel: [2495968.599603] [] ? child_rip+0xa/0x20 Oct 8 06:51:23 oss74 kernel: [2495968.605277] [] ? kthread+0x0/0xa0 Oct 8 06:51:23 oss74 kernel: [2495968.610824] [] ? child_rip+0x0/0x20 Oct 8 06:51:23 oss74 kernel: [2495968.616782] ---[ end trace 5cbb014d6f132f8c ]--- Oct 8 06:51:23 oss74 kernel: [2495968.656032] md: export_rdev(sdc) Oct 8 06:51:23 oss74 abrt-dump-oops: Reported 8 kernel oopses to Abrt Oct 8 07:56:26 oss74 kernel: [2499873.442457] VFS: Quota for id 820202 referenced but not present. Oct 8 07:56:26 oss74 kernel: [2499873.449262] VFS: Can't read quota structure for id 820202. Oct 8 08:04:29 oss74 kernel: [2500356.210736] VFS: Quota for id 820202 referenced but not present. Oct 8 08:04:29 oss74 kernel: [2500356.217297] VFS: Can't read quota structure for id 820202. Oct 8 08:44:33 oss74 kernel: [2502761.169140] VFS: Quota for id 820202 referenced but not present. Oct 8 08:44:33 oss74 kernel: [2502761.175826] VFS: Can't read quota structure for id 820202. Oct 8 08:46:25 oss74 kernel: [2502873.045535] VFS: Quota for id 821882 referenced but not present. Oct 8 08:46:25 oss74 kernel: [2502873.052011] VFS: Can't read quota structure for id 821882. Oct 8 09:09:51 oss74 kernel: [2504279.563570] VFS: Quota for id 820202 referenced but not present. Oct 8 09:09:51 oss74 kernel: [2504279.570193] VFS: Can't read quota structure for id 820202. Oct 8 09:17:31 oss74 kernel: [2504740.361515] VFS: Quota for id 821720 referenced but not present. Oct 8 09:17:31 oss74 kernel: [2504740.368107] VFS: Can't read quota structure for id 821720. Oct 8 09:17:31 oss74 kernel: [2504740.374250] VFS: Quota for id 821720 referenced but not present. Oct 8 09:17:31 oss74 kernel: [2504740.380848] VFS: Can't read quota structure for id 821720. Oct 8 09:51:11 oss74 kernel: [2506760.257763] Lustre: scratch-OST0010: haven't heard from client ae4186f0-161a-2ad5-cf1c-b2c2fbd59a44 (at 192.168.90.184@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8803251de400, cur 1412779871 expire 1412779721 last 1412779644 Oct 8 09:51:11 oss74 kernel: [2506760.285233] Lustre: Skipped 5 previous similar messages Oct 8 09:58:17 oss74 kernel: [2507186.419152] Lustre: scratch-OST0011: haven't heard from client fe55a4fa-7991-926d-96cf-4722d9fe9626 (at 192.168.90.202@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8804e5e8f400, cur 1412780297 expire 1412780147 last 1412780070 Oct 8 09:58:17 oss74 kernel: [2507186.446917] Lustre: Skipped 5 previous similar messages Oct 8 09:58:41 oss74 kernel: [2507211.073615] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Oct 8 09:58:41 oss74 kernel: [2507211.086101] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Skipped 1 previous similar message Oct 8 09:58:41 oss74 kernel: [2507211.099148] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 192.168.90.202@o2ib (151): c: 6, oc: 0, rc: 8 Oct 8 09:58:41 oss74 kernel: [2507211.113639] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Skipped 1 previous similar message Oct 8 10:31:31 oss74 kernel: [2509181.163079] Lustre: scratch-OST000f: haven't heard from client 587326fb-47d6-d2f7-b33b-2b6b17479b14 (at 192.168.91.8@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff88054d848400, cur 1412782291 expire 1412782141 last 1412782064 Oct 8 10:31:31 oss74 kernel: [2509181.189502] Lustre: Skipped 5 previous similar messages Oct 8 10:32:47 oss74 kernel: [2509257.192771] Lustre: scratch-OST000d: haven't heard from client scratch-MDT0000-mdtlov_UUID (at 192.168.65.5@o2ib) in 209 seconds. I think it's dead, and I am evicting it. exp ffff880346d6a800, cur 1412782367 expire 1412782217 last 1412782158 Oct 8 10:32:47 oss74 kernel: [2509257.219661] Lustre: Skipped 5 previous similar messages Oct 8 10:40:43 oss74 kernel: [2509734.014516] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Oct 8 10:40:43 oss74 kernel: [2509734.026275] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 192.168.67.5@o2ib (155): c: 7, oc: 0, rc: 8 Oct 8 10:41:54 oss74 kernel: [2509804.399866] Lustre: scratch-OST000c: haven't heard from client 75a198bf-8d61-9292-0efb-4d0608cdb07c (at 192.168.67.5@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8805f73c0c00, cur 1412782914 expire 1412782764 last 1412782687 Oct 8 11:13:25 oss74 kernel: [2511696.746304] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Oct 8 11:13:25 oss74 kernel: [2511696.757239] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 192.168.66.102@o2ib (150): c: 6, oc: 0, rc: 8 Oct 8 11:14:42 oss74 kernel: [2511773.127927] Lustre: scratch-OST0011: haven't heard from client 944b9210-53e5-6352-e3e8-64c590ae80f4 (at 192.168.66.102@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff880160fd2c00, cur 1412784882 expire 1412784732 last 1412784655 Oct 8 11:14:42 oss74 kernel: [2511773.151601] Lustre: Skipped 5 previous similar messages Oct 8 11:14:59 oss74 kernel: [2511790.781364] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Oct 8 11:14:59 oss74 kernel: [2511790.793992] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 192.168.90.61@o2ib (155): c: 7, oc: 0, rc: 8 Oct 8 11:40:23 oss74 kernel: [2513314.928845] VFS: Quota for id 820202 referenced but not present. Oct 8 11:40:23 oss74 kernel: [2513314.935479] VFS: Can't read quota structure for id 820202. Oct 8 11:40:33 oss74 kernel: [2513325.588567] VFS: Quota for id 820202 referenced but not present. Oct 8 11:40:33 oss74 kernel: [2513325.595612] VFS: Can't read quota structure for id 820202. Oct 8 11:46:46 oss74 kernel: [2513698.479797] VFS: Quota for id 820202 referenced but not present. Oct 8 11:46:46 oss74 kernel: [2513698.486070] VFS: Can't read quota structure for id 820202. Oct 8 11:52:04 oss74 kernel: [2514016.151238] VFS: Quota for id 821152 referenced but not present. Oct 8 11:52:04 oss74 kernel: [2514016.158573] VFS: Can't read quota structure for id 821152. Oct 8 11:55:06 oss74 kernel: [2514198.886824] VFS: Quota for id 820322 referenced but not present. Oct 8 11:55:06 oss74 kernel: [2514198.895195] VFS: Can't read quota structure for id 820322. Oct 8 11:55:25 oss74 kernel: [2514217.055750] VFS: Quota for id 820322 referenced but not present. Oct 8 11:55:25 oss74 kernel: [2514217.063438] VFS: Can't read quota structure for id 820322. Oct 8 11:55:25 oss74 kernel: [2514217.658657] VFS: Quota for id 820322 referenced but not present. Oct 8 11:55:25 oss74 kernel: [2514217.666665] VFS: Can't read quota structure for id 820322. Oct 8 11:55:28 oss74 kernel: [2514220.408139] VFS: Quota for id 820322 referenced but not present. Oct 8 11:55:28 oss74 kernel: [2514220.415227] VFS: Can't read quota structure for id 820322. Oct 8 12:22:05 oss74 kernel: [2515818.253075] Lustre: scratch-OST000d: deleting orphan objects from 0x0:58240019 to 0x0:58240188 Oct 8 12:26:11 oss74 kernel: [2516063.727479] Lustre: scratch-OST0010: haven't heard from client scratch-MDT0000-mdtlov_UUID (at 192.168.65.5@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8801dd478800, cur 1412789171 expire 1412789021 last 1412788944 Oct 8 12:26:11 oss74 kernel: [2516063.753623] Lustre: Skipped 5 previous similar messages Oct 8 13:19:18 oss74 kernel: [2519252.164039] Lustre: scratch-OST000c: Client 09f62232-ae86-0e6f-26a7-c8e1db93a466 (at 192.168.67.185@o2ib) reconnecting Oct 8 13:19:18 oss74 kernel: [2519252.177552] Lustre: Skipped 1 previous similar message Oct 8 13:19:19 oss74 kernel: [2519253.493683] Lustre: scratch-OST000e: Client 09f62232-ae86-0e6f-26a7-c8e1db93a466 (at 192.168.67.185@o2ib) reconnecting Oct 8 13:19:19 oss74 kernel: [2519253.506673] Lustre: Skipped 3 previous similar messages Oct 8 13:19:37 oss74 kernel: [2519271.215718] Lustre: scratch-OST000c: Client 09f62232-ae86-0e6f-26a7-c8e1db93a466 (at 192.168.67.185@o2ib) reconnecting Oct 8 13:19:37 oss74 kernel: [2519271.228765] Lustre: Skipped 5 previous similar messages Oct 8 13:35:20 oss74 kernel: [2520214.284870] Lustre: scratch-OST000d: haven't heard from client 6b971760-3ad2-0583-782b-3104348844c4 (at 192.168.72.147@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8805a4c35000, cur 1412793320 expire 1412793170 last 1412793093 Oct 8 16:46:17 oss74 kernel: [2531675.555699] Lustre: scratch-OST0010: haven't heard from client 8e629f94-0e89-e47c-2455-27939e3c7eac (at 192.168.72.202@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff880710467800, cur 1412804777 expire 1412804627 last 1412804550 Oct 8 16:46:17 oss74 kernel: [2531675.583084] Lustre: Skipped 5 previous similar messages Oct 8 16:48:36 oss74 kernel: [2531815.249671] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Oct 8 16:48:36 oss74 kernel: [2531815.262630] LNetError: 18328:0:(o2iblnd_cb.c:3012:kiblnd_check_txs_locked()) Skipped 1 previous similar message Oct 8 16:48:36 oss74 kernel: [2531815.275606] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Timed out RDMA with 192.168.73.18@o2ib (160): c: 7, oc: 0, rc: 8 Oct 8 16:48:36 oss74 kernel: [2531815.290174] LNetError: 18328:0:(o2iblnd_cb.c:3075:kiblnd_check_conns()) Skipped 1 previous similar message Oct 8 16:49:45 oss74 kernel: [2531883.906922] LDISKFS-fs error (device md14): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 466corrupted: 15062 blocks free in bitmap, 5184 - in gd Oct 8 16:49:45 oss74 kernel: [2531883.922193] Oct 8 16:49:45 oss74 kernel: [2531883.923976] Aborting journal on device md24. Oct 8 16:49:45 oss74 kernel: [2531883.945587] LDISKFS-fs (md14): Remounting filesystem read-only Oct 8 16:49:45 oss74 kernel: [2531883.978721] LDISKFS-fs error (device md14) in ldiskfs_free_blocks: IO failure Oct 8 16:49:45 oss74 kernel: [2531883.986867] LDISKFS-fs error (device md14) in ldiskfs_reserve_inode_write: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531883.996498] LDISKFS-fs error (device md14) in ldiskfs_reserve_inode_write: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531884.005800] LDISKFS-fs error (device md14) in ldiskfs_ext_remove_space: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531884.014731] LDISKFS-fs error (device md14) in ldiskfs_reserve_inode_write: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531884.024042] LDISKFS-fs error (device md14) in ldiskfs_orphan_del: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531884.032511] LDISKFS-fs error (device md14) in ldiskfs_reserve_inode_write: Journal has aborted Oct 8 16:49:45 oss74 kernel: [2531884.041780] LDISKFS-fs error (device md14) in ldiskfs_ext_truncate: Journal has aborted ct 8 16:49:45 oss74 kernel: [2531884.050298] LustreError: 20410:0:(osd_io.c:1173:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30 Oct 8 16:49:45 oss74 kernel: [2531884.061797] LustreError: 20410:0:(osd_handler.c:1056:osd_trans_stop()) Failure in transaction hook: -30 Oct 8 16:49:45 oss74 kernel: [2531884.071928] LustreError: 20410:0:(osd_handler.c:1065:osd_trans_stop()) Failure to stop transaction: -30 Oct 8 16:49:45 oss74 kernel: [2531884.071972] LustreError: 18522:0:(osd_handler.c:863:osd_trans_commit_cb()) transaction @0xffff880710d601c0 commit error: 2 Oct 8 16:51:54 oss74 kernel: [2532013.585182] Lustre: Failing over scratch-OST0010