[LU-12258] sanity test_101d timeout when doing rolling upgrade OSS from 2.10.7 to 2.12.1 with ZFS Created: 01/May/19 Updated: 20/May/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
1. setup system with 2.10.7 with 1 MDS (ZFS), 2 OSTs (ZFS), 1 client [ 3268.291244] Lustre: DEBUG MARKER: == sanity test 101d: file read with and witt hout read-ahead enabled =================================== 00:21:09 (15566700699 ) [ 3280.980154] WARNING: MMP writes to pool 'lustre-ost1' have not succeeded in oo ver 5s; suspending pool [ 3280.981448] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended. [ 3281.091886] WARNING: MMP writes to pool 'lustre-ost2' have not succeeded in oo ver 5s; suspending pool [ 3281.092868] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended. [ 3474.405076] LNet: Service thread pid 30189 was inactive for 200.40s. The three ad might be hung, or it might only be slow and will resume later. Dumping the stt ack trace for debugging purposes: [ 3474.409289] Pid: 30189, comm: ll_ost_io00_000 3.10.0-957.10.1.el7_lustre.x86__ 64 #1 SMP Mon Apr 22 22:25:47 UTC 2019 [ 3474.410361] Call Trace: [ 3474.410361] Call Trace: [ 3474.410694] [<ffffffffc07922d5>] cv_wait_common+0x125/0x150 [spl] [ 3474.411403] [<ffffffffc0792315>] __cv_wait+0x15/0x20 [spl] [ 3474.412004] [<ffffffffc08d32bf>] txg_wait_synced+0xef/0x140 [zfs] [ 3474.412817] [<ffffffffc0888c95>] dmu_tx_wait+0x275/0x3c0 [zfs] [ 3474.413488] [<ffffffffc0888e72>] dmu_tx_assign+0x92/0x490 [zfs] [ 3474.414163] [<ffffffffc11f6009>] osd_trans_start+0x199/0x440 [osd_zfs] [ 3474.414896] [<ffffffffc131cc85>] ofd_trans_start+0x75/0xf0 [ofd] [ 3474.415596] [<ffffffffc1323881>] ofd_commitrw_write+0xa31/0x1d40 [ofd] [ 3474.416312] [<ffffffffc1327c6c>] ofd_commitrw+0x48c/0x9e0 [ofd] [ 3474.416962] [<ffffffffc102947c>] tgt_brw_write+0x10cc/0x1cf0 [ptlrpc] [ 3474.417923] [<ffffffffc10251da>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [ 3474.418699] [<ffffffffc0fca80b>] ptlrpc_server_handle_request+0x24b/0xab0 [pp tlrpc] [ 3474.419550] [<ffffffffc0fce13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [ 3474.420261] [<ffffffffa0cc1c71>] kthread+0xd1/0xe0 [ 3474.420817] [<ffffffffa1375c37>] ret_from_fork_nospec_end+0x0/0x39 [ 3474.421507] [<ffffffffffffffff>] 0xffffffffffffffff |
| Comments |
| Comment by James Nunez (Inactive) [ 20/May/19 ] |
|
I see a very similar issue in 2.10.8 RC1 full testing with ZFS in sanity-benchmark test bonnie with logs at https://testing.whamcloud.com/test_sets/0f1e6704-7606-11e9-92d8-52540065bddc. |