Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.8.0
-
Hyperion /SWL 2.7.61 review build 35536 (patch http://review.whamcloud.com/17053 - Revert "
LU-4865zfs: grow block size by write pattern")
-
3
-
9223372036854775807
Description
Running SWL, OSS has repeated timeouts
Nov 5 15:23:57 iws9 kernel: LNet: Service thread pid 23042 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 5 15:23:57 iws9 kernel: Pid: 23042, comm: ll_ost00_004 Nov 5 15:23:57 iws9 kernel: Nov 5 15:23:57 iws9 kernel: Call Trace: Nov 5 15:23:57 iws9 kernel: [<ffffffffa067c380>] ? vdev_mirror_child_done+0x0/0x30 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffff815395c3>] io_schedule+0x73/0xc0 Nov 5 15:23:57 iws9 kernel: [<ffffffffa05b2f8f>] cv_wait_common+0xaf/0x130 [spl] Nov 5 15:23:57 iws9 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 Nov 5 15:23:57 iws9 kernel: [<ffffffffa05b3028>] __cv_wait_io+0x18/0x20 [spl] Nov 5 15:23:57 iws9 kernel: [<ffffffffa06bd2eb>] zio_wait+0x10b/0x1e0 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0614939>] dbuf_read+0x439/0x850 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0614ef1>] __dbuf_hold_impl+0x1a1/0x4f0 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa06152bd>] dbuf_hold_impl+0x7d/0xb0 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0616790>] dbuf_hold+0x20/0x30 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa061d0d7>] dmu_buf_hold_noread+0x87/0x140 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa061d1cb>] dmu_buf_hold+0x3b/0x90 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0612fb8>] ? dbuf_rele_and_unlock+0x268/0x400 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0686e5a>] zap_lockdir+0x5a/0x770 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffff81178fcd>] ? kmem_cache_alloc_node_trace+0x1cd/0x200 Nov 5 15:23:57 iws9 kernel: [<ffffffffa06889ca>] zap_lookup_norm+0x4a/0x190 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0688ba3>] zap_lookup+0x33/0x40 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa062cc76>] dmu_tx_hold_zap+0x146/0x210 [zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa1034255>] osd_declare_object_create+0x2a5/0x440 [osd_zfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa11738e4>] ofd_precreate_objects+0x4e4/0x19d0 [ofd] Nov 5 15:23:57 iws9 kernel: [<ffffffffa04b4b61>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Nov 5 15:23:57 iws9 kernel: [<ffffffffa1180a9b>] ? ofd_grant_create+0x23b/0x3e0 [ofd] Nov 5 15:23:57 iws9 kernel: [<ffffffffa116384e>] ofd_create_hdl+0x56e/0x2640 [ofd] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0c28e80>] ? lustre_pack_reply_v2+0x220/0x280 [ptlrpc] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0c930ec>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0c3a9e1>] ptlrpc_main+0xe41/0x1910 [ptlrpc] Nov 5 15:23:57 iws9 kernel: [<ffffffffa0c39ba0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] Nov 5 15:23:57 iws9 kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0 Nov 5 15:23:57 iws9 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 5 15:23:57 iws9 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 Nov 5 15:23:57 iws9 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Lustre-log dump attached
Attachments
Issue Links
- is duplicated by
-
LU-7602 Repeated timeouts with ZFS 0.6.5.2
-
- Resolved
-
- is related to
-
LU-6750 missing stop callback in osd-zfs
-
- Resolved
-
-
LU-7987 Lustre 2.8 OSS with zfs 0.6.5 backend hitting most schedule_timeout
-
- Closed
-
- is related to
-
LU-7153 Update ZFS/SPL version to 0.6.5.2
-
- Resolved
-
-
LU-4865 osd-zfs: increase object block size dynamically as object grows
-
- Resolved
-
- links to
(1 links to)
Shouldn't ZFS limit the TXG size based on the speed of the underlying storage? I'd thought that was the main feature of the dynamic TXG sizing - record how quickly the data could be flushed to disk in the previous TXG and then use it to limit it the size of the next TXG based on the desired TXG commit interval.