Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c25bf48e-53d1-11e7-a749-5254006e85c2.
The sub-test test_53a failed with the following error:
test failed to respond and timed out
env: tag-2.9.59 #3603 ZFS
MDS console
05:24:43:[40506.686671] Lustre: MGS: Connection restored to 1a7ddb58-f96d-6d7b-dd19-c36909e007a9 (at 0@lo) 05:24:43:[40506.826682] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 05:24:43:[40506.829961] LustreError: 16342:0:(osd_oi.c:503:osd_oid()) lustre-MDT0000-osd: unsupported quota oid: 0x16 05:24:43:[40507.031480] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check 05:24:43:[40507.336736] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust 05:24:43:[40507.892722] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-46vm7.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:24:43:[40507.893599] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-46vm7.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:24:43:[40508.120605] Lustre: DEBUG MARKER: trevis-46vm7.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:24:43:[40508.128355] Lustre: DEBUG MARKER: trevis-46vm7.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:24:43:[40508.289854] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 05:24:43:[40508.591867] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' 05:24:43:[40508.901939] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' 05:24:43:[40509.212232] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname lustre-mdt1/mdt1 2>/dev/null 05:24:43:[40509.520874] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 05:24:43:[40511.168162] Lustre: MGS: Connection restored to 00faafbe-57f8-e637-4480-1f8b3bfa5137 (at 10.9.6.35@tcp) 05:24:43:[40511.800039] Lustre: 15064:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1497677067/real 1497677067] req@ffff8800363c0300 x1570428225061024/t0(0) o8->lustre-OST0000-osc-MDT0000@10.9.6.35@tcp:28/4 lens 520/544 e 0 to 1 dl 1497677072 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 05:24:43:[40512.608633] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-46vm8.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:25:17:[40512.800854] Lustre: DEBUG MARKER: trevis-46vm8.trevis.hpdd.intel.com: executing set_default_debug -1 all 4 05:25:17:[40513.793184] Lustre: MGS: Connection restored to 7310674c-545b-174e-1751-630f6aaa5a8a (at 10.9.6.28@tcp) 05:25:17:[40513.796051] Lustre: Skipped 2 previous similar messages 05:25:17:[40516.831820] LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_statfs to node 10.9.6.35@tcp failed: rc = -107 05:25:17:[40516.835465] Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.9.6.35@tcp) was lost; in progress operations using this service will wait for recovery to complete 05:25:17:[40518.877430] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts 05:25:17:[40519.195771] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 05:25:17:[40544.062002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [tx_commit_cb:16414] 05:25:17: 05:25:17:[40544.062002] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_zfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper syscopyarea sysfillrect crc32c_intel 8139too serio_raw sysimgblt virtio_pci fb_sys_fops virtio_ring ttm virtio ata_piix 8139cp mii drm libata i2c_core floppy [last unloaded: libcfs] 05:25:17:[40544.062002] CPU: 1 PID: 16414 Comm: tx_commit_cb Tainted: P OE ------------ 3.10.0-514.21.1.el7_lustre.x86_64 #1 05:25:17:[40544.062002] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 05:25:17:[40544.062002] task: ffff88005d50bec0 ti: ffff88004361c000 task.ti: ffff88004361c000 05:25:17:[40544.062002] RIP: 0010:[<ffffffffa08c4fe4>] [<ffffffffa08c4fe4>] dt_txn_hook_commit+0x34/0x60 [obdclass] 05:25:17:[40544.062002] RSP: 0018:ffff88004361fd78 EFLAGS: 00000297 05:25:17:[40544.062002] RAX: 0000000000000000 RBX: ffff8800694b94a0 RCX: ffff880069efc000 05:25:17:[40544.062002] RDX: ffff880069efc048 RSI: ffffffffa1287600 RDI: ffff88005e026e00 05:25:17:[40544.062002] RBP: ffff88004361fd88 R08: 20737365636f7250 R09: 0a64657265746e65 05:25:17:[40544.062002] R10: ffff880014dc2535 R11: 0a64657265746e65 R12: 0000000000000000 05:25:17:[40544.062002] R13: ffffffff810ce9ac R14: ffff88004361fd20 R15: 00000000a57bc182 05:25:17:[40544.062002] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 05:25:17:[40544.062002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 05:25:17:[40544.062002] CR2: 00007f23d5a9e000 CR3: 000000007b0d7000 CR4: 00000000000406e0 05:25:17:[40544.062002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 05:25:17:[40544.062002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 05:25:17:[40544.062002] Stack: 05:25:17:[40544.062002] ffff8800694b94a0 ffff88005cbd0600 ffff88004361fdd0 ffffffffa0f676a8 05:25:17:[40544.062002] ffff880069efc000 ffff88007c0c0000 ffff8800694b94a0 ffff8800694b90f0 05:25:17:[40544.062002] 0000000000000000 ffff8800694b90e0 ffff880079f78800 ffff88004361fe00 05:25:17:[40544.062002] Call Trace: 05:25:17:[40544.062002] [<ffffffffa0f676a8>] osd_trans_commit_cb+0xe8/0x490 [osd_zfs] 05:25:17:[40544.062002] [<ffffffffa17d05e4>] dmu_tx_do_callbacks+0x44/0x70 [zfs] 05:25:17:[40544.062002] [<ffffffffa1814c74>] txg_do_callbacks+0x14/0x30 [zfs] 05:25:17:[40544.062002] [<ffffffffa06d36de>] taskq_thread+0x22e/0x440 [spl] 05:25:17:[40544.062002] [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20 05:25:17:[40544.062002] [<ffffffffa06d34b0>] ? taskq_thread_spawn+0x60/0x60 [spl] 05:25:17:[40544.062002] [<ffffffff810b0a4f>] kthread+0xcf/0xe0 05:25:17:[40544.062002] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 05:25:17:[40544.062002] [<ffffffff81697798>] ret_from_fork+0x58/0x90 05:25:17:[40544.062002] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140 05:25:17:[40544.062002] Code: 41 54 49 89 fc 53 f6 47 48 02 75 42 48 8b 0f 48 8b 41 48 48 8d 51 48 48 39 d0 48 8d 58 d8 74 2e 0f 1f 80 00 00 00 00 48 8b 43 10 <48> 85 c0 74 0d 48 8b 73 18 4c 89 e7 ff d0 49 8b 0c 24 48 8b 43 05:25:17:[40544.062002] Kernel panic - not syncing: softlockup: hung tasks 05:25:17:[40544.062002] CPU: 1 PID: 16414 Comm: tx_commit_cb Tainted: P OEL ------------ 3.10.0-514.21.1.el7_lustre.x86_64 #1 05:25:17:[40544.062002] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 05:25:17:[40544.062002] ffffffff818da7c9 00000000a57bc182 ffff88007fd03e18 ffffffff8168717f 05:25:17:[40544.062002] ffff88007fd03e98 ffffffff816805aa 0000000000000008 ffff88007fd03ea8 05:25:17:[40544.062002] ffff88007fd03e48 00000000a57bc182 ffff88007fd03e67 0000000000000000 05:25:17:[40544.062002] Call Trace: 05:25:17:[40544.062002] <IRQ> [<ffffffff8168717f>] dump_stack+0x19/0x1b 05:25:17:[40544.062002] [<ffffffff816805aa>] panic+0xe3/0x1f2 05:25:17:[40544.062002] [<ffffffff8112f444>] watchdog_timer_fn+0x234/0x240 05:25:17:[40544.062002] [<ffffffff8112f210>] ? watchdog+0x50/0x50 05:25:17:[40544.062002] [<ffffffff810b4d72>] __hrtimer_run_queues+0xd2/0x260 05:25:17:[40544.062002] [<ffffffff810b5310>] hrtimer_interrupt+0xb0/0x1e0 05:25:17:[40544.062002] [<ffffffff816992dc>] ? call_softirq+0x1c/0x30 05:25:17:[40544.062002] [<ffffffff81050ff7>] local_apic_timer_interrupt+0x37/0x60 05:25:17:[40544.062002] [<ffffffff81699f4f>] smp_apic_timer_interrupt+0x3f/0x60 05:25:17:[40544.062002] [<ffffffff8169849d>] apic_timer_interrupt+0x6d/0x80 05:25:17:[40544.062002] <EOI> [<ffffffffa08c4fe4>] ? dt_txn_hook_commit+0x34/0x60 [obdclass] 05:25:17:[40544.062002] [<ffffffffa08c4ff2>] ? dt_txn_hook_commit+0x42/0x60 [obdclass] 05:25:17:[40544.062002] [<ffffffffa0f676a8>] osd_trans_commit_cb+0xe8/0x490 [osd_zfs] 05:25:17:[ 0.000000] Initializing cgroup subsys cpuset 05:25:17:[ 0.000000] Initializing cgroup subsys cpu 05:25:17:[ 0.000000] Initializing cgroup subsys cpuacct 05:25:17:[ 0.000000] Linux version 3.10.0-514.21.1.el7_lustre.x86_64 (jenkins@trevis-310-el7
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34496/
Subject:
LU-9706dt: remove dt_txn_hook_commit()Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6c2ecb0c72510cad32210ec327122c2b140eb6cc