Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.9.0
-
None
-
autotest review-dne
-
3
-
9223372036854775807
Description
sanity-hsm hangs before any tests are run. From the suite_stdout, we see that there is a problem with the OST 4 setup:
06:11:37:Starting ost4: /dev/lvm-Role_OSS/P4 /mnt/lustre-ost4 06:11:37:CMD: onyx-34vm8 mkdir -p /mnt/lustre-ost4; mount -t lustre /dev/lvm-Role_OSS/P4 /mnt/lustre-ost4 06:11:37:CMD: onyx-34vm8 /usr/sbin/lctl get_param -n health_check 06:11:37:CMD: onyx-34vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/usr/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4 06:11:37:CMD: onyx-34vm8 e2label /dev/lvm-Role_OSS/P4 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' 06:11:37:CMD: onyx-34vm8 e2label /dev/lvm-Role_OSS/P4 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' 06:11:37:CMD: onyx-34vm8 sync; sync; sync 07:11:20:********** Timeout by autotest system **********
On OST4 (vm8), we see
06:12:07:[16441.962399] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P4 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' 06:12:07:[16442.192158] Lustre: DEBUG MARKER: sync; sync; sync 06:12:07:[16456.204737] Lustre: lustre-OST0003: Export ffff88005981e800 already connecting from 10.2.4.131@tcp 06:12:07:[16461.204570] Lustre: lustre-OST0003: Export ffff88005981e800 already connecting from 10.2.4.131@tcp 06:12:07:[16461.207146] Lustre: Skipped 1 previous similar message 06:12:07:[16466.204701] Lustre: lustre-OST0003: Export ffff88005981e800 already connecting from 10.2.4.131@tcp 06:12:07:[16466.207151] Lustre: Skipped 1 previous similar message 06:12:07:[16468.053003] BUG: soft lockup - CPU#0 stuck for 23s! [ll_ost00_003:29743] 06:12:07:[16468.053003] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev parport_pc pcspkr virtio_balloon i2c_piix4 parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm virtio_blk 8139too drm serio_raw ata_piix virtio_pci virtio_ring virtio i2c_core 8139cp mii floppy libata 06:12:07:[16468.053003] CPU: 0 PID: 29743 Comm: ll_ost00_003 Tainted: G OE ------------ 3.10.0-327.18.2.el7_lustre.x86_64 #1 06:12:07:[16468.053003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 06:12:07:[16468.053003] task: ffff88004b1a5c00 ti: ffff88004cd40000 task.ti: ffff88004cd40000 06:12:07:[16468.053003] RIP: 0010:[<ffffffff8163d5f2>] [<ffffffff8163d5f2>] _raw_spin_lock+0x32/0x50 06:12:07:[16468.053003] RSP: 0018:ffff88004cd435d0 EFLAGS: 00000287 06:12:07:[16468.053003] RAX: 00000000000003e0 RBX: ffff88007b87e680 RCX: 0000000000000b42 06:12:07:[16468.053003] RDX: 0000000000000b38 RSI: 0000000000000b38 RDI: ffff880077768ba0 06:12:07:[16468.053003] RBP: ffff88004cd435d0 R08: c010000000000000 R09: 0035f32b60080000 06:12:07:[16468.053003] R10: ffac0ce21cc2d802 R11: 0000000000004000 R12: ffff880035f32958 06:12:07:[16468.053003] R13: ffff880035f32b60 R14: ffffffff8121255b R15: ffff88004cd435e0 06:12:07:[16468.053003] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 06:12:07:[16468.053003] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 06:12:07:[16468.053003] CR2: 00007f7f78dc54a9 CR3: 000000000194a000 CR4: 00000000000006f0 06:12:07:[16468.053003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 06:12:07:[16468.053003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 06:12:07:[16468.053003] Stack: 06:12:07:[16468.053003] ffff88004cd43658 ffffffffa0155bfc ffff88004cd43600 ffffffff81212e6c 06:12:07:[16468.053003] 00000000811c11da ffff880077768800 0000000000000000 0000000000000025 06:12:07:[16468.053003] 0000000000000004 0000000000000c35 ffff88007b8ee4b0 000000007608c8d8 06:12:07:[16468.053003] Call Trace: 06:12:07:[16468.053003] [<ffffffffa0155bfc>] do_get_write_access+0x32c/0x4e0 [jbd2] 06:12:07:[16468.053003] [<ffffffff81212e6c>] ? __find_get_block+0xbc/0x120 06:12:07:[16468.053003] [<ffffffffa0155dd7>] jbd2_journal_get_write_access+0x27/0x40 [jbd2] 06:12:07:[16468.053003] [<ffffffffa05e6c0b>] __ldiskfs_journal_get_write_access+0x3b/0x80 [ldiskfs] 06:12:07:[16468.053003] [<ffffffffa0620ec7>] __ldiskfs_new_inode+0x447/0x1300 [ldiskfs] 06:12:07:[16468.053003] [<ffffffffa05f7038>] ldiskfs_mkdir+0x148/0x280 [ldiskfs] 06:12:07:[16468.053003] [<ffffffff811ea557>] vfs_mkdir+0xb7/0x160 06:12:07:[16468.053003] [<ffffffffa0c3a5c9>] simple_mkdir.isra.17.constprop.26+0x429/0x4c0 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffffa0c4ebfe>] osd_seq_load_locked.isra.19+0x19a/0x6a0 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffff811c13be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 06:12:07:[16468.053003] [<ffffffffa0c3a9c1>] osd_seq_load+0x361/0x520 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffffa0c3eb96>] osd_obj_spec_lookup+0x66/0x300 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffffa0c2bf07>] osd_oi_lookup+0x47/0x150 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffffa0c2886b>] osd_fid_lookup+0x92b/0x1780 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffff8163d59b>] ? _raw_spin_unlock_bh+0x1b/0x40 06:12:07:[16468.053003] [<ffffffffa096f0c2>] ? ksocknal_queue_tx_locked+0x132/0x4d0 [ksocklnd] 06:12:07:[16468.053003] [<ffffffff812fc6a2>] ? put_dec+0x72/0x90 06:12:07:[16468.053003] [<ffffffff811c13be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 06:12:07:[16468.053003] [<ffffffffa0c29715>] osd_object_init+0x55/0xf0 [osd_ldiskfs] 06:12:07:[16468.053003] [<ffffffffa07e42df>] lu_object_alloc+0xdf/0x310 [obdclass] 06:12:07:[16468.053003] [<ffffffffa07e46dc>] lu_object_find_try+0x16c/0x2b0 [obdclass] 06:12:07:[16468.053003] [<ffffffffa07e48cc>] lu_object_find_at+0xac/0xe0 [obdclass] 06:12:07:[16468.053003] [<ffffffffa0a41b06>] ? null_alloc_rs+0x176/0x330 [ptlrpc] 06:12:07:[16468.053003] [<ffffffffa07e5c58>] dt_locate_at+0x18/0xb0 [obdclass] 06:12:07:[16468.053003] [<ffffffffa07e7c65>] dt_find_or_create+0x55/0x8d0 [obdclass] 06:12:07:[16468.053003] [<ffffffff811c13be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 06:12:07:[16468.053003] [<ffffffffa0d69ebc>] ofd_seq_load+0x2ac/0x9c0 [ofd] 06:12:07:[16468.053003] [<ffffffffa0d61afa>] ofd_get_info_hdl+0x76a/0x14e0 [ofd] 06:12:07:[16468.053003] [<ffffffffa0a66f25>] tgt_request_handle+0x915/0x1320 [ptlrpc] 06:12:07:[16468.053003] [<ffffffffa0a134bb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 06:12:07:[16468.053003] [<ffffffffa0a11078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 06:12:07:[16468.053003] [<ffffffff810b88b2>] ? default_wake_function+0x12/0x20 06:12:07:[16468.053003] [<ffffffff810af018>] ? __wake_up_common+0x58/0x90 06:12:07:[16468.053003] [<ffffffffa0a17570>] ptlrpc_main+0xaa0/0x1dd0 [ptlrpc] 06:12:07:[16468.053003] [<ffffffffa0a16ad0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] 06:12:07:[16468.053003] [<ffffffff810a5acf>] kthread+0xcf/0xe0 06:12:07:[16468.053003] [<ffffffff810b47e6>] ? finish_task_switch+0x56/0x170 06:12:07:[16468.053003] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 06:12:07:[16468.053003] [<ffffffff81646318>] ret_from_fork+0x58/0x90 06:12:07:[16468.053003] [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
Logs for this failure are at:
https://testing.hpdd.intel.com/test_sets/e6bdb3d2-3bc0-11e6-a0ce-5254006e85c2