[LU-6249] mds-survey test_1: test failed to respond and timed out Created: 14/Feb/15  Updated: 12/Aug/22  Resolved: 12/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.11.0, Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: dne, zfs
Environment:

client and server: lustre-master build# 2856


Issue Links:
Duplicate
is duplicated by LU-10421 mds-survey test 1: Timeout occurred a... Resolved
Severity: 3
Rank (Obsolete): 17499

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a383e9ac-b2e0-11e4-b42d-5254006e85c2.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

cannot find error message, test just timed out, not sure if this is a dup of LU-2600

Info required for matching: mds-survey 1



 Comments   
Comment by Oleg Drokin [ 17/Feb/15 ]

MDS crashed and the console log is nowhee to be found to see why.
We need to find mds crashdump and get the data from there as for the cause of the failure.

Comment by Minh Diep [ 06/Feb/18 ]

MDS crashed https://testing.hpdd.intel.com/test_logs/1ebcd592-0b54-11e8-a7cd-52540065bddc/show_text

[15349.075441] LustreError: 21742:0:(echo_client.c:1795:echo_md_lookup()) Skipped 1 previous similar message
[15349.078482] LustreError: 21742:0:(echo_client.c:2027:echo_md_destroy_internal()) Can't find child MDT0002-tests: rc = -2
[15349.081864] LustreError: 21742:0:(echo_client.c:2027:echo_md_destroy_internal()) Skipped 1 previous similar message
[15499.480571] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[15499.481150] IP: [<ffffffffc0ca9f53>] lu_object_alloc+0x73/0x310 [obdclass]
[15499.481150] PGD 800000004f885067 PUD 1d0e4067 PMD 0 
[15499.481150] Oops: 0002 [#1] SMP 
[15499.481150] Modules linked in: obdecho(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel dm_mod ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper virtio_blk ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm libata crct10dif_pclmul crct10dif_common 8139too crc32c_intel serio_raw virtio_pci 8139cp virtio_ring virtio mii i2c_core floppy
[15499.496961] CPU: 1 PID: 16365 Comm: lctl Tainted: P           OE  ------------   3.10.0-693.17.1.el7_lustre.x86_64 #1
[15499.496961] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[15499.496961] task: ffff88005fb58000 ti: ffff880020888000 task.ti: ffff880020888000
[15499.496961] RIP: 0010:[<ffffffffc0ca9f53>]  [<ffffffffc0ca9f53>] lu_object_alloc+0x73/0x310 [obdclass]
[15499.496961] RSP: 0018:ffff88002088baf0  EFLAGS: 00010246
[15499.496961] RAX: 0000000240000bd0 RBX: ffff88005d39e180 RCX: 0000000000000000
[15499.496961] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88003520c7e0
[15499.496961] RBP: ffff88002088bb38 R08: 000000000001b920 R09: 0000000000000000
[15499.496961] R10: ffff88003520c7e0 R11: 0000000000000fff R12: ffff88003520c7e0
[15499.496961] R13: ffff88002088bbd8 R14: ffff88004b3e6228 R15: 0000000000000000
[15499.496961] FS:  00007f036cfa5740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[15499.496961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15499.496961] CR2: 0000000000000008 CR3: 000000003b72a000 CR4: 00000000000606e0
[15499.496961] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15499.496961] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15499.496961] Call Trace:
[15499.496961]  [<ffffffffc0ca7833>] ? htable_lookup+0x153/0x170 [obdclass]
[15499.496961]  [<ffffffffc0caa3bc>] lu_object_find_at+0x16c/0x290 [obdclass]
[15499.496961]  [<ffffffffc13237de>] echo_md_dir_stripe_choose.isra.43+0x26e/0x680 [obdecho]
[15499.496961]  [<ffffffffc1324b87>] echo_md_handler.isra.45+0xf97/0x2c20 [obdecho]
[15499.496961]  [<ffffffff816add34>] ? _raw_read_lock+0x14/0x20
[15499.496961]  [<ffffffffc13278a1>] echo_client_iocontrol+0x1091/0x1ba0 [obdecho]
[15499.496961]  [<ffffffffc0c8aa59>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[15499.496961]  [<ffffffffc0c7586d>] class_handle_ioctl+0x18ed/0x1df0 [obdclass]
[15499.496961]  [<ffffffff811af746>] ? do_read_fault.isra.44+0xe6/0x130
[15499.496961]  [<ffffffff812b3ea8>] ? security_capable+0x18/0x20
[15499.496961]  [<ffffffffc0c5a602>] obd_class_ioctl+0xd2/0x170 [obdclass]
[15499.496961]  [<ffffffff8121730d>] do_vfs_ioctl+0x33d/0x540
[15499.496961]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
[15499.496961]  [<ffffffff810ec7ba>] ? __getnstimeofday64+0x3a/0xd0
[15499.496961]  [<ffffffff812175b1>] SyS_ioctl+0xa1/0xc0
[15499.496961]  [<ffffffff816b8930>] ? system_call_after_swapgs+0x15d/0x214
[15499.496961]  [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[15499.496961]  [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214
[15499.496961] Code: 48 8b 42 10 ff 10 48 85 c0 49 89 c4 0f 84 3c 02 00 00 48 3d 00 f0 ff ff 0f 87 6f 02 00 00 48 8b 08 49 8b 57 08 49 8b 07 45 31 ff <48> 89 51 08 48 89 01 49 8b 04 24 4c 8d 70 40 48 89 44 24 08 48 
[15499.496961] RIP  [<ffffffffc0ca9f53>] lu_object_alloc+0x73/0x310 [obdclass]
[15499.496961]  RSP <ffff88002088baf0>
[15499.496961] CR2: 0000000000000008
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-693.17.1.el7_lustre.x86_64 (jenkins@trevis-307-el7-x8664-1.trevis.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Jan 26 13:49:52 UTC 2018
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.17.1.el7_lustre.x86_64 root=UUID=36a4fa8e-8395-4c4c-9d40-93a0779cd2bb ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867708K
[    0.000000] Disabled fast string operations
[    0.000000] e820: BIOS-provided physical RAM

https://testing.hpdd.intel.com/test_sets/1eb3e98c-0b54-11e8-a7cd-52540065bddc

Generated at Sat Feb 10 01:58:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.