Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.3
-
None
-
3
-
9223372036854775807
Description
parallel-scale test_compilebench - Timeout occurred after 1194 mins, last suite running was parallel-scale, restarting cluster to continue tests
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/ba628f5c-f572-11e7-a169-52540065bddc
test_compilebench failed with the following error:
Timeout occurred after 1194 mins, last suite running was parallel-scale, restarting cluster to continue tests
client and server: 2.10.3 RC1 SLES12sp3
client console
[57326.246102] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale test compilebench: compilebench ==================================================== 05:29:45 \(1515504585\) [57326.294735] Lustre: DEBUG MARKER: == parallel-scale test compilebench: compilebench ==================================================== 05:29:45 (1515504585) [57326.393164] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench.16591 -i 2 -r 2 --makej [57326.442340] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench.16591 -i 2 -r 2 --makej [57609.756062] Lustre: 17431:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1515504859/real 1515504859] req@ffff88003bc056c0 x1589121722362864/t0(0) o4->lustre-OST0000-osc-ffff8800645e9800@10.9.4.78@tcp:6/4 lens 608/448 e 0 to 1 dl 1515504868 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [57609.756084] Lustre: lustre-OST0000-osc-ffff8800645e9800: Connection to lustre-OST0000 (at 10.9.4.78@tcp) was lost; in progress operations using this service will wait for recovery to complete [57610.366210] Lustre: lustre-OST0000-osc-ffff8800645e9800: Connection restored to 10.9.4.78@tcp (at 10.9.4.78@tcp) [58151.162493] python (19843): drop_caches: 3 [58213.476016] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpcd_00_01:17432] [58213.480015] Modules linked in: lustre(OEN) obdecho(OEN) mgc(OEN) lov(OEN) osc(OEN) mdc(OEN) lmv(OEN) fid(OEN) fld(OEN) ptlrpc_gss(OEN) ptlrpc(OEN) obdclass(OEN) ksocklnd(OEN) lnet(OEN) libcfs(OEN) brd loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache af_packet iscsi_boot_sysfs rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ib_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng drbg ansi_cprng ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 8139too pcspkr 8139cp joydev mii virtio_balloon i2c_piix4 parport_pc parport pvpanic processor button ata_generic ext4 crc16 jbd2 mbcache virtio_blk floppy ata_piix ahci libahci cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm uhci_hcd drm serio_raw ehci_hcd usbcore usb_common libata virtio_pci virtio_ring virtio sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [last unloaded: libcfs] [58214.120024] Supported: No, Unsupported modules are loaded [58214.996031] CPU: 1 PID: 17432 Comm: ptlrpcd_00_01 Tainted: G W OE N 4.4.103-6.38-default #1 [58215.000035] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [58215.004038] task: ffff880036869300 ti: ffff880079ab4000 task.ti: ffff880079ab4000 [58215.008032] RIP: 0010:[<ffffffff813304c3>] [<ffffffff813304c3>] memset_orig+0x33/0xb0 [58215.904020] RSP: 0018:ffff880079ab7b90 EFLAGS: 00010216 [58215.908041] RAX: 0000000000000000 RBX: 000000000000036e RCX: 000000000000003f [58215.912032] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88007787f000 [58215.912032] RBP: ffff88006415c180 R08: ffff88005c798000 R09: 0000000000000000 [58215.916032] R10: ffff88007787f000 R11: 0000000000000000 R12: ffff8800632b34e0 [58215.916032] R13: ffff88005c798000 R14: ffff880028b69c40 R15: 0000000000000093 [58215.920032] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [58215.920032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [58215.924035] CR2: 00007f2676683000 CR3: 000000007b288000 CR4: 0000000000060670 [58215.928031] Stack: [58215.928031] ffffffffa0d49102 0000000000000000 0000000000000000 0000000000000000 [58215.932032] 0000000000000000 0000000000000000 0000000000000000 0000000000000006 [58215.932032] 0000000000000000 0000000000000000 0000000000728835 ffff88006415c000 [58215.936031] Call Trace: [58224.408020] Inexact backtrace: [58224.408020] [58224.416020] [<ffffffffa0d49102>] ? osc_brw_fini_request+0x942/0x12a0 [osc] [58224.420021] [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc] [58224.424021] [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc] [58224.428024] [<ffffffffa0d4bb75>] ? brw_interpret+0x45/0xdd0 [osc] [58224.432021] [<ffffffffa0b758cd>] ? after_reply+0x8ad/0xe40 [ptlrpc] [58224.432021] [<ffffffffa0b7624f>] ? ptlrpc_check_set.part.23+0x3ef/0x1d70 [ptlrpc] [58225.700023] [<ffffffff810f2c05>] ? try_to_del_timer_sync+0x45/0x60 [58225.704016] [<ffffffffa0ba126a>] ? ptlrpcd_check+0x39a/0x580 [ptlrpc] [58225.704016] [<ffffffffa0ba1688>] ? ptlrpcd+0x238/0x490 [ptlrpc] [58226.220016] [<ffffffff810a9b40>] ? wake_up_q+0x80/0x80 [58226.224017] [<ffffffffa0ba1450>] ? ptlrpcd_check+0x580/0x580 [ptlrpc] [58226.224017] [<ffffffff8109d517>] ? kthread+0xc7/0xe0 [58226.228065] [<ffffffff8109d450>] ? kthread_park+0x50/0x50 [58226.228065] [<ffffffff8160c5df>] ? ret_from_fork+0x3f/0x70 [58226.232034] [<ffffffff8109d450>] ? kthread_park+0x50/0x50 [58226.232034] Code: b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 [58226.236032] Kernel panic - not syncing: softlockup: hung tasks [58226.240037] CPU: 1 PID: 17432 Comm: ptlrpcd_00_01 Tainted: G W OEL N 4.4.103-6.38-default #1 [58226.240037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [58226.244032] 0000000000000000 ffffffff813228f0 ffffffff81a22fba ffff88007fd03ed0 [58226.244032] ffffffff8118d8d1 0000000000000008 ffff88007fd03ee0 ffff88007fd03e80 [58226.248033] 0000000000000000 0000000000000000 0000000000000000 0000000000000006 [58226.248033] Call Trace: [58226.280016] [<ffffffff81019b19>] dump_trace+0x59/0x310 [58226.280016] [<ffffffff81019eba>] show_stack_log_lvl+0xea/0x170 [58226.284037] [<ffffffff8101ac41>] show_stack+0x21/0x40 [58226.284037] [<ffffffff813228f0>] dump_stack+0x5c/0x7c [58226.732016] [<ffffffff8118d8d1>] panic+0xd2/0x220 [58226.944038] [<ffffffff81140ff9>] watchdog_timer_fn+0x1d9/0x1e0 [58227.300030] [<ffffffff810f580a>] __hrtimer_run_queues+0xea/0x260 [58227.340030] [<ffffffff810f5c49>] hrtimer_interrupt+0x99/0x190 [58227.340030] [<ffffffff8160f959>] smp_apic_timer_interrupt+0x39/0x50 [58227.340030] [<ffffffff8160d199>] apic_timer_interrupt+0xc9/0xd0 [58227.344029] DWARF2 unwinder stuck at apic_timer_interrupt+0xc9/0xd0 [58227.344029] [58227.348028] Leftover inexact backtrace: [58227.348028] [58227.348028] <IRQ> <EOI> [<ffffffff813304c3>] ? memset_orig+0x33/0xb0 [58227.352032] [<ffffffffa0d49102>] ? osc_brw_fini_request+0x942/0x12a0 [osc] [58227.352032] [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc] [58227.352032] [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc] [58227.356029] [<ffffffffa0d4bb75>] ? brw_interpret+0x45/0xdd0 [osc] [58227.356029] [<ffffffffa0b758cd>] ? after_reply+0x8ad/0xe40 [ptlrpc] [58227.360029] [<ffffffffa0b7624f>] ? ptlrpc_check_set.part.23+0x3ef/0x1d70 [ptlrpc] [58227.360029] [<ffffffff810f2c05>] ? try_to_del_timer_sync+0x45/0x60 [58227.360029] [<ffffffffa0ba126a>] ? ptlrpcd_check+0x39a/0x580 [ptlrpc] [58227.364029] [<ffffffffa0ba1688>] ? ptlrpcd+0x238/0x490 [ptlrpc] [58227.364029] [<ffffffff810a9b40>] ? wake_up_q+0x80/0x80 [58227.364029] [<ffffffffa0ba1450>] ? ptlrpcd_check+0x580/0x580 [ptlrpc] [58227.368028] [<ffffffff8109d517>] ? kthread+0xc7/0xe0 [58227.368028] [<ffffffff8109d450>] ? kthread_park+0x50/0x50 [58227.368028] [<ffffffff8160c5df>] ? ret_from_fork+0x3f/0x70 [58227.372028] [<ffffffff8109d450>] ? kthread_park+0x50/0x50 [ 0.004000] do_IRQ: 0.113 No irq handler for vector [ 1.952054] RPC: Registered named UNIX socket transport module. [ 1.952618] RPC: Registered udp transport module. [ 1.953002] RPC: Registered tcp transport module. [ 1.953367] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 2.282228] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver [ 2.295427] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 2.295913] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver [ 2.298732] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) [ 2.299499] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 [ 2.302057] 8139cp 0000:00:03.0 eth0: RTL-8139C+ at 0xffffc90000162000, 00:16:3e:09:52:62, IRQ 11 [ 2.311616] [drm] Initialized drm 1.1.0 20060810 [ 2.314046] ACPI: bus type USB registered [ 2.314398] usbcore: registered new interface driver usbfs [ 2.314841] usbcore: registered new interface driver hub [ 2.315266] usbcore: registered new device driver usb [ 2.341844] 8139too: 8139too Fast Ethernet driver 0.9.28 [ 2.344051] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 2.345725] uhci_hcd: USB Universal Host Controller Interface driver [ 2.346424] uhci_hcd 0000:00:01.2: UHCI Host Controller [ 2.346827] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1 [ 2.347397] uhci_hcd 0000:00:01.2: detected 2 ports [ 2.347838] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020 [ 2.385237] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001 [ 2.385862] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.386457] usb usb1: Product: UHCI Host Controller [ 2.386879] usb usb1: Manufacturer: Linux 4.4.103-6.38-default uhci_hcd [ 2.387428] usb usb1: SerialNumber: 0000:00:01.2 [ 2.388659] tsc: Refined TSC clocksource calibration: 2693.388 MHz [ 2.389149] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x26d2d831a2d, max_idle_ns: 440795227913 ns [ 2.409136] hub 1-0:1.0: USB hub found [ 2.409509] hub 1-0:1.0: 2 ports detected [ 2.428809] [TTM] Zone kernel: Available graphics memory: 54494 kiB [ 2.429320] [TTM] Initializing pool allocator [ 2.429742] [TTM] Initializing DMA pool allocator [ 2.430300] [drm] fb mappable at 0xF0000000 [ 2.430704] [drm] vram aper at 0xF0000000 [ 2.431011] [drm] size 33554432 [ 2.431261] [drm] fb depth is 24 [ 2.431553] [drm] pitch is 3072 [ 2.432285] fbcon: cirrusdrmfb (fb0) is primary device [ 2.470856] Console: switching to colour frame buffer device 128x48 [ 2.580649] FDC 0 is a S82078B [ 2.612840] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device [ 2.612849] [drm] Initialized cirrus 1.0.0 20110418 for 0000:00:02.0 on minor 0 [ 2.630852] vda: vda1 vda2 vda4 [ 2.646992] scsi host0: ata_piix [ 2.656024] scsi host1: ata_piix [ 2.656384] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 [ 2.656964] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 [ 2.854238] 8139cp 0000:00:03.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [ 2.862901] dracut-initqueue[282]: Notice: eth0: Request to acquire DHCPv4 lease with UUID dac7545a-dec2-0200-6c01-000001000000 [ 3.328071] NET: Registered protocol family 17 [ 3.931434] dracut-initqueue[282]: Notice: eth0: Committed DHCPv4 lease with address 10.9.4.76 (lease time 21600 sec, renew in 10800 sec, rebind in 18900 sec) [ 3.935582] dracut-initqueue[282]: RTNETLINK answers: File exists [ 3.938367] dracut-initqueue[282]: Error: inet prefix is expected rather than "0.255.255.0[0]/0.255.255.0[1]". [ 3.940590] dracut-initqueue[282]: Error: inet prefix is expected rather than "9.0.0.0[0]/9.0.0.0[1]". [ 4.969557] EXT4-fs (vda1): mounting ext3 file system using the ext4 subsystem [ 4.974510] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: acl,user_xattr Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device) Deletion of old dump only on local disk. [ 5.015788] FS-Cache: Loaded [ 5.026295] FS-Cache: Netfs 'nfs' registered for caching [ 5.030385] Key type dns_resolver registered [ 5.042219] NFS: Registering the id_resolver key type [ 5.042665] Key type id_resolver registered [ 5.043012] Key type id_legacy registered Extracting dmesg
Attachments
Issue Links
- mentioned in
-
Page Loading...