Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10489

parallel-scale test_compilebench: Kernel panic - not syncing: softlockup: hung tasks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale test_compilebench - Timeout occurred after 1194 mins, last suite running was parallel-scale, restarting cluster to continue tests
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/ba628f5c-f572-11e7-a169-52540065bddc

      test_compilebench failed with the following error:

      Timeout occurred after 1194 mins, last suite running was parallel-scale, restarting cluster to continue tests
      

      client and server: 2.10.3 RC1 SLES12sp3
      client console

      [57326.246102] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale test compilebench: compilebench ==================================================== 05:29:45 \(1515504585\)
      [57326.294735] Lustre: DEBUG MARKER: == parallel-scale test compilebench: compilebench ==================================================== 05:29:45 (1515504585)
      [57326.393164] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench.16591 -i 2         -r 2 --makej
      [57326.442340] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench.16591 -i 2 -r 2 --makej
      [57609.756062] Lustre: 17431:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1515504859/real 1515504859]  req@ffff88003bc056c0 x1589121722362864/t0(0) o4->lustre-OST0000-osc-ffff8800645e9800@10.9.4.78@tcp:6/4 lens 608/448 e 0 to 1 dl 1515504868 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [57609.756084] Lustre: lustre-OST0000-osc-ffff8800645e9800: Connection to lustre-OST0000 (at 10.9.4.78@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [57610.366210] Lustre: lustre-OST0000-osc-ffff8800645e9800: Connection restored to 10.9.4.78@tcp (at 10.9.4.78@tcp)
      [58151.162493] python (19843): drop_caches: 3
      [58213.476016] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpcd_00_01:17432]
      [58213.480015] Modules linked in: lustre(OEN) obdecho(OEN) mgc(OEN) lov(OEN) osc(OEN) mdc(OEN) lmv(OEN) fid(OEN) fld(OEN) ptlrpc_gss(OEN) ptlrpc(OEN) obdclass(OEN) ksocklnd(OEN) lnet(OEN) libcfs(OEN) brd loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache af_packet iscsi_boot_sysfs rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ib_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng drbg ansi_cprng ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 8139too pcspkr 8139cp joydev mii virtio_balloon i2c_piix4 parport_pc parport pvpanic processor button ata_generic ext4 crc16 jbd2 mbcache virtio_blk floppy ata_piix ahci libahci cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm uhci_hcd drm serio_raw ehci_hcd usbcore usb_common libata virtio_pci virtio_ring virtio sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [last unloaded: libcfs]
      [58214.120024] Supported: No, Unsupported modules are loaded
      [58214.996031] CPU: 1 PID: 17432 Comm: ptlrpcd_00_01 Tainted: G        W  OE   N  4.4.103-6.38-default #1
      [58215.000035] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [58215.004038] task: ffff880036869300 ti: ffff880079ab4000 task.ti: ffff880079ab4000
      [58215.008032] RIP: 0010:[<ffffffff813304c3>]  [<ffffffff813304c3>] memset_orig+0x33/0xb0
      [58215.904020] RSP: 0018:ffff880079ab7b90  EFLAGS: 00010216
      [58215.908041] RAX: 0000000000000000 RBX: 000000000000036e RCX: 000000000000003f
      [58215.912032] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88007787f000
      [58215.912032] RBP: ffff88006415c180 R08: ffff88005c798000 R09: 0000000000000000
      [58215.916032] R10: ffff88007787f000 R11: 0000000000000000 R12: ffff8800632b34e0
      [58215.916032] R13: ffff88005c798000 R14: ffff880028b69c40 R15: 0000000000000093
      [58215.920032] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
      [58215.920032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [58215.924035] CR2: 00007f2676683000 CR3: 000000007b288000 CR4: 0000000000060670
      [58215.928031] Stack:
      [58215.928031]  ffffffffa0d49102 0000000000000000 0000000000000000 0000000000000000
      [58215.932032]  0000000000000000 0000000000000000 0000000000000000 0000000000000006
      [58215.932032]  0000000000000000 0000000000000000 0000000000728835 ffff88006415c000
      [58215.936031] Call Trace:
      [58224.408020] Inexact backtrace:
      [58224.408020] 
      [58224.416020]  [<ffffffffa0d49102>] ? osc_brw_fini_request+0x942/0x12a0 [osc]
      [58224.420021]  [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc]
      [58224.424021]  [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc]
      [58224.428024]  [<ffffffffa0d4bb75>] ? brw_interpret+0x45/0xdd0 [osc]
      [58224.432021]  [<ffffffffa0b758cd>] ? after_reply+0x8ad/0xe40 [ptlrpc]
      [58224.432021]  [<ffffffffa0b7624f>] ? ptlrpc_check_set.part.23+0x3ef/0x1d70 [ptlrpc]
      [58225.700023]  [<ffffffff810f2c05>] ? try_to_del_timer_sync+0x45/0x60
      [58225.704016]  [<ffffffffa0ba126a>] ? ptlrpcd_check+0x39a/0x580 [ptlrpc]
      [58225.704016]  [<ffffffffa0ba1688>] ? ptlrpcd+0x238/0x490 [ptlrpc]
      [58226.220016]  [<ffffffff810a9b40>] ? wake_up_q+0x80/0x80
      [58226.224017]  [<ffffffffa0ba1450>] ? ptlrpcd_check+0x580/0x580 [ptlrpc]
      [58226.224017]  [<ffffffff8109d517>] ? kthread+0xc7/0xe0
      [58226.228065]  [<ffffffff8109d450>] ? kthread_park+0x50/0x50
      [58226.228065]  [<ffffffff8160c5df>] ? ret_from_fork+0x3f/0x70
      [58226.232034]  [<ffffffff8109d450>] ? kthread_park+0x50/0x50
      [58226.232034] Code: b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 
      [58226.236032] Kernel panic - not syncing: softlockup: hung tasks
      [58226.240037] CPU: 1 PID: 17432 Comm: ptlrpcd_00_01 Tainted: G        W  OEL  N  4.4.103-6.38-default #1
      [58226.240037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [58226.244032]  0000000000000000 ffffffff813228f0 ffffffff81a22fba ffff88007fd03ed0
      [58226.244032]  ffffffff8118d8d1 0000000000000008 ffff88007fd03ee0 ffff88007fd03e80
      [58226.248033]  0000000000000000 0000000000000000 0000000000000000 0000000000000006
      [58226.248033] Call Trace:
      [58226.280016]  [<ffffffff81019b19>] dump_trace+0x59/0x310
      [58226.280016]  [<ffffffff81019eba>] show_stack_log_lvl+0xea/0x170
      [58226.284037]  [<ffffffff8101ac41>] show_stack+0x21/0x40
      [58226.284037]  [<ffffffff813228f0>] dump_stack+0x5c/0x7c
      [58226.732016]  [<ffffffff8118d8d1>] panic+0xd2/0x220
      [58226.944038]  [<ffffffff81140ff9>] watchdog_timer_fn+0x1d9/0x1e0
      [58227.300030]  [<ffffffff810f580a>] __hrtimer_run_queues+0xea/0x260
      [58227.340030]  [<ffffffff810f5c49>] hrtimer_interrupt+0x99/0x190
      [58227.340030]  [<ffffffff8160f959>] smp_apic_timer_interrupt+0x39/0x50
      [58227.340030]  [<ffffffff8160d199>] apic_timer_interrupt+0xc9/0xd0
      [58227.344029] DWARF2 unwinder stuck at apic_timer_interrupt+0xc9/0xd0
      [58227.344029] 
      [58227.348028] Leftover inexact backtrace:
      [58227.348028] 
      [58227.348028]  <IRQ>  <EOI>  [<ffffffff813304c3>] ? memset_orig+0x33/0xb0
      [58227.352032]  [<ffffffffa0d49102>] ? osc_brw_fini_request+0x942/0x12a0 [osc]
      [58227.352032]  [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc]
      [58227.352032]  [<ffffffffa0b9c3f5>] ? at_measured+0xf5/0x310 [ptlrpc]
      [58227.356029]  [<ffffffffa0d4bb75>] ? brw_interpret+0x45/0xdd0 [osc]
      [58227.356029]  [<ffffffffa0b758cd>] ? after_reply+0x8ad/0xe40 [ptlrpc]
      [58227.360029]  [<ffffffffa0b7624f>] ? ptlrpc_check_set.part.23+0x3ef/0x1d70 [ptlrpc]
      [58227.360029]  [<ffffffff810f2c05>] ? try_to_del_timer_sync+0x45/0x60
      [58227.360029]  [<ffffffffa0ba126a>] ? ptlrpcd_check+0x39a/0x580 [ptlrpc]
      [58227.364029]  [<ffffffffa0ba1688>] ? ptlrpcd+0x238/0x490 [ptlrpc]
      [58227.364029]  [<ffffffff810a9b40>] ? wake_up_q+0x80/0x80
      [58227.364029]  [<ffffffffa0ba1450>] ? ptlrpcd_check+0x580/0x580 [ptlrpc]
      [58227.368028]  [<ffffffff8109d517>] ? kthread+0xc7/0xe0
      [58227.368028]  [<ffffffff8109d450>] ? kthread_park+0x50/0x50
      [58227.368028]  [<ffffffff8160c5df>] ? ret_from_fork+0x3f/0x70
      [58227.372028]  [<ffffffff8109d450>] ? kthread_park+0x50/0x50
      [    0.004000] do_IRQ: 0.113 No irq handler for vector
      [    1.952054] RPC: Registered named UNIX socket transport module.
      [    1.952618] RPC: Registered udp transport module.
      [    1.953002] RPC: Registered tcp transport module.
      [    1.953367] RPC: Registered tcp NFSv4.1 backchannel transport module.
      [    2.282228] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
      [    2.295427] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
      [    2.295913] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
      [    2.298732] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
      [    2.299499] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
      [    2.302057] 8139cp 0000:00:03.0 eth0: RTL-8139C+ at 0xffffc90000162000, 00:16:3e:09:52:62, IRQ 11
      [    2.311616] [drm] Initialized drm 1.1.0 20060810
      [    2.314046] ACPI: bus type USB registered
      [    2.314398] usbcore: registered new interface driver usbfs
      [    2.314841] usbcore: registered new interface driver hub
      [    2.315266] usbcore: registered new device driver usb
      [    2.341844] 8139too: 8139too Fast Ethernet driver 0.9.28
      [    2.344051] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
      [    2.345725] uhci_hcd: USB Universal Host Controller Interface driver
      [    2.346424] uhci_hcd 0000:00:01.2: UHCI Host Controller
      [    2.346827] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
      [    2.347397] uhci_hcd 0000:00:01.2: detected 2 ports
      [    2.347838] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020
      [    2.385237] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
      [    2.385862] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
      [    2.386457] usb usb1: Product: UHCI Host Controller
      [    2.386879] usb usb1: Manufacturer: Linux 4.4.103-6.38-default uhci_hcd
      [    2.387428] usb usb1: SerialNumber: 0000:00:01.2
      [    2.388659] tsc: Refined TSC clocksource calibration: 2693.388 MHz
      [    2.389149] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x26d2d831a2d, max_idle_ns: 440795227913 ns
      [    2.409136] hub 1-0:1.0: USB hub found
      [    2.409509] hub 1-0:1.0: 2 ports detected
      [    2.428809] [TTM] Zone  kernel: Available graphics memory: 54494 kiB
      [    2.429320] [TTM] Initializing pool allocator
      [    2.429742] [TTM] Initializing DMA pool allocator
      [    2.430300] [drm] fb mappable at 0xF0000000
      [    2.430704] [drm] vram aper at 0xF0000000
      [    2.431011] [drm] size 33554432
      [    2.431261] [drm] fb depth is 24
      [    2.431553] [drm]    pitch is 3072
      [    2.432285] fbcon: cirrusdrmfb (fb0) is primary device
      [    2.470856] Console: switching to colour frame buffer device 128x48
      [    2.580649] FDC 0 is a S82078B
      [    2.612840] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device
      [    2.612849] [drm] Initialized cirrus 1.0.0 20110418 for 0000:00:02.0 on minor 0
      [    2.630852]  vda: vda1 vda2 vda4
      [    2.646992] scsi host0: ata_piix
      [    2.656024] scsi host1: ata_piix
      [    2.656384] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
      [    2.656964] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
      [    2.854238] 8139cp 0000:00:03.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1
      [    2.862901] dracut-initqueue[282]: Notice: eth0: Request to acquire DHCPv4 lease with UUID dac7545a-dec2-0200-6c01-000001000000
      [    3.328071] NET: Registered protocol family 17
      [    3.931434] dracut-initqueue[282]: Notice: eth0: Committed DHCPv4 lease with address 10.9.4.76 (lease time 21600 sec, renew in 10800 sec, rebind in 18900 sec)
      [    3.935582] dracut-initqueue[282]: RTNETLINK answers: File exists
      [    3.938367] dracut-initqueue[282]: Error: inet prefix is expected rather than "0.255.255.0[0]/0.255.255.0[1]".
      [    3.940590] dracut-initqueue[282]: Error: inet prefix is expected rather than "9.0.0.0[0]/9.0.0.0[1]".
      [    4.969557] EXT4-fs (vda1): mounting ext3 file system using the ext4 subsystem
      [    4.974510] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: acl,user_xattr
      Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device)
      Deletion of old dump only on local disk.
      [    5.015788] FS-Cache: Loaded
      [    5.026295] FS-Cache: Netfs 'nfs' registered for caching
      [    5.030385] Key type dns_resolver registered
      [    5.042219] NFS: Registering the id_resolver key type
      [    5.042665] Key type id_resolver registered
      [    5.043012] Key type id_legacy registered
      Extracting dmesg
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: