[LU-11439] recovery-random-scale test_fail_client_mds: Kernel panic - not syncing: softlockup: hung tasks Created: 27/Sep/18 Updated: 15/Oct/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/141663f0-b8a7-11e8-9df3-52540065bddc test_fail_client_mds failed with the following error: trevis-44vm3 crashed during recovery-random-scale test_fail_client_mds server: RHEL7 tat-2.11.55 [ 106.405377] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh
[ 106.511253] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
[ 106.705497] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
[ 107.045491] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers:
mds1 failed over 1 times and counting...
[ 107.239916] Lustre: DEBUG MARKER: Number of failovers:
[ 169.421912] Lustre: 1786:0:(client.c:2126:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1536959605/real 1536959606] req@ffff880055a04380 x1611619009321216/t0(0) o4->lustre-OST0004-osc-ffff88007ac80000@10.9.5.53@tcp:6/4 lens 608/448 e 1 to 1 dl 1536959646 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
[ 169.421946] Lustre: lustre-OST0004-osc-ffff88007ac80000: Connection to lustre-OST0004 (at 10.9.5.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 170.420094] Lustre: lustre-OST0004-osc-ffff88007ac80000: Connection restored to 10.9.5.53@tcp (at 10.9.5.53@tcp)
[ 192.121860] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:5:411]
[ 192.122615] Modules linked in: mgc(OEN) lustre(OEN) lmv(OEN) mdc(OEN) fid(OEN) osc(OEN) lov(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache af_packet iscsi_ibft iscsi_boot_sysfs rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ib_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 8139too cryptd 8139cp pcspkr joydev virtio_balloon mii i2c_piix4 processor button ext4 crc16 jbd2 mbcache ata_generic virtio_blk ata_piix ahci libahci serio_raw floppy uhci_hcd
[ 192.134149] virtio_pci ehci_hcd virtio_ring virtio usbcore usb_common libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
[ 192.136328] Supported: No, Unsupported modules are loaded
[ 192.136848] CPU: 1 PID: 411 Comm: kworker/1:5 Tainted: G OE N 4.4.143-94.47-default #1
[ 192.137670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 192.138378] Workqueue: ptlrpc_pinger ptlrpc_pinger_main [ptlrpc]
[ 192.139016] task: ffff88003745c140 ti: ffff88007b05c000 task.ti: ffff88007b05c000
[ 192.139703] RIP: 0010:[<ffffffff816183cc>] [<ffffffff816183cc>] mutex_lock+0xc/0x22
[ 192.140507] RSP: 0018:ffff88007b05fdc0 EFLAGS: 00000246
[ 192.141015] RAX: 00000000000000c0 RBX: ffffffffa0b5d5e0 RCX: 0000000000000002
[ 192.141677] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffa0b5d5e0
[ 192.142336] RBP: 0000000000000009 R08: 0000000000000000 R09: 0000000000000000
[ 192.142989] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 192.143640] R13: ffff8800707772d0 R14: 0000000000000000 R15: ffff880070777000
[ 192.144297] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 192.145035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 192.145573] CR2: 00007fdd6e776780 CR3: 000000006fba2000 CR4: 0000000000060670
[ 192.146238] Stack:
[ 192.146475] ffffffffffffffda ffffffffa0ab83c9 ffff880000000001 0000000000000000
[ 192.147428] ffffffff00000000 ffff880000000001 0000000000000000 0000000000000095
[ 192.148382] 0000000100000000 ffffffffa0b31a20 ffff880037075040 ffff88007fd19d00
[ 192.149336] Call Trace:
[ 192.149664] [<ffffffffa0ab83c9>] ptlrpc_pinger_main+0x29/0x840 [ptlrpc]
[ 192.150339] [<ffffffff810996c4>] process_one_work+0x154/0x410
[ 192.150902] [<ffffffff8109a2a6>] worker_thread+0x116/0x4a0
[ 192.151438] [<ffffffff8109f7c9>] kthread+0xc9/0xe0
[ 192.151913] [<ffffffff8161acc5>] ret_from_fork+0x55/0x80
[ 192.155061] DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
[ 192.155597]
[ 192.155790] Leftover inexact backtrace:
[ 192.156318] [<ffffffff8109f700>] ? kthread_park+0x50/0x50
[ 192.156840] Code: 41 5e 41 5f 5d c3 31 c0 87 03 83 f8 01 0f 85 55 ff ff ff eb cb 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb f0 ff 0f <79> 05 e8 dd fe ff ff 65 48 8b 04 25 80 25 01 00 48 89 43 18 5b
[ 192.162845] Kernel panic - not syncing: softlockup: hung tasks
[ 192.163403] CPU: 1 PID: 411 Comm: kworker/1:5 Tainted: G OEL N 4.4.143-94.47-default #1
[ 192.164219] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 192.164780] Workqueue: ptlrpc_pinger ptlrpc_pinger_main [ptlrpc]
[ 192.165416] 0000000000000000 ffffffff8132ad80 ffffffff81a25f19 ffff88007fd03ed0
[ 192.166363] ffffffff81191f31 0000000000000008 ffff88007fd03ee0 ffff88007fd03e80
[ 192.167313] 0000000000000000 0000000000000000 0000000000000000 0000000000000006
[ 192.168266] Call Trace:
[ 192.168557] [<ffffffff81019ac9>] dump_trace+0x59/0x340
[ 192.169069] [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
[ 192.169630] [<ffffffff8101ac71>] show_stack+0x21/0x40
[ 192.170144] [<ffffffff8132ad80>] dump_stack+0x5c/0x7c
[ 192.170653] [<ffffffff81191f31>] panic+0xd2/0x232
[ 192.171136] [<ffffffff811446a9>] watchdog_timer_fn+0x1d9/0x1e0
[ 192.171708] [<ffffffff810f878c>] __hrtimer_run_queues+0xec/0x260
[ 192.172293] [<ffffffff810f8bc9>] hrtimer_interrupt+0x99/0x190
[ 192.172845] [<ffffffff8161e67f>] smp_apic_timer_interrupt+0x3f/0x60
[ 192.173445] [<ffffffff8161b96b>] apic_timer_interrupt+0xeb/0xf0
[ 192.176111] DWARF2 unwinder stuck at apic_timer_interrupt+0xeb/0xf0
[ 192.176706]
[ 192.176900] Leftover inexact backtrace:
[ 192.177425] <IRQ> <EOI> [<ffffffff816183cc>] ? mutex_lock+0xc/0x22
[ 192.178194] [<ffffffffa0ab83c9>] ? ptlrpc_pinger_main+0x29/0x840 [ptlrpc]
[ 192.178835] [<ffffffff810996c4>] ? process_one_work+0x154/0x410
[ 192.179406] [<ffffffff8109a2a6>] ? worker_thread+0x116/0x4a0
[ 192.179954] [<ffffffff8109a190>] ? rescuer_thread+0x320/0x320
[ 192.180509] [<ffffffff8109a190>] ? rescuer_thread+0x320/0x320
[ 192.181068] [<ffffffff8109f7c9>] ? kthread+0xc9/0xe0
[ 192.181552] [<ffffffff8109f700>] ? kthread_park+0x50/0x50
[ 192.182080] [<ffffffff8161acc5>] ? ret_from_fork+0x55/0x80
[ 192.182608] [<ffffffff8109f700>] ? kthread_park+0x50/0x50
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |