[LU-11439] recovery-random-scale test_fail_client_mds: Kernel panic - not syncing: softlockup: hung tasks Created: 27/Sep/18  Updated: 15/Oct/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/141663f0-b8a7-11e8-9df3-52540065bddc

test_fail_client_mds failed with the following error:

trevis-44vm3 crashed during recovery-random-scale test_fail_client_mds

server: RHEL7 tat-2.11.55
client: SLES12sp3

[  106.405377] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh
[  106.511253] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
[  106.705497] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
[  107.045491] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers:
               mds1 failed over 1 times                and counting...
[  107.239916] Lustre: DEBUG MARKER: Number of failovers:
[  169.421912] Lustre: 1786:0:(client.c:2126:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1536959605/real 1536959606]  req@ffff880055a04380 x1611619009321216/t0(0) o4->lustre-OST0004-osc-ffff88007ac80000@10.9.5.53@tcp:6/4 lens 608/448 e 1 to 1 dl 1536959646 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
[  169.421946] Lustre: lustre-OST0004-osc-ffff88007ac80000: Connection to lustre-OST0004 (at 10.9.5.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
[  170.420094] Lustre: lustre-OST0004-osc-ffff88007ac80000: Connection restored to 10.9.5.53@tcp (at 10.9.5.53@tcp)
[  192.121860] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:5:411]
[  192.122615] Modules linked in: mgc(OEN) lustre(OEN) lmv(OEN) mdc(OEN) fid(OEN) osc(OEN) lov(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache af_packet iscsi_ibft iscsi_boot_sysfs rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ib_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 8139too cryptd 8139cp pcspkr joydev virtio_balloon mii i2c_piix4 processor button ext4 crc16 jbd2 mbcache ata_generic virtio_blk ata_piix ahci libahci serio_raw floppy uhci_hcd
[  192.134149]  virtio_pci ehci_hcd virtio_ring virtio usbcore usb_common libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
[  192.136328] Supported: No, Unsupported modules are loaded
[  192.136848] CPU: 1 PID: 411 Comm: kworker/1:5 Tainted: G           OE   N  4.4.143-94.47-default #1
[  192.137670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  192.138378] Workqueue: ptlrpc_pinger ptlrpc_pinger_main [ptlrpc]
[  192.139016] task: ffff88003745c140 ti: ffff88007b05c000 task.ti: ffff88007b05c000
[  192.139703] RIP: 0010:[<ffffffff816183cc>]  [<ffffffff816183cc>] mutex_lock+0xc/0x22
[  192.140507] RSP: 0018:ffff88007b05fdc0  EFLAGS: 00000246
[  192.141015] RAX: 00000000000000c0 RBX: ffffffffa0b5d5e0 RCX: 0000000000000002
[  192.141677] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffa0b5d5e0
[  192.142336] RBP: 0000000000000009 R08: 0000000000000000 R09: 0000000000000000
[  192.142989] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  192.143640] R13: ffff8800707772d0 R14: 0000000000000000 R15: ffff880070777000
[  192.144297] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[  192.145035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  192.145573] CR2: 00007fdd6e776780 CR3: 000000006fba2000 CR4: 0000000000060670
[  192.146238] Stack:
[  192.146475]  ffffffffffffffda ffffffffa0ab83c9 ffff880000000001 0000000000000000
[  192.147428]  ffffffff00000000 ffff880000000001 0000000000000000 0000000000000095
[  192.148382]  0000000100000000 ffffffffa0b31a20 ffff880037075040 ffff88007fd19d00
[  192.149336] Call Trace:
[  192.149664]  [<ffffffffa0ab83c9>] ptlrpc_pinger_main+0x29/0x840 [ptlrpc]
[  192.150339]  [<ffffffff810996c4>] process_one_work+0x154/0x410
[  192.150902]  [<ffffffff8109a2a6>] worker_thread+0x116/0x4a0
[  192.151438]  [<ffffffff8109f7c9>] kthread+0xc9/0xe0
[  192.151913]  [<ffffffff8161acc5>] ret_from_fork+0x55/0x80
[  192.155061] DWARF2 unwinder stuck at ret_from_fork+0x55/0x80
[  192.155597] 
[  192.155790] Leftover inexact backtrace:
               
[  192.156318]  [<ffffffff8109f700>] ? kthread_park+0x50/0x50
[  192.156840] Code: 41 5e 41 5f 5d c3 31 c0 87 03 83 f8 01 0f 85 55 ff ff ff eb cb 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb f0 ff 0f <79> 05 e8 dd fe ff ff 65 48 8b 04 25 80 25 01 00 48 89 43 18 5b 
[  192.162845] Kernel panic - not syncing: softlockup: hung tasks
[  192.163403] CPU: 1 PID: 411 Comm: kworker/1:5 Tainted: G           OEL  N  4.4.143-94.47-default #1
[  192.164219] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  192.164780] Workqueue: ptlrpc_pinger ptlrpc_pinger_main [ptlrpc]
[  192.165416]  0000000000000000 ffffffff8132ad80 ffffffff81a25f19 ffff88007fd03ed0
[  192.166363]  ffffffff81191f31 0000000000000008 ffff88007fd03ee0 ffff88007fd03e80
[  192.167313]  0000000000000000 0000000000000000 0000000000000000 0000000000000006
[  192.168266] Call Trace:
[  192.168557]  [<ffffffff81019ac9>] dump_trace+0x59/0x340
[  192.169069]  [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
[  192.169630]  [<ffffffff8101ac71>] show_stack+0x21/0x40
[  192.170144]  [<ffffffff8132ad80>] dump_stack+0x5c/0x7c
[  192.170653]  [<ffffffff81191f31>] panic+0xd2/0x232
[  192.171136]  [<ffffffff811446a9>] watchdog_timer_fn+0x1d9/0x1e0
[  192.171708]  [<ffffffff810f878c>] __hrtimer_run_queues+0xec/0x260
[  192.172293]  [<ffffffff810f8bc9>] hrtimer_interrupt+0x99/0x190
[  192.172845]  [<ffffffff8161e67f>] smp_apic_timer_interrupt+0x3f/0x60
[  192.173445]  [<ffffffff8161b96b>] apic_timer_interrupt+0xeb/0xf0
[  192.176111] DWARF2 unwinder stuck at apic_timer_interrupt+0xeb/0xf0
[  192.176706] 
[  192.176900] Leftover inexact backtrace:
               
[  192.177425]  <IRQ>  <EOI>  [<ffffffff816183cc>] ? mutex_lock+0xc/0x22
[  192.178194]  [<ffffffffa0ab83c9>] ? ptlrpc_pinger_main+0x29/0x840 [ptlrpc]
[  192.178835]  [<ffffffff810996c4>] ? process_one_work+0x154/0x410
[  192.179406]  [<ffffffff8109a2a6>] ? worker_thread+0x116/0x4a0
[  192.179954]  [<ffffffff8109a190>] ? rescuer_thread+0x320/0x320
[  192.180509]  [<ffffffff8109a190>] ? rescuer_thread+0x320/0x320
[  192.181068]  [<ffffffff8109f7c9>] ? kthread+0xc9/0xe0
[  192.181552]  [<ffffffff8109f700>] ? kthread_park+0x50/0x50
[  192.182080]  [<ffffffff8161acc5>] ? ret_from_fork+0x55/0x80
[  192.182608]  [<ffffffff8109f700>] ? kthread_park+0x50/0x50

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
recovery-random-scale test_fail_client_mds - trevis-44vm3 crashed during recovery-random-scale test_fail_client_mds


Generated at Sat Feb 10 02:43:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.