[LU-9900] conf-sanity test_46a: soft lockup on MDS when umount Created: 22/Aug/17  Updated: 21/Mar/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/af4f08f4-866c-11e7-b3ca-5254006e85c2.

The MDS console messages:

09:30:03:[19169.155819] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
09:30:03:[19228.288937] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [umount:6400]
09:30:03:[19228.289745] Modules linked in: lustre(OE) obdecho(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw ppdev virtio_balloon gf128mul glue_helper ablk_helper cryptd joydev pcspkr parport_pc i2c_piix4 parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk crct10dif_pclmul crct10dif_common drm 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii i2c_core floppy
09:30:03:[19228.289745] CPU: 1 PID: 6400 Comm: umount Tainted: G           OE  ------------   3.10.0-693.el7_lustre.x86_64 #1
09:30:03:[19228.289745] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
09:30:03:[19228.289745] task: ffff8800600fcf10 ti: ffff88007c05c000 task.ti: ffff88007c05c000
09:30:03:[19228.289745] RIP: 0010:[<ffffffff81331ba3>]  [<ffffffff81331ba3>] memset+0x33/0xb0
09:30:03:[19228.289745] RSP: 0018:ffff88007c05fb30  EFLAGS: 00010216
09:30:03:[19228.289745] RAX: 5a5a5a5a5a5a5a5a RBX: ffff88007aa60600 RCX: 00000000000023cf
09:30:03:[19228.289745] RDX: 00000000000fa400 RSI: 000000000000005a RDI: ffff88003e96b000
09:30:03:[19228.289745] RBP: ffff88007c05fb70 R08: 000000000000000a R09: 0000000000000000
09:30:03:[19228.289745] R10: ffff88003e900000 R11: 000000000000000f R12: ffff8800205d2c00
09:30:03:[19228.289745] R13: 0000000100100007 R14: ffff88001fccfc00 R15: 0000000000000008
09:30:03:[19228.289745] FS:  00007f2579d75880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
09:30:03:[19228.289745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
09:30:03:[19228.289745] CR2: 00007f74b5103000 CR3: 000000005faa9000 CR4: 00000000000406e0
09:30:03:[19228.289745] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
09:30:03:[19228.289745] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
09:30:03:[19228.289745] Stack:
09:30:03:[19228.289745]  ffffffffc0ade2a8 0000000000000000 ffff8800205d2c48 0000000000000001
09:30:03:[19228.289745]  ffff88007aa60600 ffffffffc0ec0ac0 ffff8800642b2fe0 ffff88003cdd47c0
09:30:03:[19228.289745]  ffff88007c05fba0 ffffffffc0adfb17 ffff88007b292e40 ffff88007b292ec0
09:30:03:[19228.289745] Call Trace:
09:30:03:[19228.289745]  [<ffffffffc0ade2a8>] ? ptlrpc_service_purge_all+0x498/0x920 [ptlrpc]
09:30:03:[19228.289745]  [<ffffffffc0adfb17>] ptlrpc_unregister_service+0xe7/0x6f0 [ptlrpc]
09:30:03:[19228.289745]  [<ffffffffc0e8643e>] mds_stop_ptlrpc_service+0x6e/0x1b0 [mdt]
09:30:03:[19228.289745]  [<ffffffffc0e865ad>] mds_device_fini+0x2d/0xe0 [mdt]
09:30:03:[19228.289745]  [<ffffffffc0898711>] class_cleanup+0x971/0xcd0 [obdclass]
09:30:03:[19228.289745]  [<ffffffffc089aaad>] class_process_config+0x19cd/0x23b0 [obdclass]
09:30:03:[19228.289745]  [<ffffffffc0693ba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
09:30:03:[19228.289745]  [<ffffffffc089b656>] class_manual_cleanup+0x1c6/0x710 [obdclass]
09:30:03:[19228.289745]  [<ffffffffc08c3d45>] server_stop_servers+0xd5/0x160 [obdclass]
09:30:03:[19228.289745]  [<ffffffffc08c98c6>] server_put_super+0x126/0xcd0 [obdclass]
09:30:03:[19228.289745]  [<ffffffff81203712>] generic_shutdown_super+0x72/0x100
09:30:03:[19228.289745]  [<ffffffff81203ae2>] kill_anon_super+0x12/0x20
09:30:03:[19228.289745]  [<ffffffffc089df52>] lustre_kill_super+0x32/0x50 [obdclass]
09:30:03:[19228.289745]  [<ffffffff81203e99>] deactivate_locked_super+0x49/0x60
09:30:03:[19228.289745]  [<ffffffff81204606>] deactivate_super+0x46/0x60
09:30:03:[19228.289745]  [<ffffffff812216bf>] cleanup_mnt+0x3f/0x80
09:30:03:[19228.289745]  [<ffffffff81221752>] __cleanup_mnt+0x12/0x20
09:30:03:[19228.289745]  [<ffffffff810ad265>] task_work_run+0xc5/0xf0
09:30:03:[19228.289745]  [<ffffffff8102ab62>] do_notify_resume+0x92/0xb0
09:30:03:[19228.289745]  [<ffffffff816b527d>] int_signal+0x12/0x17
09:30:03:[19228.289745] Code: b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 39 66 0f 1f 84 00 00 00 00 00 48 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 
09:30:03:[19228.289745] Kernel panic - not syncing: softlockup: hung tasks


 Comments   
Comment by nasf (Inactive) [ 14/Feb/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/ecc57286-1177-11e8-a10a-52540065bddc

Comment by Bruno Faccini (Inactive) [ 21/Mar/18 ]

+1 on master:

https://testing.hpdd.intel.com/test_sets/fb1ba636-2c5f-11e8-b3c6-52540065bddc

 

Generated at Sat Feb 10 02:30:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.