Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12180

Crash when routerstat is running during lustre_rmmod

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.1
    • None
    • Several multi-rail systems running lnet self test
    • 3
    • 9223372036854775807

    Description

      I've seen this crash several times when I have a "routerstat 1" running in one window and running lustre_rmmod.  Could be coincidence but I didn't see it when running 2.10.7.  I'm running 4e737a6a8a0f75425255c21eb95e43d9a950193b as head from the b2_12 branch.

       

      This is in a multi-rail environment.

       

      [ 1518.289981] BUG: unable to handle kernel paging request at fffffffffffffff0
       [ 1518.298117] IP: [<ffffffffc0c1c5c6>] cfs_percpt_number+0x6/0x10 [libcfs]
       [ 1518.305955] PGD 459c814067 PUD 459c816067 PMD 0 
       [ 1518.311665] Oops: 0000 [#1] SMP 
       [ 1518.315955] Modules linked in: lnet_selftest(OE-) ksocklnd(OE) ko2iblnd(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma sunrpc rdma_ucm ib_umad ib_uverbs ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_cm skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif aesni_intel lrw gf128mul
       [ 1518.394242] glue_helper ablk_helper cryptd ses enclosure pcspkr sg ipmi_si ipmi_devintf ipmi_msghandler hpilo mlx5_ib hpwdt ib_core mei_me lpc_ich mei wmi ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops serio_raw ttm crct10dif_pclmul crct10dif_common crc32c_intel mlx5_core drm smartpqi scsi_transport_sas mlxfw drm_panel_orientation_quirks devlink tg3 ptp pps_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest]
       [ 1518.450540] CPU: 19 PID: 50662 Comm: routerstat Kdump: loaded Tainted: G OE ------------ 3.10.0-957.5.1.el7.x86_64 #1
       [ 1518.465721] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 10/02/2018
       [ 1518.475984] task: ffff9bc19df2d140 ti: ffff9bc11bed0000 task.ti: ffff9bc11bed0000
       [ 1518.485188] RIP: 0010:[<ffffffffc0c1c5c6>] [<ffffffffc0c1c5c6>] cfs_percpt_number+0x6/0x10 [libcfs]
       [ 1518.496089] RSP: 0018:ffff9bc11bed3db0 EFLAGS: 00010296
       [ 1518.503102] RAX: 0000000000000004 RBX: ffff9ba85e34eb00 RCX: 0000000000000000
       [ 1518.511930] RDX: 0000000000000001 RSI: 00000000ffffffff RDI: 0000000000000000
       [ 1518.520751] RBP: ffff9bc11bed3dd0 R08: 000000000001f120 R09: ffff9b633fc03700
       [ 1518.529564] R10: ffffffffc0c999cb R11: 0000000000000246 R12: 0000000000000000
       [ 1518.538349] R13: 0000000000000000 R14: 0000000000000300 R15: ffff9ba85e34eb00
       [ 1518.547109] FS: 00007f1f47adc740(0000) GS:ffff9ba99fac0000(0000) knlGS:0000000000000000
       [ 1518.556828] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [ 1518.564166] CR2: fffffffffffffff0 CR3: 00000046a18d0000 CR4: 00000000007607e0
       [ 1518.572891] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [ 1518.581607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [ 1518.590306] PKRU: 55555554
       [ 1518.594523] Call Trace:
       [ 1518.598455] [<ffffffffc0c6c96b>] ? lnet_counters_get_common+0xeb/0x150 [lnet]
       [ 1518.607204] [<ffffffffc0c6ca3c>] lnet_counters_get+0x6c/0x150 [lnet]
       [ 1518.615162] [<ffffffffc0c99a0b>] __proc_lnet_stats+0xfb/0x810 [lnet]
       [ 1518.623081] [<ffffffffc0c09602>] lprocfs_call_handler+0x22/0x50 [libcfs]
       [ 1518.631332] [<ffffffffc0c98ee5>] proc_lnet_stats+0x25/0x30 [lnet]
       [ 1518.638962] [<ffffffffc0c0965d>] lnet_debugfs_read+0x2d/0x40 [libcfs]
       [ 1518.646929] [<ffffffffa46414bf>] vfs_read+0x9f/0x170
       [ 1518.653382] [<ffffffffa464237f>] SyS_read+0x7f/0xf0
       [ 1518.659717] [<ffffffffa4b74ddb>] system_call_fastpath+0x22/0x27
       [ 1518.667084] Code: 85 03 00 50 4b c5 c0 c7 05 cc 85 03 00 00 00 02 00 e8 ff f8 fe ff 8b 1d 89 67 01 00 e9 e8 f9 ff ff 0f 1f 40 00 0f 1f 44 00 00 55 <8b> 47 f0 48 89 e5 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 
       [ 1518.689553] RIP [<ffffffffc0c1c5c6>] cfs_percpt_number+0x6/0x10 [libcfs]
       [ 1518.697778] RSP <ffff9bc11bed3db0>
       [ 1518.702661] CR2: fffffffffffffff0
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            jfilizetti Jeremy Filizetti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: