Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10390

MGS crashes in ldlm_reprocess_queue() when stopping

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.2
    • None
    • CentOS 7.4
    • 3
    • 9223372036854775807

    Description

      Never seen before but already twice with the latest 2.10.2 version: the MGS is crashing when stopping:

      [77225.855547] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [77225.864304] IP: [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
      [77225.872614] PGD 0 
      [77225.874864] Oops: 0000 [#1] SMP 
      [77225.878480] Modules linked in: mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat uas usb_storage mpt2sas mptctl mptbase dell_rbu rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core intel_powerclamp dm_service_time coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper iTCO_wdt ablk_helper iTCO_vendor_support cryptd dm_round_robin pcspkr mxm_wmi dcdbas sg ipmi_si ipmi_devintf ipmi_msghandler mei_me mei lpc_ich shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc dm_mod ip_tables ext4 mbcache
      [77225.957975]  jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core drm tg3 ahci libahci crct10dif_pclmul mpt3sas crct10dif_common raid_class ptp crc32c_intel libata megaraid_sas i2c_core devlink scsi_transport_sas pps_core
      [77225.986851] CPU: 23 PID: 27105 Comm: ldlm_bl_14 Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1
      [77225.999661] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.3.4 11/08/2016
      [77226.008010] task: ffff88103323cf10 ti: ffff881012ab8000 task.ti: ffff881012ab8000
      [77226.016359] RIP: 0010:[<ffffffffc0ba48ce>]  [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
      [77226.027355] RSP: 0018:ffff881012abbbe0  EFLAGS: 00010287
      [77226.033280] RAX: 0000000000000000 RBX: ffff881011f7d400 RCX: ffff881012abbc7c
      [77226.041240] RDX: 0000000000000002 RSI: ffff881012abbc80 RDI: ffff881011f7d400
      [77226.049201] RBP: ffff881012abbc58 R08: ffff881012abbcd0 R09: ffff88103d0d7880
      [77226.057162] R10: ffff881011f7d400 R11: 7fffffffffffffff R12: ffff880168287540
      [77226.065123] R13: 0000000000000002 R14: ffff881012abbcd0 R15: ffff881011f7d460
      [77226.073085] FS:  0000000000000000(0000) GS:ffff88203c8c0000(0000) knlGS:0000000000000000
      [77226.082111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [77226.088521] CR2: 000000000000001c CR3: 00000000019f2000 CR4: 00000000001407e0
      [77226.096482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [77226.104443] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [77226.112403] Stack:
      [77226.114642]  ffff881012abbc7c ffff881012abbcd0 ffff881012abbc80 0000000000000000
      [77226.122930]  ffff880168287520 0000001000000001 ffff880100000010 ffff881012abbc18
      [77226.131219]  ffff881012abbc18 00000000bd734aeb 0000000000000002 ffff880168287540
      [77226.139507] Call Trace:
      [77226.142252]  [<ffffffffc0ba4860>] ? ldlm_errno2error+0x60/0x60 [ptlrpc]
      [77226.149649]  [<ffffffffc0b8f9db>] ldlm_reprocess_queue+0x13b/0x2a0 [ptlrpc]
      [77226.157434]  [<ffffffffc0b9057d>] __ldlm_reprocess_all+0x14d/0x3a0 [ptlrpc]
      [77226.165220]  [<ffffffffc0b90b30>] ldlm_reprocess_res+0x20/0x30 [ptlrpc]
      [77226.172611]  [<ffffffffc0866bef>] cfs_hash_for_each_relax+0x21f/0x400 [libcfs]
      [77226.180687]  [<ffffffffc0b90b10>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
      [77226.188571]  [<ffffffffc0b90b10>] ? ldlm_lock_downgrade+0x320/0x320 [ptlrpc]
      [77226.196441]  [<ffffffffc0869d95>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
      [77226.204518]  [<ffffffffc0b90b7c>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
      [77226.212983]  [<ffffffffc0b917bc>] ldlm_export_cancel_locks+0x11c/0x130 [ptlrpc]
      [77226.221162]  [<ffffffffc0bbada8>] ldlm_bl_thread_main+0x4c8/0x700 [ptlrpc]
      [77226.228836]  [<ffffffff816a8fad>] ? __schedule+0x39d/0x8b0
      [77226.234977]  [<ffffffffc0bba8e0>] ? ldlm_handle_bl_callback+0x410/0x410 [ptlrpc]
      [77226.243232]  [<ffffffff810b098f>] kthread+0xcf/0xe0
      [77226.248672]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      [77226.255472]  [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
      [77226.261494]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      [77226.268292] Code: 89 45 a0 74 0d f6 05 b3 ac cd ff 01 0f 85 34 06 00 00 8b 83 98 00 00 00 39 83 9c 00 00 00 89 45 b8 0f 84 57 09 00 00 48 8b 45 a0 <8b> 40 1c 85 c0 0f 84 7a 09 00 00 48 8b 4d a0 48 89 c8 48 83 c0 
      [77226.289871] RIP  [<ffffffffc0ba48ce>] ldlm_process_plain_lock+0x6e/0xb30 [ptlrpc]
      [77226.298248]  RSP <ffff881012abbbe0>
      [77226.302135] CR2: 000000000000001c
      

      Best,
      Stephane

      Attachments

        Issue Links

          Activity

            People

              emoly.liu Emoly Liu
              sthiell Stephane Thiell
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: