Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4056

oops in lprocfs_remove_nolock from hsm_cdt_procfs_fini

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 3
    • 10873

    Description

      It seems recent hsm proc landings introduced an oops on MDs shutdown.

      doing sh llmount.sh ; sleep 10 ; sh llmountcleanup.sh triggers it 90% of the time for me (the other 10% is a lockup somewhere):

      <1>[  178.700818] BUG: unable to handle kernel paging request at ffff8800984f6f78
      <1>[  178.702152] IP: [<ffffffffa0572ccc>] lprocfs_remove_nolock+0x2c/0x110 [obdclass]
      <4>[  178.704018] PGD 1a26063 PUD 501067 PMD 5c4067 PTE 80000000984f6060
      <4>[  178.704261] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      <4>[  178.704261] last sysfs file: /sys/devices/system/cpu/possible
      <4>[  178.704261] CPU 1 
      <4>[  178.704261] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs exportfs mdd mgs lquota lfsck jbd obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet sha512_generic sha256_generic libcfs ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_net virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: speedstep_lib]
      <4>[  178.704261] 
      <4>[  178.704261] Pid: 3651, comm: umount Not tainted 2.6.32-rhe6.4-debug #2 Red Hat KVM
      <4>[  178.704261] RIP: 0010:[<ffffffffa0572ccc>]  [<ffffffffa0572ccc>] lprocfs_remove_nolock+0x2c/0x110 [obdclass]
      <4>[  178.704261] RSP: 0018:ffff88008d4abb08  EFLAGS: 00010287
      <4>[  178.704261] RAX: ffffffffa05f4d40 RBX: ffff8800984f6f30 RCX: 0000000000000000
      <4>[  178.704261] RDX: 0000000000000000 RSI: 0000000000000030 RDI: ffff88009abb6b28
      <4>[  178.704261] RBP: ffff88008d4abb38 R08: 0000000000000001 R09: ffff880000000000
      <4>[  178.704261] R10: ffff8800984e4000 R11: 0000000087654321 R12: ffff88009abb6000
      <4>[  178.704261] R13: ffff8800b351fdf0 R14: ffff88009073dd48 R15: 0000000000000038
      <4>[  178.704261] FS:  00007f76bf2af740(0000) GS:ffff880006240000(0000) knlGS:0000000000000000
      <4>[  178.704261] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      <4>[  178.704261] CR2: ffff8800984f6f78 CR3: 00000000b085c000 CR4: 00000000000006e0
      <4>[  178.704261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[  178.704261] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[  178.704261] Process umount (pid: 3651, threadinfo ffff88008d4aa000, task ffff8800b06b40c0)
      <4>[  178.704261] Stack:
      <4>[  178.704261]  ffff88009abbd0e8 0000000000000246 ffff88009abb6b28 ffff88009abb6000
      <4>[  178.704261] <d> ffff8800b351fdf0 ffff88009073dd48 ffff88008d4abb58 ffffffffa0572f05
      <4>[  178.704261] <d> ffff88008d4abb68 ffff88009abbc180 ffff88008d4abb68 ffffffffa0d0e409
      <4>[  178.704261] Call Trace:
      <4>[  178.704261]  [<ffffffffa0572f05>] lprocfs_remove+0x25/0x40 [obdclass]
      <4>[  178.704261]  [<ffffffffa0d0e409>] hsm_cdt_procfs_fini+0x29/0x60 [mdt]
      <4>[  178.704261]  [<ffffffffa0cfc8ae>] mdt_procfs_fini+0x4e/0x80 [mdt]
      <4>[  178.704261]  [<ffffffffa0cc8a55>] mdt_device_fini+0x385/0xd80 [mdt]
      <4>[  178.704261]  [<ffffffffa05919c3>] class_cleanup+0x583/0xd40 [obdclass]
      <4>[  178.704261]  [<ffffffffa056922c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      <4>[  178.704261]  [<ffffffffa05936ea>] class_process_config+0x156a/0x1ad0 [obdclass]
      <4>[  178.704261]  [<ffffffffa058c89c>] ? lustre_cfg_new+0x16c/0x6e0 [obdclass]
      <4>[  178.704261]  [<ffffffffa058ca03>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
      <4>[  178.704261]  [<ffffffffa0593dc9>] class_manual_cleanup+0x179/0x6e0 [obdclass]
      <4>[  178.704261]  [<ffffffffa056922c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      <4>[  178.704261]  [<ffffffffa05cde94>] server_put_super+0x5c4/0xed0 [obdclass]
      <4>[  178.704261]  [<ffffffff81183a4b>] generic_shutdown_super+0x5b/0xe0
      <4>[  178.704261]  [<ffffffff81183b36>] kill_anon_super+0x16/0x60
      <4>[  178.704261]  [<ffffffffa0595c16>] lustre_kill_super+0x36/0x60 [obdclass]
      <4>[  178.704261]  [<ffffffff811842d7>] deactivate_super+0x57/0x80
      <4>[  178.704261]  [<ffffffff811a237f>] mntput_no_expire+0xbf/0x110
      <4>[  178.704261]  [<ffffffff811a2dfb>] sys_umount+0x7b/0x3a0
      <4>[  178.704261]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
      <4>[  178.704261] Code: 48 89 e5 41 56 41 55 41 54 53 48 83 ec 10 0f 1f 44 00 00 48 8b 1f 48 c7 07 00 00 00 00 48 85 db 74 52 48 81 fb 00 f0 ff ff 77 49 <4c> 8b 73 48 4d 85 f6 75 0e e9 99 00 00 00 66 0f 1f 44 00 00 4c 
      <1>[  178.704261] RIP  [<ffffffffa0572ccc>] lprocfs_remove_nolock+0x2c/0x110 [obdclass]
      <4>[  178.704261]  RSP <ffff88008d4abb08>
      <4>[  178.704261] CR2: ffff8800984f6f78
      

      Attachments

        Activity

          People

            jhammond John Hammond
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: