Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4285

kernel OOPs in __dquot_initialize()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • None
    • SLES 11 SP2 and SP3 servers
    • 3
    • 11760

    Description

      One of the base kernel patches for SLES11 has a serious flaw. It leaves local variables in __dquot_initialize() uninited in some execution paths. This can cause kernel OOPS during calls to dquot_initialize() made from very common low level ldiskfs primitives. Often seen during user level mount or mkfs.lustre of lustre devices. example seen during a mount:

      Nov 20 11:56:25 suse2 kernel: [222946.909459] BUG: unable to handle kernel
      paging request at 00000000000b012b
      Nov 20 11:56:25 suse2 kernel: [222946.909767] IP: [<ffffffff811a8e4b>]
      dqput+0x16b/0x1e0
      Nov 20 11:56:25 suse2 kernel: [222946.911543] PGD 1eafc067 PUD 5ae3067 PMD 0 
      Nov 20 11:56:25 suse2 kernel: [222946.911548] Oops: 0000 [#1] SMP 
      Nov 20 11:56:25 suse2 kernel: [222946.913328] CPU 0 
      Nov 20 11:56:25 suse2 kernel: [222946.913330] Modules linked in:
      osd_ldiskfs(N) lquota(N) ldiskfs(N) jbd2 lustre(N) lov(N) osc(N) mdc(N) fid(N)
      fld(N) ksocklnd(N) ptlrpc(N) obdclass(N) lnet(N) sha512_generic sha256_generic
      sha1_generic md5 crc32c libcfs(N) lp snd_pcm_oss snd_mixer_oss snd_seq_midi
      snd_seq_midi_event snd_seq edd mperf vmhgfs(X) vsock(X) acpiphp microcode fuse
      loop dm_mod btusb bluetooth rfkill ppdev crc16 parport_pc parport floppy ipv6
      ipv6_lib snd_ens1371 gameport snd_rawmidi snd_seq_device acpi_memhotplug
      snd_ac97_codec vmw_balloon(X) ac97_bus snd_pcm pciehp pcspkr snd_timer snd ac
      soundcore e1000 sr_mod snd_page_alloc rtc_cmos cdrom sg container button
      i2c_piix4 mptctl intel_agp vmci(X) i2c_core shpchp intel_gtt pci_hotplug ext3
      jbd mbcache usbhid hid sd_mod crc_t10dif processor thermal_sys hwmon uhci_hcd
      ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw
      scsi_dh_alua scsi_dh vmxnet(X) vmw_pvscsi vmxnet3 ata_generic ata_piix libata
      mptspi mptscsih mptbase scsi_transport_spi sc
      Nov 20 11:56:25 suse2 kernel: si_mod
      Nov 20 11:56:25 suse2 kernel: [222946.913391] Supported: No, Unsupported
      modules are loaded
      Nov 20 11:56:25 suse2 kernel: [222946.913393] 
      Nov 20 11:56:25 suse2 kernel: [222946.913708] Pid: 37457, comm: mount.lustre
      Tainted: G           NX 3.0.93-0.5_lustre.g8744d8b-default #1 VMware, Inc.
      VMware Virtual Platform/440BX Desktop Reference Platform
      Nov 20 11:56:25 suse2 kernel: [222946.913714] RIP: 0010:[<ffffffff811a8e4b>] 
      [<ffffffff811a8e4b>] dqput+0x16b/0x1e0
      Nov 20 11:56:25 suse2 kernel: [222946.913741] RSP: 0018:ffff880004265858 
      EFLAGS: 00010216
      Nov 20 11:56:25 suse2 kernel: [222946.913744] RAX: 00000000000b000b RBX:
      ffff88003e021498 RCX: 00000000000147f8
      Nov 20 11:56:25 suse2 kernel: [222946.913746] RDX: 000000000000000e RSI:
      0000000000000001 RDI: ffffffff81a02900
      Nov 20 11:56:25 suse2 kernel: [222946.913748] RBP: ffff88003e021400 R08:
      0000000000000000 R09: 0000000000000001
      Nov 20 11:56:25 suse2 kernel: [222946.913750] R10: 0000000000000004 R11:
      ffff8800042658c8 R12: ffff88003e021460
      Nov 20 11:56:25 suse2 kernel: [222946.913752] R13: ffff88003e021430 R14:
      00000000ffffffff R15: ffff8800035d2cd0
      Nov 20 11:56:25 suse2 kernel: [222946.913754] FS:  00007fbf5d0a7700(0000)
      GS:ffff88003f600000(0000) knlGS:0000000000000000
      Nov 20 11:56:25 suse2 kernel: [222946.913756] CS:  0010 DS: 0000 ES: 0000 CR0:
      000000008005003b
      Nov 20 11:56:25 suse2 kernel: [222946.913758] CR2: 00000000000b012b CR3:
      00000000398ea000 CR4: 00000000001406f0
      Nov 20 11:56:25 suse2 kernel: [222946.913779] DR0: 0000000000000000 DR1:
      0000000000000000 DR2: 0000000000000000
      Nov 20 11:56:25 suse2 kernel: [222946.913794] DR3: 0000000000000000 DR6:
      00000000ffff0ff0 DR7: 0000000000000400
      Nov 20 11:56:25 suse2 kernel: [222946.913797] Process mount.lustre (pid:
      37457, threadinfo ffff880004264000, task ffff8800187cc1c0)
      Nov 20 11:56:25 suse2 kernel: [222946.913799] Stack:
      Nov 20 11:56:25 suse2 kernel: [222946.913800]  ffff88000f116000
      ffff8800035d2cb0 0000000000000010 0000000000000002
      Nov 20 11:56:25 suse2 kernel: [222946.913804]  0000000000000006
      ffffffff811a9c74 0000000000000000 ffff88003e0213e0
      Nov 20 11:56:25 suse2 kernel: [222946.913806]  ffff8800035d2cb0
      0000000200000301 0000000000000000 000000003e090240
      Nov 20 11:56:25 suse2 kernel: [222946.914290] Call Trace:
      Nov 20 11:56:25 suse2 kernel: [222946.916006]  [<ffffffff811a9c74>]
      __dquot_initialize+0x224/0x2a0
      Nov 20 11:56:25 suse2 kernel: [222946.916020]  [<ffffffffa0cc9b60>]
      osd_ea_fid_set+0x80/0x390 [osd_ldiskfs]
      Nov 20 11:56:25 suse2 kernel: [222946.916050]  [<ffffffffa0cf6ff0>]
      osd_scrub_setup+0x1e0/0x600 [osd_ldiskfs]
      Nov 20 11:56:25 suse2 kernel: [222946.916070]  [<ffffffffa0ccad19>]
      osd_device_init0+0x3d9/0x5c0 [osd_ldiskfs]
      Nov 20 11:56:25 suse2 kernel: [222946.916080]  [<ffffffffa0ccb066>]
      osd_device_alloc+0x166/0x2c0 [osd_ldiskfs]
      Nov 20 11:56:25 suse2 kernel: [222946.916111]  [<ffffffffa067e1bb>]
      class_setup+0x61b/0xad0 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916141]  [<ffffffffa0685be5>]
      class_process_config+0xc95/0x18f0 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916170]  [<ffffffffa068ac32>]
      do_lcfg+0x142/0x460 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916199]  [<ffffffffa068afe4>]
      lustre_start_simple+0x94/0x210 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916232]  [<ffffffffa06b8e3a>]
      osd_start+0x4fa/0x7c0 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916275]  [<ffffffffa06c2bad>]
      server_fill_super+0xfd/0x550 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916316]  [<ffffffffa06907a8>]
      lustre_fill_super+0x178/0x530 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.916749]  [<ffffffff811556e3>]
      mount_nodev+0x83/0xc0
      Nov 20 11:56:25 suse2 kernel: [222946.917387]  [<ffffffffa0688670>]
      lustre_mount+0x20/0x30 [obdclass]
      Nov 20 11:56:25 suse2 kernel: [222946.917408]  [<ffffffff811551ee>]
      mount_fs+0x4e/0x1a0
      Nov 20 11:56:25 suse2 kernel: [222946.917454]  [<ffffffff811703f5>]
      vfs_kern_mount+0x65/0xd0
      Nov 20 11:56:25 suse2 kernel: [222946.917477]  [<ffffffff811704e3>]
      do_kern_mount+0x53/0x110
      Nov 20 11:56:25 suse2 kernel: [222946.917481]  [<ffffffff81171e2d>]
      do_mount+0x21d/0x260
      Nov 20 11:56:25 suse2 kernel: [222946.917483]  [<ffffffff81171f30>]
      sys_mount+0xc0/0xf0
      Nov 20 11:56:25 suse2 kernel: [222946.918465]  [<ffffffff81452192>]
      system_call_fastpath+0x16/0x1b
      Nov 20 11:56:25 suse2 kernel: [222946.918903]  [<00007fbf5ca2347a>]
      0x7fbf5ca23479
      Nov 20 11:56:25 suse2 kernel: [222946.918905] Code: 00 29 a0 81 e8 b7 14 2a 00
      3e 0f ba 33 00 19 c0 85 c0 75 6c 66 ff 05 c5 9a 85 00 e9 e0 fe ff ff 3e ff 4d
      60 48 8b 85 80 00 00 00 <8b> 90 20 01 00 00 0f bf 85 a0 00 00 00 8d 0c 40 b8
      01 00 00 00 
      Nov 20 11:56:25 suse2 kernel: [222946.918923] RIP  [<ffffffff811a8e4b>]
      dqput+0x16b/0x1e0
      Nov 20 11:56:25 suse2 kernel: [222946.918927]  RSP <ffff880004265858>
      Nov 20 11:56:25 suse2 kernel: [222946.918929] CR2: 00000000000b012b
      Nov 20 11:56:25 suse2 kernel: [222946.919070] ---[ end trace 64d4ae18af89a329
      ]-
      

      I think this was due to the SLES version of the offending patch being too close a copy of the RHEL version, not taking into account slight differences in context between the relevant RHEL & SLES kernel code.

      I will push a mod in the offending base kernel patch to fix it.

      Attachments

        Activity

          People

            bogl Bob Glossman (Inactive)
            bogl Bob Glossman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: