Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11626

mdc: obd might go away while referenced by code in mdc_changelog

Details

    • 3
    • 9223372036854775807

    Description

      In lustre/mdc/mdc_changelog.c:

      /**
       * Find the OBD device associated to a changelog character device.
       * @param[in]  cdev  character device instance descriptor
       * @return corresponding OBD device or NULL if none was found.
       */
      static struct obd_device *chlg_obd_get(dev_t cdev)
      {
      	int minor = MINOR(cdev);
      	struct obd_device *obd = NULL;
      	struct chlg_registered_dev *curr;
      
      	mutex_lock(&chlg_registered_dev_lock);
      	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
      		if (curr->ced_misc.minor == minor) {
      			/* take the first available OBD device attached */
      			obd = list_first_entry(&curr->ced_obds,
      					       struct obd_device,
      					       u.cli.cl_chg_dev_linkage);
      			break;
      		}
      	}
      	mutex_unlock(&chlg_registered_dev_lock);
      	return obd;
      }
      

      The "take the first available OBD device attached" approach is broken as that OBD might go away while the changelog is being read.

      Here is an example of how things can go wrong:

      • mount lustre at /mnt/lustre;
      • mount the same lustre filesystem at /mnt/lustre2;
      • open /dev/changelog-lustre-MDT0000 in writing mode;
      • unmount /mnt/lustre;
      • write 'clear:cl0:0' to the opened chardevice.

      That triggers an LBUG.

      Attachments

        Issue Links

          Activity

            [LU-11626] mdc: obd might go away while referenced by code in mdc_changelog

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36338/
            Subject: LU-11626 mdc: hold obd while processing changelog
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 9d88030b8a532de3bec3f75d01ec108bfba97082

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36338/ Subject: LU-11626 mdc: hold obd while processing changelog Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 9d88030b8a532de3bec3f75d01ec108bfba97082
            pjones Peter Jones added a comment -

            James

            We tagged RC1 this morning so, while there's still a chance, it is looking unlikely as things stand

            Peter

            pjones Peter Jones added a comment - James We tagged RC1 this morning so, while there's still a chance, it is looking unlikely as things stand Peter

            Peter, any chance this can land for the 2.12.3 release

            simmonsja James A Simmons added a comment - Peter, any chance this can land for the 2.12.3 release

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36338
            Subject: LU-11626 mdc: hold obd while processing changelog
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 24935d83103d44dee84a16a7e304bb13180ec22d

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36338 Subject: LU-11626 mdc: hold obd while processing changelog Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 24935d83103d44dee84a16a7e304bb13180ec22d
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35784/
            Subject: LU-11626 mdc: hold obd while processing changelog
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d7bb6647cd4dd26949bceb6a099cd606623aff2b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35784/ Subject: LU-11626 mdc: hold obd while processing changelog Project: fs/lustre-release Branch: master Current Patch Set: Commit: d7bb6647cd4dd26949bceb6a099cd606623aff2b

            Hi James,
            The issue still exists on master branch with the patch https://review.whamcloud.com/35668.

            [  106.920737] general protection fault: 0000 [#1] SMP 
            [  106.927417] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg parport_pc pcspkr parport i2c_piix4 video i2c_core ip_tables xfs libcrc32c sr_mod cdrom ata_generic sd_mod crc_t10dif crct10dif_generic pata_acpi ahci libahci ata_piix libata e1000 crct10dif_pclmul crct10dif_common crc32c_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod
            [  106.941934] CPU: 0 PID: 1287 Comm: bash Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
            [  106.944006] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
            [  106.945492] task: ffff8f61898b5ee0 ti: ffff8f61803e4000 task.ti: ffff8f61803e4000
            [  106.946817] RIP: 0010:[<ffffffffc09ed510>]  [<ffffffffc09ed510>] chlg_write+0x450/0x6c0 [mdc]
            [  106.948338] RSP: 0018:ffff8f61803e7e70  EFLAGS: 00010206
            [  106.949300] RAX: ffff8f6047e741e0 RBX: ffff8f6171eeac00 RCX: 0000000000000000
            [  106.950582] RDX: 0000000000000000 RSI: ffffffffc09f8d74 RDI: 0000000000000000
            [  106.951766] RBP: ffff8f61803e7ec0 R08: ffff8f6171eeac0a R09: 0000000000000000
            [  106.953063] R10: 000000000000000a R11: f000000000000000 R12: 000000000000000b
            [  106.954279] R13: 00007f9075b49000 R14: 5a5a5a5a5a5a5a5a R15: 0000000000000000
            [  106.955473] FS:  00007f9075b43740(0000) GS:ffff8f619fc00000(0000) knlGS:0000000000000000
            [  106.956802] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [  106.957723] CR2: 00005619854aa528 CR3: 0000000216e52000 CR4: 00000000000606f0
            [  106.958892] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [  106.960067] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [  106.961216] Call Trace:
            [  106.961613]  [<ffffffffac41b490>] vfs_write+0xc0/0x1f0
            [  106.962440]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
            [  106.963499]  [<ffffffffac41c2bf>] SyS_write+0x7f/0xf0
            [  106.964300]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
            [  106.965349]  [<ffffffffac920795>] system_call_fastpath+0x1c/0x21
            [  106.966311]  [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
            [  106.967362] Code: 27 7e 01 00 ef 01 00 00 c7 05 21 7e 01 00 01 00 00 00 48 c7 05 1e 7e 01 00 00 00 00 00 e8 89 d2 90 ff 4d 85 f6 0f 84 33 01 00 00 <49> 8b 86 f8 00 00 00 48 85 c0 0f 84 a3 00 00 00 48 8b 10 48 85 
            [  106.970783] RIP  [<ffffffffc09ed510>] chlg_write+0x450/0x6c0 [mdc]
            [  106.971791]  RSP <ffff8f61803e7e70>
            [    0.794662] mce: Unable to init device /dev/mcelog (rc: -5)
            [    0.800919] Failed to execute /init
            [    0.801546] Kernel panic - not syncing: No init found.  Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.
            [    0.803450] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-862.9.1.el7.x86_64 #1
            [    0.804651] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
            [    0.805965] Call Trace:
            [    0.806399]  [<ffffffff8170e84e>] dump_stack+0x19/0x1b
            [    0.807277]  [<ffffffff81708b50>] panic+0xe8/0x21f
            [    0.808041]  [<ffffffff8122ccdd>] ? putname+0x3d/0x60
            [    0.808799]  [<ffffffff816fd780>] ? rest_init+0x80/0x80
            [    0.809627]  [<ffffffff816fd862>] kernel_init+0xe2/0xf0
            [    0.810741]  [<ffffffff817205f7>] ret_from_fork_nospec_begin+0x21/0x21
            [    0.811744]  [<ffffffff816fd780>] ? rest_init+0x80/0x80
            
            hongchao.zhang Hongchao Zhang added a comment - Hi James, The issue still exists on master branch with the patch https://review.whamcloud.com/35668 . [ 106.920737] general protection fault: 0000 [#1] SMP [ 106.927417] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg parport_pc pcspkr parport i2c_piix4 video i2c_core ip_tables xfs libcrc32c sr_mod cdrom ata_generic sd_mod crc_t10dif crct10dif_generic pata_acpi ahci libahci ata_piix libata e1000 crct10dif_pclmul crct10dif_common crc32c_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod [ 106.941934] CPU: 0 PID: 1287 Comm: bash Kdump: loaded Tainted: P OE ------------ 3.10.0-862.9.1.el7.x86_64 #1 [ 106.944006] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 106.945492] task: ffff8f61898b5ee0 ti: ffff8f61803e4000 task.ti: ffff8f61803e4000 [ 106.946817] RIP: 0010:[<ffffffffc09ed510>] [<ffffffffc09ed510>] chlg_write+0x450/0x6c0 [mdc] [ 106.948338] RSP: 0018:ffff8f61803e7e70 EFLAGS: 00010206 [ 106.949300] RAX: ffff8f6047e741e0 RBX: ffff8f6171eeac00 RCX: 0000000000000000 [ 106.950582] RDX: 0000000000000000 RSI: ffffffffc09f8d74 RDI: 0000000000000000 [ 106.951766] RBP: ffff8f61803e7ec0 R08: ffff8f6171eeac0a R09: 0000000000000000 [ 106.953063] R10: 000000000000000a R11: f000000000000000 R12: 000000000000000b [ 106.954279] R13: 00007f9075b49000 R14: 5a5a5a5a5a5a5a5a R15: 0000000000000000 [ 106.955473] FS: 00007f9075b43740(0000) GS:ffff8f619fc00000(0000) knlGS:0000000000000000 [ 106.956802] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.957723] CR2: 00005619854aa528 CR3: 0000000216e52000 CR4: 00000000000606f0 [ 106.958892] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.960067] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.961216] Call Trace: [ 106.961613] [<ffffffffac41b490>] vfs_write+0xc0/0x1f0 [ 106.962440] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146 [ 106.963499] [<ffffffffac41c2bf>] SyS_write+0x7f/0xf0 [ 106.964300] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146 [ 106.965349] [<ffffffffac920795>] system_call_fastpath+0x1c/0x21 [ 106.966311] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146 [ 106.967362] Code: 27 7e 01 00 ef 01 00 00 c7 05 21 7e 01 00 01 00 00 00 48 c7 05 1e 7e 01 00 00 00 00 00 e8 89 d2 90 ff 4d 85 f6 0f 84 33 01 00 00 <49> 8b 86 f8 00 00 00 48 85 c0 0f 84 a3 00 00 00 48 8b 10 48 85 [ 106.970783] RIP [<ffffffffc09ed510>] chlg_write+0x450/0x6c0 [mdc] [ 106.971791] RSP <ffff8f61803e7e70> [ 0.794662] mce: Unable to init device /dev/mcelog (rc: -5) [ 0.800919] Failed to execute /init [ 0.801546] Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance. [ 0.803450] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-862.9.1.el7.x86_64 #1 [ 0.804651] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 0.805965] Call Trace: [ 0.806399] [<ffffffff8170e84e>] dump_stack+0x19/0x1b [ 0.807277] [<ffffffff81708b50>] panic+0xe8/0x21f [ 0.808041] [<ffffffff8122ccdd>] ? putname+0x3d/0x60 [ 0.808799] [<ffffffff816fd780>] ? rest_init+0x80/0x80 [ 0.809627] [<ffffffff816fd862>] kernel_init+0xe2/0xf0 [ 0.810741] [<ffffffff817205f7>] ret_from_fork_nospec_begin+0x21/0x21 [ 0.811744] [<ffffffff816fd780>] ? rest_init+0x80/0x80
            simmonsja James A Simmons added a comment - - edited

            Patch 

            https://review.whamcloud.com/35668 which landed resolved this. It would be nice to push a reproducer to the test suite from the information in this ticket. That was missing in the patch.

            simmonsja James A Simmons added a comment - - edited Patch  https://review.whamcloud.com/35668 which landed resolved this. It would be nice to push a reproducer to the test suite from the information in this ticket. That was missing in the patch.

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35784
            Subject: LU-11626 mdc: hold obd while processing changelog
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f866eac40cc761b960cc92907d1d5546b6e24e38

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35784 Subject: LU-11626 mdc: hold obd while processing changelog Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f866eac40cc761b960cc92907d1d5546b6e24e38

            It might make sense to also allow writing to the open file descriptor like "mdt:mdt_idx" to set the specific MDT to be used. That allows userspace scripts to potentially read and manage the changelog file, instead of having to find some way to call ioctl().

            adilger Andreas Dilger added a comment - It might make sense to also allow writing to the open file descriptor like " mdt:mdt_idx " to set the specific MDT to be used. That allows userspace scripts to potentially read and manage the changelog file, instead of having to find some way to call ioctl() .

            People

              hongchao.zhang Hongchao Zhang
              cealustre CEA
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: