[LU-10607] uninitialized spinlock in osd_zfs Created: 06/Feb/18 Updated: 09/May/20 Resolved: 06/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
I missed this originally, but looks like https://review.whamcloud.com/#/c/30909/2 introduced a regression into master. Every time I mount lustre on ZFS, I get this: [ 54.320874] ZFS: Loaded module v0.7.1-1, ZFS pool version 5000, ZFS filesystem version 5 [ 82.005278] BUG: spinlock bad magic on CPU#1, mount.lustre/5874 [ 82.007911] lock: 0xffff88030066e688, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 82.008146] CPU: 1 PID: 5874 Comm: mount.lustre Tainted: P OE ------------ 3.10.0-debug #2 [ 82.008381] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 82.008526] 0000000000000000 00000000c03a9b87 ffff8800b00ff798 ffffffff816fd3e4 [ 82.010041] ffff8800b00ff7b8 ffffffff816fd472 ffff88030066e688 ffffffff81aa1a78 [ 82.013275] ffff8800b00ff7d8 ffffffff816fd498 ffff88030066e688 0000000000000001 [ 82.014974] Call Trace: [ 82.015815] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [ 82.016589] [<ffffffff816fd472>] spin_dump+0x8c/0x91 [ 82.017340] [<ffffffff816fd498>] spin_bug+0x21/0x26 [ 82.018086] [<ffffffff8138ec10>] do_raw_spin_lock+0x130/0x150 [ 82.018858] [<ffffffff81706249>] _raw_spin_lock+0x49/0x50 [ 82.019714] [<ffffffffa032ebf7>] ? lprocfs_oh_tally+0x17/0x40 [obdclass] [ 82.032661] [<ffffffffa032ebf7>] lprocfs_oh_tally+0x17/0x40 [obdclass] [ 82.036879] [<ffffffffa113781b>] record_start_io.part.13+0x2b/0x40 [osd_zfs] [ 82.037792] [<ffffffffa1137f1d>] osd_read+0xfd/0x2b0 [osd_zfs] [ 82.038385] [<ffffffffa0356244>] dt_read+0x14/0x50 [obdclass] [ 82.039012] [<ffffffffa037a52f>] scrub_file_load+0x4f/0x410 [obdclass] [ 82.039792] [<ffffffffa115290a>] osd_scrub_setup+0x3ea/0xb50 [osd_zfs] [ 82.040618] [<ffffffffa112c26a>] osd_mount+0xdfa/0x1170 [osd_zfs] [ 82.041425] [<ffffffff81706717>] ? _raw_read_unlock+0x27/0x40 [ 82.042127] [<ffffffffa112cc1d>] osd_device_alloc+0x34d/0x400 [osd_zfs] [ 82.043068] [<ffffffffa0341084>] obd_setup+0x114/0x2a0 [obdclass] [ 82.043668] [<ffffffffa03414b6>] class_setup+0x2a6/0x840 [obdclass] [ 82.044592] [<ffffffffa0345032>] class_process_config+0x1b62/0x28a0 [obdclass] [ 82.046302] [<ffffffff811cd4f9>] ? __kmalloc+0x649/0x660 [ 82.047316] [<ffffffffa01f3f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 82.048126] [<ffffffffa0348ae8>] do_lcfg+0x258/0x4b0 [obdclass] [ 82.048907] [<ffffffffa034cb48>] lustre_start_simple+0x88/0x210 [obdclass] [ 82.049714] [<ffffffffa0379154>] server_fill_super+0xf34/0x1860 [obdclass] [ 82.050487] [<ffffffffa01f3f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 82.062653] [<ffffffffa0350680>] lustre_fill_super+0x3d0/0x8b0 [obdclass] [ 82.063465] [<ffffffffa03502b0>] ? lustre_common_put_super+0xb50/0xb50 [obdclass] [ 82.065632] [<ffffffff811f0ebd>] mount_nodev+0x4d/0xb0 [ 82.069485] [<ffffffffa0348818>] lustre_mount+0x38/0x60 [obdclass] [ 82.070379] [<ffffffff811f1899>] mount_fs+0x39/0x1b0 [ 82.071309] [<ffffffff8120e843>] vfs_kern_mount+0x63/0xf0 [ 82.072876] [<ffffffff81210ebe>] do_mount+0x24e/0xa40 [ 82.073642] [<ffffffff811767fe>] ? __get_free_pages+0xe/0x50 [ 82.074377] [<ffffffff81211746>] SyS_mount+0x96/0xf0 [ 82.075246] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Peter Jones [ 06/Feb/18 ] |
|
Fan Yong Can you please advise Thanks Peter |
| Comment by Gerrit Updater [ 06/Feb/18 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31180 |
| Comment by Gerrit Updater [ 06/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31180/ |
| Comment by Peter Jones [ 06/Mar/18 ] |
|
Landed for 2.11 |
| Comment by Andrew Perepechko [ 27/Nov/19 ] |
|
This fix seems incomplete. Lustre will still crash a little further with this fix with the same root cause (incomplete osd stats initialization): [ 5696.621248] [<ffffffffc169719b>] record_start_io.part.14+0x2b/0x40 [osd_zfs] [ 5696.623630] [<ffffffffc1698322>] osd_read+0xa2/0x180 [osd_zfs] [ 5696.625807] [<ffffffffc1167dee>] dt_record_read+0x1e/0x70 [obdclass] [ 5696.628063] [<ffffffffc1190997>] lustre_index_restore+0x527/0x1720 [obdclass] [ 5696.637015] [<ffffffffc16b2564>] osd_initial_OI_scrub+0xa34/0xd50 [osd_zfs] [ 5696.639357] [<ffffffffc16b34fd>] osd_scrub_setup+0x9ed/0xb90 [osd_zfs] [ 5696.641585] [<ffffffffc168a97b>] osd_mount+0xf4b/0x1380 [osd_zfs] |