[LU-7297] BUG: spinlock bad magic, probably on oh->oh_lock Created: 14/Oct/15 Updated: 04/Dec/15 Resolved: 04/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Olaf Faaland | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl, patch | ||
| Environment: |
Linux kernel 2.6.32-504.16.2.1chaos.ch5.3.x86_64.debug |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
After building lustre, attempted to run in-tree on a single node. Ran FSTYPE=zfs llmount.sh mkfs.lustre completed without errors. Lustre: Lustre: Build Version: v2_7_54_0-g99fd511-CHANGED-2.6.32-504.16.2.1chaos.ch5.3.x86_64.debug LNet: Added LNI 192.168.122.12@tcp [8/256/0/180] LNet: Accept secure, port 988 Lustre: Echo OBD driver; http://www.lustre.org/ zpool used greatest stack depth: 2928 bytes left txg_sync used greatest stack depth: 2872 bytes left BUG: spinlock bad magic on CPU#0, mount.lustre/4152 (Not tainted) lock: ffff88002713a770, .magic: 00000000, .owner: mount.lustre/4152, .owner_cpu: 0 Pid: 4152, comm: mount.lustre Not tainted 2.6.32-504.16.2.1chaos.ch5.3.x86_64.debug #1 Call Trace: [<ffffffff812c3d2a>] ? spin_bug+0xaa/0x100 [<ffffffff812c3de5>] ? _raw_spin_unlock+0x65/0xa0 [<ffffffff81561a4b>] ? _spin_unlock+0x2b/0x40 [<ffffffffa0a8d32c>] ? lprocfs_oh_tally+0x3c/0x50 [obdclass] [<ffffffffa0e54319>] ? record_start_io+0x39/0x90 [osd_zfs] [<ffffffffa0e5601d>] ? osd_write+0x1ad/0x3a0 [osd_zfs] [<ffffffffa0ab747d>] ? dt_record_write+0x3d/0x130 [obdclass] [<ffffffffa0a97895>] ? local_oid_storage_init+0xe55/0x1410 [obdclass] [<ffffffffa11226a4>] ? mgs_fs_setup+0xa4/0x4b0 [mgs] [<ffffffff8156190b>] ? _read_unlock+0x2b/0x40 [<ffffffffa1121aaf>] ? mgs_init0+0xeff/0x17c0 [mgs] [<ffffffff8118f215>] ? kmem_cache_alloc_trace+0x1c5/0x2e0 [<ffffffff81545b30>] ? kmemleak_alloc+0x20/0xd0 [<ffffffffa111a399>] ? mgs_type_start+0x19/0x20 [mgs] [<ffffffffa1122480>] ? mgs_device_alloc+0x110/0x1f0 [mgs] [<ffffffffa0a9d19f>] ? obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa0a9d477>] ? class_setup+0x207/0x870 [obdclass] [<ffffffffa0aa4bfc>] ? class_process_config+0x113c/0x2710 [obdclass] [<ffffffff8118c983>] ? cache_alloc_debugcheck_after+0xf3/0x230 [<ffffffff81545b30>] ? kmemleak_alloc+0x20/0xd0 [<ffffffff8118ffdb>] ? __kmalloc+0x21b/0x330 [<ffffffffa0aaaf98>] ? do_lcfg+0x198/0x9c0 [obdclass] [<ffffffffa0aab422>] ? do_lcfg+0x622/0x9c0 [obdclass] [<ffffffffa0aab854>] ? lustre_start_simple+0x94/0x200 [obdclass] [<ffffffffa0ae0ae1>] ? server_fill_super+0x1161/0x1690 [obdclass] [<ffffffffa0ab0c58>] ? lustre_fill_super+0x5d8/0xa80 [obdclass] [<ffffffffa0ab0680>] ? lustre_fill_super+0x0/0xa80 [obdclass] [<ffffffff811af06f>] ? get_sb_nodev+0x5f/0xa0 [<ffffffffa0aa8345>] ? lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff811ae61b>] ? vfs_kern_mount+0x7b/0x1b0 [<ffffffff811ae7c2>] ? do_kern_mount+0x52/0x130 [<ffffffff811d0b0b>] ? do_mount+0x2fb/0x920 [<ffffffff811d11c0>] ? sys_mount+0x90/0xe0 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Lustre: Setting parameter lustre-MDT0000.mdt.identity_upcall in log lustre-MDT0000 Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space Lustre: lustre-MDT0000: new disk, initializing mount.lustre used greatest stack depth: 2536 bytes left Code that triggered the BUG: void lprocfs_oh_tally(struct obd_histogram *oh, unsigned int value)
{
if (value >= OBD_HIST_MAX)
value = OBD_HIST_MAX - 1;
spin_lock(&oh->oh_lock);
oh->oh_buckets[value]++;
spin_unlock(&oh->oh_lock);
}
|
| Comments |
| Comment by Olaf Faaland [ 14/Oct/15 ] |
|
Lustre involved is: * 99fd511 LU-6765 obdecho: initialize cs_lu.ls_purge_mutex * 4a604d2 LU-6816 utils: remove libzfs_load_module() call * 4dbb82b LU-6747 osd-zfs: initialize obd_statfs in osd_statfs() * d20d17e New tag 2.7.54 * 2b5ebbb LU-6599 header: Change erroneous GPLv3 header to GPLv2 * c6aab2c LU-6068 misc: update old URLs to hpdd.intel.com * 8badb39 LU-6389 llite: restart short read/write for normal IO I will check with current master and report back in a comment whether the issue has already been resolved and ticket can be closed, or not. |
| Comment by Olaf Faaland [ 14/Oct/15 ] |
|
Running under a kernel with lock debug features enabled. |
| Comment by Olaf Faaland [ 14/Oct/15 ] |
|
Rebuild from current master and problem does not occur. False alarm, sorry. Ticket should be closed. |
| Comment by Joseph Gmitter (Inactive) [ 14/Oct/15 ] |
|
Thanks Olaf, we will close the ticket. |
| Comment by Olaf Faaland [ 14/Oct/15 ] |
|
I find it does occur with current master. Kernel doesn't report it every time. The BUG report that occurs with current master Lustre is BUG: spinlock bad magic on CPU#0, mount.lustre/4407 (Not tainted) lock: ffff880028996770, .magic: 00000000, .owner: mount.lustre/4407, .owner_cpu: 0 Pid: 4407, comm: mount.lustre Not tainted 2.6.32-504.16.2.1chaos.ch5.3.x86_64.debug #1 Call Trace: [<ffffffff812c3d2a>] ? spin_bug+0xaa/0x100 [<ffffffff812c3de5>] ? _raw_spin_unlock+0x65/0xa0 [<ffffffff81561a4b>] ? _spin_unlock+0x2b/0x40 [<ffffffffa0a8f41c>] ? lprocfs_oh_tally+0x3c/0x50 [obdclass] [<ffffffffa0e8e869>] ? record_start_io+0x39/0x90 [osd_zfs] [<ffffffffa0e908ad>] ? osd_write+0x1ad/0x3a0 [osd_zfs] [<ffffffffa0aba6bd>] ? dt_record_write+0x3d/0x130 [obdclass] [<ffffffffa0a99f34>] ? local_oid_storage_init+0xeb4/0x14a0 [obdclass] [<ffffffffa11495c4>] ? mgs_fs_setup+0xa4/0x4b0 [mgs] [<ffffffff8156190b>] ? _read_unlock+0x2b/0x40 [<ffffffffa1148a8f>] ? mgs_init0+0xecf/0x1790 [mgs] [<ffffffff81545b30>] ? kmemleak_alloc+0x20/0xd0 [<ffffffffa11416b9>] ? mgs_type_start+0x19/0x20 [mgs] [<ffffffffa11493e8>] ? mgs_device_alloc+0x98/0x140 [mgs] [<ffffffffa0a9f42f>] ? obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa0a9f758>] ? class_setup+0x258/0x930 [obdclass] [<ffffffffa0aa6011>] ? class_process_config+0x1151/0x26d0 [obdclass] [<ffffffff8118c983>] ? cache_alloc_debugcheck_after+0xf3/0x230 [<ffffffff81545b30>] ? kmemleak_alloc+0x20/0xd0 [<ffffffff8118ffdb>] ? __kmalloc+0x21b/0x330 [<ffffffffa0ab1198>] ? do_lcfg+0x198/0xb60 [obdclass] [<ffffffffa0ab12cb>] ? do_lcfg+0x2cb/0xb60 [obdclass] [<ffffffffa0ab1bf4>] ? lustre_start_simple+0x94/0x200 [obdclass] [<ffffffffa0ae206e>] ? server_fill_super+0x115e/0x1688 [obdclass] [<ffffffffa0ab3e28>] ? lustre_fill_super+0x338/0x990 [obdclass] [<ffffffffa0ab3af0>] ? lustre_fill_super+0x0/0x990 [obdclass] [<ffffffff811af06f>] ? get_sb_nodev+0x5f/0xa0 [<ffffffffa0aab195>] ? lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff811ae61b>] ? vfs_kern_mount+0x7b/0x1b0 [<ffffffff811ae7c2>] ? do_kern_mount+0x52/0x130 [<ffffffff811d0b0b>] ? do_mount+0x2fb/0x920 [<ffffffff811d11c0>] ? sys_mount+0x90/0xe0 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b record_start_io() calls lprocfs_oh_tally with osd->od_brw_stats.hist. [faaland1@hefe branch:follow_master lustre] $git grep 'spin_lock_init.*oh_lock' lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_read_rpc_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_write_rpc_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_read_page_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_write_page_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_read_offset_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_write_offset_hist.oh_lock); lustre/ldlm/ldlm_lib.c: spin_lock_init(&cli->cl_mod_rpcs_hist.oh_lock); lustre/mdt/mdt_lproc.c: spin_lock_init(&mdt->mdt_rename_stats.hist[i].oh_lock); lustre/osd-ldiskfs/osd_lproc.c: spin_lock_init(&osd->od_brw_stats.hist[i].oh_lock); The ldiskfs OSD initializes the locks via But the zfs OSD returns from osd_device_init() immediately without doing anything. It's not clear to me where it should be initialized, if that's not the place. |
| Comment by Joseph Gmitter (Inactive) [ 15/Oct/15 ] |
|
Hi Emoly, |
| Comment by Gerrit Updater [ 22/Oct/15 ] |
|
Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: http://review.whamcloud.com/16919 |
| Comment by Olaf Faaland [ 22/Oct/15 ] |
|
Somehow I'd overlooked osd-zfs/osd_lproc.c:osd_stats_init(). Patch submitted. |
| Comment by Gerrit Updater [ 04/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16919/ |
| Comment by Joseph Gmitter (Inactive) [ 04/Dec/15 ] |
|
Landed for 2.8 |