[LU-3915] After upgrade from 2.4.0 to 2.5, can not mount OST, (osd_handler.c:2668:osd_object_ref_del()) LBUG Pid: 9537, comm: mount.lustre Created: 09/Sep/13 Updated: 15/Apr/14 Resolved: 24/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Sarah Liu | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, mn4 | ||
| Environment: |
before upgrade, server and client: 2.4.0 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 10328 | ||||||||||||
| Description |
|
After upgrade the server and client to 2.5, when mounting OST, hit following error: Lustre: DEBUG MARKER: == upgrade-downgrade End == 12:43:46 (1378755826) LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 13a-8: Failed to get MGS log params and no local copy. LustreError: 9537:0:(fld_handler.c:150:fld_server_lookup()) srv-lustre-OST0000: lookup 0x54, but not connects to MDT0yet: rc = -5. LustreError: 9537:0:(osd_handler.c:2125:osd_fld_lookup()) lustre-OST0000-osd: cannot find FLD range for 0x54: rc = -5 LustreError: 9537:0:(osd_handler.c:3344:osd_mdt_seq_exists()) lustre-OST0000-osd: Can not lookup fld for 0x54 LustreError: 9537:0:(osd_handler.c:2668:osd_object_ref_del()) ASSERTION( inode->i_nlink > 0 ) failed: LustreError: 9537:0:(osd_handler.c:2668:osd_object_ref_del()) LBUG Pid: 9537, comm: mount.lustre Call Trace: [<ffffffffa044c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa044ce97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0cdd1d7>] osd_object_ref_del+0x1e7/0x220 [osd_ldiskfs] [<ffffffffa0586c0e>] llog_osd_destroy+0x48e/0xb20 [obdclass] [<ffffffffa0558d51>] llog_destroy+0x51/0x170 [obdclass] [<ffffffffa055d8e4>] llog_erase+0x1c4/0x1e0 [obdclass] [<ffffffffa055e141>] llog_backup+0x231/0x500 [obdclass] [<ffffffff81281b10>] ? sprintf+0x40/0x50 [<ffffffffa0d69b79>] mgc_process_log+0x1629/0x18e0 [mgc] [<ffffffffa0d63350>] ? mgc_blocking_ast+0x0/0x800 [mgc] [<ffffffffa070fb90>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] [<ffffffffa0d6b3c2>] mgc_process_config+0x5f2/0x1120 [mgc] [<ffffffffa05a3016>] lustre_process_log+0x256/0xa60 [obdclass] [<ffffffffa0572872>] ? class_name2dev+0x42/0xe0 [obdclass] [<ffffffff81168013>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 [<ffffffffa057291e>] ? class_name2obd+0xe/0x30 [obdclass] [<ffffffffa05d5b67>] server_start_targets+0x1c57/0x1e10 [obdclass] [<ffffffffa05a660b>] ? lustre_start_mgc+0x48b/0x1e10 [obdclass] [<ffffffffa059e5b0>] ? class_config_llog_handler+0x0/0x1880 [obdclass] [<ffffffffa05dab6c>] server_fill_super+0xbac/0x19e4 [obdclass] [<ffffffffa05a8168>] lustre_fill_super+0x1d8/0x530 [obdclass] [<ffffffffa05a7f90>] ? lustre_fill_super+0x0/0x530 [obdclass] [<ffffffff811845af>] get_sb_nodev+0x5f/0xa0 [<ffffffffa059ff35>] lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff81183beb>] vfs_kern_mount+0x7b/0x1b0 [<ffffffff81183d92>] do_kern_mount+0x52/0x130 [<ffffffff811a3f52>] do_mount+0x2d2/0x8d0 [<ffffffff811a45e0>] sys_mount+0x90/0xe0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Message fromKernel panic - not syncing: LBUG Pid: 9537, comm: mount.lustre Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff8150de58>] ? panic+0xa7/0x16f [<ffffffffa044ceeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0cdd1d7>] ? osd_object_ref_del+0x1e7/0x220 [osd_ldiskfs] syslogd@wtm-88 [<ffffffffa0586c0e>] ? llog_osd_destroy+0x48e/0xb20 [obdclass] at Sep 9 12:43: [<ffffffffa0558d51>] ? llog_destroy+0x51/0x170 [obdclass] 57 ... kernel [<ffffffffa055d8e4>] ? llog_erase+0x1c4/0x1e0 [obdclass] :LustreError: 95 [<ffffffffa055e141>] ? llog_backup+0x231/0x500 [obdclass] [<ffffffff81281b10>] ? sprintf+0x40/0x50 [<ffffffffa0d69b79>] ? mgc_process_log+0x1629/0x18e0 [mgc] [<ffffffffa0d63350>] ? mgc_blocking_ast+0x0/0x800 [mgc] 37:0:(osd_handle [<ffffffffa070fb90>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] [<ffffffffa0d6b3c2>] ? mgc_process_config+0x5f2/0x1120 [mgc] r.c:2668:osd_obj [<ffffffffa05a3016>] ? lustre_process_log+0x256/0xa60 [obdclass] ect_ref_del()) A [<ffffffffa0572872>] ? class_name2dev+0x42/0xe0 [obdclass] [<ffffffff81168013>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 SSERTION( inode- [<ffffffffa057291e>] ? class_name2obd+0xe/0x30 [obdclass] >i_nlink > 0 ) f [<ffffffffa05d5b67>] ? server_start_targets+0x1c57/0x1e10 [obdclass] ailed: [<ffffffffa05a660b>] ? lustre_start_mgc+0x48b/0x1e10 [obdclass] [<ffffffffa059e5b0>] ? class_config_llog_handler+0x0/0x1880 [obdclass] [<ffffffffa05dab6c>] ? server_fill_super+0xbac/0x19e4 [obdclass] [<ffffffffa05a8168>] ? lustre_fill_super+0x1d8/0x530 [obdclass] [<ffffffffa05a7f90>] ? lustre_fill_super+0x0/0x530 [obdclass] [<ffffffff811845af>] ? get_sb_nodev+0x5f/0xa0 [<ffffffffa059ff35>] ? lustre_get_sb+0x25/0x30 [obdclass] [<ffffffff81183beb>] ? vfs_kern_mount+0x7b/0x1b0 [<ffffffff81183d92>] ? do_kern_mount+0x52/0x130 [<ffffffff811a3f52>] ? do_mount+0x2d2/0x8d0 [<ffffffff811a45e0>] ? sys_mount+0x90/0xe0 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.32-358.18.1.el6_lustre.x86_64 (jenkins@builder-4-sde1-el6-x8664.lab.whamcloud.com) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Tue Sep 3 04:11:46 PDT 2013 Command line: ro root=UUID=64a1c9cd-640f-46ab-919e-9f5419f9e521 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off memmap=exactmap memmap=574K@4K memmap=134574K@49726K elfcorehdr=184300K memmap=4K$0K memmap=62K$578K memmap=128K$896K memmap=488K#3031140K memmap=68K$3061492K memmap=36K$3061864K memmap=12K$3061924K memmap=8K$3062120K memmap=12K$3062168K memmap=52K$3062372K memmap=16K$3062680K memmap=60K$3063816K memmap=3076K$3064340K memmap=160K$3067448K memmap=2048K$3106676K memmap=1184K$3109628K memmap=656K#3110812K memmap=4K#3111468K memmap=476K#3111472K memmap=4K#3111948K memmap=8K#3111952K memmap=4K#3111960K memmap=4K#3111964K memmap=108K#3111968K memmap=564K#3112076K memmap=294912K$3112960K memmap=4K$4173824K memmap=4K$4174948K memmap=16K$4174960K memmap=4K$4175872K memmap=6016K$4188288K KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls BIOS-provided physical RAM map: |
| Comments |
| Comment by Andreas Dilger [ 10/Sep/13 ] |
|
|
| Comment by Jodi Levi (Inactive) [ 10/Sep/13 ] |
|
Mike, |
| Comment by Jodi Levi (Inactive) [ 10/Sep/13 ] |
|
Sarah, |
| Comment by Sarah Liu [ 12/Sep/13 ] |
|
Here is the patch revert 5049: http://review.whamcloud.com/#/c/7624/ |
| Comment by James A Simmons [ 12/Sep/13 ] |
|
The revert will work. I ran into this problem before. I would recommend not reverting the patch for 2.5 since this breaks support of ZFS on the MGS. The problem is the config log format is now the OSD format on the MGS instead of using fsfilt but a 2.4 MGT will still be in the old config format. If we had a way to convert the format this wouldn't be a problem. The other solution is to rebuild the config log. |
| Comment by Andreas Dilger [ 13/Sep/13 ] |
|
James, could you please clarify what you mean by "config log is now in OSD format" vs. "old config format"? Is this a change in the FID for the llog object, or how the config logs are named, or the name of the devices inside the config logs or what? |
| Comment by Mikhail Pershin [ 16/Sep/13 ] |
|
We could replace assertion with simple check for nlink == 0 and suppose it can be possible case due to upgrade. But I wonder shouldn't lfsck repair such files upon start or that happens too early to be fixed by lfsck? |
| Comment by Mikhail Pershin [ 16/Sep/13 ] |
|
http://review.whamcloud.com/7673 |
| Comment by Sarah Liu [ 16/Sep/13 ] |
|
upgrade from 2.4.1 to build http://build.whamcloud.com/job/lustre-reviews/18150/ which revert patch 5049, the test passed. |
| Comment by James A Simmons [ 17/Sep/13 ] |
|
In the past Andreas I would delete the config logs on the MGS to get around the issue of not being able to mount a file system built before |
| Comment by Peter Jones [ 24/Sep/13 ] |
|
Landed for 2.5.0 |