[LU-4878] fld_server_lookup() ASSERTION( fld->lsf_control_exp ) failed Created: 10/Apr/14 Updated: 15/May/14 Resolved: 15/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Gregoire Pichon | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 13490 | ||||||||||||
| Description |
|
The following LBUG appeared at customer site, during the mount process on all OSS in lustre 2.4.3 version. LustreError: 12838:0:(fld_handler.c:172:fld_server_lookup()) ASSERTION(fld->lsf_control_exp ) failed: LustreError: 12838:0:(fld_handler.c:172:fld_server_lookup()) LBUG Pid: 12838, comm: mount.lustre Call Trace: libcfs_debug_dumpstack+0x55/0x80 [libcfs] lbug_with_loc+0x47/0xb0 [libcfs] fld_server_lookup+0x2f7/0x3d0 [fld] osd_fld_lookup+0x71/0x1d0 [osd_ldiskfs] osd_remote_fid+0x9a/0x280 [osd_ldiskfs] osd_index_ea_lookup+0521/0x850 [osd_ldiskfs] dt_lookup_dir+0x6f/0x130 [obdclass] llog_osd_open+0x485/0xc00 [obdclass] llog_open+0xba/0x2c0 [obdclass] mgc_process_log [mgc] mgc_process_config [mgc] lustre_process_log [obdclass] server_start_targets [obdclass] server_fill_super [obdclass] lustre_fill_super[obdclass] get_sb_nodev lustre_get_sb vfs_kern_mount do_kern_mount do_mount sys_mount system_call_fastpath This issue seems the same as |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 10/Apr/14 ] |
|
Hello Gregoire, |
| Comment by Bob Glossman (Inactive) [ 10/Apr/14 ] |
|
|
| Comment by Antoine Percher [ 10/Apr/14 ] |
|
Yes We meet the same LBUG on all OSS, with or without abort-recov |
| Comment by Gregoire Pichon [ 14/Apr/14 ] |
|
I have tested the patch #9929 posted in gerrit. Unfortunately the OSS still crashes when mounting OST. Here is the stack <3>LustreError: 6577:0:(fld_handler.c:174:fld_server_lookup()) srv-fs_pv-OST0000: lookup 0x7d, but not connects to MDT0yet: rc = -5. <3>LustreError: 6577:0:(osd_handler.c:2135:osd_fld_lookup()) fs_pv-OST0000-osd: cannot find FLD range for 0x7d: rc = -5 <3>LustreError: 6577:0:(osd_handler.c:3344:osd_mdt_seq_exists()) fs_pv-OST0000-osd: Can not lookup fld for 0x7d <0>LustreError: 6577:0:(osd_handler.c:2651:osd_object_ref_del()) ASSERTION( inode->i_nlink > 0 ) failed: <0>LustreError: 6577:0:(osd_handler.c:2651:osd_object_ref_del()) LBUG <4>Pid: 6577, comm: mount.lustre <4> <4>Call Trace: <4> [<ffffffffa0d57895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0d57e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa04197e7>] osd_object_ref_del+0x1e7/0x220 [osd_ldiskfs] <4> [<ffffffffa0ec1fee>] llog_osd_destroy+0x48e/0xb20 [obdclass] <4> [<ffffffffa0e91d61>] llog_destroy+0x51/0x170 [obdclass] <4> [<ffffffffa0e96b34>] llog_erase+0x1c4/0x1e0 [obdclass] <4> [<ffffffffa0e97401>] llog_backup+0x231/0x500 [obdclass] <4> [<ffffffffa049ad66>] mgc_process_log+0x1636/0x18f0 [mgc] <4> [<ffffffffa049c514>] mgc_process_config+0x594/0xed0 [mgc] <4> [<ffffffffa0ede64c>] lustre_process_log+0x25c/0xaa0 [obdclass] <4> [<ffffffffa0f126d3>] server_start_targets+0x1833/0x19c0 [obdclass] <4> [<ffffffffa0f1340c>] server_fill_super+0xbac/0x1660 [obdclass] <4> [<ffffffffa0ee3d68>] lustre_fill_super+0x1d8/0x530 [obdclass] <4> [<ffffffff8118c7df>] get_sb_nodev+0x5f/0xa0 <4> [<ffffffffa0edb3b5>] lustre_get_sb+0x25/0x30 [obdclass] <4> [<ffffffff8118be3b>] vfs_kern_mount+0x7b/0x1b0 <4> [<ffffffff8118bfe2>] do_kern_mount+0x52/0x130 <4> [<ffffffff811acfeb>] do_mount+0x2fb/0x930 <4> [<ffffffff811ad6b0>] sys_mount+0x90/0xe0 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4> |
| Comment by Bruno Faccini (Inactive) [ 15/Apr/14 ] |
|
Hello Gregoire, |
| Comment by Gregoire Pichon [ 15/Apr/14 ] |
|
The LBUG I hit when testing path #9929 has the same stack trace than In my case, I was upgrading from 2.4.2 to 2.4.3 with a few additional patches including #5049 " Patch #7673 " |
| Comment by Bruno Faccini (Inactive) [ 15/Apr/14 ] |
|
You may have missed my previous update that already confirmed what you finally found! |
| Comment by Gregoire Pichon [ 15/Apr/14 ] |
|
Thanks for the backport Bruno. Our comments interleaved ! I have tested a lustre version 2.4.3 with both additional patches
The OSS is able to start without any problem. Filesystem is operational. I am now waiting for these patches to be fully approved and Maloo tested so they can be delivered to the customer. |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/14 ] |
|
Hello Gregoire, |
| Comment by Gregoire Pichon [ 16/Apr/14 ] |
|
Hello Bruno, Actually the patch #5049 " These problems occured in lustre 2.4.x release and need to be addressed. |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/14 ] |
|
Gregoire, don't misunderstand me, I did not mean that you added patches without good reasons to do so, but only that doing so you fall back out from our regression/interop testing process. |
| Comment by Bruno Faccini (Inactive) [ 05/May/14 ] |
|
Hello Gregoire, |
| Comment by Gregoire Pichon [ 06/May/14 ] |
|
Hi Bruno, Yes this ticket can be closed since our tests have shown the issue is fixed with patches #9929 and #9958. |
| Comment by Peter Jones [ 15/May/14 ] |
|
Yes this would be under consideration for 2.4.4. |