[LU-3126] conf-sanity test_41b: fld_server_lookup()) ASSERTION( fld->lsf_control_exp ) failed Created: 08/Apr/13  Updated: 15/Apr/14  Resolved: 26/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: mn4, yuc2

Issue Links:
Related
is related to LU-4878 fld_server_lookup() ASSERTION( fld->l... Resolved
is related to LU-3582 Runtests failed: old and new files ar... Resolved
Severity: 3
Rank (Obsolete): 7596

 Description   

This issue was created by maloo for bfaccini <bruno.faccini@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/5963ea2c-9e4a-11e2-9d68-52540035b04c.

The sub-test test_41b failed with the following error:

test failed to respond and timed out

Info required for matching: conf-sanity 41b

An LBUG occured on the OSS side :

09:23:10:LustreError: 19905:0:(fld_handler.c:173:fld_server_lookup()) ASSERTION( fld->lsf_control_exp ) failed: 
09:23:10:LustreError: 19905:0:(fld_handler.c:173:fld_server_lookup()) LBUG
09:23:10:Pid: 19905, comm: ll_ost00_001
09:23:10:
09:23:10:Call Trace:
09:23:13: [<ffffffffa04d7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
09:23:13: [<ffffffffa04d7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
09:23:13: [<ffffffffa09bfe0f>] fld_server_lookup+0x2df/0x3b0 [fld]
09:23:13: [<ffffffffa0d3b53e>] osd_fld_lookup+0xae/0x1e0 [osd_ldiskfs]
09:23:13: [<ffffffffa0d4e902>] fid_is_on_ost+0x102/0x3b0 [osd_ldiskfs]
09:23:13: [<ffffffffa0d5083a>] osd_oi_lookup+0xca/0x150 [osd_ldiskfs]
09:23:13: [<ffffffffa0d4c810>] osd_object_init+0x4c0/0xa40 [osd_ldiskfs]
09:23:13: [<ffffffffa066557d>] lu_object_alloc+0xcd/0x300 [obdclass]
09:23:13: [<ffffffffa06658f9>] ? htable_lookup+0x119/0x1c0 [obdclass]
09:23:13: [<ffffffffa06660e5>] lu_object_find_at+0x205/0x360 [obdclass]
09:23:13: [<ffffffffa0666256>] lu_object_find+0x16/0x20 [obdclass]
09:23:13: [<ffffffffa0e25f15>] ofd_object_find+0x35/0xf0 [ofd]
09:23:13: [<ffffffffa0e27423>] ofd_precreate_objects+0x1d3/0x1360 [ofd]
09:23:13: [<ffffffffa04e2d88>] ? libcfs_log_return+0x28/0x40 [libcfs]
09:23:14: [<ffffffffa0e1cf72>] ofd_create+0x322/0x1470 [ofd]
09:23:14: [<ffffffffa07e7ff5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
09:23:14: [<ffffffffa0df247c>] ost_handle+0x356c/0x46f0 [ost]
09:23:14: [<ffffffffa04e40e4>] ? libcfs_id2str+0x74/0xb0 [libcfs]
09:23:14: [<ffffffffa07f91dc>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
09:23:14: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
09:23:14: [<ffffffffa07f0819>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
09:23:15: [<ffffffffa04e82c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
09:23:15: [<ffffffff81052223>] ? __wake_up+0x53/0x70
09:23:16: [<ffffffffa07fa725>] ptlrpc_main+0xb75/0x1870 [ptlrpc]
09:23:16: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:16: [<ffffffff8100c0ca>] child_rip+0xa/0x20
09:23:16: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:16: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:17: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
09:23:18:
09:23:18:Kernel panic - not syncing: LBUG
09:23:18:Pid: 19905, comm: ll_ost00_001 Not tainted 2.6.32-279.19.1.el6_lustre.gc4681d8.x86_64 #1
09:23:18:Call Trace:
09:23:18: [<ffffffff814e9811>] ? panic+0xa0/0x168
09:23:18: [<ffffffffa04d7eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
09:23:18: [<ffffffffa09bfe0f>] ? fld_server_lookup+0x2df/0x3b0 [fld]
09:23:18: [<ffffffffa0d3b53e>] ? osd_fld_lookup+0xae/0x1e0 [osd_ldiskfs]
09:23:18: [<ffffffffa0d4e902>] ? fid_is_on_ost+0x102/0x3b0 [osd_ldiskfs]
09:23:18: [<ffffffffa0d5083a>] ? osd_oi_lookup+0xca/0x150 [osd_ldiskfs]
09:23:18: [<ffffffffa0d4c810>] ? osd_object_init+0x4c0/0xa40 [osd_ldiskfs]
09:23:18: [<ffffffffa066557d>] ? lu_object_alloc+0xcd/0x300 [obdclass]
09:23:18: [<ffffffffa06658f9>] ? htable_lookup+0x119/0x1c0 [obdclass]
09:23:19: [<ffffffffa06660e5>] ? lu_object_find_at+0x205/0x360 [obdclass]
09:23:19: [<ffffffffa0666256>] ? lu_object_find+0x16/0x20 [obdclass]
09:23:19: [<ffffffffa0e25f15>] ? ofd_object_find+0x35/0xf0 [ofd]
09:23:20: [<ffffffffa0e27423>] ? ofd_precreate_objects+0x1d3/0x1360 [ofd]
09:23:20: [<ffffffffa04e2d88>] ? libcfs_log_return+0x28/0x40 [libcfs]
09:23:20: [<ffffffffa0e1cf72>] ? ofd_create+0x322/0x1470 [ofd]
09:23:21: [<ffffffffa07e7ff5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
09:23:21: [<ffffffffa0df247c>] ? ost_handle+0x356c/0x46f0 [ost]
09:23:21: [<ffffffffa04e40e4>] ? libcfs_id2str+0x74/0xb0 [libcfs]
09:23:22: [<ffffffffa07f91dc>] ? ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
09:23:23: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
09:23:23: [<ffffffffa07f0819>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
09:23:23: [<ffffffffa04e82c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
09:23:23: [<ffffffff81052223>] ? __wake_up+0x53/0x70
09:23:23: [<ffffffffa07fa725>] ? ptlrpc_main+0xb75/0x1870 [ptlrpc]
09:23:23: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:23: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
09:23:23: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:23: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
09:23:23: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Li Wei (Inactive) [ 15/Apr/13 ]

https://maloo.whamcloud.com/test_sets/d29c0fe6-a18d-11e2-8fc0-52540035b04c

Comment by John Hammond [ 19/Jul/13 ]

https://maloo.whamcloud.com/test_logs/e7567198-f024-11e2-b957-52540035b04c

Comment by Sarah Liu [ 31/Jul/13 ]

Hit this issue when running conf-sanity test_32a with 2.4-ldiskfs image:

https://maloo.whamcloud.com/test_sets/8feeb8cc-f9c3-11e2-aee1-52540035b04c

Comment by Sarah Liu [ 01/Aug/13 ]

Hit this error when upgrading 2.4.0 to 2.5

Lustre: DEBUG MARKER: == upgrade-downgrade End == 15:07:57 (1375394877)
LDISKFS-fs (sdc1): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 7916:0:(fld_handler.c:147:fld_server_lookup()) ASSERTION( fld->lsf_control_exp ) failed: 
LustreError: 7916:0:(fld_handler.c:147:fld_server_lookup()) LBUG
Pid: 7916, comm: mount.lustre

Call Trace:
 [<ffffffffa0375895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0375e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa083fe0f>] fld_server_lookup+0x2ef/0x3d0 [fld]
 [<ffffffff8119c88e>] ? generic_detach_inode+0x18e/0x1f0
 [<ffffffffa0c693d1>] osd_fld_lookup+0x71/0x1d0 [osd_ldiskfs]
 [<ffffffff8119c6f2>] ? iput+0x62/0x70
 [<ffffffffa0c695ca>] osd_remote_fid+0x9a/0x280 [osd_ldiskfs]
 [<ffffffffa0c75621>] osd_index_ea_lookup+0x521/0x850 [osd_ldiskfs]
 [<ffffffffa04d282f>] dt_lookup_dir+0x6f/0x130 [obdclass]
 [<ffffffffa04b0fb5>] llog_osd_open+0x475/0xbb0 [obdclass]
 [<ffffffffa047d31a>] llog_open+0xba/0x2c0 [obdclass]
 [<ffffffffa0480f71>] llog_backup+0x61/0x500 [obdclass]
 [<ffffffff81281860>] ? sprintf+0x40/0x50
 [<ffffffffa0cf9702>] mgc_process_log+0x1192/0x18e0 [mgc]
 [<ffffffffa0cf3370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
 [<ffffffffa0633c40>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
 [<ffffffffa0cfb2e4>] mgc_process_config+0x594/0xed0 [mgc]
 [<ffffffffa04c6776>] lustre_process_log+0x256/0xaa0 [obdclass]
 [<ffffffffa0495972>] ? class_name2dev+0x42/0xe0 [obdclass]
 [<ffffffff81167d83>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
 [<ffffffffa0495a1e>] ? class_name2obd+0xe/0x30 [obdclass]
 [<ffffffffa04fa641>] server_start_targets+0x1821/0x1a40 [obdclass]
 [<ffffffffa04c9db3>] ? lustre_start_mgc+0x493/0x1e90 [obdclass]
 [<ffffffffa04c1ca0>] ? class_config_llog_handler+0x0/0x1880 [obdclass]
 [<ffffffffa04fe1fc>] server_fill_super+0xbbc/0x1a24 [obdclass]
 [<ffffffffa04cb988>] lustre_fill_super+0x1d8/0x530 [obdclass]
 [<ffffffffa04cb7b0>] ? lustre_fill_super+0x0/0x530 [obdclass]
 [<ffffffff8118431f>] get_sb_nodev+0x5f/0xa0
 [<ffffffffa04c3625>] lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118395b>] vfs_kern_mount+0x7b/0x1b0
 [<ffffffff81183b02>] do_kern_mount+0x52/0x130
 [<ffffffff811a3d32>] do_mount+0x2d2/0x8d0
 [<ffffffff811a43c0>] sys_mount+0x90/0xe0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Kernel panic - not syncing: LBUG
Pid: 7916, comm: mount.lustre Not tainted 2.6.32-358.11.1.el6_lustre.g55605c6.x86_64 #1
Call Trace:
 [<ffffffff8150d938>] ? panic+0xa7/0x16f
 [<ffffffffa0375eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa083fe0f>] ? fld_server_lookup+0x2ef/0x3d0 [fld]
 [<ffffffff8119c88e>] ? generic_detach_inode+0x18e/0x1f0
 [<ffffffffa0c693d1>] ? osd_fld_lookup+0x71/0x1d0 [osd_ldiskfs]
 [<ffffffff8119c6f2>] ? iput+0x62/0x70
 [<ffffffffa0c695ca>] ? osd_remote_fid+0x9a/0x280 [osd_ldiskfs]
 [<ffffffffa0c75621>] ? osd_index_ea_lookup+0x521/0x850 [osd_ldiskfs]
 [<ffffffffa04d282f>] ? dt_lookup_dir+0x6f/0x130 [obdclass]
 [<ffffffffa04b0fb5>] ? llog_osd_open+0x475/0xbb0 [obdclass]
 [<ffffffffa047d31a>] ? llog_open+0xba/0x2c0 [obdclass]
 [<ffffffffa0480f71>] ? llog_backup+0x61/0x500 [obdclass]
 [<ffffffff81281860>] ? sprintf+0x40/0x50
 [<ffffffffa0cf9702>] ? mgc_process_log+0x1192/0x18e0 [mgc]
 [<ffffffffa0cf3370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
 [<ffffffffa0633c40>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
 [<ffffffffa0cfb2e4>] ? mgc_process_config+0x594/0xed0 [mgc]
 [<ffffffffa04c6776>] ? lustre_process_log+0x256/0xaa0 [obdclass]
 [<ffffffffa0495972>] ? class_name2dev+0x42/0xe0 [obdclass]
 [<ffffffff81167d83>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
 [<ffffffffa0495a1e>] ? class_name2obd+0xe/0x30 [obdclass]
 [<ffffffffa04fa641>] ? server_start_targets+0x1821/0x1a40 [obdclass]
 [<ffffffffa04c9db3>] ? lustre_start_mgc+0x493/0x1e90 [obdclass]
 [<ffffffffa04c1ca0>] ? class_config_llog_handler+0x0/0x1880 [obdclass]
 [<ffffffffa04fe1fc>] ? server_fill_super+0xbbc/0x1a24 [obdclass]
 [<ffffffffa04cb988>] ? lustre_fill_super+0x1d8/0x530 [obdclass]
 [<ffffffffa04cb7b0>] ? lustre_fill_super+0x0/0x530 [obdclass]
 [<ffffffff8118431f>] ? get_sb_nodev+0x5f/0xa0
 [<ffffffffa04c3625>] ? lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118395b>] ? vfs_kern_mount+0x7b/0x1b0
 [<ffffffff81183b02>] ? do_kern_mount+0x52/0x130
 [<ffffffff811a3d32>] ? do_mount+0x2d2/0x8d0
 [<ffffffff811a43c0>] ? sys_mount+0x90/0xe0
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Comment by Cliff White (Inactive) [ 07/Aug/13 ]

Hit this error when attempting to mount a new filesystem on OSS - Hyperion

2013-08-06 13:50:42 Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel-1594-gbdf591f-PRISTINE-2.6.32-358.11.1.el6_lustre.gbdf591f.x86_64
2013-08-06 13:50:43 LDISKFS-fs warning (device sdc): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait.
2013-08-06 13:50:43
2013-08-06 13:51:40 LDISKFS-fs (sdc): recovery complete
2013-08-06 13:51:40 LDISKFS-fs (sdc): mounted filesystem with ordered data mode. quota=on. Opts:
2013-08-06 13:51:41 LustreError: 4864:0:(fld_handler.c:147:fld_server_lookup()) ASSERTION( fld->lsf_control_exp ) failed:
2013-08-06 13:51:41 LustreError: 4864:0:(fld_handler.c:147:fld_server_lookup()) LBUG
2013-08-06 13:51:41 Pid: 4864, comm: mount.lustre
2013-08-06 13:51:41
2013-08-06 13:51:41 Call Trace:
2013-08-06 13:51:41  [<ffffffffa04d0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-08-06 13:51:41  [<ffffffffa04d0e97>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-08-06 13:51:41  [<ffffffffa0b4ee0f>] fld_server_lookup+0x2ef/0x3d0 [fld]
2013-08-06 13:51:41  [<ffffffff8119c88e>] ? generic_detach_inode+0x18e/0x1f0
2013-08-06 13:51:41  [<ffffffffa0f7a3d1>] osd_fld_lookup+0x71/0x1d0 [osd_ldiskfs]
2013-08-06 13:51:41  [<ffffffff8119c6f2>] ? iput+0x62/0x70
2013-08-06 13:51:41  [<ffffffffa0f7a5ca>] osd_remote_fid+0x9a/0x280 [osd_ldiskfs]
2013-08-06 13:51:41  [<ffffffffa0f86621>] osd_index_ea_lookup+0x521/0x850 [osd_ldiskfs]
2013-08-06 13:51:41  [<ffffffffa081982f>] dt_lookup_dir+0x6f/0x130 [obdclass]
2013-08-06 13:51:41  [<ffffffffa07f7fb5>] llog_osd_open+0x475/0xbb0 [obdclass]
2013-08-06 13:51:41  [<ffffffffa07c431a>] llog_open+0xba/0x2c0 [obdclass]
2013-08-06 13:51:41  [<ffffffffa07c7f71>] llog_backup+0x61/0x500 [obdclass]
2013-08-06 13:51:41  [<ffffffff81281860>] ? sprintf+0x40/0x50
2013-08-06 13:51:41  [<ffffffffa1005702>] mgc_process_log+0x1192/0x18e0 [mgc]
2013-08-06 13:51:41  [<ffffffffa0fff370>] ? mgc_blocking_ast+0x0/0x800 [mgc]
2013-08-06 13:51:41  [<ffffffffa097ac40>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
2013-08-06 13:51:41  [<ffffffffa10072e4>] mgc_process_config+0x594/0xed0 [mgc]
2013-08-06 13:51:41  [<ffffffffa080d776>] lustre_process_log+0x256/0xaa0 [obdclass]
2013-08-06 13:51:41  [<ffffffffa07dc972>] ? class_name2dev+0x42/0xe0 [obdclass]
2013-08-06 13:51:41  [<ffffffff81167d83>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
2013-08-06 13:51:41  [<ffffffffa07dca1e>] ? class_name2obd+0xe/0x30 [obdclass]
2013-08-06 13:51:41  [<ffffffffa0841641>] server_start_targets+0x1821/0x1a40 [obdclass]
2013-08-06 13:51:41  [<ffffffffa0810db3>] ? lustre_start_mgc+0x493/0x1e90 [obdclass]
2013-08-06 13:51:41  [<ffffffffa0808ca0>] ? class_config_llog_handler+0x0/0x1880 [obdclass]
2013-08-06 13:51:41  [<ffffffffa08451fc>] server_fill_super+0xbbc/0x1a24 [obdclass]
2013-08-06 13:51:41  [<ffffffffa0812988>] lustre_fill_super+0x1d8/0x530 [obdclass]
2013-08-06 13:51:41  [<ffffffffa08127b0>] ? lustre_fill_super+0x0/0x530 [obdclass]
2013-08-06 13:51:41  [<ffffffff8118431f>] get_sb_nodev+0x5f/0xa0
2013-08-06 13:51:41  [<ffffffffa080a625>] lustre_get_sb+0x25/0x30 [obdclass]
2013-08-06 13:51:41  [<ffffffff8118395b>] vfs_kern_mount+0x7b/0x1b0
2013-08-06 13:51:41  [<ffffffff81183b02>] do_kern_mount+0x52/0x130
2013-08-06 13:51:41  [<ffffffff811a3d32>] do_mount+0x2d2/0x8d0
2013-08-06 13:51:41  [<ffffffff811a43c0>] sys_mount+0x90/0xe0
2013-08-06 13:51:41  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
2013-08-06 13:51:41
2013-08-06 13:51:41 Aug  6 13:51:41 Kernel panic - not syncing: LBUG

Version is 2.4.53

Comment by Di Wang [ 07/Aug/13 ]

http://review.whamcloud.com/7266

Comment by Andreas Dilger [ 23/Aug/13 ]

We probably also need to land this patch for 2.4.1.

Comment by Peter Jones [ 26/Aug/13 ]

Landed for 2.5. Will track landing on b2_4 separately

Comment by Gregoire Pichon [ 19/Sep/13 ]

Peter,
Has this been landed in 2.4 ? What is the gerrit patch ?

thanks.

Comment by Peter Jones [ 19/Sep/13 ]

Hi Gregoire

There is no gerrit patch at present. Presently I would expect this to be taken care of when we start working on 2.4.2 in the coming weeks, but this could always be expedited if need be.

Peter

Generated at Sat Feb 10 01:31:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.