[LU-1554] LBUG when doing system cleanup after clean upgrade from 1.8.8 to 2.3 Created: 21/Jun/12  Updated: 29/Jun/12  Resolved: 29/Jun/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Server: one OST and one MDS upgrade from 1.8.8-RHEL5 to lustre-master-tag-2.2.57-RHEL6
Client: 1.8.8-RHEL5 upgrade to lustre-master-tag-2.2.57-RHEL5
1.8.8-RHEL6 upgrade to lustre-master-tag-2.2.57-RHEL6


Issue Links:
Duplicate
duplicates LU-1534 Test failure on test suite lfsck Resolved
Severity: 3
Rank (Obsolete): 6375

 Description   

Clean upgrade from 1.8.8 to 2.3 successfully, system checking pass(quota, pools, verify data), after that when cleaning the system, MDS hit LBUG and restarted. Here is the console message:

Lustre: DEBUG MARKER: ===== Pass ==================================================================
Lustre: DEBUG MARKER: Using TIMEOUT=20
LNet: 11093:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
LustreError: 8010:0:(osd_internal.h:665:osd_fid2oi()) ASSERTION( !fid_is_igif(fid) ) failed:
LustreError: 8010:0:(osd_internal.h:665:osd_fid2oi()) LBUG
Pid: 8010, comm: mdt_02

Call Trace:
[<ffffffffa03a3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa03a3f17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0e2d505>] osd_oi_delete+0x2e5/0x470 [osd_ldiskfs]
[<ffffffffa0e26154>] osd_object_destroy+0x234/0x420 [osd_ldiskfs]
[<ffffffffa0cf9e80>] mdd_object_kill+0xb0/0x290 [mdd]
[<ffffffffa0d106c9>] mdd_finish_unlink+0x1f9/0x2f0 [mdd]
[<ffffffffa0d16609>] mdd_unlink+0xa09/0xd60 [mdd]
[<ffffffffa064e8f0>] ? ldlm_completion_ast+0x0/0x730 [ptlrpc]
[<ffffffffa0d77a30>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
[<ffffffffa0678294>] ? lustre_msg_get_versions+0xa4/0x120 [ptlrpc]
[<ffffffffa0e7f027>] cml_unlink+0x97/0x200 [cmm]
[<ffffffffa0d93b2f>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
[<ffffffffa0d957b4>] mdt_reint_unlink+0x634/0x9e0 [mdt]
[<ffffffffa0d92b51>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa0d8c3aa>] mdt_reint_internal+0x50a/0x810 [mdt]
[<ffffffffa0d8c6f4>] mdt_reint+0x44/0xe0 [mdt]
[<ffffffffa0d7e2a2>] mdt_handle_common+0x922/0x1740 [mdt]
[<ffffffffa0d7f195>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa06858a2>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc]
[<ffffffffa03a465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa03b4daf>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
[<ffffffffa067e6d2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
[<ffffffff81051ba3>] ? __wake_up+0x53/0x70
[<ffffffffa0686b17>] ptlrpc_main+0x7d7/0x1610 [ptlrpc]
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c140>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 8010, comm: mdt_02 Not tainted 2.6.32-220.17.1.el6_lustre.x86_64 #1
Call Trace:
[<ffffffff814eccea>] ? panic+0x78/0x143
[<ffffffffa03a3f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa0e2d505>] ? osd_oi_delete+0x2e5/0x470 [osd_ldiskfs]
[<ffffffffa0e26154>] ? osd_object_destroy+0x234/0x420 [osd_ldiskfs]
[<ffffffffa0cf9e80>] ? mdd_object_kill+0xb0/0x290 [mdd]
[<ffffffffa0d106c9>] ? mdd_finish_unlink+0x1f9/0x2f0 [mdd]
[<ffffffffa0d16609>] ? mdd_unlink+0xa09/0xd60 [mdd]
[<ffffffffa064e8f0>] ? ldlm_completion_ast+0x0/0x730 [ptlrpc]
[<ffffffffa0d77a30>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
[<ffffffffa0678294>] ? lustre_msg_get_versions+0xa4/0x120 [ptlrpc]
[<ffffffffa0e7f027>] ? cml_unlink+0x97/0x200 [cmm]
[<ffffffffa0d93b2f>] ? mdt_version_get_save+0x8f/0xd0 [mdt]
[<ffffffffa0d957b4>] ? mdt_reint_unlink+0x634/0x9e0 [mdt]
[<ffffffffa0d92b51>] ? mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa0d8c3aa>] ? mdt_reint_internal+0x50a/0x810 [mdt]
[<ffffffffa0d8c6f4>] ? mdt_reint+0x44/0xe0 [mdt]
[<ffffffffa0d7e2a2>] ? mdt_handle_common+0x922/0x1740 [mdt]
[<ffffffffa0d7f195>] ? mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa06858a2>] ? ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc]
[<ffffffffa03a465e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa03b4daf>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
[<ffffffffa067e6d2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
[<ffffffff81051ba3>] ? __wake_up+0x53/0x70
[<ffffffffa0686b17>] ? ptlrpc_main+0x7d7/0x1610 [ptlrpc]
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffffa0686340>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu



 Comments   
Comment by Andreas Dilger [ 22/Jun/12 ]

The "osd_fid2oi() ASSERTION( !fid_is_igif(fid)) failed" error is already being fixed via LU-1534.

Comment by Sarah Liu [ 29/Jun/12 ]

verify it on tag-2.2.58, fixed

Generated at Sat Feb 10 01:17:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.