[LU-211] lu_ref revisited on 2.1 Created: 13/Apr/11  Updated: 05/May/11  Resolved: 05/May/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Josephine Palencia Assignee: Robert Read (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

First encountered on
-lustre2.0
-release: 1.10.0.32
-kernel: 2.6.18-164.1.11 x86_64
-all physical machines
-krb5p/null

Crashes OSS
Patched with lu_ref_fixes.patch, lock_errors_fix.patch
Workable, stable after patches installed.

Reference: bugzilla.lustre.org : 23428, 24403

Encountered again
-lustre 2.1
-release 2.0.59
-kernel: 2.6.18-194.3.1 (mds, oss)
-kernel: 2.6.18-194.32.1.el5xen x86_64 (VM client)
-krb5p/null

Crashes the VM client only with mkdir on fs


Severity: 3
Rank (Obsolete): 10352

 Description   

Lustre: 2089:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGC128.182.112.60@tcp88->MGC128.182.112.60@tcp88_0 netid 20058: select flavor null
Lustre: MGC128.182.112.60@tcp88: Reactivating import
Lustre: 2089:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import jwan2-MDT0000-mdc-ffff8800
3d085c00->128.182.112.60@tcp88 netid 20058: select flavor null
Lustre: Server jwan2-OST0000_UUID version (1.10.0.32) is much older. Consider upgrading this client (2.0.59)
Lustre: Client jwan2-client has started

01:35:43:root@goldeneye: ~]# LustreError: 11-0: an error occurred while communicating wi
th 128.182.112.60@tcp88. The mds_getxattr operation failed with -95
LustreError: 2119:0:(lu_ref.c:116:lu_ref_print()) lu_ref: ffff88003059c1b8 27 0 ldlm_res
ource_new:994
LustreError: 2119:0:(lu_ref.c:118:lu_ref_print()) link: ldlm_res_hop_get_locked fff
f88003f83b040
LustreError: 2119:0:(lu_ref.c:160:lu_ref_fini()) ASSERTION(0) failed
LustreError: 2119:0:(lu_ref.c:160:lu_ref_fini()) LBUG
Pid: 2119, comm: mkdir

Call Trace:
[<ffffffff8854a641>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
[<ffffffff8854ab7a>] lbug_with_loc+0x7a/0xd0 [libcfs]
[<ffffffff88555810>] cfs_tracefile_init+0x0/0x10a [libcfs]
[<ffffffff8864fffd>] lu_ref_fini+0x3d/0xc0 [obdclass]
[<ffffffff886c8810>] __ldlm_resource_putref_final+0xc0/0x100 [ptlrpc]
[<ffffffff886c8bc3>] ldlm_resource_putref+0x143/0x240 [ptlrpc]
[<ffffffff80207112>] kmem_cache_free+0x80/0xd3
[<ffffffff886c3bc8>] ldlm_lock_put+0x1b8/0x420 [ptlrpc]
[<ffffffff886dc5af>] ldlm_cli_cancel_list+0x27f/0x370 [ptlrpc]
[<ffffffff886f7958>] ptlrpc_request_bufs_pack+0x58/0x80 [ptlrpc]
[<ffffffff886dd8d4>] ldlm_prep_elc_req+0x3e4/0x530 [ptlrpc]
[<ffffffff802d0b9b>] __kmalloc+0x8f/0x9f
[<ffffffff887312c9>] req_capsule_init+0x99/0x100 [ptlrpc]
[<ffffffff888320dc>] mdc_prep_elc_req+0x1c/0x30 [mdc]
[<ffffffff8883291f>] mdc_create+0x3bf/0x5d0 [mdc]
[<ffffffff88a03c22>] lmv_create+0x742/0xa70 [lmv]
[<ffffffff8854e03d>] cfs_curproc_cap_pack+0x1d/0x30 [libcfs]
[<ffffffff88961dc6>] ll_prep_md_op_data+0x376/0x3d0 [lustre]
[<ffffffff8897e393>] ll_new_node+0x4d3/0x790 [lustre]
[<ffffffff8897e773>] ll_mkdir+0x123/0x1b0 [lustre]
[<ffffffff802dabac>] vfs_mkdir+0xe1/0x150
[<ffffffff802db079>] sys_mkdirat+0xa3/0xe4
[<ffffffff80260295>] tracesys+0x47/0xb6
[<ffffffff802602f9>] tracesys+0xab/0xb6

Kernel panic - not syncing: LBUG
<1>LustreError: dumping log to /tmp/lustre-log.1302673005.2119
BUG: warning at arch/x86_64/kernel/genapic_xen.c:92/xen_send_IPI_mask() (Tainted: G
)

Call Trace:
[<ffffffff80274cc2>] xen_send_IPI_mask+0x51/0xaa
[<ffffffff802745ed>] smp_send_reschedule+0x4b/0x50
[<ffffffff802887df>] enqueue_task+0x41/0x56
[<ffffffff80248bc5>] try_to_wake_up+0x309/0x3a4
[<ffffffff80263845>] __wait_on_bit+0x60/0x6e
[<ffffffff8029c498>] autoremove_wake_function+0x9/0x2e
[<ffffffff8028767b>] __wake_up_common+0x3e/0x68
[<ffffffff8022f10c>] __wake_up+0x38/0x4f
[<ffffffff8803447a>] :jbd:journal_commit_transaction+0x104c/0x106a
[<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
[<ffffffff8023f7d2>] lock_timer_base+0x1b/0x3c
[<ffffffff880375a3>] :jbd:kjournald+0xc1/0x213
[<ffffffff8029c48f>] autoremove_wake_function+0x0/0x2e
[<ffffffff8029c277>] keventd_create_kthread+0x0/0xc4
[<ffffffff880374e2>] :jbd:kjournald+0x0/0x213
[<ffffffff8029c277>] keventd_create_kthread+0x0/0xc4
[<ffffffff80233c56>] kthread+0xfe/0x132
[<ffffffff80260b2c>] child_rip+0xa/0x12
[<ffffffff8029c277>] keventd_create_kthread+0x0/0xc4
[<ffffffff80233b58>] kthread+0x0/0x132
[<ffffffff80260b22>] child_rip+0x0/0x12



 Comments   
Comment by Peter Jones [ 19/Apr/11 ]

Josephine

Do we understand correctly that this only occurs with kerberos options enabled?

Peter

Comment by Josephine Palencia [ 19/Apr/11 ]

Peter,

As per logs, the kerberos option was set to NULL so kerberos
functionality/feature was disabled...

But if you're asking if during build, --enable-gss is used, then yes.

josephine

Comment by Josephine Palencia [ 03/May/11 ]

I changed/upgraded to kernel 2.6.18-194.11.3 and it's been stable so far (ext4, kerberos)

Comment by Peter Jones [ 05/May/11 ]

Great. As we are continually moving to newer kernels this will certainly be a non-issue by the time we see full kerberos support

Generated at Sat Feb 10 01:04:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.