[LU-1017] MDS oops when running racer test Created: 20/Jan/12  Updated: 04/May/12  Resolved: 13/Feb/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: Lustre 2.2.0, Lustre 2.1.2

Type: Bug Priority: Blocker
Reporter: James A Simmons Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File barry-all.sh    
Severity: 3
Rank (Obsolete): 4739

 Description   

Each time running the racer test the MDS eventually oops when running with the newest lustre code from master.



 Comments   
Comment by James A Simmons [ 20/Jan/12 ]

Here is the oops

barry-mds1: kernel: Oops 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 7
Modules linked in: ipmi_si cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) exportfs mgc(U) ldiskfs(U) jbd2 crc16 lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) mpt2sas mptctl dell_rbu autofs4 ib_srp(U) ipmi_devintf ipmi_msghandler ipt_REJECT xt_tcpudp xt_state ip_conntrack nfnetlink iptable_filter ip_tables x_tables be2iscsi iscsi_tcp bnx2i cnic uio libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_qib(U) dca mlx4_ib(U) mlx4_en(U) mlx4_core(U) dm_mirror dm_log dm_multipath scsi_dh dm_mod raid0 raid1 video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sd_mod sr_mod cdrom sg joydev mptsas ib_mthca(U) ata_piix mptscsih i50
00_edac pcspkr libata mptbase ib_mad(U) shpchp edac_mc scsi_transport_sas ib_core(U) scsi_mod tpm_tis uhci_hcd tpm ehci_hcd serio_raw tpm_bios nfs nfs_acl lockd fscache sunrpc bnx2
Pid: 8111, comm: mdt_372 Tainted: G 2.6.18-238.19.1.el5.head #1
RIP: 0010:[<ffffffff889b8ce9>] [<ffffffff889b8ce9>] :obdclass:lu_object_find_at+0x139/0x450
RSP: 0018:ffff8103eb4d9630 EFLAGS: 00010286
RAX: fffffffffffffff5 RBX: ffff8103de825080 RCX: 000000000000000e
RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff8103d9415b40
RBP: ffff8103eb4d9710 R08: ffffc20036cb8170 R09: ffff8103eb4d96c0
R10: 00000000000000fe R11: 0000000200000430 R12: ffff8103de825a80
R13: fffffffffffffff5 R14: ffffc20036cb8170 R15: ffff8103dbd87000
FS: 00002ac79ed186e0(0000) GS:ffff81043fc59740(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffffffffffffff5 CR3: 00000004171d6000 CR4: 00000000000006e0
Process mdt_372 (pid: 8111, threadinfo ffff8103eb4d8000, task ffff8103ebcf5040)
Stack: ffff8103eb4d9640 ffff8103eb4d9680 000004e5eb4d97b0 ffffffff88eec336
ffffffff88ef0988 0000000200000000 0000000000000000 0000000000000000
ffff8103eb4d9ed0 ffff8103dcbf43a4 0000000000000000 ffff8103ebcf5040
Call Trace:
[<ffffffff80093cea>] default_wake_function+0x0/0xf
[<ffffffff88a88970>] :ptlrpc:lustre_msg_buf+0x0/0x80
[<ffffffff889b9011>] :obdclass:lu_object_find+0x11/0x20
[<ffffffff88eb2d80>] :mdt:mdt_object_find+0xf0/0x180
[<ffffffff88eb4d1a>] :mdt:mdt_object_find_lock+0x3a/0x140
[<ffffffff88a87eb3>] :ptlrpc:lustre_msg_get_flags+0x23/0x80
[<ffffffff88edf2a0>] :mdt:mdt_reint_open+0x1b80/0x3250
[<ffffffff88e82702>] :mdd:md_ucred+0x42/0x50
[<ffffffff88e82702>] :mdd:md_ucred+0x42/0x50
[<ffffffff88ecaf53>] :mdt:mdt_reint_rec+0x83/0x100
[<ffffffff88eb13a5>] :mdt:mdt_reint_internal+0x8e5/0x990
[<ffffffff88ebef55>] :mdt:mdt_intent_reint+0x1f5/0x5e0
[<ffffffff88ebf7a7>] :mdt:mdt_intent_policy+0x467/0x650
[<ffffffff88a46299>] :ptlrpc:ldlm_lock_enqueue+0x179/0xa00
[<ffffffff88a62880>] :ptlrpc:ldlm_export_lock_get+0x10/0x20
[<ffffffff888d8511>] :libcfs:cfs_hash_bd_add_locked+0x71/0x80
[<ffffffff888dbecd>] :libcfs:cfs_hash_add+0x17d/0x190
[<ffffffff88a64d4b>] :ptlrpc:ldlm_handle_enqueue0+0x9eb/0xfe0
[<ffffffff88eb33ee>] :mdt:mdt_unpack_req_pack_rep+0x5de/0x6a0
[<ffffffff88ebec02>] :mdt:mdt_enqueue+0x72/0x100
[<ffffffff88eb4696>] :mdt:mdt_handle_common+0x11e6/0x1750
[<ffffffff88eb4cd0>] :mdt:mdt_regular_handle+0x10/0x20
[<ffffffff88a943a7>] :ptlrpc:ptlrpc_server_handle_request+0x897/0xe90
[<ffffffff8004abb1>] mod_timer+0x2b/0x2d
[<ffffffff888cb099>] :libcfs:cfs_timer_arm+0x9/0x10
[<ffffffff888d6fb3>] :libcfs:lc_watchdog_touch+0xc3/0x120
[<ffffffff88a971d8>] :ptlrpc:ptlrpc_wait_event+0x258/0x360
[<ffffffff8002f7b2>] __wake_up+0x43/0x50
[<ffffffff88a98699>] :ptlrpc:ptlrpc_main+0x13b9/0x15a0
[<ffffffff88a972e0>] :ptlrpc:ptlrpc_main+0x0/0x15a0
[<ffffffff80061fb1>] child_rip+0xa/0x11
[<ffffffff88a972e0>] :ptlrpc:ptlrpc_main+0x0/0x15a0
[<ffffffff88a972e0>] :ptlrpc:ptlrpc_main+0x0/0x15a0
[<ffffffff80061fa7>] child_rip+0x0/0x11

Code: 48 8b 10 48 83 c2 38 48 8b 0a 48 39 d1 0f 84 d4 02 00 00 48
RIP [<ffffffff889b8ce9>] :obdclass:lu_object_find_at+0x139/0x450
RSP <ffff8103eb4d9630>
CR2: fffffffffffffff5
<0>Kernel panic - not syncing: Fatal exception

Comment by Peter Jones [ 20/Jan/12 ]

Cliff

Does this appear to be a duplicate of the issue that you ran into on Hyperion?

Peter

Comment by Cliff White (Inactive) [ 20/Jan/12 ]

This does not appear to be the same, we have hit LBUG/Assert, not an OOPS, I will see if I can replicate on hyperion.

Comment by James A Simmons [ 24/Jan/12 ]

Finished git bisecting to track down the commit that is causing it. The racer problem shows up after commit 22464d1230ed58461f51d881f512d5e16644a735 which is the patch for LU-909. Cliff can you test the branch with that patch reverted to see if you still see your LBUG?

Comment by Cliff White (Inactive) [ 24/Jan/12 ]

I am traveling this week, so it may be difficult. Will see what I can do, good that you identified the commit.

Comment by James A Simmons [ 27/Jan/12 ]

I uploaded our kernel module at an enginners request. Its on your ftp site in directory uploads/LU-1017/obdclass.ko

Comment by Alex Zhuravlev [ 30/Jan/12 ]

000000000004fccc <lu_object_find_at+0x12c> callq 000000000004f4f0 <htable_lookup>
000000000004fcd1 <lu_object_find_at+0x131> test %rax,%rax
000000000004fcd4 <lu_object_find_at+0x134> mov %rax,%r13
000000000004fcd7 <lu_object_find_at+0x137> je 000000000004fd03 <lu_object_find_at+0x163>
000000000004fcd9 <lu_object_find_at+0x139> mov (%rax),%rdx

RAX: fffffffffffffff5 = -11

shadow = htable_lookup(s, &bd, f, waiter, &version);
if (likely(shadow == NULL)) {

in turn...

htable_lookup() can return -EAGAIN:

/*

  • Lookup found an object being destroyed this object cannot be
  • returned (to assure that references to dying objects are eventually
  • drained), and moreover, lookup has to wait until object is freed.
    */
    cfs_atomic_dec(&h->loh_ref);

cfs_waitlink_init(waiter);
cfs_waitq_add(&bkt->lsb_marche_funebre, waiter);
cfs_set_current_state(CFS_TASK_UNINT);
lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
return ERR_PTR(-EAGAIN);

Comment by Peter Jones [ 30/Jan/12 ]

Niu

Could you please create a patch for this?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 30/Jan/12 ]

Hi, Alex

Do you mean that the oops is triggered when htable_lookup() returns -EAGAIN? I don't see the reason from the code, could you explain it more? Thanks.

Comment by Alex Zhuravlev [ 30/Jan/12 ]

htable_lookup() found an object being destroyed and returned -EAGAIN

Comment by Niu Yawei (Inactive) [ 31/Jan/12 ]

http://review.whamcloud.com/2066

Comment by James A Simmons [ 31/Jan/12 ]

No more oops.

Comment by James A Simmons [ 31/Jan/12 ]

Spoke to fast. The test complete but once I ran llmountcleanup.sh I get the following oops

Jan 31 14:13:43 spoon02 kernel: Lustre: DEBUG MARKER: == racer racer.sh test complete, duration 659 sec ==================================================== 14:1
3:43 (1328037223)
Jan 31 14:29:31 spoon02 kernel: Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
Jan 31 14:29:31 spoon02 kernel: LustreError: 25339:0:(file.c:157:ll_close_inode_openhandle()) inode 144115205255725063 mdc close failed: rc = -4
Jan 31 14:29:31 spoon02 kernel: Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request
Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(file.c:157:ll_close_inode_openhandle()) inode 144115205255725059 mdc close failed: rc = -108
Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(file.c:157:ll_close_inode_openhandle()) Skipped 1 previous similar message
Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(cl_lock.c:2082:cl_locks_prune()) ASSERTION(lock->cll_users == 0) failed
Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(cl_lock.c:2082:cl_locks_prune()) LBUG
Jan 31 14:29:31 spoon02 kernel: Pid: 25321, comm: dd
Jan 31 14:29:31 spoon02 kernel:
Jan 31 14:29:31 spoon02 kernel: Call Trace:
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8882372f>] libcfs_debug_dumpstack+0x5f/0x80 [libcfs]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88823c5f>] lbug_with_loc+0x7f/0xd0 [libcfs]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8882f191>] libcfs_assertion_failed+0x61/0x70 [libcfs]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88954e7c>] cl_locks_prune+0x14c/0x210 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88831aec>] cfs_hash_bd_from_key+0x3c/0xc0 [libcfs]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8894a992>] cl_object_kill+0x82/0x90 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88bba781>] lov_delete_raid0+0x141/0x300 [lov]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88943fa1>] lu_obj_hop_hash+0x131/0x240 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88bbb7df>] lov_object_delete+0xcf/0x150 [lov]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88942709>] lu_object_free+0x89/0x190 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88833c8d>] cfs_hash_hd_hnode_del+0xd/0x50 [libcfs]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88943613>] lu_object_put+0x1c3/0x1e0 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88949629>] cl_object_put+0x9/0x10 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c8b133>] cl_inode_fini+0x1d3/0x240 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88cf30cf>] lmv_change_cbdata+0x55f/0x580 [lmv]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c53a60>] null_if_equal+0x0/0x50 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c63248>] ll_clear_inode+0x838/0xc70 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c56c10>] ll_delete_inode+0x0/0x600 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff80023be7>] clear_inode+0xda/0x12d
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c5718a>] ll_delete_inode+0x57a/0x600 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c56c10>] ll_delete_inode+0x0/0x600 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff800309d8>] generic_delete_inode+0xc9/0x147
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8003b777>] generic_drop_inode+0x15/0x15f
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8002c25c>] iput+0x85/0x8a
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88c289ab>] ll_d_iput+0x4b/0x60 [lustre]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff80036aa8>] dentry_iput+0x8c/0xae
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8000d890>] dput+0xf7/0x115
Jan 31 14:29:31 spoon02 kernel: [<ffffffff800131c5>] __fput+0x19c/0x1bd
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8002e001>] fput+0x14/0x16
Jan 31 14:29:31 spoon02 kernel: [<ffffffff800249d3>] filp_close+0x65/0x70
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8003aa87>] put_files_struct+0x6b/0xb3
Jan 31 14:29:31 spoon02 kernel: [<ffffffff80015fb2>] do_exit+0x38c/0x9c0
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8004b974>] cpuset_exit+0x0/0x8f
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8002c6d6>] get_signal_to_deliver+0x475/0x4a7
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8006119f>] sysret_signal+0x1c/0x27
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8005e0d0>] do_notify_resume+0xa4/0x84e
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8016685c>] list_add+0xc/0xe
Jan 31 14:29:31 spoon02 kernel: [<ffffffff88948d68>] cl_env_put+0x288/0x2c0 [obdclass]
Jan 31 14:29:31 spoon02 kernel: [<ffffffff80017267>] vfs_write+0xcf/0x175
Jan 31 14:29:31 spoon02 kernel: [<ffffffff8006119f>] sysret_signal+0x1c/0x27
Jan 31 14:29:31 spoon02 kernel: [<ffffffff80061427>] ptregscall_common+0x67/0xac
Jan 31 14:29:31 spoon02 kernel:

Comment by Niu Yawei (Inactive) [ 31/Jan/12 ]

This should be a separate defect, it looks like the io process exit with sigkill, and we missed lock unuse somewhere in this code path, but I didn't figure it out from the code yet. I've asked Jingshan to look into it too.

Comment by Jinshan Xiong (Inactive) [ 01/Feb/12 ]

The recent significant change on that part is fanyong's LU-925 with commit number 6f5813d36102a19f314c9aab409972e8a9f1112b

James, can you please share me some parameters to run racer as I want to reproduce it in our lab?

Comment by nasf (Inactive) [ 01/Feb/12 ]

> Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(file.c:157:ll_close_inode_openhandle()) Skipped 1 previous similar message
> Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(cl_lock.c:2082:cl_locks_prune()) ASSERTION(lock->cll_users == 0) failed
> Jan 31 14:29:31 spoon02 kernel: LustreError: 25321:0:(cl_lock.c:2082:cl_locks_prune()) LBUG

It maybe caused by AGL (async glimpse lock), which holds an user count for AGL RPC reply processing later. Since it is not the same as original issue, I will fix it in a new ticket LU-1061.

Comment by Niu Yawei (Inactive) [ 01/Feb/12 ]

it probably caused by AGL (async glimpse lock) which doesn't grab inode, I'll open a separate ticket for it.

Comment by James A Simmons [ 01/Feb/12 ]

Here is the cfg file I use to run racer in the test suite.

Comment by Peter Jones [ 13/Feb/12 ]

Landed for 2.2

Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 13/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #468
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = SUCCESS
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Mikhail Pershin [ 14/Feb/12 ]

this patch overlaps with previous commit b9ccecd1453c5c76fe135048c39f149c241650c6 LU-1013 obdclass: lu_object_find miss to unlink object from LRU.

Fan Yong was trying to solve the same issue it seems but cover only single case and also didn't handle -EAGAIN case. This is his change:

@@ -627,12 +627,14 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env,
                 bkt->lsb_busy++;
                 cfs_hash_bd_unlock(hs, &bd, 1);
                 return o;
+        } else {
+                if (!cfs_list_empty(&shadow->lo_header->loh_lru))
+                        cfs_list_del_init(&shadow->lo_header->loh_lru);
+                lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE);
+                cfs_hash_bd_unlock(hs, &bd, 1);
+                lu_object_free(env, o);
+                return shadow;
         }
-
-        lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_RACE);
-        cfs_hash_bd_unlock(hs, &bd, 1);
-        lu_object_free(env, o);
-        return shadow;
 }

Now we have other code using result of htable_lookup() without checking for -EAGAIN. Meanwhile this commit is not needed at all after LU-1017 fix, because the last does list_del_init inside of htable_lookup, so we can just revert b9ccecd1453c5c76fe135048c39f149c241650c6 to solve this issue.

Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #480
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = FAILURE
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #480
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = FAILURE
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » i686,client,el6,ofa #480
LU-1017 handle -EAGAIN properly in lu_object_find_try() (Revision 4ce426ab33196ff04cc7016a2dc717e53d1b4a75)

Result = ABORTED
Oleg Drokin : 4ce426ab33196ff04cc7016a2dc717e53d1b4a75
Files :

  • lustre/obdclass/lu_object.c
Comment by Bob Glossman (Inactive) [ 02/May/12 ]

http://review.whamcloud.com/#change,2629
back port to b2_1

Generated at Sat Feb 10 01:12:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.