[LU-2199] LBUG triggered in brw_interpret: "obdo already freed" Created: 16/Oct/12  Updated: 13/Nov/12  Resolved: 13/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Prakash Surya (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: topsequoia
Environment:

orion-2_3_49_54_2-75chaos


Severity: 3
Rank (Obsolete): 5243

 Description   

Hit this LBUG on one of our Sequoia IO nodes running the old Orion code base orion-2_3_49_54_2-75chaos:

LustreError: 3216:0:(osc_request.c:1859:brw_interpret()) @@@ obdo already freed  req@c0000003c24cc800 x1415959349900530/t12885333672(12885333672) o4->ls1-OST00e3-osc-c0000003c6904c00@172.20.2.27@o2ib500:6/4 lens 456/416 e 0 to 0 dl 1350423562 ref 1 fl Interpret:R/4/0 rc 0/0
LustreError: 3216:0:(osc_request.c:1860:brw_interpret()) LBUG
Call Trace:
[c0000003e9343870] [c000000000008190] .show_stack+0x7c/0x184 (unreliable)
[c0000003e9343920] [80000000009f0c1c] .libcfs_debug_dumpstack+0x9c/0xe0 [libcfs]
[c0000003e93439c0] [80000000009f1260] .lbug_with_loc+0x50/0xc0 [libcfs]
[c0000003e9343a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
[c0000003e9343b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
[c0000003e9343d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
[c0000003e9343e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
[c0000003e9343f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
^GMessage from syslogd@(none) at Oct 16 14:37:37 ...
 kernel:LustreError: 3216:0:(osc_request.c:1860:brw_interpret()) LBUG
Kernel panic - not syncing: LBUG
Call Trace:
[c0000003e9343880] [c000000000008190] .show_stack+0x7c/0x184 (unreliable)
[c0000003e9343930] [c000000000432c0c] .panic+0x80/0x1a8
[c0000003e93439c0] [80000000009f12c0] .lbug_with_loc+0xb0/0xc0 [libcfs]
[c0000003e9343a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
[c0000003e9343b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
[c0000003e9343d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
[c0000003e9343e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
[c0000003e9343f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
LustreError: dumping log to /tmp/lustre-log.1350423457.3216


 Comments   
Comment by Peter Jones [ 17/Oct/12 ]

Alex

Could someone please look into this one?

Thanks

Peter

Comment by Christopher Morrone [ 01/Nov/12 ]

Hit again.

2012-11-01 14:23:35.708036 {DefaultControlEventListener} [mmcs]{103}.7.1: LustreError: 3336:0:(osc_request.c:1859:brw_interpret()) @@@ obdo already freed  req@c0000002e0f58c00 x1417459081295033/t21475646365(21475646365) o4->ls1-OST00d5-osc-c0000003ec619800@172.20.2.13@o2ib500:6/4 lens 456/416 e 0 to 0 dl 1351805120 ref 1 fl Interpret:R/4/0 rc 0/0
2012-11-01 14:23:35.747969 {DefaultControlEventListener} [mmcs]{103}.7.1: LustreError: 3336:0:(osc_request.c:1860:brw_interpret()) LBUG
2012-11-01 14:23:35.787805 {DefaultControlEventListener} [mmcs]{103}.7.1: Call Trace:
2012-11-01 14:23:35.827832 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3870] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2012-11-01 14:23:35.868181 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3920] [80000000009f0c1c] .libcfs_debug_dumpstack+0x9c/0xe0 [libcfs]
2012-11-01 14:23:35.908596 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b39c0] [80000000009f1260] .lbug_with_loc+0x50/0xc0 [libcfs]
2012-11-01 14:23:35.947814 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
2012-11-01 14:23:35.987861 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
2012-11-01 14:23:36.027871 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
2012-11-01 14:23:36.068110 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
2012-11-01 14:23:36.107803 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
2012-11-01 14:23:36.198902 {DefaultControlEventListener} [mmcs]{103}.13.1: ^GMessage from syslogd@(none) at Nov  1 14:23:35 ...
2012-11-01 14:23:36.237809 {DefaultControlEventListener} [mmcs]{103}.13.1:  kernel:LustreError: 3336:0:(osc_request.c:1860:brw_interpret()) LBUG
2012-11-01 14:23:36.288623 {DefaultControlEventListener} [mmcs]{103}.7.2: Kernel panic - not syncing: LBUG
2012-11-01 14:23:36.328575 {DefaultControlEventListener} [mmcs]{103}.7.2: Call Trace:
2012-11-01 14:23:36.368579 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3880] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2012-11-01 14:23:36.407824 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3930] [c000000000432c0c] .panic+0x80/0x1a8
2012-11-01 14:23:36.448613 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b39c0] [80000000009f12c0] .lbug_with_loc+0xb0/0xc0 [libcfs]
2012-11-01 14:23:36.488034 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
2012-11-01 14:23:36.528883 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
2012-11-01 14:23:36.568987 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
2012-11-01 14:23:36.608557 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
2012-11-01 14:23:36.649014 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
Comment by Christopher Morrone [ 01/Nov/12 ]

I hit this pretty reliably today while we were running 2.3.49.54-75chaos, but after installing 2.3.54-2chaos I haven't hit it yet. It may be fixed...or perhaps just harder to hit now.

Comment by Jinshan Xiong (Inactive) [ 08/Nov/12 ]

Hi Chris, where can I refer to the code base you're using?

Comment by Christopher Morrone [ 08/Nov/12 ]

I updated the tags on our github site.

The tag for 2.3.49.54-75chaos is orion-2_3_49_54_2-75chaos.
The tag for 2.3.54-2chaos is 2.3.54-2chaos.

Comment by Jinshan Xiong (Inactive) [ 09/Nov/12 ]

now that it can't be seen anymore, let's lower the priority and leave this ticket open.

Comment by Jinshan Xiong (Inactive) [ 13/Nov/12 ]

Please reopen it if this problem can be seen again

Generated at Sat Feb 10 01:23:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.