[LU-2199] LBUG triggered in brw_interpret: "obdo already freed" Created: 16/Oct/12 Updated: 13/Nov/12 Resolved: 13/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | topsequoia | ||
| Environment: |
orion-2_3_49_54_2-75chaos |
||
| Severity: | 3 |
| Rank (Obsolete): | 5243 |
| Description |
|
Hit this LBUG on one of our Sequoia IO nodes running the old Orion code base orion-2_3_49_54_2-75chaos: LustreError: 3216:0:(osc_request.c:1859:brw_interpret()) @@@ obdo already freed req@c0000003c24cc800 x1415959349900530/t12885333672(12885333672) o4->ls1-OST00e3-osc-c0000003c6904c00@172.20.2.27@o2ib500:6/4 lens 456/416 e 0 to 0 dl 1350423562 ref 1 fl Interpret:R/4/0 rc 0/0 LustreError: 3216:0:(osc_request.c:1860:brw_interpret()) LBUG Call Trace: [c0000003e9343870] [c000000000008190] .show_stack+0x7c/0x184 (unreliable) [c0000003e9343920] [80000000009f0c1c] .libcfs_debug_dumpstack+0x9c/0xe0 [libcfs] [c0000003e93439c0] [80000000009f1260] .lbug_with_loc+0x50/0xc0 [libcfs] [c0000003e9343a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc] [c0000003e9343b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc] [c0000003e9343d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc] [c0000003e9343e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc] [c0000003e9343f90] [c00000000001a9e0] .kernel_thread+0x54/0x70 ^GMessage from syslogd@(none) at Oct 16 14:37:37 ... kernel:LustreError: 3216:0:(osc_request.c:1860:brw_interpret()) LBUG Kernel panic - not syncing: LBUG Call Trace: [c0000003e9343880] [c000000000008190] .show_stack+0x7c/0x184 (unreliable) [c0000003e9343930] [c000000000432c0c] .panic+0x80/0x1a8 [c0000003e93439c0] [80000000009f12c0] .lbug_with_loc+0xb0/0xc0 [libcfs] [c0000003e9343a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc] [c0000003e9343b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc] [c0000003e9343d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc] [c0000003e9343e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc] [c0000003e9343f90] [c00000000001a9e0] .kernel_thread+0x54/0x70 LustreError: dumping log to /tmp/lustre-log.1350423457.3216 |
| Comments |
| Comment by Peter Jones [ 17/Oct/12 ] |
|
Alex Could someone please look into this one? Thanks Peter |
| Comment by Christopher Morrone [ 01/Nov/12 ] |
|
Hit again. 2012-11-01 14:23:35.708036 {DefaultControlEventListener} [mmcs]{103}.7.1: LustreError: 3336:0:(osc_request.c:1859:brw_interpret()) @@@ obdo already freed req@c0000002e0f58c00 x1417459081295033/t21475646365(21475646365) o4->ls1-OST00d5-osc-c0000003ec619800@172.20.2.13@o2ib500:6/4 lens 456/416 e 0 to 0 dl 1351805120 ref 1 fl Interpret:R/4/0 rc 0/0
2012-11-01 14:23:35.747969 {DefaultControlEventListener} [mmcs]{103}.7.1: LustreError: 3336:0:(osc_request.c:1860:brw_interpret()) LBUG
2012-11-01 14:23:35.787805 {DefaultControlEventListener} [mmcs]{103}.7.1: Call Trace:
2012-11-01 14:23:35.827832 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3870] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2012-11-01 14:23:35.868181 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3920] [80000000009f0c1c] .libcfs_debug_dumpstack+0x9c/0xe0 [libcfs]
2012-11-01 14:23:35.908596 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b39c0] [80000000009f1260] .lbug_with_loc+0x50/0xc0 [libcfs]
2012-11-01 14:23:35.947814 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
2012-11-01 14:23:35.987861 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
2012-11-01 14:23:36.027871 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
2012-11-01 14:23:36.068110 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
2012-11-01 14:23:36.107803 {DefaultControlEventListener} [mmcs]{103}.7.1: [c0000003e90b3f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
2012-11-01 14:23:36.198902 {DefaultControlEventListener} [mmcs]{103}.13.1: ^GMessage from syslogd@(none) at Nov 1 14:23:35 ...
2012-11-01 14:23:36.237809 {DefaultControlEventListener} [mmcs]{103}.13.1: kernel:LustreError: 3336:0:(osc_request.c:1860:brw_interpret()) LBUG
2012-11-01 14:23:36.288623 {DefaultControlEventListener} [mmcs]{103}.7.2: Kernel panic - not syncing: LBUG
2012-11-01 14:23:36.328575 {DefaultControlEventListener} [mmcs]{103}.7.2: Call Trace:
2012-11-01 14:23:36.368579 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3880] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2012-11-01 14:23:36.407824 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3930] [c000000000432c0c] .panic+0x80/0x1a8
2012-11-01 14:23:36.448613 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b39c0] [80000000009f12c0] .lbug_with_loc+0xb0/0xc0 [libcfs]
2012-11-01 14:23:36.488034 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3a50] [8000000004445918] .brw_interpret+0xd28/0xec0 [osc]
2012-11-01 14:23:36.528883 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3b70] [800000000383b8a4] .ptlrpc_check_set+0x384/0x3c40 [ptlrpc]
2012-11-01 14:23:36.568987 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3d10] [800000000387d86c] .ptlrpcd_check+0x5bc/0x760 [ptlrpc]
2012-11-01 14:23:36.608557 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3e30] [800000000387dd28] .ptlrpcd+0x318/0x4d0 [ptlrpc]
2012-11-01 14:23:36.649014 {DefaultControlEventListener} [mmcs]{103}.7.2: [c0000003e90b3f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
|
| Comment by Christopher Morrone [ 01/Nov/12 ] |
|
I hit this pretty reliably today while we were running 2.3.49.54-75chaos, but after installing 2.3.54-2chaos I haven't hit it yet. It may be fixed...or perhaps just harder to hit now. |
| Comment by Jinshan Xiong (Inactive) [ 08/Nov/12 ] |
|
Hi Chris, where can I refer to the code base you're using? |
| Comment by Christopher Morrone [ 08/Nov/12 ] |
|
I updated the tags on our github site. The tag for 2.3.49.54-75chaos is orion-2_3_49_54_2-75chaos. |
| Comment by Jinshan Xiong (Inactive) [ 09/Nov/12 ] |
|
now that it can't be seen anymore, let's lower the priority and leave this ticket open. |
| Comment by Jinshan Xiong (Inactive) [ 13/Nov/12 ] |
|
Please reopen it if this problem can be seen again |