Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
lustre-modules-1.8.5.0-5chaos_2.6.18_107chaos.ch4.4
lustre-1.8.5.0-5chaos_2.6.18_107chaos.ch4.4
lustre-tools-llnl-1.2-6.ch4.4
chaos 4.4-3
-
3
-
22,723
-
9755
Description
This is the same bug that was discussed on bugzilla ticket 22723 found here: https://bugzilla.lustre.org/show_bug.cgi?id=22723
We have run into this issue in the past, and we recently ran into the issue again on one of our machines running chaos 4.4-3, with lustre 1.8.5 installed.
This time we had a patch installed for Liang from the above mentioned bugzilla ticket to print some more debugging information to the console:
18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0
And here is some more console output from the crash this time around:
2011-07-15 19:44:40 LustreError: 18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed: TX: ffffc200003da540, type: IMMEDIATE, magic: deadbeef, sending: 0, waiting: 0, queued: 0, cookie: 9361598, comps: 1, status: 0 2011-07-15 19:44:40 ib_mthca 0000:07:00.0: SQ 000413 full (292641 head, 292873 tail, 4096 max, 0 nreq) 2011-07-15 19:44:40 LustreError: 19245:0:(o2iblnd_cb.c:886:kiblnd_post_tx_locked()) Error -12 posting transmit to 192.168.123.110@o2ib1 2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Request obd_ping sent 0s ago to 172.16.68.23@tcp has failed due to network error (limit 105s) 2011-07-15 19:44:40 Lustre: Skipped 1 previous similar message 2011-07-15 19:44:40 Lustre: lsa-OST0036-osc-ffff8100cc104000: Connection to lsa-OST0036 (at 172.16.68.23@tcp) was lost; in progress operations using this service will wait for recovery to complete 2011-07-15 19:44:40 LustreError: 18632:0:(o2iblnd_cb.c:980:kiblnd_tx_complete()) LBUG 2011-07-15 19:44:40 Pid: 18632, comm: kiblnd_sd_03 2011-07-15 19:44:40 2011-07-15 19:44:40 Call Trace: 2011-07-15 19:44:40 [<ffffffff885d478f>] libcfs_debug_dumpstack+0x5f/0x80 [libcfs] 2011-07-15 19:44:40 [<ffffffff885d4cbf>] lbug_with_loc+0x7f/0xd0 [libcfs] 2011-07-15 19:44:40 [<ffffffff88657965>] kiblnd_tx_complete+0x155/0x460 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80091f8c>] __wake_up_common+0x3e/0x68 2011-07-15 19:44:40 [<ffffffff88658f1c>] kiblnd_complete+0xbc/0xe0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff8865eeee>] kiblnd_scheduler+0x50e/0x6b0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80093b5a>] default_wake_function+0x0/0xf 2011-07-15 19:44:40 [<ffffffff8006101d>] child_rip+0xa/0x11 2011-07-15 19:44:40 [<ffffffff80061013>] child_rip+0x0/0x11 2011-07-15 19:44:40 [<ffffffff8865e9e0>] kiblnd_scheduler+0x0/0x6b0 [ko2iblnd] 2011-07-15 19:44:40 [<ffffffff80061013>] child_rip+0x0/0x11 2011-07-15 19:44:40 2011-07-15 19:44:40 ib_mthca 00Linux version 2.6.18-107chaos (mockbuild@chaos4-builder1) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Thu Jun 23 14:36:14 PDT 2011 2011-07-15 19:44:41 Command line: initrd=initrd console=ttyS0,115200n8 elevator=deadline swiotlb=65536 selinux=0 BOOT_IMAGE=vmlinuz BOOTIF=01-00-30-48-57-9b-24 irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=5312K@16384K memmap=125104K@22336K elfcorehdr=147440K memmap=56K#3391360K memmap=69K#3391416K memmap=4K$3391484K memmap=4K$4173824K memmap=1024K$4175872K memmap=9216K$4185088K 2011-07-15 19:44:41 BIOS-provided physical RAM map:
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA