Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 1.8.x (1.8.0 - 1.8.5)
-
None
-
3
-
4048
Description
In the past week we have encountered 3 separate OSSs panic with the following message. Two of the incidents occurred with Chaos 4.4-2, lustre 1.8.5.0.3 but we had scheduled to update to the 4.4-4.1 release this week. When we encountered the second panic on Monday we pushed it up to then. But we just encountered the same panic with the newer operating environment.
We are in the process of trying to identify the codes/users that were running at the time of the panics but I wanted to get this posted immediately. Unfortunately the lustre.log file was lost because of our diskless environment. I am working to correct that so that logs are sent to an NFS mount in the future.
LDISKFS-fs error (device dm-0): ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 38734, block = 126923574
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/ldiskfs/mballoc.c:3714
invalid opcode: 0000 [1] SMP
last sysfs file: /class/scsi_host/host0/local_ib_port
CPU 13
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) lustre(U) ldiskfs(U) jbd(U) lov(U) crc16(U) mdc(U) lquotr
mlx4_en(U) joydev(U) mlx4_core(U) shpchp(U) pcspkr(U) cxgb4(U) serio_raw(U) i7core_edac(U) edac_mc(U) ehci_hcd(U) 8021)
Pid: 8957, comm: ll_ost_io_175 Tainted: G 2.6.18-108chaos #1
RIP: 0010:[<ffffffff888ddf66>] [<ffffffff888ddf66>] :ldiskfs:ldiskfs_mb_release_inode_pa+0x1eb/0x218
RSP: 0018:ffff8102d2beb200 EFLAGS: 00010206
RAX: 00000000000000d6 RBX: 00000000000001d6 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8032dd80
RBP: ffff8102d2beb330 R08: ffffffff8032dd88 R09: 0000000000000020
R10: ffffffff8049981c R11: 0000000000000000 R12: ffff8101693abdc8
R13: 0000000000003fd6 R14: 0000000000003fd6 R15: 00000000000001d6
FS: 00002ae81dc466e0(0000) GS:ffff81033fdd6d40(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000046bc68 CR3: 0000000000201000 CR4: 00000000000006e0
Process ll_ost_io_175 (pid: 8957, threadinfo ffff8102d2bea000, task ffff8102d2aaa080)
Stack: 000000000000984f ffff8101a34293d0 ffff8102d2beb340 ffff8105fbdc8000
ffff810198036568 ffff81063bb91c00 0000000000003a00 ffff8101269084f8
0000000000000000 0000000000000001 07eee5c3000022fd 00000000002b00c9
Call Trace:
[<ffffffff8001a470>] __getblk+0x28/0x248
[<ffffffff888e0712>] :ldiskfs:ldiskfs_mb_discard_inode_preallocations+0x209/0x2b0
[<ffffffff889cc2af>] :fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0x38f/0x620
[<ffffffff888d9bd9>] :ldiskfs:ldiskfs_ext_walk_space+0x182/0x1fa
[<ffffffff889cbf20>] :fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0x0/0x620
[<ffffffff889c8849>] :fsfilt_ldiskfs:fsfilt_map_nblocks+0xe9/0x120
[<ffffffff80066c8c>] __mutex_unlock_slowpath+0x31/0x38
[<ffffffff889c8a7a>] :fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+0x1fa/0x220
[<ffffffff8002efc1>] __wake_up+0x43/0x50
[<ffffffff888a911e>] :jbd:start_this_handle+0x321/0x3c7
[<ffffffff889c8ae1>] :fsfilt_ldiskfs:fsfilt_ldiskfs_map_inode_pages+0x41/0xb0
[<ffffffff88a057fa>] :obdfilter:filter_direct_io+0x46a/0xd50
[<ffffffff887f09e0>] :lquota:filter_quota_acquire+0x0/0x120
[<ffffffff88a08552>] :obdfilter:filter_commitrw_write+0x17a2/0x2b30
[<ffffffff88a00038>] :obdfilter:filter_commitrw+0x58/0x2a0
[<ffffffff889aac3f>] :ost:ost_brw_write+0x1c2f/0x2410
[<ffffffff8872dea0>] :ptlrpc:lustre_msg_check_version_v2+0x10/0x30
[<ffffffff80093b5a>] default_wake_function+0x0/0xf
[<ffffffff889ae053>] :ost:ost_handle+0x2c33/0x5690
[<ffffffff8015f9f8>] __next_cpu+0x19/0x28
[<ffffffff8007a442>] smp_send_reschedule+0x4a/0x50
[<ffffffff8872dce5>] :ptlrpc:lustre_msg_get_opc+0x35/0xf0
[<ffffffff8873d41e>] :ptlrpc:ptlrpc_server_handle_request+0x96e/0xdc0
[<ffffffff8873db7a>] :ptlrpc:ptlrpc_wait_event+0x30a/0x320
[<ffffffff8873eaf6>] :ptlrpc:ptlrpc_main+0xf66/0x1110
[<ffffffff8006101d>] child_rip+0xa/0x11
[<ffffffff8873db90>] :ptlrpc:ptlrpc_main+0x0/0x1110
[<ffffffff80061013>] child_rip+0x0/0x11
Code: 0f 0b 68 69 98 8e 88 c2 82 0e 48 8b 85 e8 fe ff ff f0 44 01
RIP [<ffffffff888ddf66>] :ldiskfs:ldiskfs_mb_release_inode_pa+0x1eb/0x218
RSP <ffff8102d2beb200>
LustreError: dumping log to /var/dumps/lustre-log.1340395410.9106
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA