Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.4.0
-
None
-
1 Lustre Client Server
2 MDS Server, config failover
2 OSS Server,config failover
-
3
-
12054
Description
In the process of deleting a large number of small files(close to 800,000), simulate one MDS crash. During the othe MDS takeover the crashed one, the normal MDS print call trace and crashed.
After a while , the client server crashed too.
Call trace info as follow:
BUG: unable to handle kernel paging request at 0000000000001008
IP: [<ffffffffa08394b2>] cfs_tage_to_tail+0x32/0xa0 [libcfs]
PGD 26ca0b067 PUD 26e2b2067 PMD 0
Oops: 0002 1 SMP
last sysfs file: /sys/class/nt_neg/info
CPU 1
Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) device(F)(U) tcmfc(F)(U) libpmcfc(U) lundev(F)(U) iscsi_target_mod(F)(U) target_core_mod(F)(U) replication(F)(U) snapshot(F)(U) hotcache(F)(U) cache(F)(U) mmgt(F)(U) copy(F)(U) raid5(F)(U) raid10(F)(U) raid1(F)(U) raid0(F)(U) data_record(F)(U) raid_common(F)(U) ACpoweroff(F)(U) pm80xx(F)(U) sys_disk(U) disk_noop(F)(U) disk_deadline(F)(U) disk_elevator(F)(U) lib_sas(F)(U) scsi_transport_sas(F)(U) lib_ata(F)(U) odsp_sas(F)(U) disk_adapter(F)(U) disk_vault(F)(U) disk_error(F)(U) odsp_scsi(F)(U) disk_err_stub(U) disk_arch(F)(U) cell(F)(U) ld_worker(F)(U) comm_duplock(F)(U) iodir(F)(U) pthread(F)(U) kmempool(F)(U) common(U) comm(F)(U) dma(F)(U) bugon(F)(U) driver_adapter(F)(U) event(F)(U) debug(U) nt_memcpy(U) drv_dma(F)(U) plxdma(F)(U) async_tx(U) xor(U) nt_neg_resume(F)(U) Plx8000(U) msgpio(U) bonding ipv6 sunrpc i2c_ismt(U) i2c_dev 8021q garp stp llc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mod e1000e sg ixgbe hwmon dca mdio iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ext4 mbcache jbd2 sd_mod crc_t10dif ahci [last unloaded: scsi_wait_scan]
Pid: 23499, comm: mdt00_003 Tainted: GF W --------------- 2.6.32-358.6.2.l2.08 #2 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffffa08394b2>] [<ffffffffa08394b2>] cfs_tage_to_tail+0x32/0xa0 [libcfs]
RSP: 0000:ffff880288c119b0 EFLAGS: 00010282
RAX: ffff880290eecd20 RBX: ffff880290c79720 RCX: 0000000000001000
RDX: ffff880290c79728 RSI: ffff880290eecd20 RDI: ffff880290c79720
RBP: ffff880288c119c0 R08: 000000000000004c R09: 0000000000000000
R10: 0000000000000001 R11: ffff88026d710d60 R12: ffff880290eecd20
R13: 0000000000001010 R14: ffff880288c11ae0 R15: 00000000000000a1
FS: 00007fe20c76a700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000001008 CR3: 000000026d0bd000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mdt00_003 (pid: 23499, threadinfo ffff880288c10000, task ffff880293bc0aa0)
Stack:
ffff880290eecd00 0000000000000400 ffff880288c11a30 ffffffffa083aab6
<d> 0000000052b01764 ffff880290c79720 0000000000000000 ffff880288c11ab0
<d> ffff880288c11a30 ffffffffa082927b 0000000052b01764 0000000000009880
Call Trace:
[<ffffffffa083aab6>] cfs_trace_get_tage+0x1c6/0x300 [libcfs]
[<ffffffffa082927b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
[<ffffffffa083ada2>] libcfs_debug_vmsg2+0x1b2/0x890 [libcfs]
[<ffffffffa083b4c1>] libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa093fea7>] ? class_handle2object+0x97/0x170 [obdclass]
[<ffffffffa1029b01>] mdt_save_lock+0x201/0x300 [mdt]
[<ffffffffa1029c5c>] mdt_object_unlock+0x5c/0x160 [mdt]
[<ffffffffa104a52b>] mdt_reint_unlink+0xa0b/0x14e0 [mdt]
[<ffffffffa10457e1>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa1027063>] mdt_reint_internal+0x4d3/0x7a0 [mdt]
[<ffffffffa1027374>] mdt_reint+0x44/0xe0 [mdt]
[<ffffffffa102ca08>] mdt_handle_common+0x648/0x1660 [mdt]
[<ffffffffa106a7e5>] mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0abfa59>] ptlrpc_server_handle_request+0x3c9/0xc70 [ptlrpc]
[<ffffffffa082a64e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa083c0ff>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
[<ffffffffa0abc94f>] ? ptlrpc_wait_event+0xaf/0x2d0 [ptlrpc]
[<ffffffff8103fec3>] ? __wake_up+0x53/0x70
[<ffffffffa0ac341e>] ptlrpc_main+0xafe/0x1940 [ptlrpc]
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffff81003f1a>] child_rip+0xa/0x20
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffff81003f10>] ? child_rip+0x0/0x20
Code: 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 48 85 ff 48 89 fb 49 89 f4 74 38 4d 85 e4 74 52 48 8b 43 10 48 8b 4b 08 48 8d 53 08 <48> 89 41 08 48 89 08 49 8b 44 24 08 49 89 54 24 08 4c 89 63 08
RIP [<ffffffffa08394b2>] cfs_tage_to_tail+0x32/0xa0 [libcfs]
RSP <ffff880288c119b0>
CR2: 0000000000001008
--[ end trace 026f77cb820e9e09 ]--
Kernel panic - not syncing: Fatal exception
Pid: 23499, comm: mdt00_003 Tainted: GF D W --------------- 2.6.32-358.6.2.l2.08 #2
Call Trace:
[<ffffffff81476fa7>] ? panic+0xa1/0x163
[<ffffffff8147b2fc>] ? oops_end+0xdc/0xf0
[<ffffffff81031b3b>] ? no_context+0xfb/0x260
[<ffffffff81075ec7>] ? bit_waitqueue+0x17/0xd0
[<ffffffff81031db5>] ? __bad_area_nosemaphore+0x115/0x1d0
[<ffffffffa00248e2>] ? do_get_write_access+0x3d2/0x550 [jbd2]
[<ffffffff81031e83>] ? bad_area_nosemaphore+0x13/0x20
[<ffffffff8147cdd1>] ? do_page_fault+0x351/0x540
[<ffffffffa0ece15b>] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
[<ffffffffa0edee18>] ? ldiskfs_bread+0x18/0x90 [ldiskfs]
[<ffffffff8147a68f>] ? page_fault+0x1f/0x30
[<ffffffffa08394b2>] ? cfs_tage_to_tail+0x32/0xa0 [libcfs]
[<ffffffffa083aab6>] ? cfs_trace_get_tage+0x1c6/0x300 [libcfs]
[<ffffffffa082927b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
[<ffffffffa083ada2>] ? libcfs_debug_vmsg2+0x1b2/0x890 [libcfs]
[<ffffffffa083b4c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[<ffffffffa093fea7>] ? class_handle2object+0x97/0x170 [obdclass]
[<ffffffffa1029b01>] ? mdt_save_lock+0x201/0x300 [mdt]
[<ffffffffa1029c5c>] ? mdt_object_unlock+0x5c/0x160 [mdt]
[<ffffffffa104a52b>] ? mdt_reint_unlink+0xa0b/0x14e0 [mdt]
[<ffffffffa10457e1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa1027063>] ? mdt_reint_internal+0x4d3/0x7a0 [mdt]
[<ffffffffa1027374>] ? mdt_reint+0x44/0xe0 [mdt]
[<ffffffffa102ca08>] ? mdt_handle_common+0x648/0x1660 [mdt]
[<ffffffffa106a7e5>] ? mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0abfa59>] ? ptlrpc_server_handle_request+0x3c9/0xc70 [ptlrpc]
[<ffffffffa082a64e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[<ffffffffa083c0ff>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
[<ffffffffa0abc94f>] ? ptlrpc_wait_event+0xaf/0x2d0 [ptlrpc]
[<ffffffff8103fec3>] ? __wake_up+0x53/0x70
[<ffffffffa0ac341e>] ? ptlrpc_main+0xafe/0x1940 [ptlrpc]
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffff81003f1a>] ? child_rip+0xa/0x20
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffffa0ac2920>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
[<ffffffff81003f10>] ? child_rip+0x0/0x20
Attachments
Issue Links
- is related to
-
LU-4391 vsnprintf+0x218/0x5e0
-
- Resolved
-