[LU-2255] mds-survey falls over after ~30 consecutive runs Created: 31/Oct/12 Updated: 08/Nov/18 Resolved: 08/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Richard Henwood (Inactive) | Assignee: | Di Wang |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 5391 |
| Description |
|
I'm running mds-survey using: for i in {0..100}; do echo $i; thrlo=16 thrhi=16 file_count=16 dir_count=16 /usr/bin/mds-survey >> ./100_runs_latestbuild.txt; sleep 30; done After 30 runs or so, the machine reboots. I'll try and capture more logs. After reboot, I mount the drive and this is what I find: # mount -t ldiskfs /dev/sdb /mnt [root@fat-amd-4 ~]# ls /mnt capa_keys oi.16.0 oi.16.18 oi.16.27 oi.16.36 oi.16.45 oi.16.54 oi.16.63 seq_ctl changelog_catalog oi.16.1 oi.16.19 oi.16.28 oi.16.37 oi.16.46 oi.16.55 oi.16.7 seq_srv changelog_users oi.16.10 oi.16.2 oi.16.29 oi.16.38 oi.16.47 oi.16.56 oi.16.8 CONFIGS oi.16.11 oi.16.20 oi.16.3 oi.16.39 oi.16.48 oi.16.57 oi.16.9 fld oi.16.12 oi.16.21 oi.16.30 oi.16.4 oi.16.49 oi.16.58 OI_scrub last_rcvd oi.16.13 oi.16.22 oi.16.31 oi.16.40 oi.16.5 oi.16.59 PENDING lfsck_bookmark oi.16.14 oi.16.23 oi.16.32 oi.16.41 oi.16.50 oi.16.6 quota_master lost+found oi.16.15 oi.16.24 oi.16.33 oi.16.42 oi.16.51 oi.16.60 quota_slave NIDTBL_VERSIONS oi.16.16 oi.16.25 oi.16.34 oi.16.43 oi.16.52 oi.16.61 ROOT O oi.16.17 oi.16.26 oi.16.35 oi.16.44 oi.16.53 oi.16.62 seq-200000003-lastid |
| Comments |
| Comment by Richard Henwood (Inactive) [ 31/Oct/12 ] |
|
It looks like I missed the crash, but I caught this from conman. SRAT: PXM 0 -> APIC 19 -> Node 0
SRAT: PXM 1 -> APIC 20 -> Node 1
SRAT: PXM 1 -> APIC 21 -> Node 1
SRAT: PXM 1 -> APIC 22 -> Node 1
SRAT: PXM 1 -> APIC 23 -> Node 1
SRAT: PXM 2 -> APIC 36 -> Node 2
SRAT: PXM 2 -> APIC 37 -> Node 2
SRAT: PXM 2 -> APIC 38 -> Node 2
SRAT: PXM 2 -> APIC 39 -> Node 2
SRAT: PXM 3 -> APIC 32 -> Node 3
SRAT: PXM 3 -> APIC 33 -> Node 3
SRAT: PXM 3 -> APIC 34 -> Node 3
SRAT: PXM 3 -> APIC 35 -> Node 3
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-e0000000
SRAT: Node 0 PXM 0 100000000-120000000
SRAT: Node 1 PXM 1 120000000-220000000
SRAT: Node 2 PXM 2 220000000-320000000
SRAT: Node 3 PXM 3 320000000-420000000
Bootmem setup node 0 0000000000000000-000000000b1fb000
NODE_DATA [0000000000019480 - 000000000004d47f]
bootmap [000000000004e000 - 000000000004f63f] pages 2
(9 early reservations) ==> bootmem [0000000000 - 000b1fb000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
#2 [0004000000 - 0005012024] TEXT DATA BSS ==> [0004000000 - 0005012024]
#3 [000ad56000 - 000b1ee56e] RAMDISK ==> [000ad56000 - 000b1ee56e]
#4 [0000081400 - 0000100000] BIOS reserved ==> [0000081400 - 0000100000]
#5 [0005013000 - 0005013277] BRK ==> [0005013000 - 0005013277]
#6 [0000010000 - 0000011000] PGTABLE ==> [0000010000 - 0000011000]
#7 [0000011000 - 000001103c] ACPI SLIT ==> [0000011000 - 000001103c]
#8 [0000011040 - 0000019480] MEMNODEMAP ==> [0000011040 - 0000019480]
found SMP MP-table at [ffff8800000ff780] ff780
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0: 0x00000010 -> 0x00000096
0: 0x00003087 -> 0x0000b1fb
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 almost reached. Keeping one slot for boot cpu. Processor 0/0x10 ignored.
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x11] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x12] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 2/0x12 ignored.
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x13] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 3/0x13 ignored.
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x14] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 4/0x14 ignored.
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x15] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 5/0x15 ignored.
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x16] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 6/0x16 ignored.
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x17] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 7/0x17 ignored.
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x20] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 8/0x20 ignored.
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x21] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 9/0x21 ignored.
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x22] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 10/0x22 ignored.
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x23] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 11/0x23 ignored.
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x24] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 12/0x24 ignored.
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x25] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 13/0x25 ignored.
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x26] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 14/0x26 ignored.
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x27] enabled)
ACPI: NR_CPUS/possible_cpus limit of 1 reached. Processor 15/0x27 ignored.
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x90] disabled)
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x91] disabled)
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x92] disabled)
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x93] disabled)
ACPI: LAPIC (acpi_id[0x15] lapic_id[0x94] disabled)
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x95] disabled)
ACPI: LAPIC (acpi_id[0x17] lapic_id[0x96] disabled)
ACPI: LAPIC (acpi_id[0x18] lapic_id[0x97] disabled)
ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 0, version 33, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8300 base: 0xfed00000
24 Processors exceeds NR_CPUS limit of 1
SMP: Allowing 1 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 0000000000096000 - 0000000000097000
PM: Registered nosave memory: 0000000000097000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e6000
PM: Registered nosave memory: 00000000000e6000 - 0000000000100000
PM: Registered nosave memory: 0000000000100000 - 0000000003087000
Allocating PCI resources starting at b1fb000 (gap: b1fb000:d4ca3000)
Booting paravirtualized kernel on bare hardware
NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:4
PERCPU: Embedded 31 pages/cpu @ffff880003200000 s94488 r8192 d24296 u2097152
pcpu-alloc: s94488 r8192 d24296 u2097152 alloc=1*2097152
pcpu-alloc: [0] 0
Built 1 zonelists in Node order, mobility grouping on. Total pages: 32651
Policy zone: DMA32
Kernel command line: ro root=UUID=80a1042f-9504-42f4-ae9f-195b313d906d rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K memmap=132562K@49690K elfcorehdr=182252K memmap=64K$0K memmap=38K$602K memmap=104K$920K memmap=8K$3668600K memmap=72K#3668608K memmap=184K#3668680K memmap=263296K$3668864K memmap=2048K$4192256K
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
Disabling memory control group subsystem
PID hash table entries: 512 (order: 0, 4096 bytes)
Checking aperture...
No AGP bridge found
Node 0: aperture @ 20000000 size 64 MB
Node 1: aperture @ 20000000 size 64 MB
Node 2: aperture @ 20000000 size 64 MB
Node 3: aperture @ 20000000 size 64 MB
AMD-Vi disabled by default: pass amd_iommu=on to enable
Memory: 107120k/182252k available (5153k kernel code, 49156k absent, 25976k reserved, 7165k data, 1260k init)
Hierarchical RCU implementation.
NR_IRQS:33024 nr_irqs:256
Extended CMOS year: 2000
Spurious LAPIC timer interrupt on cpu 0
do_IRQ: 0.76 No irq handler for vector (irq -1)
Console: colour VGA+ 80x25
console [ttyS0] enabled
Fast TSC calibration using PIT
Detected 2000.204 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.40 BogoMIPS (lpj=2000204)
pid_max: default: 32768 minimum: 301
Security Framework initialized
SELinux: Initializing.
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
Initializing cgroup subsys perf_event
Initializing cgroup subsys net_prio
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
SMP alternatives: switching to UP code
Freeing SMP alternatives: 34k freed
ACPI: Core revision 20090903
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 21019 entries in 83 pages
Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: AMD Opteron(tm) Processor 6128 stepping 01
Performance Events: Broken BIOS detected, complain to your hardware vendor.
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 530076)
AMD PMU driver.
... version: 0
... bit width: 48
... generic registers: 4
... value mask: 0000ffffffffffff
... max period: 00007fffffffffff
... fixed-purpose events: 0
... event mask: 000000000000000f
NMI watchdog enabled, takes one hw-pmu counter.
Brought up 1 CPUs
Total of 1 processors activated (4000.40 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
NET: Registered protocol family 16
TOM: 00000000e0000000 aka 3584M
TOM2: 0000000420000000 aka 16896M
ACPI: bus type pci registered
PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
PCI: MCFG area at e0000000 reserved in E820
PCI: Using MMCONFIG at e0000000 - efffffff
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
ACPI Warning for \_SB_._OSC: Return type mismatch - found Integer, expected Buffer (20090903/nspredef-1006)
ACPI: Executed 2 blocks of module-level executable AML code
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI Warning: Incorrect checksum in table [OEMB] - B4, should be B1 (20090903/tbutils-314)
ACPI: No dock devices found.
HEST: Table parsing has been initialized.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A08:00: host bridge window [io 0x0000-0x0cf7]
pci_root PNP0A08:00: host bridge window [io 0x0d00-0xffff]
pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A08:00: host bridge window [mem 0x000d0000-0x000dffff]
pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff]
pci 0000:00:04.0: PME# supported from D0 D3hot D3cold
pci 0000:00:04.0: PME# disabled
pci 0000:00:0b.0: PME# supported from D0 D3hot D3cold
pci 0000:00:0b.0: PME# disabled
pci 0000:00:12.2: PME# supported from D0 D1 D2 D3hot
pci 0000:00:12.2: PME# disabled
pci 0000:00:13.2: PME# supported from D0 D1 D2 D3hot
pci 0000:00:13.2: PME# disabled
pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
pci 0000:03:00.0: PME# disabled
pci 0000:03:00.1: PME# supported from D0 D3hot D3cold
pci 0000:03:00.1: PME# disabled
pci 0000:00:04.0: PCI bridge to [bus 03-03]
pci 0000:00:0b.0: PCI bridge to [bus 02-02]
pci 0000:00:14.4: PCI bridge to [bus 01-01] (subtractive decode)
pci0000:00: Requesting ACPI _OSC control (0x1d)
Unable to assume _OSC PCIe control. Disabling ASPM
ACPI: PCI Interrupt Link [LNKA] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKD] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 4 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 4 7 10 11 12 14 15) *0, disabled.
vgaarb: device added: PCI:0000:01:04.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
vgaarb: bridge control possible 0000:01:04.0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 comparators, 32-bit 14.318180 MHz counter
Switching to clocksource hpet
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 15 devices
ACPI: ACPI bus type pnp unregistered
system 00:07: [io 0x0a10-0x0a1f] has been reserved
system 00:09: [mem 0xfec00000-0xfec00fff] could not be reserved
system 00:09: [mem 0xfee00000-0xfee00fff] has been reserved
system 00:0a: [io 0x0ca2-0x0ca3] has been reserved
system 00:0a: [io 0x0550-0x0551] has been reserved
system 00:0a: [io 0x04d0-0x04d1] has been reserved
system 00:0a: [io 0x040b] has been reserved
system 00:0a: [io 0x04d6] has been reserved
system 00:0a: [io 0x0c00-0x0c01] has been reserved
system 00:0a: [io 0x0c14] has been reserved
system 00:0a: [io 0x0c50-0x0c51] has been reserved
system 00:0a: [io 0x0c52] has been reserved
system 00:0a: [io 0x0c6c] has been reserved
system 00:0a: [io 0x0c6f] has been reserved
system 00:0a: [io 0x0cd0-0x0cd1] has been reserved
system 00:0a: [io 0x0cd2-0x0cd3] has been reserved
system 00:0a: [io 0x0cd4-0x0cd5] has been reserved
system 00:0a: [io 0x0cd6-0x0cd7] has been reserved
system 00:0a: [io 0x0cd8-0x0cdf] has been reserved
system 00:0a: [io 0x0800-0x089f] has been reserved
system 00:0a: [io 0x0b00-0x0b0f] has been reserved
system 00:0a: [io 0x0b20-0x0b3f] has been reserved
system 00:0a: [io 0x0900-0x090f] has been reserved
system 00:0a: [io 0x0910-0x091f] has been reserved
system 00:0a: [io 0xfe00-0xfefe] has been reserved
system 00:0a: [mem 0xffb80000-0xffbfffff] has been reserved
system 00:0a: [mem 0xfec10000-0xfec1001f] has been reserved
system 00:0d: [mem 0xe0000000-0xefffffff] has been reserved
system 00:0e: [mem 0x00000000-0x0009ffff] could not be reserved
system 00:0e: [mem 0x000c0000-0x000cffff] could not be reserved
system 00:0e: [mem 0x000e0000-0x000fffff] could not be reserved
system 00:0e: [mem 0x00100000-0xdfffffff] could not be reserved
system 00:0e: [mem 0xfec00000-0xffffffff] could not be reserved
pci 0000:03:00.0: BAR 7: can't assign mem (size 0x20000)
pci 0000:03:00.0: BAR 10: can't assign mem (size 0x20000)
pci 0000:03:00.1: BAR 7: can't assign mem (size 0x20000)
pci 0000:03:00.1: BAR 10: can't assign mem (size 0x20000)
pci 0000:00:04.0: PCI bridge to [bus 03-03]
pci 0000:00:04.0: PCI bridge to [bus 03-03]
pci 0000:00:04.0: bridge window [io 0xe000-0xefff]
pci 0000:00:04.0: bridge window [mem 0xfeb00000-0xfebfffff]
pci 0000:00:04.0: bridge window [mem pref disabled]
pci 0000:00:0b.0: PCI bridge to [bus 02-02]
pci 0000:00:0b.0: PCI bridge to [bus 02-02]
pci 0000:00:0b.0: bridge window [io disabled]
pci 0000:00:0b.0: bridge window [mem 0xfea00000-0xfeafffff]
pci 0000:00:0b.0: bridge window [mem 0xfc800000-0xfcffffff 64bit pref]
pci 0000:00:14.4: PCI bridge to [bus 01-01]
pci 0000:00:14.4: PCI bridge to [bus 01-01]
pci 0000:00:14.4: bridge window [io disabled]
pci 0000:00:14.4: bridge window [mem 0xfdf00000-0xfe7fffff]
pci 0000:00:14.4: bridge window [mem 0xfb000000-0xfbffffff pref]
pci 0000:00:04.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 1, 8192 bytes)
TCP established hash table entries: 4096 (order: 4, 65536 bytes)
TCP bind hash table entries: 4096 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
NET: Registered protocol family 1
pci 0000:00:12.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:12.0: PCI INT A disabled
pci 0000:00:12.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:12.1: PCI INT A disabled
pci 0000:00:12.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
pci 0000:00:12.2: PCI INT B disabled
pci 0000:00:13.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pci 0000:00:13.0: PCI INT A disabled
pci 0000:00:13.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pci 0000:00:13.1: PCI INT A disabled
pci 0000:00:13.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
pci 0000:00:13.2: PCI INT B disabled
pci 0000:00:14.5: PCI INT C -> GSI 18 (level, low) -> IRQ 18
pci 0000:00:14.5: PCI INT C disabled
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 4705k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1351695845.993:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 218
alg: No test for stdrng (krng)
ksign: Installing public key data
Loading keyring
- Added public key 634A076ED50DEA42
- User ID: Red Hat, Inc. (Kernel Module GPG key)
- Added public key D4A26C9CCD09BEDA
- User ID: Red Hat Enterprise Linux Driver Update Program <secalert@redhat.com>
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered (default)
io scheduler cfq registered
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
ACPI: Power Button [PWRB]
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
ACPI: Power Button [PWRF]
BIOS reported wrong ACPI id for the processor
APEI: Can not request iomem region <00000000dfec60ea-00000000dfec60ec> for GARs.
[Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled.
GHES: APEI firmware first mode is enabled by WHEA _OSC.
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
crash memory driver: version 1.1
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
�serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:0b: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:0c: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2
Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd 0000:00:12.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
ehci_hcd 0000:00:12.2: EHCI Host Controller
ehci_hcd 0000:00:12.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:12.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
ehci_hcd 0000:00:12.2: debug port 1
ehci_hcd 0000:00:12.2: irq 17, io mem 0xfe9fa800
ehci_hcd 0000:00:12.2: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.32-279.5.1.el6_lustre.x86_64 ehci_hcd
usb usb1: SerialNumber: 0000:00:12.2
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
ehci_hcd 0000:00:13.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
ehci_hcd 0000:00:13.2: EHCI Host Controller
ehci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:13.2: applying AMD SB700/SB800/Hudson-2/3 EHCI dummy qh workaround
ehci_hcd 0000:00:13.2: debug port 1
ehci_hcd 0000:00:13.2: irq 19, io mem 0xfe9fac00
ehci_hcd 0000:00:13.2: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.32-279.5.1.el6_lustre.x86_64 ehci_hcd
usb usb2: SerialNumber: 0000:00:13.2
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:00:12.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ohci_hcd 0000:00:12.0: OHCI Host Controller
ohci_hcd 0000:00:12.0: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:12.0: irq 16, io mem 0xfe9f6000
|
| Comment by Keith Mannthey (Inactive) [ 31/Oct/12 ] |
|
This is with Master? What sort of a configuration are you in? You are getting zero info on the serial line during the reboot or just have not caught it in logs yet? |
| Comment by Richard Henwood (Inactive) [ 31/Oct/12 ] |
|
This was with Master. I'm now running on on 2.3 without similar issue. I haven't caught the logs yet. |
| Comment by Keith Mannthey (Inactive) [ 31/Oct/12 ] |
|
Also a hard crash might be a bit more than a minor bug. |
| Comment by Jodi Levi (Inactive) [ 06/Nov/12 ] |
|
Can you please have a look at this one? |
| Comment by Richard Henwood (Inactive) [ 06/Nov/12 ] |
|
I've finally got around to capturing the dmesg: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffffa028e93e>] seq_server_alloc_meta+0x51e/0x700 [fid] PGD 74532067 PUD 5d11a067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/possible CPU 1 Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mds(U) mgs(U) lquota(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) exportfs jbd sha512_generic sha256_generic floppy iptable_filter ip_tables netconsole configfs ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon virtio_console virtio_net snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core sg ext4 mbcache jbd2 virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] Pid: 18381, comm: lctl Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa028e93e>] [<ffffffffa028e93e>] seq_server_alloc_meta+0x51e/0x700 [fid] RSP: 0018:ffff8800703399f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000002000007e8 RCX: 0000000200000bd0 RDX: 00000000000003e8 RSI: ffff88005e1be7c0 RDI: ffff8800754ccd10 RBP: ffff880070339a38 R08: ffff880070339d40 R09: ffff880078040318 R10: ffff880070242378 R11: ffff8800702b0018 R12: ffff88007a4901e8 R13: ffff88005e1be830 R14: ffff88005e1be7c0 R15: ffff8800754ccd10 FS: 00007f88a448c700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000007027f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process lctl (pid: 18381, threadinfo ffff880070338000, task ffff880076520040) Stack: ffff880070339a28 ffffffffa0df11c4 ffff880070339a38 ffff88007a4901c0 <d> ffff880070339d40 ffff8800754ccd10 ffff88007a4901c8 ffff880070339a88 <d> ffff880070339a78 ffffffffa029180b ffff880070339a78 ffffffffa0290683 Call Trace: [<ffffffffa0df11c4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] [<ffffffffa029180b>] seq_client_alloc_seq+0x1ab/0x470 [fid] [<ffffffffa0290683>] ? seq_fid_alloc_prep+0x43/0xc0 [fid] [<ffffffffa0291b3d>] seq_client_get_seq+0x6d/0x1e0 [fid] [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffffa050d405>] ? cl_env_get+0x195/0x370 [obdclass] [<ffffffffa06a07d9>] echo_client_iocontrol+0x1c79/0x39d0 [obdecho] [<ffffffff813200ca>] ? misc_open+0x1ca/0x320 [<ffffffffa0ddcbe0>] ? cfs_alloc+0x30/0x60 [libcfs] [<ffffffffa04b576f>] class_handle_ioctl+0x137f/0x1f50 [obdclass] [<ffffffff81178d24>] ? nameidata_to_filp+0x54/0x70 [<ffffffffa049f2ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8118dff2>] vfs_ioctl+0x22/0xa0 [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff8118e194>] do_vfs_ioctl+0x84/0x580 [<ffffffff8118e711>] sys_ioctl+0x81/0xa0 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Code: d1 fc ff ff 66 0f 1f 84 00 00 00 00 00 49 8b 86 f8 00 00 00 49 8b 96 e8 00 00 00 4c 89 f6 49 8b 5e 30 49 8b 0e 4c 89 ff 48 8b 00 <48> 8b 40 10 48 8b 40 28 48 63 80 40 01 00 00 49 89 5e 18 49 0f RIP [<ffffffffa028e93e>] seq_server_alloc_meta+0x51e/0x700 [fid] RSP <ffff8800703399f8> CR2: 0000000000000010 ---[ end trace 331a78dbb891d2b6 ]--- Kernel panic - not syncing: Fatal exception Pid: 18381, comm: lctl Tainted: G D --------------- 2.6.32-279.5.1.el6_lustre.x86_64 #1 Call Trace: [<ffffffff814fd58a>] ? panic+0xa0/0x168 [<ffffffff81501724>] ? oops_end+0xe4/0x100 [<ffffffff81043bab>] ? no_context+0xfb/0x260 [<ffffffff81043e35>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff81043f5e>] ? bad_area+0x4e/0x60 [<ffffffff81044710>] ? __do_page_fault+0x3d0/0x480 [<ffffffff815036de>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500a95>] ? page_fault+0x25/0x30 [<ffffffffa028e93e>] ? seq_server_alloc_meta+0x51e/0x700 [fid] [<ffffffffa028e461>] ? seq_server_alloc_meta+0x41/0x700 [fid] [<ffffffffa0df11c4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] [<ffffffffa029180b>] ? seq_client_alloc_seq+0x1ab/0x470 [fid] [<ffffffffa0290683>] ? seq_fid_alloc_prep+0x43/0xc0 [fid] [<ffffffffa0291b3d>] ? seq_client_get_seq+0x6d/0x1e0 [fid] [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffffa050d405>] ? cl_env_get+0x195/0x370 [obdclass] [<ffffffffa06a07d9>] ? echo_client_iocontrol+0x1c79/0x39d0 [obdecho] [<ffffffff813200ca>] ? misc_open+0x1ca/0x320 [<ffffffffa0ddcbe0>] ? cfs_alloc+0x30/0x60 [libcfs] [<ffffffffa04b576f>] ? class_handle_ioctl+0x137f/0x1f50 [obdclass] [<ffffffff81178d24>] ? nameidata_to_filp+0x54/0x70 [<ffffffffa049f2ab>] ? obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8118dff2>] ? vfs_ioctl+0x22/0xa0 [<ffffffff81039678>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff8118e194>] ? do_vfs_ioctl+0x84/0x580 [<ffffffff8118e711>] ? sys_ioctl+0x81/0xa0 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b |
| Comment by Andreas Dilger [ 08/Nov/18 ] |
|
Close old bug. |