When page alloc debugging is not enabled, we essentially accept any
virtual address for linear kernel TLB misses. But with kgdb, kernel
address probing, and other facilities we can try to access arbitrary
crap.
So, make sure the address we miss on will translate to physical memory
that actually exists.
In order to make this work we have to embed the valid address bitmap
into the kernel image. And in order to make that less expensive we
make an adjustment, in that the max physical memory address is
decreased to "1 << 41", even on the chips that support a 42-bit
physical address space. We can do this because bit 41 indicates
"I/O space" and thus covers non-memory ranges.
The result of this is that:
1) kpte_linear_bitmap shrinks from 2K to 1K in size
2) we need 64K more for the valid address bitmap
We can't let the valid address bitmap be dynamically allocated
once we start using it to validate TLB misses, otherwise we have
crazy issues to deal with wrt. recursive TLB misses and such.
If we're in a TLB miss it could be the deepest trap level that's legal
inside of the cpu. So if we TLB miss referencing the bitmap, the cpu
will be out of trap levels and enter RED state.
To guard against out-of-range accesses to the bitmap, we have to check
to make sure no bits in the physical address above bit 40 are set. We
could export and use last_valid_pfn for this check, but that's just an
unnecessary extra memory reference.
On the plus side of all this, since we load all of these translations
into the special 4MB mapping TSB, and we check the TSB first for TLB
misses, there should be absolutely no real cost for these new checks
in the TLB miss path.
Reported-by: heyongli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, srmmu uses different trap table register values to allow
determination of the cpu we're on. All of the trap tables have
identical content, they just sit at different offsets from the first
trap table, and the offset shifted down and masked out determines
the cpu we are on.
The code tries to free them up when they aren't actually used
(don't have all 4 cpus, we're on sun4d, etc.) but that causes
problems.
For one thing it triggers false positives in the DMA debugging
code. And fixing that up while preserving this relative offset
thing isn't trivial.
So just kill the freeing code, it costs us at most 3 pages, big
deal...
Signed-off-by: David S. Miller <davem@davemloft.net>
I think arch/sparc/kernel/sys32.S has an incorrect splice definition:
SIGN2(sys32_splice, sys_splice, %o0, %o1)
The splice() prototype looks like :
long splice(int fd_in, loff_t *off_in, int fd_out,
loff_t *off_out, size_t len, unsigned int flags);
So I think we should have :
SIGN2(sys32_splice, sys_splice, %o0, %o2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu code has been assuming num_possible_cpus() == nr_cpu_ids which
is incorrect if cpu_possible_map contains holes. This causes percpu
code to access beyond allocated memories and vmalloc areas. On a
sparc64 machine with cpus 0 and 2 (u60), this triggers the following
warning or fails boot.
WARNING: at /devel/tj/os/work/mm/vmalloc.c:106 vmap_page_range_noflush+0x1f0/0x240()
Modules linked in:
Call Trace:
[00000000004b17d0] vmap_page_range_noflush+0x1f0/0x240
[00000000004b1840] map_vm_area+0x20/0x60
[00000000004b1950] __vmalloc_area_node+0xd0/0x160
[0000000000593434] deflate_init+0x14/0xe0
[0000000000583b94] __crypto_alloc_tfm+0xd4/0x1e0
[00000000005844f0] crypto_alloc_base+0x50/0xa0
[000000000058b898] alg_test_comp+0x18/0x80
[000000000058dad4] alg_test+0x54/0x180
[000000000058af00] cryptomgr_test+0x40/0x60
[0000000000473098] kthread+0x58/0x80
[000000000042b590] kernel_thread+0x30/0x60
[0000000000472fd0] kthreadd+0xf0/0x160
---[ end trace 429b268a213317ba ]---
This patch fixes generic percpu functions and sparc64
setup_per_cpu_areas() so that they handle sparse cpu_possible_map
properly.
Please note that on x86, cpu_possible_map() doesn't contain holes and
thus num_possible_cpus() == nr_cpu_ids and this patch doesn't cause
any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
The first thing sys_truncate() and sys_ftruncate() do is sign extend
the unsigned length arg to a signed type.
Thanks to Benjamin Herrenschmidt for the tip.
Signed-off-by: David S. Miller <davem@davemloft.net>
The name size limit is gone from the driver-core, the BUS_ID_SIZE
value will be removed.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page allocator and SLAB are available at this point now,
and if we still try to use bootmem allocations here the kernel
spits out warnings.
Signed-off-by: David S. Miller <davem@davemloft.net>
This function was only used by pci_claim_resource(), and the last commit
deleted that use.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:
init_mm is already unexported on x86.
One strange place is some OMAP driver (drivers/video/omap/) which
won't build modular, but it's already wants get_vm_area() export.
Somebody should look there.
[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:
#define CPU_MASK_ALL (cpumask_t) { { ... } }
Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:
#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)
Which formalizes this practice. One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).
[Description by Rusty Russell]
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves code common to of_device_32.c and of_device_64.c into
of_device_common.h and of_device_common.c.
The only functional difference is in sparc32 where of_bus_default_map is
used in place of of_bus_sbus_map because they are equivelent.
There is still room for further code consolidation with some minor
refactoring.
Boot tested on sparc32 and compile tested on sparc64.
Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This modifies SPARC32 to use struct dma_map ops. It means that we can
remove dma-mapping_{32|64}.h.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts dma_map_single and dma_unmap_single to use
map_page and unmap_page respectively and removes unnecessary
map_single and unmap_single. map_page can be used to implement
map_single but the opposite is impossible. Having only dma_map_page in
struct dma_ops is enough.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
irq_choose_cpu() should compare the affinity mask against cpu_online_map
rather than CPU_MASK_ALL, since irq_select_affinity() sets the interrupt's
affinity mask to cpu_online_map "and" CPU_MASK_ALL (which ends up being
just cpu_online_map). The mask comparison in irq_choose_cpu() will always
fail since the two masks are not the same. So the CPU chosen is the first CPU
in the intersection of cpu_online_map and CPU_MASK_ALL, which is always CPU0.
That means all interrupts are reassigned to CPU0...
Distributing interrupts to CPUs in a linearly increasing round robin fashion
is not optimal for the UltraSPARC T1/T2. Also, the irq_rover in
irq_choose_cpu() causes an interrupt to be assigned to a different
processor each time the interrupt is allocated and released. This may lead
to an unbalanced distribution over time.
A static mapping of interrupts to processors is done to optimize and balance
interrupt distribution. For the T1/T2, interrupts are spread to different
cores first, and then to strands within a core.
The following is some benchmarks showing the effects of interrupt
distribution on a T2. The test was done with iperf using a pair of T5220
boxes, each with a 10GBe NIU (XAUI) connected back to back.
TCP | Stock Linear RR IRQ Optimized IRQ
Streams | 2.6.30-rc5 Distribution Distribution
| GBits/sec GBits/sec GBits/sec
--------+-----------------------------------------
1 0.839 0.862 0.868
8 1.16 4.96 5.88
16 1.15 6.40 8.04
100 1.09 7.28 8.68
Signed-off-by: Hong H. Pham <hong.pham@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets us real close to the generic implementation of
setup_per_cpu_areas() except:
1) We store the per-cpu offset into the trap_block[], whereas
the generic code has it's own static array.
2) We have to initialize the %g5 register to hold the boot cpu's
per-cpu area offset.
3) The OBP/MDESC cpu info scan is performed at the end.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we defer the cpu_data() initializations to the end of per-cpu
setup, we can get rid of this local hack we had to setup the per-cpu
areas eary.
This is a necessary step in order to support HAVE_DYNAMIC_PER_CPU_AREA
since the per-cpu setup must run when page structs are available.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to split up the cpu present mask setup from the cpu_data
initialization, and this is a first step towards that.
Signed-off-by: David S. Miller <davem@davemloft.net>
This really isn't necessary at all, a local variable suits the
job just fine.
This frees up 8 bytes in the trap_block[] that we can use later
to store the per-cpu base addresses.
Signed-off-by: David S. Miller <davem@davemloft.net>
Everyone cut and paste this comment from my original one. We now do
it generically, so cut the comments.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Amerigo Wang <amwang@redhat.com>
Merge reason: both topics modify the APIC code but were able to do it in
parallel so far. An upcoming patch generates a conflict so
merge them to avoid the conflict.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
according to Ingo, change set_affinity() in irq_chip should return int,
because that way we can handle failure cases in a much cleaner way, in
the genirq layer.
v2: fix two typos
[ Impact: extend API ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <49F654E9.4070809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The section .text.init.refok is deprecated and __REF (.ref.text)
should be used in assembly files instead. This patch cleans up a few
uses of .text.init.refok in the sparc architecture.
Also fix a reference to .text.init in a comment that wasn't updated to
.init.text.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Fix bus type probing for ESP and LE devices.
sparc32: Update defconfig.
sparc64: Update defconfig.
If there is a dummy "espdma" or "ledma" parent device above ESP scsi
or LE ethernet device nodes, we have to match the bus as SBUS.
Otherwise the address and size cell counts are wrong and we don't
calculate the final physical device resource values correctly at all.
Commit 5280267c1d ("sparc: Fix handling
of LANCE and ESP parent nodes in of_device.c") was meant to fix this
problem, but that only influences the inner loop of
build_device_resources(). We need this logic to also kick in at the
beginning of build_device_resources() as well, when we make the first
attempt to determine the device's immediate parent bus type for 'reg'
property element extraction.
Based almost entirely upon a patch by Friedrich Oslage.
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass clocksource pointer to the read() callback for clocksources. This
allows us to share the callback between multiple instances.
[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove some pointless conditionals before kfree().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interrupts must be disabled when taking the IPI lock.
Caught by lockdep.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
trivial: Update my email address
trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
trivial: Fix misspelling of "Celsius".
trivial: remove unused variable 'path' in alloc_file()
trivial: fix a pdlfush -> pdflush typo in comment
trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
trivial: wusb: Storage class should be before const qualifier
trivial: drivers/char/bsr.c: Storage class should be before const qualifier
trivial: h8300: Storage class should be before const qualifier
trivial: fix where cgroup documentation is not correctly referred to
trivial: Give the right path in Documentation example
trivial: MTD: remove EOL from MODULE_DESCRIPTION
trivial: Fix typo in bio_split()'s documentation
trivial: PWM: fix of #endif comment
trivial: fix typos/grammar errors in Kconfig texts
trivial: Fix misspelling of firmware
trivial: cgroups: documentation typo and spelling corrections
trivial: Update contact info for Jochen Hein
trivial: fix typo "resgister" -> "register"
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Fix reset hangs on Niagara systems.
cpumask: use mm_cpumask() wrapper: sparc
cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL.: sparc
cpumask: remove the now-obsoleted pcibus_to_cpumask(): sparc
cpumask: remove cpu_coregroup_map: sparc
cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: sparc
cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: sparc64
cpumask: Use accessors code.: sparc64
cpumask: Use accessors code: sparc
cpumask: arch_send_call_function_ipi_mask: sparc
cpumask: Use smp_call_function_many(): sparc64