While being at it make every occurrence of 'do_irq_select_affinity'
have the same signature in terms of signedness of the first argument.
Fix this sparse warning:
kernel/irq/manage.c:112:5: warning: symbol 'do_irq_select_affinity' was not declared. Should it be static?
Also rename do_irq_select_affinity() to setup_affinity() - shorter name
and clearer naming.
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Simplify and make init_kstat_irqs etc more type proof, suggested by
Andrew.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: get correct kstat_irqs [/proc/interrupts] for msi/msi-x etc
need to call clear_kstat_irqs(), so when we reuse that irq_desc,
we get correct kstat in /proc/interrupts.
This makes /proc/interrupts not have <NULL> entries.
Don't need to worry about arch that doesn't support genirq, because they
will not call dynamic_irq_cleanup().
v2: simplify and make clear_kstat_irqs more robust
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Paris reported:
> I have an hp dl785g5 which is unable to successfully run
> 2.6.29-0.66.rc3.fc11.x86_64 or 2.6.29-rc2-next-20090126. During bootup
> (early in userspace daemons starting) I get the below BUG, which quickly
> renders the machine dead. I assume it is because sparse_irq_lock never
> gets released when the BUG kills that task.
Adjust lock sequence when migrating a descriptor with
CONFIG_NUMA_MIGRATE_IRQ_DESC enabled.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the initialization of irq_default_affinity to early_irq_init as
core_initcall is too late.
irq_default_affinity can be used in init_IRQ and potentially timer and
SMP init as well. All of these happen before core_initcall. Moving
the initialization to early_irq_init ensures that it is initialized
before it is used.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Mike Travis <travis@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In interrupt.h these functions are declared only if
CONFIG_GENERIC_HARDIRQS is set. We should define them under identical
conditions.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a shared interrupt debug facility under CONFIG_DEBUG_SHIRQ:
it uses the existing irqpoll facilities to iterate through all
registered interrupt handlers and call those which can handle shared
IRQ lines.
This can be handy for suspend/resume debugging: if we call this function
early during resume we can trigger crashes in those drivers which have
incorrect assumptions about when exactly their ISRs will be called
during suspend/resume.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar wrote:
> All non-x86 architectures fail to build:
>
> In file included from /home/mingo/tip/include/linux/random.h:11,
> from /home/mingo/tip/include/linux/stackprotector.h:6,
> from /home/mingo/tip/init/main.c:17:
> /home/mingo/tip/include/linux/irqnr.h:26:63: error: asm/irq_vectors.h: No such file or directory
Do not include asm/irq_vectors.h in generic code - it's not available
on all architectures.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce memory usage.
Allocate kstat_irqs_legacy based on nr_cpu_ids to deal with this
memory usage bump when NR_CPUS bumped from 128 to 4096:
8192 +253952 262144 +3100% kstat_irqs_legacy(.bss)
This is only when CONFIG_SPARSE_IRQS=y.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: Reduce memory usage.
This is the second half of the changes to make the irq_desc_ptrs be
variable sized based on nr_cpu_ids. This is done by adding a new
"max_nr_irqs" macro to irq_vectors.h (and a dummy in irqnr.h) to
return a max NR_IRQS value based on NR_CPUS or nr_cpu_ids.
This necessitated moving the define of MAX_IO_APICS to a separate
file (asm/apicnum.h) so it could be included without the baggage
of the other asm/apicdef.h declarations.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: allocate irq_desc_ptrs in preparation for making it variable-sized.
This addresses this memory usage bump when NR_CPUS bumped from 128 to 4096:
34816 +229376 264192 +658% irq_desc_ptrs(.data.read_mostly)
The patch is split into two parts, the first simply allocates the
irq_desc_ptrs array. Then next will deal with making it variable.
This is only when CONFIG_SPARSE_IRQS=y.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: cleanup WARN msg.
Ingo requested:
> While at it, could you please also convert this to a WARN() construct
> instead? (in a separate commit)
... and it shall be done. ;-)
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: preparation, cleanup, add KERN_INFO printk
Modify references from NR_IRQS to nr_irqs as the later will become
variable-sized based on nr_cpu_ids when CONFIG_SPARSE_IRQS=y.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: fix bug where new irq_desc uses old cpumask pointers which are freed.
As Yinghai pointed out, init_copy_one_irq_desc() copies the old desc to
the new desc overwriting the cpumask pointers. Since the old_desc and
the cpumask pointers are freed, then memory corruption will occur if
these old pointers are used.
Move the allocation of these pointers to after the copy.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Impact: reduce memory usage, use new cpumask API.
Replace the affinity and pending_masks with cpumask_var_t's. This adds
to the significant size reduction done with the SPARSE_IRQS changes.
The added functions (init_alloc_desc_masks & init_copy_desc_masks) are
in the include file so they can be inlined (and optimized out for the
!CONFIG_CPUMASKS_OFFSTACK case.) [Naming chosen to be consistent with
the other init*irq functions, as well as the backwards arg declaration
of "from, to" instead of the more common "to, from" standard.]
Includes a slight change to the declaration of struct irq_desc to embed
the pending_mask within ifdef(CONFIG_SMP) to be consistent with other
references, and some small changes to Xen.
Tested: sparse/non-sparse/cpumask_offstack/non-cpumask_offstack/nonuma/nosmp on x86_64
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: virtualization@lists.osdl.org
Cc: xen-devel@lists.xensource.com
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Impact: clean up sparseirq fallout on random.c
Ingo suggested to change some ifdef from SPARSE_IRQ to GENERIC_HARDIRQS
so we could some #ifdef later if all arch support genirq
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Right now, most of the kernel boot is strictly synchronous, such that
various hardware delays are done sequentially.
In order to make the kernel boot faster, this patch introduces
infrastructure to allow doing some of the initialization steps
asynchronously, which will hide significant portions of the hardware delays
in practice.
In order to not change device order and other similar observables, this
patch does NOT do full parallel initialization.
Rather, it operates more in the way an out of order CPU does; the work may
be done out of order and asynchronous, but the observable effects
(instruction retiring for the CPU) are still done in the original sequence.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
Impact: build fix on ia64
ia64's default_affinity_write() still had old cpumask_t usage:
/home/mingo/tip/kernel/irq/proc.c: In function `default_affinity_write':
/home/mingo/tip/kernel/irq/proc.c:114: error: incompatible type for argument 1 of `is_affinity_mask_valid'
make[3]: *** [kernel/irq/proc.o] Error 1
make[3]: *** Waiting for unfinished jobs....
update it to cpumask_var_t.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
Impact: Reduce stack usage, use new cpumask API. ALPHA mod!
Main change is that irq_default_affinity becomes a cpumask_var_t, so
treat it as a pointer (this effects alpha).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sparseirq: move __weak symbols into separate compilation unit
sparseirq: work around __weak alias bug
sparseirq: fix hang with !SPARSE_IRQ
sparseirq: set lock_class for legacy irq when sparse_irq is selected
sparseirq: work around compiler optimizing away __weak functions
sparseirq: fix desc->lock init
sparseirq: do not printk when migrating IRQ descriptors
sparseirq: remove duplicated arch_early_irq_init()
irq: simplify for_each_irq_desc() usage
proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
irq: for_each_irq_desc() move to irqnr.h
hrtimer: remove #include <linux/irq.h>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, sparseirq: clean up Kconfig entry
x86: turn CONFIG_SPARSE_IRQ off by default
sparseirq: fix numa_migrate_irq_desc dependency and comments
sparseirq: add kernel-doc notation for new member in irq_desc, -v2
locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
sparseirq, xen: make sure irq_desc is allocated for interrupts
sparseirq: fix !SMP building, #2
x86, sparseirq: move irq_desc according to smp_affinity, v7
proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
sparse irqs: add irqnr.h to the user headers list
sparse irqs: handle !GENIRQ platforms
sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
sparseirq: fix Alpha build failure
sparseirq: fix typo in !CONFIG_IO_APIC case
x86, MSI: pass irq_cfg and irq_desc
x86: MSI start irq numbering from nr_irqs_gsi
x86: use NR_IRQS_LEGACY
sparse irq_desc[] array: core kernel and x86 changes
genirq: record IRQ_LEVEL in irq_desc[]
irq.h: remove padding from irq_desc on 64bits
Impact: fix theoretical NULL dereference
The generic irq layer doesn't know whether irq_chip has ack routine on some
architectures or not. Upon that, before calling chip->ack, we should check
that it's not NULL.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.
So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot crash if the kernel is built with certain GCC versions
GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.
This can lead to the boot crash reported by Kamalesh Babulal:
ACPI: Core revision 20080926
Setting APIC routing to flat
BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
IP: [<ffffffff8021f9a8>] add_pin_to_irq_cpu+0x14/0x74
PGD 0
Oops: 0000 [#1] SMP
[...]
So move the arch_init_chip_data() function from handle.c to manage.c.
Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix hang
Suresh report his two sockets system only works with SPARSE_IRQ enable
it turns out we miss the setting desc->irq
so provide early_irq_init() even !SPARSE_IRQ to set desc->irq
Reported-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add lockdep annotation to legacy IRQ descs
Warnings resulting out of this were not seen in practice, but it's prudent
to initialize the legacy descriptors to the lock class as well, symmetric
to how we do it with other descriptors.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix panic on null pointer with sparseirq
Some GCC versions seem to inline the weak global function,
when that function is empty.
Work it around, by making the functions return a (dummy) integer.
Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
init_one_irq_desc() does not initialize the desc->lock properly -
you cannot init a lock by memcpying some other lock on it.
This happens to work right now (because irq_desc_init is never in use),
but it's a dangerous construct nevertheless, so fix it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce printk noise
There were a couple of leftover KERN_DEBUG debugging printks, remove
them. Also clarify an error message.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
We already have a weak copy of this function in init/main.c
Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
all for_each_irq_desc() usage point have !desc check.
then its check can move into for_each_irq_desc() macro.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
before CONFIG_SPARSE_IRQ age, for_each_irq_desc() sat in irqnr.h and
could be called from generic code.
CONFIG_SPARSE_IRQ breaks this assumption, but SPARSE_IRQ version
for_each_irq_desc() also can move into irqnr.h easily.
Also, this patch unifies CONFIG_SPARSE_IRQ and !CONFIG_SPARSE_IRQ
for_each_irq_desc().
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce kconfig variable scope and clean up
Bartlomiej pointed out that the config dependencies and comments are not right.
update it depend to NUMA, and fix some comments
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: simplify code
commit "08678b0: generic: sparse irqs: use irq_desc() [...]" introduced
the irq_desc_lock_class variable.
But it is used only if CONFIG_SPARSE_IRQ=Y or CONFIG_TRACE_IRQFLAGS=Y.
Otherwise, following warnings happen:
CC kernel/irq/handle.o
kernel/irq/handle.c:26: warning: 'irq_desc_lock_class' defined but not used
Actually, current early_init_irq_lock_class has a bit strange and messy ifdef.
In addition, it is not valueable.
1. this function is protected by !CONFIG_SPARSE_IRQ, but that is not necessary.
if CONFIG_SPARSE_IRQ=Y, desc of all irq number are initialized by NULL
at first - then this function calling is safe.
2. this function protected by CONFIG_TRACE_IRQFLAGS too. but it is not
necessary either, because lockdep_set_class() doesn't have bad side
effect even if CONFIG_TRACE_IRQFLAGS=n.
This patch bloat kernel size a bit on CONFIG_TRACE_IRQFLAGS=n and
CONFIG_SPARSE_IRQ=Y - but that's ok. early_init_irq_lock_class() is not
a fastpatch at all.
To avoid messy ifdefs is more important than a few bytes diet.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes
if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:
- make irq_desc to go with affinity aka irq_desc moving etc
- call move_irq_desc in irq_complete_move()
- legacy irq_desc is not moved, because they are allocated via static array
for logical apic mode, need to add move_desc_in_progress_in_same_domain,
otherwise it will not be moved ==> also could need two phases to get
irq_desc moved.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
Impact: new feature
Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.
To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.
When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).
This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: emit new warning
We periodically waste time tracking down problems from the genirq
framework not respecting IRQF_DISABLED for some shared IRQ cases. Linus
views this as "will not fix", but we're still left with the bugs caused by
this misbehavior.
This patch adds a nag message in request_irq(), so that drivers can fix
their IRQ handlers to avoid this problem.
Note that developers will never see the relevant bugs when they run with
LOCKDEP, so it's no wonder these bugs are hard to find. (That also means
LOCKDEP is overlooking some IRQ-related bugs involving IRQ handlers that
don't set IRQF_DISABLED...)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix __irq_set_trigger() for IRQ_LEVEL
When recording the irq trigger type, let's also make sure
that IRQ_LEVEL gets set correctly.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 0c5d1eb77a (genirq: record trigger
type) caused powerpc platforms that had no set_type() function in their
struct irq_chip to spew out warnings about "No set_type function for
IRQ...". This warning isn't necessarily justified though because the
generic powerpc platform code calls set_irq_type() (which in turn calls
__irq_set_trigger) with information from the device tree to establish
the interrupt mappings, regardless of whether the PIC can actually set
a type.
A platform's irq_chip might not have a set_type function for a variety
of reasons, for example: the platform may have the type essentially
hard-coded, or as in the case for Cell interrupts are just messages
past around that have no real concept of type, or the platform
could even have a virtual PIC as on the PS3.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The affinity setting in setup irq is called before the NO_BALANCING
flag is checked and might therefore override affinity settings from the
calling code with the default setting.
Move the NO_BALANCING flag check before the call to the affinity
setting.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preserve user-modified affinities on interrupts
Kumar Galak noticed that commit
1840475676 (genirq: Expose default irq
affinity mask (take 3))
overrides an already set affinity setting across a free /
request_irq(). Happens e.g. with ifdown/ifup of a network device.
Change the logic to mark the affinities as set and keep them
intact. This also fixes the unlocked access to irq_desc in
irq_select_affinity() when called from irq_affinity_proc_write()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This variable is only used in the source file, so make it static.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the member 'name' of the irq_desc structure happens to point to a
character string that is resident within a kernel module, problems ensue
if that module is rmmod'd (at which time dynamic_irq_cleanup() is called)
and then later show_interrupts() is called by someone.
It is also not a good thing if the character string resided in kmalloc'd
space that has been kfree'd (after having called dynamic_irq_cleanup()).
dynamic_irq_cleanup() fails to NULL the 'name' member and
show_interrupts() references it on a few architectures (like h8300, sh and
x86).
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot hang on a G5
In set_irq_type() we want to pass the type rather than the current
interrupt state.
Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
probe_irq_off() is disfunctional as the local nr_irqs is referenced
instead of the global one for the for_each_irq_desc() iterator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This code is not ready, but we need to rip it out instead of rebasing
as we would lose the APIC/IO_APIC unification otherwise.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when SPARSE_IRQ is not used, should still use irq_desc->lock
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Noonan reported a boot hang when using irqpoll and
CONFIG_HAVE_SPARSE_IRQ=y.
The irqpoll loop needs to be updated to not iterate from 1 to nr_irqs
but to iterate via for_each_irq_desc(). (in the former case desc can
be NULL which crashes the box)
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the IRQ affinity in the process context when the IRQ is disabled.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extraneous call to irq_to_desc().
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha noticed that we should have a spinlock around it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This has been deprecated for years, the user space irqbalanced utility
works better with numa, has configurable policies, etc...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmai.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
change names:
irq_desc() ==> irq_desc_alloc
__irq_desc() ==> irq_desc
Also split a few of the uses in lowlevel x86 code.
v2: need to check if desc is null in smp_irq_move_cleanup
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we could remove some duplicated calling to irq_desc
v2: make sure irq_desc in init/main.c is not used without generic_hardirqs
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove irq limit checks - nr_irqs is dynamic and we expand anytime.
v2: fix checking about result irq_cfg_without_new, so could use msi again
v3: use irq_desc_without_new to check irq is valid
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are a handful of loops that go from 0 to nr_irqs and use
get_irq_desc() on them. These would allocate all the irq_desc
entries, regardless of the need for them.
Use the smarter for_each_irq_desc() iterator that will only iterate
over the present ones.
v2: make sure arch without GENERIC_HARDIRQS work too
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add an irq_desc accessor that will not allocate any sparse entry
but returns failure if there's no entry present.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
based on Eric's patch ...
together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.
v2: make sure system without generic_hardirqs works they don't have irq_desc
v3: fix merging
v4: [mingo@elte.hu] fix typo
[ mingo@elte.hu ] irq: build fix
fix:
arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.
Preallocate 32 irq_desc, and irq_desc() will try to get more.
( No change in functionality is expected anywhere, except the odd build
failure where we missed a code site or where a crossing commit itroduces
new irq_desc[] usage. )
v2: according to Eric, change get_irq_desc() to irq_desc()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
at this point nr_irqs is equal NR_IRQS
convert a few easy users from NR_IRQS to dynamic nr_irqs.
v2: according to Eric, we need to take care of arch without generic_hardirqs
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Genirq hasn't previously recorded the trigger type used by any given IRQ,
although some irq_chip support has done so. That data can be useful when
troubleshooting. This patch records it in the relevant irq_desc.status
bits, and improves consistency between the two driver-visible calls
affected:
- Make set_irq_type() usage match request_irq() usage:
* IRQ_TYPE_NONE should be a NOP; succeed, so irq_chip methods
won't have to handle that case any more (many do it wrong).
* IRQ_TYPE_PROBE is ignored; any buggy out-of-tree callers
might need to switch over to the real IRQ probing code.
* emit the same diagnostics (from shared utility code)
- Their kerneldoc now reflects usage:
* request_irq() flags include IRQF_TRIGGER_* to specify
active edge(s)/level ... docs previously omitted that
* set_irq_type() is declared in <linux/irq.h> so callers
should use the (bit-equivalent) IRQ_TYPE_* symbols there
Also: adds a warning about shared IRQs that don't end up using the
requested trigger mode; and fix an unrelated "sparse" warning.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch clarifies usage of irq_chip->startup() callback:
1. The "if (startup) startup(); else enabled();" code in setup_irq()
is unnecessary, as startup() falls back to enabled() via
default callbacks, set by irq_chip_set_defaults().
2. When using set_irq_chained_handler() the startup() was never called,
which is not good at all... Fixed. And again - when startup() is not
defined the call will fall back to enable() than to unmask() via
default callbacks.
Signed-off-by: Pawel Moll <pawel.moll@st.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When DEBUG_SHIRQ is selected, a spurious IRQ is issued before
the setup_irq() initializes the desc->depth. An IRQ handler may
call disable_irq_nosync(), but then setup_irq() will overwrite
desc->depth, and upon enable_irq() we'll catch this WARN:
------------[ cut here ]------------
Badness at kernel/irq/manage.c:180
NIP: c0061ab8 LR: c0061f10 CTR: 00000000
REGS: cf83be50 TRAP: 0700 Not tainted (2.6.27-rc3-23450-g74919b0)
MSR: 00021032 <ME,IR,DR> CR: 22042022 XER: 20000000
TASK = cf829100[5] 'events/0' THREAD: cf83a000
GPR00: c0061f10 cf83bf00 cf829100 c038e674 00000016 00000000 cf83bef8 00000038
GPR08: c0298910 00000000 c0310d28 cf83a000 00000c9c 1001a1a8 0fffe000 00800000
GPR16: ffffffff 00000000 007fff00 00000000 007ffeb0 c03320a0 c031095c c0310924
GPR24: cf8292ec cf807190 cf83a000 00009032 c038e6a4 c038e674 cf99b1cc c038e674
NIP [c0061ab8] __enable_irq+0x20/0x80
LR [c0061f10] enable_irq+0x50/0x70
Call Trace:
[cf83bf00] [c038e674] irq_desc+0x630/0x9000 (unreliable)
[cf83bf10] [c0061f10] enable_irq+0x50/0x70
[cf83bf30] [c01abe94] phy_change+0x68/0x108
[cf83bf50] [c0046394] run_workqueue+0xc4/0x16c
[cf83bf90] [c0046834] worker_thread+0x74/0xd4
[cf83bfd0] [c004ab7c] kthread+0x48/0x84
[cf83bff0] [c00135e0] kernel_thread+0x44/0x60
Instruction dump:
4e800020 3d20c031 38a94214 4bffffcc 9421fff0 7c0802a6 93e1000c 7c7f1b78
90010014 8123001c 2f890000 409e001c <0fe00000> 80010014 83e1000c 38210010
That trace corresponds to this line:
WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
The patch fixes the problem by moving the SHIRQ code below the
setup_irq().
Unfortunately we can't easily move the SHIRQ code inside the
setup_irq(), since it grabs a spinlock, so to prvent a 'real'
IRQ from interfere us we should disable that IRQ.
p.s. The driver in question is drivers/net/phy/phy.c.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch /proc/irq/*/smp_affinity , /proc/irq/default_smp_affinity to
seq_files.
cat(1) reads with 1024 chunks by default, with high enough NR_CPUS, there
will be -EINVAL.
As side effect, there are now two less users of the ->read_proc interface.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While I'm glad to finally see the hole fixed whereby passing an invalid
IRQ trigger type to request_irq() would be ignored, the current diagnostic
isn't quite useful. Fixed by also listing the trigger type which was
rejected.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace a printk+WARN_ON() by a WARN(); this increases the chance of the
string making it into the bugreport (ie: it goes inside the
---[ cut here ]--- section)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace a printk+WARN_ON() by a WARN(); this increases the chance of the
string making it into the bugreport (ie: it goes inside the
---[ cut here ]--- section)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_type returns an int indicating success or failure, but up to now
setup_irq ignores that.
In my case this resulted in a machine hang:
gpio-keys requested IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING, but
arm/ns9xxx can only trigger on one direction so set_type didn't touch
the configuration which happens do default to a level sensitiveness and
returned -EINVAL. setup_irq ignored that and unmasked the irq. This
resulted in an endless triggering of the gpio-key interrupt service
routine which effectively killed the machine.
With this patch applied setup_irq propagates the error to the caller.
Note that before in the case
chip && !chip->set_type && !chip->name
a NULL pointer was feed to printk. This is fixed, too.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 15a647eba9 set_irq_wake returned -ENXIO
if another device had it already enabled. Zero is the right value to
return in this case. Moreover the change to desc->status was not reverted
if desc->chip->set_wake returned an error.
Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we disable a screaming irq we never see it again. If the irq
line is shared or if the driver half works this is a real pain. So
periodically poll the handlers for screaming interrupts.
I use a timer instead of the classic irq poll technique of working off
the timer interrupt because when we use the local apic timers
note_interrupt is never called (bug?). Further on a system with
dynamic ticks the timer interrupt might not even fire unless there is
a timer telling it it needs to.
I forced this case on my test system with an e1000 nic and my ssh
session remained responsive despite the interrupt handler only being
called every 10th of a second.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In http://bugzilla.kernel.org/show_bug.cgi?id=9580 it was pointed out
that the desc->chip checks are extraneous. In fact these are left
overs from early development and can be removed safely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Current IRQ affinity interface does not provide a way to set affinity
for the IRQs that will be allocated/activated in the future.
This patch creates /proc/irq/default_smp_affinity that lets users set
default affinity mask for the newly allocated IRQs. Changing the default
does not affect affinity masks for the currently active IRQs, they
have to be changed explicitly.
Updated based on Paul J's comments and added some more documentation.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: a.p.zijlstra@chello.nl
Cc: tglx@linutronix.de
Cc: rdunlap@xenotime.net
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Uwe Kleine-Koenig has some strange hardware where one of the shared
interrupts can be asserted during boot before the appropriate driver
loads. Requesting the shared irq line from another driver result in a
spurious interrupt storm which finally disables the interrupt line.
I have seen similar behaviour on resume before (the hardware does not
work anymore so I can not verify).
Change the spurious disable logic to increment the disable depth and
mark the interrupt with an extra flag which allows us to reenable the
interrupt when a new driver arrives which requests the same irq
line. In the worst case this will disable the irq again via the
spurious trap, but there is a decent chance that the new driver is the
one which can handle the already asserted interrupt and makes the box
usable again.
Eric Biederman said further: This case also happens on a regular basis
in kdump kernels where we deliberately don't shutdown the hardware
before starting the new kernel. This patch should reduce the need for
using irqpoll in that situation by a small amount.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-and-Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Following an experimental deletion of the unnecessary directive
#include <linux/slab.h>
from the header file <linux/percpu.h>, these files under kernel/ were exposed
as needing to include one of <linux/slab.h> or <linux/gfp.h>, so explicit
includes were added where necessary.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.
* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:
- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The default_disable() function was changed in commit:
76d2160147
genirq: do not mask interrupts by default
It removed the mask function in favour of the default delayed
interrupt disabling. Unfortunately this also broke the shutdown in
free_irq() when the last handler is removed from the interrupt for
those architectures which rely on the default implementations. Now we
can end up with a enabled interrupt line after the last handler was
removed, which can result in spurious interrupts.
Fix this by adding a default_shutdown function, which is only
installed, when the irqchip implementation does provide neither a
shutdown nor a disable function.
[@stable: affected versions: .21 - .24 ]
Pointed-out-by: Michael Hennerich <Michael.Hennerich@analog.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Tested-by: Michael Hennerich <Michael.Hennerich@analog.com>
The functions time_before, time_before_eq, time_after, and
time_after_eq are more robust for comparing jiffies against other
values.
So following patch implements usage of the time_after() macro, defined
at linux/jiffies.h, which deals with wrapping correctly
Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq
method may crash the system because handle_percpu_irq does not check
IRQ_WAITING. This for example hits the MIPS Qemu configuration.
This patch provides two helper functions set_irq_noprobe and set_irq_probe to
set rsp. clear the IRQ_NOPROBE flag. The only current caller is MIPS code
but this really belongs into generic code.
As an aside, interrupt probing these days has become a mostly obsolete if not
dangerous art. I think Linux interrupts should be changed to default to
non-probing but that's subject of this patch.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-and-tested-by: Rob Landley <rob@landley.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
these bugs are harder to find than they seem, a stackdump helps.
make it dependent on CONFIG_DEBUG_SHIRQ so that people can turn it off
if it annoys them.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is useful to debug problems with interrupt handlers that return
sometimes IRQ_NONE.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This allows to change them at runtime using sysfs. No need to
reboot to set them.
I only added aliases (kernel.noirqdebug etc.) so the old options
still work.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In commit 76d2160147 lazy irq disabling
was implemented, and the simple irq handler had a masking set to it.
Remy Bohmer discovered that some devices in the ARM architecture
would trigger the mask, but never unmask it. His patch to do the
unmasking was questioned by Russell King about masking simple irqs
to begin with. Looking further, it was discovered that the problems
Remy was seeing was due to improper use of the simple handler by
devices, and he later submitted patches to fix those. But the issue
that was uncovered was that the simple handler should never mask.
This patch reverts the masking in the simple handler.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
In __do_IRQ(), the normal case is that IRQ_DISABLED is checked and if set
the handler (handle_IRQ_event()) is not called.
Earlier in __do_IRQ(), if IRQ_PER_CPU is set the code does not check
IRQ_DISABLED and calls the handler even though IRQ_DISABLED is set. This
behavior seems unintentional.
One user encountering this behavior is the CPE handler (in
arch/ia64/kernel/mca.c). When the CPE handler encounters too many CPEs
(such as a solid single bit error), it sets up a polling timer and disables
the CPE interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() is called which sets IRQ_DISABLED. The
IRQ_PER_CPU flag was previously set (in ia64_mca_late_init()). The net
result is the CPE handler gets called even though it is marked disabled.
If the behavior of not checking IRQ_DISABLED when IRQ_PER_CPU is set is
intentional, it would be worthy of a comment describing the intended
behavior. disable_irq_nosync() does call chip->disable() to provide a
chipset specifiec interface for disabling the interrupt, which avoids this
issue when used.
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers. For example,
the tg3 driver does
tp->irq_sync = 1;
smp_mb();
synchronize_irq();
and then in the IRQ handler:
if (!tp->irq_sync)
netif_rx_schedule(dev, &tp->napi);
Unfortunately memory barriers only work well when they come in
pairs. Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.
In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.
This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compiling handle_percpu_irq only on uniprocessor generates an artificial
special case so a typical use like:
set_irq_chip_and_handler(irq, &some_irq_type, handle_percpu_irq);
needs to be conditionally compiled only on SMP systems as well and an
alternative UP construct is usually needed - for no good reason.
This fixes uniprocessor configurations for some MIPS SMP systems.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Gospodarek pointed out that because we return in the middle of the
free_irq() function, we never actually do call the IRQ handler that just
got deregistered. This should fix it, although I expect Andrew will want
to convert those 'return's to 'break'. That's a separate change though.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Fernando Luis Vzquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mariusz Kozlowski reported lockdep's warning:
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.23-rc2-mm1 #7
> ---------------------------------
> inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes:
> (&tp->lock){+...}, at: [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
> {in-hardirq-W} state was registered at:
> [<c0138eeb>] __lock_acquire+0x949/0x11ac
> [<c01397e7>] lock_acquire+0x99/0xb2
> [<c0452ff3>] _spin_lock+0x35/0x42
> [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
> [<c0147a5d>] handle_IRQ_event+0x28/0x59
> [<c01493ca>] handle_level_irq+0xad/0x10b
> [<c0105a13>] do_IRQ+0x93/0xd0
> [<c010441e>] common_interrupt+0x2e/0x34
...
> other info that might help us debug this:
> 1 lock held by ifconfig/5492:
> #0: (rtnl_mutex){--..}, at: [<c0451778>] mutex_lock+0x1c/0x1f
>
> stack backtrace:
...
> [<c0452ff3>] _spin_lock+0x35/0x42
> [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
> [<c01480fd>] free_irq+0x11b/0x146
> [<de871d59>] rtl8139_close+0x8a/0x14a [8139too]
> [<c03bde63>] dev_close+0x57/0x74
...
This shows that a driver's irq handler was running both in hard interrupt
and process contexts with irqs enabled. The latter was done during
free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled.
This was fixed by another patch.
But similar problem is possible with request_irq(): any locks taken from
irq handler could be vulnerable - especially with soft interrupts. This
patch fixes it by disabling local interrupts during handler's run. (It
seems, disabling softirqs should be enough, but it needs more checking
on possible races or other special cases).
Reported-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we're going to run the handler from free_irq() then we must do it with
local irq's disabled. Otherwise lockdep complains that the handler is taking
irq-safe spinlocks in a non-irq-safe fashion.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Level type interrupts are resent by the interrupt hardware when they are
still active at irq_enable().
Suppress the resend mechanism for interrupts marked as level.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5a43a066b1: "genirq: Allow fasteoi
handler to retrigger disabled interrupts" was erroneously applied to
handle_level_irq(). This added the irq retrigger / resend functionality
to the level irq handler.
Revert the offending bits.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0fc4969b86. It was
always meant to be temporary, but it's generating more useless noise
than anything else, and we probably should never have done it in the
generic kernel (only had the people involved test it on their own).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcin Slusarz reported a ne2k-pci "hung network interface" regression.
delayed disable relies on the ability to re-trigger the interrupt in the
case that a real interrupt happens after the software disable was set.
In this case we actually disable the interrupt on the hardware level
_after_ it occurred.
On enable_irq, we need to re-trigger the interrupt. On i386 this relies
on a hardware resend mechanism (send_IPI_self()).
Actually we only need the resend for edge type interrupts. Level type
interrupts come back once enable_irq() re-enables the interrupt line.
I assume that the interrupt in question is level triggered because it is
shared and above the legacy irqs 0-15:
17: 12 IO-APIC-fasteoi eth1, eth0
Looking into the IO_APIC code, the resend via send_IPI_self() happens
unconditionally. So the resend is done for level and edge interrupts.
This makes the problem more mysterious.
The code in question lib8390.c does
disable_irq();
fiddle_with_the_network_card_hardware()
enable_irq();
The fiddle_with_the_network_card_hardware() might cause interrupts,
which are cleared in the same code path again,
Marcin found that when he disables the irq line on the hardware level
(removing the delayed disable) the card is kept alive.
So the difference is that we can get a resend on enable_irq, when an
interrupt happens during the time, where we are in the disabled region.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Otherwise smp_affinity would only update after the next interrupt
on x86 systems.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we handle spurious IRQ activity based upon seeing a lot of
invalid interrupts, and we clear things back on the base of lots of valid
interrupts.
Unfortunately in some cases you get legitimate invalid interrupts caused by
timing asynchronicity between the PCI bus and the APIC bus when disabling
interrupts and pulling other tricks. In this case although the spurious
IRQs are not a problem our unhandled counters didn't clear and they act as
a slow running timebomb. (This is effectively what the serial port/tty
problem that was fixed by clearing counters when registering a handler
showed up)
It's easy enough to add a second parameter - time. This means that if we
see a regular stream of harmless spurious interrupts which are not harming
processing we don't go off and do something stupid like disable the IRQ
after a month of running. OTOH lockups and performance killers show up a
lot more than 10/second
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With irqpoll enabled, trying to test the IRQF_IRQPOLL flag in the
actions would cause a NULL pointer dereference if no action was
installed (for example, the driver might have been unloaded with
interrupts still pending).
So be a bit more careful about testing the flag by making sure to test
for that case.
(The actual _change_ is trivial, the patch is more than a one-liner
because I rewrote the testing to also be much more readable.
Original (discarded) bugfix by Bernhard Walle.
Cc: Bernhard Walle <bwalle@suse.de>
Tested-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On SN, only allow one bit to be set in the smp_affinty mask when
redirecting an interrupt. Currently setting multiple bits is allowed, but
only the first bit is used in determining the CPU to redirect to. This has
caused confusion among some customers.
[akpm@linux-foundation.org: fixes]
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
A linuxdoc comment had fallen out of date - it refers to an argument which no
longer exists.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
irqpoll is broken on some architectures that don't use the IRQ 0 for the timer
interrupt like IA64. This patch adds a IRQF_IRQPOLL flag.
Each architecture is handled in a separate pach. As I left the irq == 0 as
condition, this should not break existing architectures that use timer_irq ==
0 and that I did't address with that patch (because I don't know).
This patch:
This patch adds a IRQF_IRQPOLL flag that the interrupt registration code could
use for the interrupt it wants to use for IRQ polling.
Because this must not be the timer interrupt, an additional flag was added
instead of re-using the IRQF_TIMER constant. Until all architectures will
have an IRQF_IRQPOLL interrupt, irq == 0 will stay as alternative as it should
not break anything.
Also, note_interrupt() is called on CPU-specific interrupts to be used as
interrupt source for IRQ polling.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Grant Grundler <grundler@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We noticed a drop in n/w performance due to the irq_desc being cacheline
aligned rather than internode aligned. We see 50% of expected performance
when two e1000 nics local to two different nodes have consecutive irq
descriptors allocated, due to false sharing.
Note that this patch does away with cacheline padding for the UP case, as
it does not seem useful for UP configurations.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An irqaction structure won't be added to an IRQ descriptor irqaction list if
it doesn't agree with other irqactions on the IRQF_PERCPU flag. Don't check
for this flag to change IRQ descriptor `status' for every irqaction added to
the list, Doing the check only for the first irqaction added is enough.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
setup_irq() releases a desc->lock before calling register_handler_proc(), so
the iteration over the IRQ action list is not protected.
(akpm: the check itself is still racy, but at least it probably won't oops
now).
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.
set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.
The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.
Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
devres should be deallocated with devres_free() not kfree(). This bug
corrupts slab on IRQ request failure. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
move_native_irqs tries to do the right thing when migrating irqs
by disabling them. However disabling them is a software logical
thing, not a hardware thing. This has always been a little flaky
and after Ingo's latest round of changes it is guaranteed to not
mask the apic.
So this patch fixes move_native_irq to directly call the mask and
unmask chip methods to guarantee that we mask the irq when we
are migrating it. We must do this as it is required by
all code that call into the path.
Since we don't know the masked status when IRQ_DISABLED is
set so we will not be able to restore it. The patch makes the code
just give up and trying again the next time this routing is called.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use mask_ack_irq() where possible.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)
Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide funtions to:
- check, whether an interrupt can set the affinity
- pin the interrupt to a given cpu
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
[akpm@osdl.org: alpha build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The problem is various drivers legally validly and sensibly try to claim
IRQs but the kernel insists on vomiting forth a giant irrelevant debugging
spew when the types clash.
Edit kernel/irq/manage.c go down to mismatch: in setup_irq() and ifdef out
the if clause that checks for mismatches. It'll then just do the right
thing and work sanely.
For the current -mm kernel this will do the trick (and moves it into shared
irq debugging as in debug mode the info spew is useful). I've had a
variant of this in my private tree for some time as I got fed up on the
mess on boxes where old legacy IRQs get reused.
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drivers registering IRQ handlers with SA_SHIRQ really ought to be able to
handle an interrupt happening before request_irq() returns. They also
ought to be able to handle an interrupt happening during the start of their
call to free_irq(). Let's test that hypothesis....
[bunk@stusta.de: Kconfig fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Split the implementation-agnostic stuff in separate files.
* Make sure that targets using non-default request_irq() pull
kernel/irq/devres.o
* Introduce new symbols (HAS_IOPORT and HAS_IOMEM) defaulting to positive;
allow architectures to turn them off (we needed these symbols anyway for
dependencies of quite a few drivers).
* protect the ioport-related parts of lib/devres.o with CONFIG_HAS_IOPORT.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: pnx8550 code creates directory but resets ->nlink to 1.
create_proc_entry() et al will correctly set ->nlink for you.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement device resource management, in short, devres. A device
driver can allocate arbirary size of devres data which is associated
with a release function. On driver detach, release function is
invoked on the devres data, then, devres data is freed.
devreses are typed by associated release functions. Some devreses are
better represented by single instance of the type while others need
multiple instances sharing the same release function. Both usages are
supported.
devreses can be grouped using devres group such that a device driver
can easily release acquired resources halfway through initialization
or selectively release resources (e.g. resources for port 1 out of 4
ports).
This patch adds devres core including documentation and the following
managed interfaces.
* alloc/free : devm_kzalloc(), devm_kzfree()
* IO region : devm_request_region(), devm_release_region()
* IRQ : devm_request_irq(), devm_free_irq()
* DMA : dmam_alloc_coherent(), dmam_free_coherent(),
dmam_declare_coherent_memory(), dmam_pool_create(),
dmam_pool_destroy()
* PCI : pcim_enable_device(), pcim_pin_device(), pci_is_managed()
* iomap : devm_ioport_map(), devm_ioport_unmap(), devm_ioremap(),
devm_ioremap_nocache(), devm_iounmap(), pcim_iomap_table(),
pcim_iomap(), pcim_iounmap()
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
We need to be able to get from an irq number to a struct msi_desc.
The msi_desc array in msi.c had several short comings the big one was
that it could not be used outside of msi.c. Using irq_data in struct
irq_desc almost worked except on some architectures irq_data needs to
be used for something else.
So this patch adds a msi_desc pointer to irq_desc, adds the appropriate
wrappers and changes all of the msi code to use them.
The dynamic_irq_init/cleanup code was tweaked to ensure the new
field is left in a well defined state.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Any newly added irq handler may obviously make any old spurious irq
status invalid, since the new handler may well be the thing that is
supposed to handle any interrupts that came in.
So just clear the statistics when adding handlers.
Pointed-out-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o noirqdebug_setup() is __init but it is being called by
quirk_intel_irqbalance() which if of type __devinit. If CONFIG_HOTPLUG=y,
quirk_intel_irqbalance() is put into text section and it is wrong to
call a function in __init section.
o MODPOST flags this on i386 if CONFIG_RELOCATABLE=y
WARNING: vmlinux - Section mismatch: reference to .init.text:noirqdebug_setup from .text between 'quirk_intel_irqbalance' (at offset 0xc010969e) and 'i8237A_suspend'
o Make noirqdebug_setup() non-init.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The sanity check for no_irq_chip in __set_irq_hander() is unconditional on
both install and uninstall of an handler. This triggers false warnings and
replaces no_irq_chip by dummy_irq_chip in the uninstall case.
Check only, when a real handler is installed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While running my MCA test (hardware error injection) on 2.6.19,
I got some warning like following:
> BUG: warning at kernel/irq/migration.c:27/move_masked_irq()
>
> Call Trace:
> [<a000000100013d20>] show_stack+0x40/0xa0
> sp=e00000006b2578d0 bsp=e00000006b2510b0
> [<a000000100013db0>] dump_stack+0x30/0x60
> sp=e00000006b257aa0 bsp=e00000006b251098
> [<a0000001000de430>] move_masked_irq+0xb0/0x240
> sp=e00000006b257aa0 bsp=e00000006b251070
> [<a0000001000de6a0>] move_native_irq+0xe0/0x180
> sp=e00000006b257aa0 bsp=e00000006b251040
> [<a00000010004ff50>] iosapic_end_level_irq+0x30/0xe0
> sp=e00000006b257aa0 bsp=e00000006b251020
> [<a0000001000d94d0>] __do_IRQ+0x170/0x400
> sp=e00000006b257aa0 bsp=e00000006b250fd8
> [<a0000001000116f0>] ia64_handle_irq+0x1b0/0x260
> sp=e00000006b257aa0 bsp=e00000006b250fa8
> [<a00000010000c3a0>] ia64_leave_kernel+0x0/0x280
> sp=e00000006b257aa0 bsp=e00000006b250fa8
> [<a000000100690cf0>] _spin_unlock_irqrestore+0x30/0x60
> sp=e00000006b257c70 bsp=e00000006b250f90
It comes from:
[kernel/irq/migration.c]
26 if (CHECK_IRQ_PER_CPU(desc->status)) {
27 WARN_ON(1);
28 return;
29 }
By putting some printk in kernel, I found that irqbalance is trying to
move CPEI which is handled as PER_CPU irq. That's why.
CPEI(Corrected Platform Error Interrupt) is ia64 specific irq, is
allowed to pin to particular processor which selected by the platform, and
even it is PER_CPU but it has set_affinity handler (=iosapic_set_affinity)
as same as other IO-SAPIC-level interrupts. (I don't know why, but
I guess that there would be typical situation where the handler for
migration is needed, such as hotplug - the processor going to be
offline/hot-removed.)
To shut up this warning, there are 2 way at least:
a) fix CPEI stuff
b) prohibit setting affinity to PER_CPU irq
I'm not sure what stuff of CPEI need to be fixed, but I think that
returning error to attempting move PER_CPU irq is useful for all
applications since it will never work.
Following small patch takes b) style.
It works, the warning disappeared and irqbalance still runs well.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Name some of the remaning 'old_style_spin_init' locks
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit f72fa70760, and solves
the problem that it tried to fix by simply making "__do_IRQ()" call the
note_interrupt() function without the lock held, the way everybody else
does.
It should be noted that all interrupt handling code must never allow the
descriptor actors to be entered "recursively" (that's why we do all the
magic IRQ_PENDING stuff in the first place), so there actually is
exclusion at that much higher level, even in the absense of locking.
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Acked-by:Pavel Emelianov <xemul@openvz.org>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we get a mismatch between handlers on the same IRQ, all we get is "IRQ
handler type mismatch for IRQ n". Let's print the name of the
presently-registered handler with which we got the mismatch.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While testing kernel on machine with "irqpoll" option I've caught such a
lockup:
__do_IRQ()
spin_lock(&desc->lock);
desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
if (...)
local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
spin_lock(&desc->lock); /* LOCKUP */
Looking at misrouted_irq() code I've found that a potential deadlock like
this can also take place:
1CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = B */
if (desc->status & IRQ_INPROGRESS) {
2CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = A */
if (desc->status & IRQ_INPROGRESS) {
As the second lock on both CPUs is taken before checking that this irq is
being handled in another processor this may cause a deadlock. This issue
is only theoretical.
I propose the attached patch to fix booth problems: when trying to handle
misrouted IRQ active desc->lock may be unlocked.
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce desc->name and eliminate the handle_irq_name() hack. Add
set_irq_chip_and_handler_name() to set the flow type and name at once.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lib/bitmap.c:bitmap_parse() is a library function that received as input a
user buffer. This seemed to have originated from the way the write_proc
function of the /proc filesystem operates.
This has been reworked to not use kmalloc and eliminates a lot of
get_user() overhead by performing one access_ok before using __get_user().
We need to test if we are in kernel or user space (is_user) and access the
buffer differently. We cannot use __get_user() to access kernel addresses
in all cases, for example in architectures with separate address space for
kernel and user.
This function will be useful for other uses as well; for example, taking
input for /sysfs instead of /proc, so it was changed to accept kernel
buffers. We have this use for the Linux UWB project, as part as the
upcoming bandwidth allocator code.
Only a few routines used this function and they were changed too.
Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the following patch, the ixp4xxdefconfig builds correctly. I'll
test some more configs if I get some time.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
Typedef the IRQ handler function type.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)
Typedef the IRQ flow handler function type.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)
Currently msi.c is doing sanity checks that make certain before an irq is
destroyed it has no more users.
By adding irq_has_action I can perform the test is a generic way, instead of
relying on a msi specific data structure.
By performing the core check in dynamic_irq_cleanup I ensure every user of
dynamic irqs has a test present and we don't free resources that are in use.
In msi.c this allows me to kill the attrib.state member of msi_desc and all of
the assciated code to maintain it.
To keep from freeing data structures when irq cleanup code is called to soon
changing dyanamic_irq_cleanup is insufficient because there are msi specific
data structures that are also not safe to free.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the msi support comes a new concept in irq handling, irqs that are
created dynamically at run time.
Currently the msi code allocates irqs backwards. First it allocates a
platform dependent routing value for an interrupt the ``vector'' and then it
figures out from the vector which irq you are on.
This msi backwards allocator suffers from two basic problems. The allocator
suffers because it is trying to do something that is architecture specific in
a generic way making it brittle, inflexible, and tied to tightly to the
architecture implementation. The alloctor also suffers from it's very
backwards nature as it has tied things together that should have no
dependencies.
To solve the basic dynamic irq allocation problem two new architecture
specific functions are added: create_irq and destroy_irq.
create_irq takes no input and returns an unused irq number, that won't be
reused until it is returned to the free poll with destroy_irq. The irq then
can be used for any purpose although the only initial consumer is the msi
code.
destroy_irq takes an irq number allocated with create_irq and returns it to
the free pool.
Making this functionality per architecture increases the simplicity of the irq
allocation code and increases it's flexibility.
dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the
irq_desc initializtion that should happen for dynamic irqs.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently move_native_irq disables and renables the irq we are migrating to
ensure we don't take that irq when we are actually doing the migration
operation. Disabling the irq needs to happen but sometimes doing the work is
move_native_irq is too late.
On x86 with ioapics the irq move sequences needs to be:
edge_triggered:
mask irq.
move irq.
unmask irq.
ack irq.
level_triggered:
mask irq.
ack irq.
move irq.
unmask irq.
We can easily perform the edge triggered sequence, with the current defintion
of move_native_irq. However the level triggered case does not map well. For
that I have added move_masked_irq, to allow me to disable the irqs around both
the ack and the move.
Q: Why have we not seen this problem earlier?
A: The only symptom I have been able to reproduce is that if we change
the vector before acknowleding an irq the wrong irq is acknowledged.
Since we currently are not reprogramming the irq vector during
migration no problems show up.
We have to mask the irq before we acknowledge the irq or else we could
hit a window where an irq is asserted just before we acknowledge it.
Edge triggered irqs do not have this problem because acknowledgements
do not propogate in the same way.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The primary aim of this patchset is to remove maintenances problems caused by
the irq infrastructure. The two big issues I address are an artificially
small cap on the number of irqs, and that MSI assumes vector == irq. My
primary focus is on x86_64 but I have touched other architectures where
necessary to keep them from breaking.
- To increase the number of irqs I modify the code to look at the (cpu,
vector) pair instead of just looking at the vector.
With a large number of irqs available systems with a large irq count no
longer need to compress their irq numbers to fit. Removing a lot of brittle
special cases.
For acpi guys the result is that irq == gsi.
- Addressing the fact that MSI assumes irq == vector takes a few more
patches. But suffice it to say when I am done none of the generic irq code
even knows what a vector is.
In quick testing on a large Unisys x86_64 machine we stumbled over at least
one driver that assumed that NR_IRQS could always fit into an 8 bit number.
This driver is clearly buggy today. But this has become a class of bugs that
it is now much easier to hit.
This patch:
This is a minor space optimization. In practice I don't think this has any
affect because of our alignment constraints and the other fields but there is
not point in chewing up an uncessary word and since we already read the flag
field this should improve the cache hit ratio of the irq handler.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Permit __do_IRQ() to be dispensed with based on a configuration option.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
while porting the -rt tree to 2.6.18-rc7 i noticed the following
screaming-IRQ scenario on an SMP system:
2274 0Dn.:1 0.001ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.010ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.020ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.029ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.039ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.048ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.058ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.068ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.077ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.087ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
2274 0Dn.:1 0.097ms: do_IRQ+0xc/0x103 <= (ret_from_intr+0x0/0xf)
as it turns out, the bug is caused by handle_level_irq(), which if it
races with another CPU already handling this IRQ, it _unmasks_ the IRQ
line on the way out. This is not how 2.6.17 works, and we introduced
this bug in one of the early genirq cleanups right before it went into
-mm. (the bug was not in the genirq patchset for a long time, and we
didnt notice the bug due to the lack of -rt rebase to the new genirq
code. -rt, and hardirq-preemption in particular opens up such races much
wider than anything else.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug where the IRQ_PENDING flag is never cleared and the ISR is called
endlessly without an actual interrupt.
Signed-off-by: Imre Deak <imre.deak@solidboot.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds the description of the parameters from handle_bad_irq().
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IRQs need refcounting and a state flag to track whether the the IRQ should
be enabled or disabled as a "normal IRQ" source after a series of calls to
{en,dis}able_irq(). For shared IRQs, the IRQ must be enabled so long as at
least one driver needs it active.
Likewise, IRQs need the same support to track whether the IRQ should be
enabled or disabled as a "wakeup event" source after a series of calls to
{en,dis}able_irq_wake(). For shared IRQs, the IRQ must be enabled as a
wakeup source during sleep so long as at least one driver needs it. But
right now they _don't have_ that refcounting ... which means sharing a
wakeup-capable IRQ can't work correctly in some configurations.
This patch adds the refcount and flag mechanisms to set_irq_wake() -- which
is what {en,dis}able_irq_wake() call -- and minimal documentation of what
the irq wake mechanism does.
Drivers relying on the older (broken) "toggle" semantics will trigger a
warning; that'll be a handful of drivers on ARM systems.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: add defconfig for Freescale MPC8349E-mITX board
powerpc: Add base support for the Freescale MPC8349E-mITX eval board
Documentation: correct values in MPC8548E SEC example node
[POWERPC] Actually copy over i8259.c to arch/ppc/syslib this time
[POWERPC] Add new interrupt mapping core and change platforms to use it
[POWERPC] Copy i8259 code back to arch/ppc
[POWERPC] New device-tree interrupt parsing code
[POWERPC] Use the genirq framework
[PATCH] genirq: Allow fasteoi handler to retrigger disabled interrupts
[POWERPC] Update the SWIM3 (powermac) floppy driver
[POWERPC] Fix error handling in detecting legacy serial ports
[POWERPC] Fix booting on Momentum "Apache" board (a Maple derivative)
[POWERPC] Fix various offb and BootX-related issues
[POWERPC] Add a default config for 32-bit CHRP machines
[POWERPC] fix implicit declaration on cell.
[POWERPC] change get_property to return void *
Make use of local_irq_enable_in_hardirq() API to annotate places that enable
hardirqs in hardirq context.
Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.
Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.
What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]
Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.
To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.
To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:
lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176
The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.
More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:
http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the fasteoi handler mark disabled interrupts as pending if they
happen anyway. This allow implementation of a delayed disable scheme
with the fasteoi handler.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The irqflags consolidation converted SA_PERCPU_IRQ to IRQF_PERCPU but
did not define the new constant.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus: "The hacks in kernel/irq/handle.c are really horrid. REALLY
horrid."
They are indeed. Move the dyntick quirks to ARM where they belong.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Thomas Gleixner
From: Thomas Gleixner <tglx@linutronix.de>
ARM has a couple of really dumb interrupt controllers.
Implement a generic one and fixup the ARM migration. ARM reused
the no_irq_chip for this purpose, but this does not work out
for platforms which are not converted to the new interrupt
type handling model.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Thomas Gleixner
From: Thomas Gleixner <tglx@linutronix.de>
Make the ARM dyntick implementation work with the generic
irq code. This hopefully goes away once we consolidated the
dyntick implementations.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>