Commit graph

137 commits

Author SHA1 Message Date
Jan Glauber
9a4da8a5b1 s390/pci: PCI adapter interrupts for MSI/MSI-X
Support PCI adapter interrupts using the Single-IRQ-mode. Single-IRQ-mode
disables an adapter IRQ automatically after delivering it until the SIC
instruction enables it again. This is used to reduce the number of IRQs
for streaming workloads.

Up to 64 MSI handlers can be registered per PCI function.
A hash table is used to map interrupt numbers to MSI descriptors.
The interrupt vector is scanned using the flogr instruction.
Only MSI/MSI-X interrupts are supported, no legacy INTs.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-30 17:47:21 +01:00
Konrad Rzeszutek Wilk
76ccc29701 x86/PCI: Expand the x86_msi_ops to have a restore MSIs.
The MSI restore function will become a function pointer in an
x86_msi_ops struct. It defaults to the implementation in the
io_apic.c and msi.c. We piggyback on the indirection mechanism
introduced by "x86: Introduce x86_msi_ops".

Cc: x86@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 14:02:26 -08:00
Neil Horman
424eb39159 PCI: msi: fix imbalanced refcount of msi irq sysfs objects
This warning was recently reported to me:

------------[ cut here ]------------
WARNING: at lib/kobject.c:595 kobject_put+0x50/0x60()
Hardware name: VMware Virtual Platform
kobject: '(null)' (ffff880027b0df40): is not initialized, yet kobject_put() is
being called.
Modules linked in: vmxnet3(+) vmw_balloon i2c_piix4 i2c_core shpchp raid10
vmw_pvscsi
Pid: 630, comm: modprobe Tainted: G        W   3.1.6-1.fc16.x86_64 #1
Call Trace:
 [<ffffffff8106b73f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8106b836>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff810da293>] ? free_desc+0x63/0x70
 [<ffffffff812a9aa0>] kobject_put+0x50/0x60
 [<ffffffff812e4c25>] free_msi_irqs+0xd5/0x120
 [<ffffffff812e524c>] pci_enable_msi_block+0x24c/0x2c0
 [<ffffffffa017c273>] vmxnet3_alloc_intr_resources+0x173/0x240 [vmxnet3]
 [<ffffffffa0182e94>] vmxnet3_probe_device+0x615/0x834 [vmxnet3]
 [<ffffffff812d141c>] local_pci_probe+0x5c/0xd0
 [<ffffffff812d2cb9>] pci_device_probe+0x109/0x130
 [<ffffffff8138ba2c>] driver_probe_device+0x9c/0x2b0
 [<ffffffff8138bceb>] __driver_attach+0xab/0xb0
 [<ffffffff8138bc40>] ? driver_probe_device+0x2b0/0x2b0
 [<ffffffff8138bc40>] ? driver_probe_device+0x2b0/0x2b0
 [<ffffffff8138a8ac>] bus_for_each_dev+0x5c/0x90
 [<ffffffff8138b63e>] driver_attach+0x1e/0x20
 [<ffffffff8138b240>] bus_add_driver+0x1b0/0x2a0
 [<ffffffffa0188000>] ? 0xffffffffa0187fff
 [<ffffffff8138c246>] driver_register+0x76/0x140
 [<ffffffff815ca414>] ? printk+0x51/0x53
 [<ffffffffa0188000>] ? 0xffffffffa0187fff
 [<ffffffff812d2996>] __pci_register_driver+0x56/0xd0
 [<ffffffffa018803a>] vmxnet3_init_module+0x3a/0x3c [vmxnet3]
 [<ffffffff81002042>] do_one_initcall+0x42/0x180
 [<ffffffff810aad71>] sys_init_module+0x91/0x200
 [<ffffffff815dccc2>] system_call_fastpath+0x16/0x1b
---[ end trace 44593438a59a9558 ]---
Using INTx interrupt, #Rx queues: 1.

It occurs when populate_msi_sysfs fails, which in turn causes free_msi_irqs to
be called.  Because populate_msi_sysfs fails, we never registered any of the
msi irq sysfs objects, but free_msi_irqs still calls kobject_del and kobject_put
on each of them, which gets flagged in the above stack trace.

The fix is pretty straightforward.  We can key of the parent pointer in the
kobject.  It is only set if the kobject_init_and_add succededs in
populate_msi_sysfs.  If anything fails there, each kobject has its parent reset
to NULL

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: linux-pci@vger.kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:17 -08:00
Eric W. Biederman
d5dea7d95c PCI: msi: Disable msi interrupts when we initialize a pci device
I traced a nasty kexec on panic boot failure to the fact that we had
screaming msi interrupts and we were not disabling the msi messages at
kernel startup.  The booting kernel had not enabled those interupts so
was not prepared to handle them.

I can see no reason why we would ever want to leave the msi interrupts
enabled at boot if something else has enabled those interrupts.  The pci
spec specifies that msi interrupts should be off by default.  Drivers
are expected to enable the msi interrupts if they want to use them.  Our
interrupt handling code reprograms the interrupt handlers at boot and
will not be be able to do anything useful with an unexpected interrupt.

This patch applies cleanly all of the way back to 2.6.32 where I noticed
the problem.

Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:10:29 -08:00
Neil Horman
da8d1c8ba4 PCI/sysfs: add per pci device msi[x] irq listing (v5)
This patch adds a per-pci-device subdirectory in sysfs called:
/sys/bus/pci/devices/<device>/msi_irqs

This sub-directory exports the set of msi vectors allocated by a given
pci device, by creating a numbered sub-directory for each vector beneath
msi_irqs.  For each vector various attributes can be exported.
Currently the only attribute is called mode, which tracks the
operational mode of that vector (msi vs. msix)

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:10:25 -08:00
Paul Gortmaker
363c75db1d pci: Fix files needing export.h for EXPORT_SYMBOL/THIS_MODULE
They were implicitly getting it from device.h --> module.h but
we want to clean that up.  So add the minimal header for these
macros.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:22 -04:00
Thomas Gleixner
dced35aeb0 drivers: Final irq namespace conversion
Scripted with coccinelle.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:48:19 +02:00
Sheng Yang
8d80528696 PCI: Add mask bit definition for MSI-X table
Then we can use it instead of magic number 1.

Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-23 12:53:08 -08:00
Thomas Gleixner
1525bf0d8f msi: Introduce default_[teardown|setup]_msi_irqs with fallback.
Introduce an override for the arch_[teardown|setup]_msi_irqs
that can be utilized to fallback to the default arch_* code.

If a platform wants to utilize the code paths defined
in driver/pci/msi.c it has to define HAVE_DEFAULT_MSI_TEARDOWN_IRQS
or HAVE_DEFAULT_MSI_SETUP_IRQS. Otherwise the old mechanism
of over-ridding the arch_* works fine.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: x86@kernel.org
2010-10-18 10:49:33 -04:00
Thomas Gleixner
39431acb1a pci: Cleanup the irq_desc mess in msi
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:34 +02:00
Thomas Gleixner
1c9db52534 pci: Convert msi to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
2010-10-12 16:53:34 +02:00
Ben Hutchings
30da552428 PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.

However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
  last MSI message written
- Use the new functions where appropriate

Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:41:39 -07:00
Ben Hutchings
fcd097f31a PCI: MSI: Remove unsafe and unnecessary hardware access
During suspend on an SMP system, {read,write}_msi_msg_desc() may be
called to mask and unmask interrupts on a device that is already in a
reduced power state.  At this point memory-mapped registers including
MSI-X tables are not accessible, and config space may not be fully
functional either.

While a device is in a reduced power state its interrupts are
effectively masked and its MSI(-X) state will be restored when it is
brought back to D0.  Therefore these functions can simply read and
write msi_desc::msg for devices not in D0.

Further, read_msi_msg_desc() should only ever be used to update a
previously written message, so it can always read msi_desc::msg
and never needs to touch the hardware.

Tested-by: "Michael Chan" <mchan@broadcom.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:29:34 -07:00
Kenji Kaneshige
4302e0fb7f PCI: fix wrong memory address handling in MSI-X
Use resource_size_t for MMIO address instead of unsigned long. Otherwise,
higher 32-bits of MMIO address are cleared unexpectedly in x86-32 PAE.

Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:29:14 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Hidetoshi Seto
500559a92d PCI MSI: Style cleanups
Cleanups (nearly based on checkpatch).

Before: total: 11 errors, 2 warnings, 0 checks, 842 lines checked
After:  total:  0 errors, 0 warnings, 0 checks, 842 lines checked

v2: fix it's/its mistakes in comment

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:35 -07:00
Hidetoshi Seto
d9d7070e61 PCI MSI: MSI-X cleanup, msix_setup_entries()
Cleanup based on the prototype from Matthew Milcox.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:34 -07:00
Hidetoshi Seto
75cb342687 PCI MSI: MSI-X cleanup, msix_program_entries()
Cleanup based on the prototype from Matthew Milcox.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:33 -07:00
Hidetoshi Seto
5a05a9d819 PCI MSI: MSI-X cleanup, msix_map_region()
Cleanup based on the prototype from Matthew Milcox.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:33 -07:00
Hidetoshi Seto
583871d436 PCI MSI: Relocate error path in init_msix_capability()
Move it from the middle of the function to the end.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:32 -07:00
Hidetoshi Seto
f56e448132 PCI MSI: Unify msi_free_irqs() and msix_free_all_irqs()
Unify msi_free_irqs() and msix_free_all_irqs(), and rename it to a
common void function free_msi_irqs().

And relocate the common function to where the prototype is located now.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:31 -07:00
Hidetoshi Seto
9cc8d54815 PCI MSI: Use list_first_entry()
use list_first_entry() instead of list_entry().

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:30 -07:00
Hidetoshi Seto
c901851fdd PCI MSI: Remove attribute check from pci_disable_msi()
The msi_list never have MSI-X's msi_desc while MSI is enabled,
and also it never have MSI's msi_desc while MSI-X is enabled.

This patch remove check for MSI-X entry from the pci_disable_msi(),
referring that pci_disable_msix() does not have any check for MSI
entry.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:29 -07:00
Hidetoshi Seto
12abb8ba84 PCI MSI: Fix restoration of MSI/MSI-X mask states in suspend/resume
There are 2 problems on mask states in suspend/resume.

[1]:
It is better to restore the mask states of MSI/MSI-X to initial states
(MSI is unmasked, MSI-X is masked) when we release the device.
The pci_msi_shutdown() does the restoration of mask states for MSI,
while the msi_free_irqs() does it for MSI-X.  In other words, in the
"disable" path both of MSI and MSI-X are handled, but in the "shutdown"
path only MSI is handled.

MSI:
   pci_disable_msi()
      => pci_msi_shutdown()
         [ mask states for MSI restored ]
         => msi_set_enable(dev, pos, 0);
      => msi_free_irqs()

MSI-X:
   pci_disable_msix()
      => pci_msix_shutdown()
         => msix_set_enable(dev, 0);
      => msix_free_all_irqs
         => msi_free_irqs()
            [ mask states for MSI-X restored ]

This patch moves the masking for MSI-X from msi_free_irqs() to
pci_msix_shutdown().

This change has some positive side effects:
 - It prevents OS from touching mask states before reading preserved
   bits in the register, which can be happen if msi_free_irqs() is
   called from error path in msix_capability_init().
 - It also prevents touching the register after turning off MSI-X in
   "disable" path, which can be a problem on some devices.

[2]:
We have cache of the mask state in msi_desc, which is automatically
updated when msi/msix_mask_irq() is called.  This cached states are
used for the resume.

But since what need to be restored in the resume is the states before
the shutdown on the suspend, calling msi/msix_mask_irq() from
pci_msi/msix_shutdown() is not appropriate.

This patch introduces __msi/msix_mask_irq() that do mask as same
as msi/msix_mask_irq() but does not update cached state, for use
in pci_msi/msix_shutdown().

[updated: get rid of msi/msix_mask_irq_nocache() (proposed by Matthew Wilcox)]

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-29 12:18:13 -07:00
Hidetoshi Seto
7ba1930db0 PCI MSI: Unmask MSI if setup failed
The initial state of mask register of MSI is unmasked.  We set it
masked before calling arch_setup_msi_irqs().  If arch_setup_msi_irq()
fails, it is better to restore the state of the mask register.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-29 12:16:19 -07:00
Hidetoshi Seto
2c21fd4b33 PCI MSI: shorten PCI_MSIX_ENTRY_* symbol names
These names are too long!  Drop _OFFSET to save some bytes/lines.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-29 12:15:19 -07:00
Hidetoshi Seto
0d07348931 PCI MSI: Return if alloc_msi_entry for MSI-X failed
In current code it continues setup even if alloc_msi_entry() for MSI-X
is failed due to lack of memory.  It means arch_setup_msi_irqs() might
be called with msi_desc entries less than its argument nvec.

At least x86's arch_setup_msi_irqs() uses list_for_each_entry() for
dev->msi_list that suspected to have entries same numbers as nvec, and
it doesn't check the number of allocated vectors and passed arg nvec.
Therefore it will result in success of pci_enable_msix(), with less
vectors allocated than requested.

This patch fixes the error route to return -ENOMEM, instead of continuing
the setup (proposed by Matthew Wilcox).

Note that there is no iounmap in msi_free_irqs() if no msi_disc is
allocated.

Reviewed-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-29 12:10:10 -07:00
Hidetoshi Seto
2af5066f66 PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write
Use msix_mask_irq() instead of direct use of writel, so as not to clear
preserved bits in the Vector Control register [31:1].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-19 15:11:45 -07:00
Matthew Wilcox
f598282f51 PCI: Fix the NIU MSI-X problem in a better way
The previous MSI-X fix (8d18101853) had
three bugs.  First, it didn't move the write that disabled the vector.
This led to writing garbage to the MSI-X vector (spotted by Michael
Ellerman).  It didn't fix the PCI resume case, and it had a race window
where the device could generate an interrupt before the MSI-X registers
were programmed (leading to a DMA to random addresses).

Fortunately, the MSI-X capability has a bit to mask all the vectors.
By setting this bit instead of clearing the enable bit, we can ensure
the device will not generate spurious interrupts.  Since the capability
is now enabled, the NIU device will not have a problem with the reads
and writes to the MSI-X registers being in the original order in the code.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-19 15:11:39 -07:00
Matthew Wilcox
110828c9cd PCI: remove redundant __msi_set_enable()
We have the 'pos' of the MSI capability at all locations which call
msi_set_enable(), so pass it to msi_set_enable() instead of making it
find the capability every time.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-18 13:57:24 -07:00
Kenji Kaneshige
ab7de999a2 PCI: remove invalid comment of msi_mask_irq()
Remove invalid comment of msi_mask_irq().

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-16 14:30:18 -07:00
Michael S. Tsirkin
57fbf52c86 PCI MSI: let drivers retry when not enough vectors
pci_enable_msix currently returns -EINVAL if you ask
for more vectors than supported by the device, which would
typically cause fallback to regular interrupts.

It's better to return the table size, making the driver retry
MSI-X with less vectors.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-11 12:04:18 -07:00
Hidetoshi Seto
67b5db6502 PCI MSI: Define PCI_MSI_MASK_32/64
Impact: cleanup, improve readability

Define PCI_MSI_MASK_32/64 for 32/64bit devices, instead of using
implicit offset (-4), "PCI_MSI_MASK_BIT - 4" and "PCI_MSI_MASK_BIT".

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-11 12:04:06 -07:00
Matthew Wilcox
8d18101853 PCI MSI: Fix MSI-X with NIU cards
The NIU device refuses to allow accesses to MSI-X registers before MSI-X
is enabled.  This patch fixes the problem by moving the read of the mask
register to after MSI-X is enabled.

Reported-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-05-11 17:02:27 -07:00
Matthew Wilcox
1c8d7b0a56 PCI MSI: Add support for multiple MSI
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block.  Ensure that the architecture back ends don't
have to know about multiple MSI.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:14 -07:00
Matthew Wilcox
f2440d9acb PCI MSI: Refactor interrupt masking code
Since most of the callers already know whether they have an MSI or
an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq()
and msix_mask_irq().  The only callers which don't (mask_msi_irq()
and unmask_msi_irq()) can share code in msi_set_mask_bit().  This then
becomes the only caller of msix_flush_writes(), so we can inline it.
The flushing read can be to any address that belongs to the device,
so we can eliminate the calculation too.

We can also get rid of maskbits_mask from struct msi_desc and simply
recalculate it on the rare occasion that we need it.  The single-bit
'masked' element is replaced by a copy of the 32-bit 'masked' register,
so this patch does not affect the size of msi_desc.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:13 -07:00
Matthew Wilcox
264d9caaa1 PCI MSI: Use mask_pos instead of mask_base when appropriate
MSI interrupts have a mask_pos where MSI-X have a mask_base.  Use a
transparent union to get rid of some ugly casts.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:13 -07:00
Matthew Wilcox
379f5327a8 PCI MSI: msi_desc->dev is always initialised
By passing the pci_dev into alloc_msi_entry() we can be sure that
the ->dev entry is always assigned and so we don't need to check it.
Also, we used kzalloc() so we don't need to initialise ->irq to 0.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:12 -07:00
Matthew Wilcox
24d2755339 PCI MSI: Replace 'type' with 'is_msix'
By changing from a 5-bit field to a 1-bit field, we free up some bits
that can be used by a later patch.  Also rearrange the fields for better
packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes
to 64 bytes).

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:12 -07:00
Michael Ellerman
b5fbf53324 PCI/MSI: Allow arch code to return the number of MSI-X available
There is code in msix_capability_init() which, when the requested number
of MSI-X couldn't be allocated, calculates how many MSI-X /could/ be
allocated and returns that to the driver. That allows the driver to then
make a second request, with a number of MSIs that should succeed.

The current code requires the arch code to setup as many msi_descs as it
can, and then return to the generic code. On some platforms the arch
code may already know how many MSI-X it can allocate, before it sets up
any of the msi_descs.

So change the logic such that if the arch code returns a positive error
code, that is taken to be the number of MSI-X that could be allocated.
If the error code is negative we still calculate the number available
using the old method.

Because it's a little subtle, make sure the error return code from
arch_setup_msi_irq() is always negative. That way only implementations
of arch_setup_msi_irqs() need to be careful about returning a positive
error code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-19 19:29:34 -07:00
Michael Ellerman
11df1f0551 PCI/MSI: Use #ifdefs instead of weak functions
Weak functions aren't all they're cracked up to be. They lead to
incorrect binaries with some toolchains, they require us to have empty
functions we otherwise wouldn't, and the unused code is not elided
(as of gcc 4.3.2 anyway).

So replace the weak MSI arch hooks with the #define foo foo idiom. We no
longer need empty versions of arch_setup/teardown_msi_irq().

This is less source (by 1 line!), and results in smaller binaries too:

   text	   data	    bss	    dec	    hex	filename
9354300	1693916	 678424	11726640 b2ef30	build/powerpc/vmlinux-before
9354052	1693852	 678424	11726328 b2edf8	build/powerpc/vmlinux-after

Also smaller on x86_64 and arm (iop13xx).

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-19 19:29:26 -07:00
Rafael J. Wysocki
a52e2e3513 PCI/MSI: Introduce pci_msix_table_size()
Introduce new function pci_msix_table_size() returning the size of
the MSI-X table of given PCI device or 0 if the device doesn't
support MSI-X.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-19 19:29:25 -07:00
Matthew Wilcox
0b49ec37a2 PCI/MSI: fix msi_mask() shift fix
Hidetoshi Seto points out that commit
bffac3c593 has wrong values in the array.
Rather than correct the array, we can just use a bounds check and
perform the calculation specified in the comment.  As a bonus, this will
not run off the end of the array if the device specifies an illegal
value in the MSI capability.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-02-13 11:59:03 -08:00
Matthew Wilcox
bffac3c593 PCI MSI: Fix undefined shift by 32
Add an msi_mask() function which returns the correct bitmask for the
number of MSI interrupts you have.  This fixes an undefined bug in
msi_capability_init().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-27 09:53:25 -08:00
Hidetoshi Seto
0db29af1e7 PCI/MSI: bugfix/utilize for msi_capability_init()
This patch fix a following bug and does a cleanup.

bug:
	commit 5993760f7f
	had a wrong change (since is_64 is boolean[0|1]):

-               pci_write_config_dword(dev,
-                       msi_mask_bits_reg(pos, is_64bit_address(control)),
-                       maskbits);
+               pci_write_config_dword(dev, entry->msi_attrib.is_64, maskbits);

utilize:
	Unify separated if (entry->msi_attrib.maskbit) statements.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: "Jike Song" <albcamus@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-16 12:35:25 -08:00
Andrew Patterson
07ae95f988 ACPI/PCI: PCI MSI _OSC support capabilities called when root bridge added
The _OSC capability OSC_MSI_SUPPORT is set when the root bridge is added
with pci_acpi_osc_support(), so we no longer need to do it in the PCI
MSI driver.  Also adds the function pci_msi_enabled, which returns true
if pci=nomsi is not on the kernel command-line.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:31 -08:00
Yinghai Lu
3145e941fc x86, MSI: pass irq_cfg and irq_desc
Impact: simplify code

Pass irq_desc and cfg around, instead of raw IRQ numbers - this way
we dont have to look it up again and again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:59 +01:00
Taku Izumi
d389fec6a2 ACPI/PCI: Set support bit for MSI in support field of _OSC
Currently linux doesn't have any code to set the "MSI supported" bit in
Support Fireld of _OSC. This patch adds the code for that.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-22 16:42:35 -07:00
Jike Song
5993760f7f PCI: utilize calculated results when detecting MSI features
In msi_capability_init, we can make use of the calculated results
instead of calling is_mask_bit_support and is_64bit_address twice.

Signed-off-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-20 10:53:50 -07:00
Jesse Barnes
abad2ec98f PCI: fully restore MSI state at resume time
With the recent change to avoid masking MSIs using the MSI enable bit, devices
without an MSI mask bit will have their MSI capability always enabled when MSI
is in use, so we need to restore it regardless of the mask bit state.

Fixes kernel bz 11178.

Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-08-07 08:52:37 -07:00