This patch allows memory resources to be assigned with a specified
alignment at boot-time or run-time. The patch is useful when we use PCI
pass-through, because page-aligned memory resources are required to
securely share PCI resources with guest drivers.
If you want to assign the resource at boot time, please set
"pci=resource_alignment=" boot parameter.
This is format of "pci=resource_alignment=" boot parameter:
[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
Specifies alignment and device to reassign
aligned memory resources.
If <order of align> is not specified, PAGE_SIZE is
used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
This is example:
pci=resource_alignment=20@07:00.0;18@0f:00.0;00:1d.7
If you want to assign the resource at run-time, please set
"/sys/bus/pci/resource_alignment" file, and hot-remove the device and
hot-add the device. For this purpose, fakephp or PCI hotplug interfaces
can be used.
The format of "/sys/bus/pci/resource_alignment" file is the same with
boot parameter. You can use "," instead of ";".
For example:
# cd /sys/bus/pci
# echo -n 20@12:00.0 > resource_alignment
# echo 1 > devices/0000:12:00.0/remove
# echo 1 > rescan
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since most of the callers already know whether they have an MSI or
an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq()
and msix_mask_irq(). The only callers which don't (mask_msi_irq()
and unmask_msi_irq()) can share code in msi_set_mask_bit(). This then
becomes the only caller of msix_flush_writes(), so we can inline it.
The flushing read can be to any address that belongs to the device,
so we can eliminate the calculation too.
We can also get rid of maskbits_mask from struct msi_desc and simply
recalculate it on the rare occasion that we need it. The single-bit
'masked' element is replaced by a copy of the 32-bit 'masked' register,
so this patch does not affect the size of msi_desc.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
MSI interrupts have a mask_pos where MSI-X have a mask_base. Use a
transparent union to get rid of some ugly casts.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
By passing the pci_dev into alloc_msi_entry() we can be sure that
the ->dev entry is always assigned and so we don't need to check it.
Also, we used kzalloc() so we don't need to initialise ->irq to 0.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
By changing from a 5-bit field to a 1-bit field, we free up some bits
that can be used by a later patch. Also rearrange the fields for better
packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes
to 64 bytes).
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds a remove_id sysfs entry to allow users of new_id to later
remove the added dynid. One use case is management tools that want to
dynamically bind/unbind devices to pci-stub driver while devices are
assigned to KVM guests. Rather than having to track which driver was
originally bound to the driver, a mangement tool can simply:
Guest uses device
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_common_swizzle() seems to have a assumption that
pci_bus->self is NULL on the pci root bus. But it might not be true on
some platforms. Because of this wrong assumption, pci_common_swizzle()
might cause endless loop. We must check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_get_interrupt_pin() seems to have an assumption that
pci_bus->self is NULL on the root pci bus. But it might not be true on
some platforms. Because of this wrong assumption, current
pci_get_interrupt_pin() might cause endless loop. We must check
pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_read_bridge_bases() has an assumption that pci_bus->self
is NULL on the pci root bus (It checks pci_bus->self to see if the pci
bus is root bus). But is might not true on some platforms. We must
check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_find_upstream_pcie_bridge() has a wrong assumption that
pci_bus->self is NULL on the root pci bus. But it might not true on
some platforms. Because of this wrong assumption, current
pci_find_upstream_pcie_bridge() might cause endless loop. We must
check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current acpi_get_hp_hw_control_from_firmware() has a assumption that
pci_bus->self is NULL on a PCI root bus. But it might not be true on
some platforms. Because of this wrong assumption, current
acpi_get_hp_hw_control_from_firmware() might cause endless loop. We
must check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current acpi_get_hp_params_from_firmware() has a assumption that
pci_bus->self is NULL on the root pci bus. But it might not true on
some platforms. Because of this wrong assumption, current
acpi_get_hp_params_from_firmware() might cause endless loop. We must
check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Implement pm object for the PCI Express port driver in order to use
the new power management framework and reduce the code size.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pcie_port_device_remove currently calls the remove method of port
drivers twice. Ouch!
We are calling device_for_each_child multiple times for no apparent
reason.
So make it simple. Place put_device and device_unregister into
remove_iter, and throw out the rest. Only call device_for_each_child
once.
The code is simpler and actually works!
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This closes http://bugzilla.kernel.org/show_bug.cgi?id=10893
which is a showstopper for X development on alpha.
The generic HAVE_PCI_MMAP code (drivers/pci-sysfs.c) is not
very useful since we have to deal with three different types
of MMIO address spaces: sparse and dense mappings for old
ev4/ev5 machines and "normal" 1:1 MMIO space (bwx) for ev56 and
later.
Also "write combine" mappings are meaningless on alpha - roughly
speaking, alpha does write combining, IO reordering and other
optimizations by default, unless user splits IO accesses
with memory barriers.
I think the cleanest way to deal with resource files on alpha
is to convert the default no-op pci_create_resource_files() and
pci_remove_resource_files() for !HAVE_PCI_MMAP case into __weak
functions and override them with alpha specific ones.
Another alpha hook is needed for "legacy_" resource files
to handle sparse addressing (pci_adjust_legacy_attr).
With the "standard" resourceN files on ev56/ev6 libpciaccess
works "out of the box". Handling of resourceN_sparse/resourceN_dense
files on older machines obviously requires some userland work.
Sparse/dense stuff has been tested on sx164 (pca56/pyxis, normally
uses bwx IO) with the kernel hacked into "cia compatible" mode.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
drivers/pci/hotplug/fakephp.c: In function 'pci_rescan_bus':
drivers/pci/hotplug/fakephp.c:271: warning: passing argument 1 of 'pci_bus_assign_resources' discards qualifiers from pointer target type
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is code in msix_capability_init() which, when the requested number
of MSI-X couldn't be allocated, calculates how many MSI-X /could/ be
allocated and returns that to the driver. That allows the driver to then
make a second request, with a number of MSIs that should succeed.
The current code requires the arch code to setup as many msi_descs as it
can, and then return to the generic code. On some platforms the arch
code may already know how many MSI-X it can allocate, before it sets up
any of the msi_descs.
So change the logic such that if the arch code returns a positive error
code, that is taken to be the number of MSI-X that could be allocated.
If the error code is negative we still calculate the number available
using the old method.
Because it's a little subtle, make sure the error return code from
arch_setup_msi_irq() is always negative. That way only implementations
of arch_setup_msi_irqs() need to be careful about returning a positive
error code.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
With for (busnr = 0; busnr <= end; busnr++) { ... } busnr reaches end + 1
after the loop. So fix the "no busses available" check to look for just
busnr > end rather than >=.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
- Rename pci_osc_control_set() to acpi_pci_osc_control_set() according
to the other API names in drivers/acpi/pci_root.c.
- Move _OSC related definitions to include/linux/acpi.h because _OSC
related API is implemented in drivers/acpi/pci_root.c now.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Tested-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move PCI _OSC management code from drivers/pci/pci-acpi.c to
drivers/acpi/pci_root.c. The benefits are
- We no longer need struct osc_data and its management code (contents
are moved to struct acpi_pci_root). This simplify the code, and we
no longer care about kmalloc() failure.
- We can make pci_acpi_osc_support() be a static function, which is
called only from drivers/acpi/pci_root.c.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Tested-by: Andrew Patterson <andrew.patterson@hp.com>
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For all devices need to do function level reset, currently we need wait for
at least 200ms, which can be too long if we have lots of devices...
The patch checked pending bit before msleep() to skip some unnecessary
sleeping interval.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Let it stay as serial, since it doesn't have subdevice in the form of 0x00PS.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Convert usages of pr_debug to dev_dbg and add physical slot name.
Note that we use dev_dbg on the struct pci_bus and still manually
print out the PCI slot number (instead of calling dev_dbg on a
pci_dev) because a struct pci_bus with empty physical slots will
not have any pci_devs.
Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The cmd_busy field in struct controller takes only two values 0 or
1. So it should be one bit.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pciehp disables software notification of adapter presence
changed event and MRL changed event when slot is turned off. Because
of this, there is no way to detect those events on empty slots in the
current pciehp implementation.
According to the past discussion(*), this behavior was introduced to
prevent endless loop that could happen if pcie_isr() runs after power
fault is detected on a certain platform whose stickey power-fault bit
remains on till the slot is powered on again.
(*) http://sourceforge.net/mailarchive/message.php?msg_id=20051130135409.A14918%40unix-os.sc.intel.com
I think this endless loop can be avoided using one bit flag that
indicates power fault had been detected, instead of disabling software
notification of adapter present changed event and MRL changed event.
With this patch, we can enable software notification mechanism of
presence changed and MRL changed event on the empty slots again.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix possible endless loop in pcie_isr.
Currently, pcie_isr() (interrupt service routine of pciehp) can end up in an
endless loop if the Slot Status register is set again immediately after being
cleared. According to the past discussion (see below URL) this case can happen
if the power fault detected bit is set during handling.
http://sourceforge.net/mailarchive/message.php?msg_id=20051130135409.A14918%40unix-os.sc.intel.com
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Error handling code following a kmalloc should free the allocated data.
Since the subsequent code that could provoke an error does not use the
allocated data, the allocation is just moved below it.
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@
(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
when != if (...) { <+...x...+> }
x->f = E
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
According to kerneljanitors todo list all printk calls (beginning
a new line) should have an according KERN_* constant.
Those are the missing pieces here for the pci subsystem.
Signed-off-by: Frank Seidel <frank@f-seidel.de>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When removing a bus, 'is_added' should be checked to make sure the
bus has been successfully added by pci_bus_add_child() who will sets
'is_added'.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Weak functions aren't all they're cracked up to be. They lead to
incorrect binaries with some toolchains, they require us to have empty
functions we otherwise wouldn't, and the unused code is not elided
(as of gcc 4.3.2 anyway).
So replace the weak MSI arch hooks with the #define foo foo idiom. We no
longer need empty versions of arch_setup/teardown_msi_irq().
This is less source (by 1 line!), and results in smaller binaries too:
text data bss dec hex filename
9354300 1693916 678424 11726640 b2ef30 build/powerpc/vmlinux-before
9354052 1693852 678424 11726328 b2edf8 build/powerpc/vmlinux-after
Also smaller on x86_64 and arm (iop13xx).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If MSI-X interrupt mode is used by the PCI Express port driver, too
many vectors are allocated and it is not ensured that the right
vectors will be used for the right services. Namely, the PCI Express
specification states that both PCI Express native PME and PCI Express
hotplug will always use the same MSI or MSI-X message for signalling
interrupts, which implies that the same vector will be used by both
of them. Also, the VC service does not use interrupts at all.
Moreover, is not clear which of the vectors allocated by
pci_enable_msix() in the current code will be used for PME and
hotplug and which of them will be used for AER if all of these
services are configured.
For these reasons, rework the allocation of interrupts for PCI
Express ports so that if MSI-X are enabled, the right vectors will be
used for the right purposes.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce new function pci_msix_table_size() returning the size of
the MSI-X table of given PCI device or 0 if the device doesn't
support MSI-X.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCI Express port driver uses 'struct pcie_port_service_id' for
matching port service devices and drivers, but this structure
contains fields that duplicate information from the port device
itself (vendor, device, subvendor, subdevice) and fields that are not
used by any existing port service driver (class, class_mask,
drvier_data). Also, both existing port service drivers (AER and
PCIe HP) don't even use the vendor and device fields for device
matching. Therefore 'struct pcie_port_service_id' can be removed
altogether and the only useful members of it (port_type, service) can
be introduced directly into the port service device and port service
driver structures. That simplifies the code quite a bit and reduces
its size.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The second argument of the ->probe() callback in
struct pcie_port_service_driver is unnecessary and never used.
Remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The function pcie_portdrv_save_config() in portdrv_pci.c is not
necessary. Remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCI Express port driver calls pci_enable_device() before setting
up interrupts, which is wrong, because if there is an interrupt pin
configured for the port, pci_enable_device() will likely set up an
interrupt link for it. However, this shouldn't be done if either
MSI or MSI-X interrupt mode is chosen for the port.
The solution is to call pci_enable_device() after setting up
interrupts, because in that case the interrupt link won't be set up
if MSI or MSI-X are enabled.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCI Express port driver should not attempt to register service
devices that require the ability to generate interrupts if generating
interrupts is not possible. Namely, if the port has no interrupt pin
configured and we cannot set up MSI or MSI-X for it, there is no way
it can generate interrupts and in such a case the port services that
rely on interrupts (PME, PCIe HP, AER) should not be enabled for it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI Express port driver extension, as defined by struct
pcie_port_device_ext in portdrv.h, is allocated and initialized, but
never used (it also is never freed). Extend it to hold the PCI Express
port type as well as the port interrupt mode, change its name and use it
to simplify the code in portdrv_core.c .
Additionally, remove the redundant interrupt_mode member of struct
pcie_device defined in include/linux/pcieport_if.h .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PCIe port driver calls pci_enable_device() during probe but
never calls pci_disable_device() during remove.
Cc: stable@kernel.org
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Prakash's system needs MSI disabled on some bridges, but not all.
This seems to be the minimal fix for 2.6.29, but should be replaced
during 2.6.30.
Signed-off-by: Prakash Punnoor <prakash@punnoor.de>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
The RPA PCI hotplug driver calls EEH routines, so should depend on
EEH. Also PPC_PSERIES implies PPC64, so remove that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Commit 47a8b0cc (Enable PCIe AER only after checking firmware
support) wants to walk the PCI bus in the remove path to disable
AER, and calls pci_walk_bus for downstream bridges.
Unfortunately, in the remove path, we remove devices and bridges
in a depth-first manner, starting with the furthest downstream
bridge and working our way backwards.
The furthest downstream bridges will not have a dev->subordinate,
and we hit a NULL deref in pci_walk_bus.
Check for dev->subordinate first before attempting to walk the
PCI hierarchy below us.
Acked-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
This patch is intended to disable L0s ASPM link state for 82598 (ixgbe)
parts due to the fact that it is possible to corrupt TX data when coming
back out of L0s on some systems. The workaround had been added for 82575
(igb) previously, but did not use the ASPM api. This quirk uses the ASPM
api to prevent the ASPM subsystem from re-enabling the L0s state.
Instead of adding the fix in igb to the ixgbe driver as well it was
decided to move it into a pci quirk. It is necessary to move the fix out
of the driver and into a pci quirk in order to prevent the issue from
occuring prior to driver load to handle the possibility of the device being
passed to a VM via direct assignment.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: AMD 813x B2 devices do not need boot interrupt quirk
PCI: Enable PCIe AER only after checking firmware support
PCI: pciehp: Handle interrupts that happen during initialization.
PCI: don't enable too many HT MSI mappings
PCI: add some sysfs ABI docs
PCI quirk: enable MSI on 8132