* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
This patch adds new version of the PPC440SPe ADMA driver.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The size of the requested and ioremaped memory is off by 1.
Use resource_size() to get the correct value.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Transfers and the test buffer have to be at least align bytes long.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The SH DMA driver by default uses 32-byte transfers, in this mode buffers and
sizes have to be 32-byte aligned. Specifying this requirement also fixes Oopses
with dmatest.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
1/ Error handling code following a kzalloc should free the allocated data.
2/ Report an error when no platform data is detected
Both problems fixed by moving the platform data check before the allocation,
and allows a goto to be killed.
Reported-by: Julia Lawall <julia@diku.dk>
Acked-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds support for the ST-Ericsson COH 901 318 DMA block,
found in the U300 series platforms. It registers a DMA slave for
device I/O and also a memcpy slave for memcpy.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The completion of a pq operation is notified with a null descriptor
appended to the end of the chain. This descriptor needs to be visible
to dma clients otherwise the client is precluded from ensuring all
operations are quiesced before freeing channel resources, i.e. due to
descriptor polling it may get the completion notification ahead of the
interrupt delivered by the null descriptor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 does not support asynchronous error notifications which makes
the driver experience latencies when non-zero pq validate results are
expected. Provide a mechanism for turning off async_xor_val and
async_syndrome_val via Kconfig. This approach is generally useful for
any driver that specifies ASYNC_TX_DISABLE_CHANNEL_SWITCH and would like
to force the async_tx api to fall back to the synchronous path for
certain operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Error interrupts and error completions may cause channel hangs, so
poll the channel status register after a timeout.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
RAID operations cause a system hang on platforms with DCA
(Direct-Cache-Access) enabled. So turn off RAID capabilities in this
case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add at91sam9g45 dependency to drivers/dma/Kconfig
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are cases where we can use this_cpu_ptr and as the result
of using this_cpu_ptr() we no longer need to determine the
currently executing cpu.
In those places no get/put_cpu combination is needed anymore.
The local cpu variable can be eliminated.
Preemption still needs to be disabled and enabled since the
modifications of the per cpu variables is not atomic. There may
be multiple per cpu variables modified and those must all
be from the same processor.
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
cc: Eric Biederman <ebiederm@aristanetworks.com>
cc: Stephen Hemminger <shemminger@vyatta.com>
cc: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
drivers/dma/ioat/dma_v3.c: In function 'ioat3_prep_memset_lock':
drivers/dma/ioat/dma_v3.c:439: warning: 'fill' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:437: warning: 'desc' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c: In function '__ioat3_prep_xor_lock':
drivers/dma/ioat/dma_v3.c:489: warning: 'xor' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:486: warning: 'desc' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c: In function '__ioat3_prep_pq_lock':
drivers/dma/ioat/dma_v3.c:631: warning: 'pq' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:628: warning: 'desc' may be used uninitialized in this function
gcc-4.0, unlike gcc-4.3, does not see that these variables are
initialized before use. Convert the descriptor loops to do-while make
this initialization apparent.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
drivers/dma/ioat/dma_v2.c: In function 'ioat2_dma_prep_memcpy_lock':
drivers/dma/ioat/dma_v2.c:680: warning: 'hw' may be used uninitialized in this function
drivers/dma/ioat/dma_v2.c:681: warning: 'desc' may be used uninitialized in this function
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With the addition of ioat_max_alloc_order it is not clear what the
maximum allocation order is, so document that in the modinfo. Also take
an opportunity to kill a stray semicolon.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch reworks platform driver power management code
for at_hdmac from legacy late/early callbacks to dev_pm_ops.
The callbacks are converted for CONFIG_SUSPEND like this:
suspend_late() -> suspend_noirq()
resume_early() -> resume_noirq()
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
A new ring implementation and the addition of raid functionality
constitutes a bump in the driver major version number.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch enables DCA support on multiple-IOH/multiple-IIO architectures.
It modifies dca module by replacing single dca_providers list
with dca_domains list, each domain containing separate list of providers.
This approach lets dca driver manage multiple domains, i.e. sets of providers
and requesters mapped back to the same PCI root complex device.
The driver takes care to register each requester to a provider
from the same domain.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
This restriction prevented ASYNC_TX_DMA from being enabled on platform
configurations where DMA address conversion could not be performed in
place on the stack. Since commit 04ce9ab3 ("async_xor: permit callers
to pass in a 'dma/page scribble' region") the async_tx api now either
uses a caller provided 'scribble' buffer, or performs the conversion in
place when sizeof(dma_addr_t) <= sizeof(struct page *).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This supported all DMA channels, and it was tested in SH7722,
SH7780, SH7785 and SH7763.
This can not use with SH DMA API.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Reviewed-by: Matt Fleming <matt@console-pimps.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams wrote:
... DMA-slave clients request specific channels and know the hardware
details at a low level, so it should not be too high an expectation to
push dma mapping responsibility to the client.
Also this patch includes DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE support for
dw_dmac driver.
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
scatterlist into an arbitrary list of hardware address/length pairs.
This allows a single DMA transaction to copy data from several different
devices into a scatterlist at the same time.
This also adds support to enable some controller-specific features such as
external start and external pause for a DMA transaction.
[dan.j.williams@intel.com: rebased on tx_list movement]
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Acked-by: Li Yang <leoli@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When using the Freescale DMA controller in external control mode, both the
request count and external pause bits need to be setup correctly. This was
being done with the same function.
The 83xx controller lacks the external pause feature, but has a similar
feature called external start. This feature requires that the request count
bits be setup correctly.
Split the function into two parts, to make it possible to use the external
start feature on the 83xx controller.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
All the necessary fields for handling an ioat2,3 ring entry can fit into
one cacheline. Move ->len prior to ->txd in struct ioat_ring_ent, and
move allocation of these entries to a hw-cache-aligned kmem cache to
reduce the number of cachelines dirtied for descriptor management.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The tx_list attribute of struct dma_async_tx_descriptor is common to
most, but not all dma driver implementations. None of the upper level
code (dmaengine/async_tx) uses it, so allow drivers to implement it
locally if they need it. This saves sizeof(struct list_head) bytes for
drivers that do not manage descriptors with a linked list (e.g.: ioatdma
v2,3).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop txx9dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop at_hdmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop mv_xor's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop ioatdma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop iop-adma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop fsldma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop dw_dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Trivial cleanup to make the PCI ID table easier to read.
[dan.j.williams@intel.com: extended to v3.2 devices]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ioatdma module is missing aliases for the PCI devices it supports,
so it is not autoloaded on boot. Add a MODULE_DEVICE_TABLE() to get
these aliases.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cleanup routine for the raid cases imposes extra checks for handling
raid descriptors and extended descriptors. If the channel does not
support raid it can avoid this extra overhead by using the ioat2 cleanup
path.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jasper Forest introduces raid offload support via ioat3.2 support. When
raid offload is enabled two (out of 8 channels) will report raid5/raid6
offload capabilities. The remaining channels will only report ioat3.0
capabilities (memcpy).
Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The async_tx api uses the DMA_INTERRUPT operation type to terminate a
chain of issued operations with a callback routine.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a platform advertises pq capabilities, but not xor, then use
ioat3_prep_pqxor and ioat3_prep_pqxor_val to simulate xor support.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds support for raid6 syndrome generation (xor sum of galois
field multiplication products) using up to 8 sources. It can also
perform an pq-zero-sum operation to validate whether the syndrome for a
given set of sources matches a previously computed syndrome.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds a hardware specific self test to be called from ioat_probe.
In the ioat3 case we will have tests for all the different raid
operations, while ioat1 and ioat2 will continue to just test memcpy.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds xor offload support for up to 8 sources. It can also
perform an xor-zero-sum operation to validate whether all given sources
sum to zero, without writing to a destination. Xor descriptors differ
from memcpy in that one operation may require multiple descriptors
depending on the number of sources. When the number of sources exceeds
5 an extended descriptor is needed. These descriptors need to be
accounted for when updating the DMA_COUNT register.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tag completion writes for direct cache access to reduce the latency of
checking for descriptor completions.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Export driver attributes for diagnostic purposes:
'ring_size': total number of descriptors available to the engine
'ring_active': number of descriptors in-flight
'capabilities': supported operation types for this channel
'version': Intel(R) QuickData specfication revision
This also allows some chattiness to be removed from the driver startup
as this information is now available via sysfs.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Up until this point the driver for Intel(R) QuickData Technology
engines, specification versions 2 and 3, were mostly identical save for
a few quirks. Version 3.2 hardware adds many new capabilities (like
raid offload support) requiring some infrastructure that is not relevant
for v2. For better code organization of the new funcionality move v3
and v3.2 support to its own file dma_v3.c, and export some routines from
the base files (dma.c and dma_v2.c) that can be reused directly.
The first new capability included in this code reorganization is support
for v3.2 memset operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds raid5 and raid6 offload capabilities.
Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for adding more operation types to the ioat3 path the
driver needs to honor the DMA_PREP_FENCE flag. For example the async_tx api
will hand xor->memcpy->xor chains to the driver with the 'fence' flag set on
the first xor and the memcpy operation. This flag in turn sets the 'fence'
flag in the descriptor control field telling the hardware that future
descriptors in the chain depend on the result of the current descriptor, so
wait for all writes to complete before starting the next operation.
Note that ioat1 does not prefetch the descriptor chain, so does not
require/support fenced operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some engines have transfer size and address alignment restrictions. Add
a per-operation alignment property to struct dma_device that the async
routines and dmatest can use to check alignment capabilities.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Channel switching is problematic for some dmaengine drivers as the
architecture precludes separating the ->prep from ->submit. In these
cases the driver can select ASYNC_TX_DISABLE_CHANNEL_SWITCH to modify
the async_tx allocator to only return channels that support all of the
required asynchronous operations.
For example MD_RAID456=y selects support for asynchronous xor, xor
validate, pq, pq validate, and memcpy. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=y any channel with all these
capabilities is marked DMA_ASYNC_TX allowing async_tx_find_channel() to
quickly locate compatible channels with the guarantee that dependency
chains will remain on one channel. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=n async_tx_find_channel() may select
channels that lead to operation chains that need to cross channel
boundaries using the async_tx channel switch capability.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Handle descriptor allocation failures by polling for a descriptor. The
driver will force forward progress when polled. In the best case this
polling interval will be the time it takes for one dma memcpy
transaction to complete. In the worst case, channel hang, we will need
to wait 100ms for the cleanup watchdog to fire (ioatdma driver).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Increment the allocation order of the descriptor ring every time we run
out of descriptors up to a maximum of allocation order specified by the
module parameter 'ioat_max_alloc_order'. After each idle period
decrement the allocation order to a minimum order of
'ioat_ring_alloc_order' (i.e. the default ring size, tunable as a module
parameter).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to support dynamic resizing of the descriptor ring or polling
for a descriptor in the presence of a hung channel the reset handler
needs to make progress while in a non-preemptible context. The current
workqueue implementation precludes polling channel reset completion
under spin_lock().
This conversion also allows us to return to opportunistic cleanup in the
ioat2 case as the timer implementation guarantees at least one cleanup
after every descriptor is submitted. This means the worst case
completion latency becomes the timer frequency (for exceptional
circumstances), but with the benefit of avoiding busy waiting when the
lock is contended.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save 4 bytes per software descriptor by transmitting tx_cnt in an unused
portion of the hardware descriptor.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mark all single use initialization routines with __devinit.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The register write in ioat_dma_cleanup_tasklet is unfortunate in two
ways:
1/ It clears the extra 'enable' bits that we set at alloc_chan_resources time
2/ It gives the impression that it disables interrupts when it is in
fact re-arming interrupts
[ Impact: fix, persist the value of the chanctrl register when re-arming ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Don't trust that the reserved bits are always zero, also sanity check
the returned value.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cleanup path makes an effort to only perform an atomic read of the
64-bit completion address. However in the 32-bit case it does not
matter if we read the upper-32 and lower-32 non-atomically because the
upper-32 will always be zero.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Provide some output for debugging the driver.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The unified ioat1/ioat2 ioat_dma_unmap() implementation derives the
source and dest addresses from the unmap descriptor. There is no longer
a need to track this information in struct ioat_desc_sw.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace the current linked list munged into a ring with a native ring
buffer implementation. The benefit of this approach is reduced overhead
as many parameters can be derived from ring position with simple pointer
comparisons and descriptor allocation/freeing becomes just a
manipulation of head/tail pointers.
It requires a contiguous allocation for the software descriptor
information.
Since this arrangement is significantly different from the ioat1 chain,
move ioat2,3 support into its own file and header. Common routines are
exported from driver/dma/ioat/dma.[ch].
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prepare the code for the conversion of the ioat2 linked-list-ring into a
native ring buffer. After this conversion ioat2 channels will share
less of the ioat1 infrastructure, but there will still be places where
sharing is possible. struct ioat_chan_common is created to house the
channel attributes that will remain common between ioat1 and ioat2
channels.
For every routine that accesses both common and hardware specific fields
the old unified 'ioat_chan' pointer is split into an 'ioat' and 'chan'
pointer. Where 'chan' references common fields and 'ioat' the
hardware/version specific.
[ Impact: pure structure member movement/variable renames, no logic changes ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a callback is to be attached to a descriptor the channel needs to
know at ->prep time so it can set the interrupt enable bit. This is in
preparation for moving descriptor ioat2 descriptor preparation from
->submit to ->prep.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The async_tx api assumes that after a successful ->prep a subsequent
->submit will not fail due to a lack of resources.
This also fixes a bug in the allocation failure case. Previously the
descriptors allocated prior to the allocation failure would not be
returned to the free list.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This cleans up a mess of and'ing and or'ing bit definitions, and allows
simple assignments from the specified dma_ctrl_flags parameter.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
->dmacount tracks the sequence number of active descriptors. It is
written to the DMACOUNT register to update the channel's view of pending
descriptors in the chain. The register is 16-bits so ->dmacount should
be unsigned and 16-bit as well. Also modify ->desccount to maintain
alignment.
This was never a problem in practice because we never compared dmacount
values, but this is a bug waiting to happen.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Towards the removal of ioatdma_device.version split the initialization
path into distinct versions. This conversion:
1/ moves version specific probe code to version specific routines
2/ removes the need for ioat_device
3/ turns off the ioat1 msi quirk if the device is reinitialized for intx
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The only .c files that utilize these protected prototypes depend on
CONFIG_INTEL_IOATDMA=y, so there is no value gained in providing empty
prototypes.
[ Impact: pure cleanup ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* reduce device->common. to dma-> in ioat_dma_{probe,remove,selftest}
* ioat_lookup_chan_by_index to ioat_chan_by_index
* multi-line function definitions
* ioat_desc_sw.async_tx to ioat_desc_sw.txd
* desc->txd. to tx-> in cleanup routine
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The driver currently duplicates much of what these routines offer, so
just use the common code. For example ->irq_mode tracks what interrupt
mode was initialized, which duplicates the ->msix_enabled and
->msi_enabled handling in pcim_release.
This also adds a check to the return value of dma_async_device_register,
which can fail.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some of these defines may be useful outside of dma.c and the header is
private so there are no namespace pollution concerns.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Even though the intent is to extend dmatest with P+Q tests there is
still value in having an always-on sanity check to prevent an
unintentionally broken driver from registering.
This depends on raid6_pq.ko for verification, the side effect being that
PQ capable channels will fail to register when raid6 is disabled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
iop33x support is not included because that engine is a bit more awkward
to handle in that it can either be in xor mode or pq mode. The
dmaengine/async_tx layers currently only comprehend static capabilities.
Note iop13xx does not support hardware PQ continuation so the driver
must handle the DMA_PREP_CONTINUE flag for operations across > 16
sources. From the comment for dma_maxpq:
/* When an engine does not support native continuation we need 3 extra
* source slots to reuse P and Q with the following coefficients:
* 1/ {00} * P : remove P from Q', but use it as a source for P'
* 2/ {01} * Q : use Q to continue Q' calculation
* 3/ {00} * Q : subtract Q from P' to cancel (2)
*/
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
lockdep correctly identifies a potential recursive locking case for
iop_chan->lock, but in the dependency submission case we expect that the same
class will be acquired for both the parent dependency and the child channel.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Test raid6 p+q operations with a simple "always multiply by 1" q
calculation to fit into dmatest's current destination verification
scheme.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[ Based on an original patch by Yuri Tikhonov ]
This adds support for doing asynchronous GF multiplication by adding
two additional functions to the async_tx API:
async_gen_syndrome() does simultaneous XOR and Galois field
multiplication of sources.
async_syndrome_val() validates the given source buffers against known P
and Q values.
When a request is made to run async_pq against more than the hardware
maximum number of supported sources we need to reuse the previous
generated P and Q values as sources into the next operation. Care must
be taken to remove Q from P' and P from Q'. For example to perform a 5
source pq op with hardware that only supports 4 sources at a time the
following approach is taken:
p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))
p' = p + q + q + src4 = p + src4
q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4
Note: 4 is the minimum acceptable maxpq otherwise we punt to
synchronous-software path.
The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
sources (in the above manner) and fill the remaining slots up to maxpq
with the new sources/coefficients.
Note1: Some devices have native support for P+Q continuation and can skip
this extra work. Devices with this capability can advertise it with
dma_set_maxpq. It is up to each driver how to handle the
DMA_PREP_CONTINUE flag.
Note2: The api supports disabling the generation of P when generating Q,
this is ignored by the synchronous path but is implemented by some dma
devices to save unnecessary writes. In this case the continuation
algorithm is simplified to only reuse Q as a source.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We currently walk the parent chain when waiting for a given tx to
complete however this walk may race with the driver cleanup routine.
The routines in async_raid6_recov.c may fall back to the synchronous
path at any point so we need to be prepared to call async_tx_quiesce()
(which calls dma_wait_for_async_tx). To remove the ->parent walk we
guarantee that every time a dependency is attached ->issue_pending() is
invoked, then we can simply poll the initial descriptor until
completion.
This also allows for a lighter weight 'issue pending' implementation as
there is no longer a requirement to iterate through all the channels'
->issue_pending() routines as long as operations have been submitted in
an ordered chain. async_tx_issue_pending() is added for this case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmaengine: at_hdmac: add DMA slave transfers
dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller
dmaengine: dmatest: correct thread_count while using multiple thread per channel
dmaengine: dmatest: add a maximum number of test iterations
drivers/dma: Remove unnecessary semicolons
drivers/dma/fsldma.c: Remove unnecessary semicolons
dmaengine: move HIGHMEM64G restriction to ASYNC_TX_DMA
fsldma: do not clear bandwidth control bits on the 83xx controller
fsldma: enable external start for the 83xx controller
fsldma: use PCI Read Multiple command
When first created the ioat driver was the only inhabitant of
drivers/dma/. Now, it is the only multi-file (more than a .c and a .h)
driver in the directory. Moving it to an ioat/ subdirectory allows the
naming convention to be cleaned up, and allows for future splitting of
the source files by hardware version (v1, v2, and v3).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch for at_hdmac adds the slave transfers capability to the Atmel DMA
controller available on some AT91 SOCs. This allow peripheral to memory and
memory to peripheral transfers with hardware handshaking.
Slave structure for controller specific information is passed through channel
private data. This at_dma_slave structure is defined in at_hdmac.h header file
and relative hardware definition are moved to this file from at_hdmac_regs.h.
Doing this we allow the channel configuration from platform definition code.
This work is intensively based on dw_dmac and several slave implementations.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This AHB DMA Controller (aka HDMA or DMAC on AT91 systems) is availlable on
at91sam9rl chip. It will be used on other products in the future.
This first release covers only the memory-to-memory tranfer type. This is the
only tranfer type supported by this chip. On other products, it will be used
also for peripheral DMA transfer (slave API support to come).
I used dmatest client without problem in different configurations to test it.
Full documentation for this controller can be found in the SAM9RL datasheet:
http://www.atmel.com/dyn/products/product_card.asp?part_id=4243
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It seems that thread_count is not properly calculated in dmatest.
In fact the thread count number that is returned from dmatest_add_threads() is
not correctly added to the thread_count and thus not properly printed.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>