Commit graph

201 commits

Author SHA1 Message Date
Denys Vlasenko
1ce73e8d6d [CRYPTO] camellia: Code cleanup
Optimize GETU32 to use 4-byte memcpy (modern gcc will convert
such memcpy to single move instruction on i386).
Original GETU32 did four byte fetches, and shifted/XORed those.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:07 +11:00
Denys Vlasenko
3a5e5f8108 [CRYPTO] camellia: Code cleanup
Rename some macros to shorter names: CAMELLIA_RR8 -> ROR8,
making it easier to understand that it is just a right rotation,
nothing camellia-specific in it.
CAMELLIA_SUBKEY_L() -> SUBKEY_L() - just shorter.

Move be32 <-> cpu conversions out of en/decrypt128/256 and into
camellia_en/decrypt - no reason to have that code duplicated twice.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:07 +11:00
Denys Vlasenko
1721a81256 [CRYPTO] camellia: Code cleanup
Move code blocks around so that related pieces are closer together:
e.g. CAMELLIA_ROUNDSM macro does not need to be separated
from the rest of the code by huge array of constants.

Remove unused macros (COPY4WORD, SWAP4WORD, XOR4WORD[2])

Drop SUBL(), SUBR() macros which only obscure things.
Same for CAMELLIA_SP1110() macro and KEY_TABLE_TYPE typedef.

Remove useless comments:
/* encryption */ -- well it's obvious enough already!
void camellia_encrypt128(...)

Combine swap with copying at the beginning/end of encrypt/decrypt.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:06 +11:00
Denys Vlasenko
e2b21b5002 [CRYPTO] twofish: Do not unroll big stuff in twofish key setup
Currently twofish cipher key setup code
has unrolled loops - approximately 70-100
instructions are repeated 40 times.

As a result, twofish module is the biggest module
in crypto/*.

Unrolling produces x2.5 more code (+18k on i386), and speeds up key
setup by 7%:

	unrolled: twofish_setkey/sec: 41128
	    loop: twofish_setkey/sec: 38148
	CALC_K256: ~100 insns each
	CALC_K192: ~90 insns
	   CALC_K: ~70 insns

Attached patch removes this unrolling.

$ size */twofish_common.o
   text    data     bss     dec     hex filename
  37920       0       0   37920    9420 crypto.org/twofish_common.o
  13209       0       0   13209    3399 crypto/twofish_common.o

Run tested (modprobe tcrypt reports ok). Please apply.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:06 +11:00
Sebastian Siewior
89e1265431 [CRYPTO] aes: Move common defines into a header file
This three defines are used in all AES related hardware.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:04 +11:00
Evgeniy Polyakov
c3041f9c93 [CRYPTO] hifn_795x: Detect weak keys
HIFN driver update to use DES weak key checks (exported in this patch).

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:03 +11:00
Evgeniy Polyakov
16d004a2ed [CRYPTO] des: Create header file for common macros
This patch creates include/crypto/des.h for common macros shared between
DES implementations.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:02 +11:00
Joy Latten
23e353c8a6 [CRYPTO] ctr: Add CTR (Counter) block cipher mode
This patch implements CTR mode for IPsec.
It is based off of RFC 3686.

Please note:
1. CTR turns a block cipher into a stream cipher.
Encryption is done in blocks, however the last block
may be a partial block.

A "counter block" is encrypted, creating a keystream
that is xor'ed with the plaintext. The counter portion
of the counter block is incremented after each block
of plaintext is encrypted.
Decryption is performed in same manner.

2. The CTR counterblock is composed of,
        nonce + IV + counter

The size of the counterblock is equivalent to the
blocksize of the cipher.
        sizeof(nonce) + sizeof(IV) + sizeof(counter) = blocksize

The CTR template requires the name of the cipher
algorithm, the sizeof the nonce, and the sizeof the iv.
        ctr(cipher,sizeof_nonce,sizeof_iv)

So for example,
        ctr(aes,4,8)
specifies the counterblock will be composed of 4 bytes
from a nonce, 8 bytes from the iv, and 4 bytes for counter
since aes has a blocksize of 16 bytes.

3. The counter portion of the counter block is stored
in big endian for conformance to rfc 3686.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-11 08:16:01 +11:00
Herbert Xu
38cb2419f5 [CRYPTO] api: Fix potential race in crypto_remove_spawn
As it is crypto_remove_spawn may try to unregister an instance which is
yet to be registered.  This patch fixes this by checking whether the
instance has been registered before attempting to remove it.

It also removes a bogus cra_destroy check in crypto_register_instance as
1) it's outside the mutex;
2) we have a check in __crypto_register_alg already.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-23 19:32:09 +08:00
Herbert Xu
f347c4facf [CRYPTO] authenc: Move initialisations up to shut up gcc
It seems that newer versions of gcc have regressed in their abilities to
analyse initialisations.  This patch moves the initialisations up to avoid
the warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-23 19:32:09 +08:00
Adrian Bunk
87ae9afdca cleanup asm/scatterlist.h includes
Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
Herbert Xu
a5a613a429 [CRYPTO] tcrypt: Move sg_init_table out of timing loops
This patch moves the sg_init_table out of the timing loops for hash
algorithms so that it doesn't impact on the speed test results.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-27 00:51:21 -07:00
David S. Miller
b733588559 [CRYPTO]: Initialize TCRYPT on-stack scatterlist objects correctly.
Use sg_init_one() and sg_init_table() as needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 00:38:10 -07:00
David S. Miller
a6767721a5 [CRYPTO]: HMAC needs some more scatterlist fixups.
hmac_setkey(), hmac_init(), and hmac_final() have
a singular on-stack scatterlist.  Initialit is
using sg_init_one() instead of using sg_set_buf().

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 00:37:12 -07:00
Vlad Yasevich
41fb285430 [CRYPTO]: Fix hmac_digest from the SG breakage.
Crypto now uses SG helper functions.  Fix hmac_digest to use those
functions correctly and fix the oops associated with it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-25 18:46:26 -07:00
Jens Axboe
642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Jens Axboe
78c2f0b8c2 [SG] Update crypto/ to sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 19:40:16 +02:00
John Anthony Kazos Jr
991d17403c crypto: convert "crypto" subdirectory to UTF-8
Convert the subdirectory "crypto" to UTF-8. The files changed are
<crypto/fcrypt.c> and <crypto/api.c>.

Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:06:17 +02:00
Jens Axboe
ab83407e9e crypto: don't pollute the global namespace with sg_next()
It's a subsystem function, prefix it as such.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:07:09 +02:00
Jan Glauber
5265eeb2b0 [CRYPTO] sha: Add header file for SHA definitions
There are currently several SHA implementations that all define their own
initialization vectors and size values. Since this values are idential
move them to a header file under include/crypto.

Signed-off-by: Jan Glauber <jang@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:50 -07:00
Sebastian Siewior
ad5d27899f [CRYPTO] sha: Load the SHA[1|256] module by an alias
Loading the crypto algorithm by the alias instead of by module directly
has the advantage that all possible implementations of this algorithm
are loaded automatically and the crypto API can choose the best one
depending on its priority.

Additionally it ensures that the generic implementation as well as the
HW driver (if available) is loaded in case the HW driver needs the
generic version as fallback in corner cases.

Also remove the probe for sha1 in padlock's init code.

Quote from Herbert:
  The probe is actually pointless since we can always probe when
  the algorithm is actually used which does not lead to dead-locks
  like this.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:50 -07:00
Sebastian Siewior
f8246af005 [CRYPTO] aes: Rename aes to aes-generic
Loading the crypto algorithm by the alias instead of by module directly
has the advantage that all possible implementations of this algorithm
are loaded automatically and the crypto API can choose the best one
depending on its priority.

Additionally it ensures that the generic implementation as well as the
HW driver (if available) is loaded in case the HW driver needs the
generic version as fallback in corner cases.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:49 -07:00
Sebastian Siewior
c5a511f1cd [CRYPTO] des: Rename des to des-generic
Loading the crypto algorithm by the alias instead of by module directly
has the advantage that all possible implementations of this algorithm
are loaded automatically and the crypto API can choose the best one
depending on its priority.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:49 -07:00
Herbert Xu
7607bd8ff0 [CRYPTO] blkcipher: Added blkcipher_walk_virt_block
This patch adds the helper blkcipher_walk_virt_block which is similar to
blkcipher_walk_virt but uses a supplied block size instead of the block
size of the block cipher.  This is useful for CTR where the block size is
1 but we still want to walk by the block size of the underlying cipher.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:48 -07:00
Herbert Xu
2614de1b9a [CRYPTO] blkcipher: Increase kmalloc amount to aligned block size
Now that the block size is no longer a multiple of the alignment, we need to
increase the kmalloc amount in blkcipher_next_slow to use the aligned block
size.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:48 -07:00
Herbert Xu
d8058480b3 [CRYPTO] api: Explain the comparison on larval cra_name
This patch adds a comment to explain why we compare the cra_driver_name of
the algorithm being registered against the cra_name of a larval as opposed
to the cra_driver_name of the larval.

In fact larvals have only one name, cra_name which is the name that was
requested by the user.  The test here is simply trying to find out whether
the algorithm being registered can or can not satisfy the larval.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:47 -07:00
Herbert Xu
70613783fc [CRYPTO] blkcipher: Remove alignment restriction on block size
Previously we assumed for convenience that the block size is a multiple of
the algorithm's required alignment.  With the pending addition of CTR this
will no longer be the case as the block size will be 1 due to it being a
stream cipher.  However, the alignment requirement will be that of the
underlying implementation which will most likely be greater than 1.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:46 -07:00
Herbert Xu
e4c5c6c9b0 [CRYPTO] authenc: Kill spaces in algorithm names
We do not allow spaces in algorithm names or parameters.  Thanks to Joy Latten
for pointing this out.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:46 -07:00
Herbert Xu
720a650f8a [CRYPTO] cryptomgr: Fix parsing of recursive algorithms
As Joy Latten points out, inner algorithm parameters will miss the closing
bracket which will also cause the outer algorithm to terminate prematurely.

This patch fixes that also kills the WARN_ON if the number of parameters
exceed the maximum as that is a user error.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:45 -07:00
Rik Snel
f19f5111c9 [CRYPTO] xts: XTS blockcipher mode implementation without partial blocks
XTS currently considered to be the successor of the LRW mode by the IEEE1619
workgroup. LRW was discarded, because it was not secure if the encyption key
itself is encrypted with LRW.

XTS does not have this problem. The implementation is pretty straightforward,
a new function was added to gf128mul to handle GF(128) elements in ble format.
Four testvectors from the specification
	http://grouper.ieee.org/groups/1619/email/pdf00086.pdf
were added, and they verify on my system.

Signed-off-by: Rik Snel <rsnel@cube.dyndns.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:45 -07:00
Ingo Oeser
5aaff0c8f7 [CRYPTO] blkcipher: Use max() in blkcipher_get_spot() to state the intention
Use max in blkcipher_get_spot() instead of open coding it.

Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:44 -07:00
Herbert Xu
70dec235d8 [CRYPTO] api: Kill crypto_km_types
When scatterwalk is built as a module digest.c was broken because it
requires the crypto_km_types structure which is in scatterwalk.  This
patch removes the crypto_km_types structure by encoding the logic into
crypto_kmap_type directly.

In fact, this even saves a few bytes of code (not to mention the data
structure itself) on i386 which is about the only place where it's
needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:44 -07:00
Herbert Xu
3c09f17c3d [CRYPTO] aead: Add authenc
This patch adds the authenc algorithm which constructs an AEAD algorithm
from an asynchronous block cipher and a hash.  The construction is done
by concatenating the encrypted result from the cipher with the output
from the hash, as is used by the IPsec ESP protocol.

The authenc algorithm exists as a template with four parameters:

	authenc(auth, authsize, enc, enckeylen).

The authentication algorithm, the authentication size (i.e., truncating
the output of the authentication algorithm), the encryption algorithm,
and the encryption key length.  Both the size field and the key length
field are in bytes.  For example, AES-128 with SHA1-HMAC would be
represented by

	authenc(hmac(sha1), 12, cbc(aes), 16)

The key for the authenc algorithm is the concatenation of the keys for
the authentication algorithm with the encryption algorithm.  For the
above example, if a key of length 36 bytes is given, then hmac(sha1)
would receive the first 20 bytes while the last 16 would be given to
cbc(aes).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:43 -07:00
Herbert Xu
5fa0fea274 [CRYPTO] scatterwalk: Add scatterwalk_map_and_copy
This patch adds the function scatterwalk_map_and_copy which reads or
writes a chunk of data from a scatterlist at a given offset.  It will
be used by authenc which would read/write the authentication data at
the end of the cipher/plain text.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:42 -07:00
Herbert Xu
e962a653f3 [CRYPTO] api: Move scatterwalk into algapi
The scatterwalk code is only used by algorithms that can be built as
a module.  Therefore we can move it into algapi.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:41 -07:00
Herbert Xu
2de98e7544 [CRYPTO] ablkcipher: Remove queue pointer from common alg object
Since not everyone needs a queue pointer and those who need it can
always get it from the context anyway the queue pointer in the
common alg object is redundant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:41 -07:00
Herbert Xu
791b4d5f73 [CRYPTO] api: Add missing headers for setkey_unaligned
This patch ensures that kernel.h and slab.h are included for
the setkey_unaligned function.  It also breaks a couple of
long lines.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:40 -07:00
Herbert Xu
39e1ee011f [CRYPTO] api: Add support for multiple template parameters
This patch adds support for having multiple parameters to
a template, separated by a comma.  It also adds support
for integer parameters in addition to the current algorithm
parameter type.

This will be used by the authenc template which will have
four parameters: the authentication algorithm, the encryption
algorithm, the authentication size and the encryption key
length.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:40 -07:00
Herbert Xu
1ae978208e [CRYPTO] api: Add aead crypto type
This patch adds crypto_aead which is the interface for AEAD
(Authenticated Encryption with Associated Data) algorithms.

AEAD algorithms perform authentication and encryption in one
step.  Traditionally users (such as IPsec) would use two
different crypto algorithms to perform these.  With AEAD
this comes down to one algorithm and one operation.

Of course if traditional algorithms were used we'd still
be doing two operations underneath.  However, real AEAD
algorithms may allow the underlying operations to be
optimised as well.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:39 -07:00
Hye-Shik Chang
e2ee95b8c6 [CRYPTO] seed: New cipher algorithm
This patch adds support for the SEED cipher (RFC4269).

This patch have been used in few VPN appliance vendors in Korea for
several years.  And it was verified by KISA, who developed the
algorithm itself.

As its importance in Korean banking industry, it would be great
if linux incorporates the support.

Signed-off-by: Hye-Shik Chang <perky@FreeBSD.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:38 -07:00
Adrian Bunk
a349365e5e [CRYPTO] Kconfig: Remove "default m"s
Other options requiring specific block cipher algorithms already have
the appropriate select's.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-10-10 16:55:36 -07:00
Dan Williams
6247cdc2cd async_tx: fix dma_wait_for_async_tx
Fix dma_wait_for_async_tx to not loop forever in the case where a
dependency chain is longer than two entries.  This condition will not
happen with current in-kernel drivers, but fix it for future drivers.

Found-by: Saeed Bishara <saeed.bishara@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2007-09-24 10:26:26 -07:00
Herbert Xu
32528d0fbd [CRYPTO] blkcipher: Fix inverted test in blkcipher_get_spot
The previous patch had the conditional inverted.  This patch fixes it
so that we return the original position if it does not straddle a page.

Thanks to Bob Gilligan for spotting this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-09-10 15:51:11 +08:00
Herbert Xu
e4630f9fd8 [CRYPTO] blkcipher: Fix handling of kmalloc page straddling
The function blkcipher_get_spot tries to return a buffer of
the specified length that does not straddle a page.  It has
an off-by-one bug so it may advance a page unnecessarily.

What's worse, one of its callers doesn't provide a buffer
that's sufficiently long for this operation.

This patch fixes both problems.  Thanks to Bob Gilligan for
diagnosing this problem and providing a fix.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-09-09 08:45:21 +01:00
Sebastian Siewior
0681717678 [CRYPTO] api: fix writting into unallocated memory in setkey_aligned
setkey_unaligned() commited in ca7c39385c
overwrites unallocated memory in the following memset() because
I used the wrong buffer length.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-08-06 15:33:56 +08:00
Dan Williams
eb0645a8b1 async_tx: fix kmap_atomic usage in async_memcpy
Andrew Morton:
	[async_memcpy] is very wrong if both ASYNC_TX_KMAP_DST and
	ASYNC_TX_KMAP_SRC can ever be set.  We'll end up using the same kmap
	slot for both src add dest and we get either corrupted data or a BUG.

Evgeniy Polyakov:
	Btw, shouldn't it always be kmap_atomic() even if flag is not set.
	That pages are usual one returned by alloc_page().

So fix the usage of kmap_atomic and kill the ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC flags.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-20 08:44:19 -07:00
Pavel Emelianov
13d31894b3 Make crypto API use seq_list_xxx helpers
Simple and stupid - just use the same code from another place in the kernel.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:42 -07:00
David S. Miller
d09f51b699 Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
Conflicts:

	crypto/Kconfig
2007-07-14 23:47:04 -07:00
Dan Williams
9bc89cd82d async_tx: add the async_tx api
The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies.  It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations.  Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources. 
 
	I imagine that any piece of ADMA hardware would register with the
	'async_*' subsystem, and a call to async_X would be routed as
	appropriate, or be run in-line. - Neil Brown

async_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:

struct dma_async_tx_descriptor *
async_<operation>(..., struct dma_async_tx_descriptor *depend_tx,
			dma_async_tx_callback cb_fn, void *cb_param)
{
	struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>);
	struct dma_device *device = chan ? chan->device : NULL;
	int int_en = cb_fn ? 1 : 0;
	struct dma_async_tx_descriptor *tx = device ?
		device->device_prep_dma_<operation>(chan, len, int_en) : NULL;

	if (tx) { /* run <operation> asynchronously */
		...
		tx->tx_set_dest(addr, tx, index);
		...
		tx->tx_set_src(addr, tx, index);
		...
		async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
	} else { /* run <operation> synchronously */
		...
		<operation>
		...
		async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
	}

	return tx;
}

async_tx_find_channel() returns a capable channel from its pool.  The
channel pool is organized as a per-cpu array of channel pointers.  The
async_tx_rebalance() routine is tasked with managing these arrays.  In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities.  For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor.  In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1.  When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel.  A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.

Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api.  A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit.  With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.
 
On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement).  For the other cases performance was roughly equal, +/- a few
percentage points.  On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation.  According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.
 
The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available

Details:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
  async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
  fire at operation completion time and the callback will occur in a
  tasklet.  if the the channel does not support interrupts then a live
  polling wait will be performed
* the api is written as a dmaengine client that requests all available
  channels
* In support of dependencies the api implicitly schedules channel-switch
  interrupts.  The interrupt triggers the cleanup tasklet which causes
  pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
  xor routine.  To the software routine the destination address is an implied
  source, whereas engines treat it as a write-only destination.  This patch
  modifies the xor_blocks routine to take a an explicit destination address
  to mirror the hardware.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts <= 1
* implicitly handle hardware concerns like channel switching and
  interrupts, Neil Brown
* remove the per operation type list, and distribute operation capabilities
  evenly amongst the available channels
* simplify async_tx_find_channel to optimize the fast path
* introduce the channel_table_initialized flag to prevent early calls to
  the api
* reorganize the code to mimic crypto
* include mm.h as not all archs include it in dma-mapping.h
* make the Kconfig options non-user visible, Adrian Bunk
* move async_tx under crypto since it is meant as 'core' functionality, and
  the two may share algorithms in the future
* move large inline functions into c files
* checkpatch.pl fixes
* gpl v2 only correction

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-By: NeilBrown <neilb@suse.de>
2007-07-13 08:06:14 -07:00
Dan Williams
685784aaf3 xor: make 'xor_blocks' a library routine for use with async_tx
The async_tx api tries to use a dma engine for an operation, but will fall
back to an optimized software routine otherwise.  Xor support is
implemented using the raid5 xor routines.  For organizational purposes this
routine is moved to a common area.

The following fixes are also made:
* rename xor_block => xor_blocks, suggested by Adrian Bunk
* ensure that xor.o initializes before md.o in the built-in case
* checkpatch.pl fixes
* mark calibrate_xor_blocks __init, Adrian Bunk

Cc: Adrian Bunk <bunk@stusta.de>
Cc: NeilBrown <neilb@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2007-07-13 08:06:14 -07:00