This patch adds unaligned buffer tests for hashes.
The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds unaligned buffer tests for AEADs.
The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds unaligned buffer tests for blkciphers.
The first new test is with one byte offset and the second test checks if
cra_alignmask for driver is big enough; for example, for testing a case
where cra_alignmask is set to 7, but driver really needs buffers to be
aligned to 16 bytes.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Patch adds check for alg_test_descs list order, so that accidentically
misplaced entries are found quicker. Duplicate entries are also checked for.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit cf1521a1a5.
Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX
implementation is therefore faster and this implementation should be removed.
Converting this implementation to use the same method as in twofish/AVX for
table look-ups would give additional ~3% speed up vs twofish/AVX, but would
hardly be worth of the added code and binary size.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commit 6048801070.
Instruction (vpgatherdd) that this implementation relied on turned out to be
slow performer on real hardware (i5-4570). The previous 4-way blowfish
implementation is therefore faster and this implementation should be removed.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add implementation tuned for more performance on real hardware. Changes are
mostly around the part mixing 128-bit extract and insert instructions and
AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
'vpshufb with zero mask'.
Tests on Intel Core i5-4570:
tcrypt ECB results, old-AVX2 vs new-AVX2:
size 128bit key 256bit key
enc dec enc dec
256 1.00x 1.00x 1.00x 1.00x
1k 1.08x 1.09x 1.05x 1.06x
8k 1.06x 1.06x 1.06x 1.06x
tcrypt ECB results, AVX vs new-AVX2:
size 128bit key 256bit key
enc dec enc dec
256 1.00x 1.00x 1.00x 1.00x
1k 1.51x 1.50x 1.52x 1.50x
8k 1.47x 1.48x 1.48x 1.48x
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The MODULE_LICENSE macro invocation must use either "GPL" or "GPL v2",
but not "GPLv2" in order to be detected by the module loader.
This fixes the allmodconfig build error:
FATAL: modpost: GPL-incompatible module bcm2835-rng.ko uses GPL-only symbol 'platform_driver_unregister'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dom Cobley <popcornmix@gmail.com>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: linux-rpi-kernel@lists.infradead.org
Acked-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.
Patch changes XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Nomadik HW RNG driver has seen some rust and is not preparing
the clock before use. Fix this up so we get rid of runtime
complaints from the clock subsystem.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
These symbols are referenced only in this file and hence
should be static.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Tobias Rauter <tobiasrauter@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use NULL instead of 0 for pointer variables.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Tobias Rauter <tobiasrauter@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
devm_* APIs are device managed and make cleanup and exit
code simpler.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Tobias Rauter <tobiasrauter@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Tested-by: Tobias Rauter <tobiasrauter@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the wrapper functions for getting and setting the driver data using
platform_device instead of using dev_{get,set}_drvdata() with &pdev->dev,
so we can directly pass a struct platform_device.
Also, unnecessary dev_set_drvdata() is removed, because the driver core
clears the driver data to NULL after device_release or on probe failure.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API for every block causes serious perf hit to
crypto operations that are done on a long buffer.
As crypto is performed on a page boundary, encrypting large buffers can
cause a series of crypto operations divided by page. The runtime PM API
is also called those many times.
We call runtime_pm_get_sync only at beginning on the session (cra_init)
and runtime_pm_put at the end. This result in upto a 50% speedup as below.
This doesn't make the driver to keep the system awake as runtime get/put
is only called during a crypto session which completes usually quickly.
Before:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 13310 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 64 size blocks: 13040 aes-128-cbc's in 0.04s
Doing aes-128-cbc for 3s on 256 size blocks: 9134 aes-128-cbc's in 0.03s
Doing aes-128-cbc for 3s on 1024 size blocks: 8939 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 8192 size blocks: 4299 aes-128-cbc's in 0.00s
After:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 18911 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 64 size blocks: 18878 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 256 size blocks: 11878 aes-128-cbc's in 0.10s
Doing aes-128-cbc for 3s on 1024 size blocks: 11538 aes-128-cbc's in 0.05s
Doing aes-128-cbc for 3s on 8192 size blocks: 4857 aes-128-cbc's in 0.03s
While at it, also drop enter and exit pr_debugs, in related code. tracers
can be used for that.
Tested on a Beaglebone (AM335x SoC) board.
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sahara crypto driver has an incorrect MODULE_DEVICE_TABLE, which
prevents us from actually building this driver as a loadable module.
sahara_dt_ids is a of_device_id array, so we have to use
MODULE_DEVICE_TABLE(of, ...).
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Javier Martin <javier.martin@vista-silicon.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of
workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly
slower than blowfish-amd64. So disable the AVX2 implementation to avoid
performance regressions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of
workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly
slower than twofish_avx. So disable the AVX2 implementation to avoid
performance regressions.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
lib/crc-t10dif.c:42:1-3: WARNING: PTR_RET can be used
Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR
Generated by: coccinelle/api/ptr_ret.cocci
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add sha224 implementation to sha256_ssse3 module.
This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used
before 'sha256'. Previously in such case, just sha256_generic was loaded and
not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used
after 'sha224' usage, sha256_ssse3 would remain unloaded.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add sha384 implementation to sha512_ssse3 module.
This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used
before 'sha512'. Previously in such case, just sha512_generic was loaded and
not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used
after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this
happens with tcrypt testing module since it tests 'sha384' before 'sha512'.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'sha512_generic' should set driver name now that there is alternative sha512
provider (sha512_ssse3).
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
free_irq() expects the same pointer that was passed to request_irq(), otherwise
the IRQ is not freed.
The issue was found using the following coccinelle script:
<smpl>
@r1@
type T;
T devid;
@@
request_irq(..., devid)
@r2@
type r1.T;
T devid;
position p;
@@
free_irq@p(..., devid)
@@
position p != r2.p;
@@
*free_irq@p(...)
</smpl>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch enables the DCP crypto functionality on imx28.
Currently, only aes-128-cbc is supported.
Moreover, the dcpboot misc-device, which is used by Freescale's
SDK tools and uses a non-software-readable OTP-key, is added.
Changes of v2:
- ring buffer for hardware-descriptors
- use of ablkcipher walk
- OTP key encryption/decryption via misc-device
(compatible to Freescale-SDK)
- overall cleanup
The DCP is also capable of sha1/sha256 but I won't be able to add
that anytime soon.
Tested with built-in runtime-self-test, tcrypt and openssl via
cryptodev 1.6 on imx28-evk and a custom built imx28-board.
Signed-off-by: Tobias Rauter <tobias.rauter@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add Class Context SRC / DEST flags for the LOAD & STORE commands
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add define for "Adjust Output Frame Length" in order to
set the AOFL bit in the IPsec ESP Decapsulation PDB.
Signed-off-by: Anca-Jeanina Floarea <anca.floarea@freescale.com>
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Store command has options to overwrite the Job Desc, Shared Desc or
the entire Descriptor in memory, using the address from
which the Descriptor was fetched.
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
added all supported math funtion on 8 byte boundary with
immediate flag bit set automatically
added MATH_SRC0_DPOVRD & MATH_SRC1_DPOVRD
The function/defines above are needed for creating descriptors
longer than 64 words
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Perform 32-bit left shift of DEST and concatenate with
left 32 bits of SRC1. {DEST[31:0],SRC1[63:32]}
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Acked-by: Mihai Serb <mihai.serb@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In case Store command is used with overwrite Shared Descriptor
feature there is no need for pointer, it is using the
address from which the Shared Descriptor was fetched.
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
SEQ IN PTR command does not require pointer if RTO or PRE bit is set
Updated desc_constr.h accordingly.
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Reviewed-by: Phillips Kim-R1AAHA <Kim.Phillips@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The _XFER stack element size was set too small, 8 bytes, when it needs to be
16 bytes. As _XFER is the last stack element used by these implementations,
the 16 byte stores with 'movdqa' corrupt the stack where the value of register
%r12 is temporarily stored. As these implementations align the stack pointer
to 16 bytes, this corruption did not happen every time.
Patch corrects this issue.
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Tested-by: Julian Wollrath <jwollrath@web.de>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Kconfig symbol EXPERIMENTAL was removed in v3.9. So this dependency
makes it impossible to set CRYPTO_DEV_SAHARA. It's unlikely that this is
what is intended, so let's remove this dependency.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d06310
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d06310
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d06310
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The driver core clears the driver data to NULL after device_release
or on probe failure, since commit 0998d06310
(device-core: Ensure drvdata = NULL when no driver is bound).
Thus, it is not needed to manually clear the device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace calls to deprecated devm_request_and_ioremap by devm_ioremap_resource.
Found with coccicheck and this semantic patch:
scripts/coccinelle/api/devm_request_and_ioremap.cocci.
Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
These are simple tests to do sanity check of CRC T10 DIF hash. The
correctness of the transform can be checked with the command
modprobe tcrypt mode=47
The speed of the transform can be evaluated with the command
modprobe tcrypt mode=320
Set the cpu frequency to constant and turn turbo off when running the
speed test so the frequency governor will not tweak the frequency and
affects the measurements.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the
crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned
on to enable the feature. The crc_t10dif crypto library function will
use this faster algorithm when crct10dif_pclmul module is loaded.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When CRC T10 DIF is calculated using the crypto transform framework, we
wrap the crc_t10dif function call to utilize it. This allows us to
take advantage of any accelerated CRC T10 DIF transform that is
plugged into the crypto framework.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>