Add edac_device_ctl_info struct members causing ABI diffs. This
patch skips sysfs changes and ignores the deferrable flag from the
original patchset.
Bug: 152456530
Test: build and boot
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
[surenb: Squashed the following commits and kept only
edac_device_ctl_info struct changes:
117ab60064 edac: Allow panic on correctable errors (CE).
1b9a704454 edac: Allow the option of creating a deferrable work for polling]
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If301dd0134c819167c81004468082a68735d9bd7
Merged-In: If301dd0134c819167c81004468082a68735d9bd7
[ Upstream commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc ]
The following commit introduced a warning on error reports without a
non-zero grain value.
3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.
Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.
Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29a0c843973bc385918158c6976e4dbe891df969 ]
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.
I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.
[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]
Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ]
The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
This is broken in several ways:
1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.
2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.
Fix this with:
e->grain = ~mem_err->physical_addr_mask + 1;
The grain_bits are defined as:
grain = 1 << grain_bits;
Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.
The value in ->physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f6da136046294a1e8d2944336eb97412751f653 ]
The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.
[Tony: These are all "edac_dbg()" messages, so this won't break scripts
that parse console logs.]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ]
Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:
1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]
The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.
Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.
ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.
When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.
But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.
This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.
Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.
ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.
This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.
This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
[ bp: merge into a single patch. ]
Fixes: 0fe5f281f7 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a2eaab7daf03b23ac902481218034ae2fae5e16 ]
AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.
Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.
Fixes: 713ad54675 ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8be8e5680225ac9caf07d4545f8529b7395327f ]
AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting mci.edac_ctl_cap.
Set the appropriate capability flag based on the device type.
Default to x8 DRAM device when neither the x4 or x16 bits are set.
[ bp: reverse cpk_en check to save an indentation level. ]
Fixes: 2d09d8f301 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-3-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29a3388bfcce7a6d087051376ea02bf8326a957b ]
Depending on how BIOS has marked the reserved region containing the 32KB
MCHBAR you can get warnings like:
resource sanity check: requesting [mem 0xfed10000-0xfed1ffff], which spans more than reserved [mem 0xfed10000-0xfed17fff]
caller dnv_rd_reg+0xc8/0x240 [pnd2_edac] mapping multiple BARs
Not all of the mmio regions used in dnv_rd_reg() are the same size. The
MCHBAR window is 32KB and the sideband ports are 64KB. Pass the correct
size to ioremap() depending on which resource we're reading from.
Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8faa1cf6ed82f33009f63986c3776cc48af1b7b2 ]
Smatch complains about the cast of a u32 pointer to unsigned long:
drivers/edac/altera_edac.c:1878 altr_edac_a10_irq_handler()
warn: passing casted pointer '&irq_status' to 'find_first_bit()'
This code wouldn't work on a 64 bit big endian system because it would
read past the end of &irq_status.
[ bp: massage. ]
Fixes: 13ab8448d2 ("EDAC, altera: Add ECC Manager IRQ controller support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624134717.GA1754@mwanda
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3724ace582d9f675134985727fd5e9811f23c059 ]
The grain in EDAC is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:
grain_bits = fls_long(e->grain) + 1;
Where grain_bits is defined as:
grain = 1 << grain_bits
Example:
grain = 8 # 64 bit (8 bytes)
grain_bits = fls_long(8) + 1
grain_bits = 4 + 1 = 5
grain = 1 << grain_bits
grain = 1 << 5 = 32
Replace it with the correct calculation:
grain_bits = fls_long(e->grain - 1);
The example gives now:
grain_bits = fls_long(8 - 1)
grain_bits = fls_long(7)
grain_bits = 3
grain = 1 << 3 = 8
Also, check if the hardware reports a reasonable grain != 0 and fallback
with a warning to 1 byte granularity otherwise.
[ bp: massage a bit. ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8655e7630dafa88bc37f101640e39c736399771 ]
Commit 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.
Reproducer:
# echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
KASAN report:
BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0xca/0x13e
print_address_description.cold+0x5/0x246
__kasan_report.cold+0x75/0x9a
? edac_set_poll_msec+0x140/0x150
kasan_report+0xe/0x20
edac_set_poll_msec+0x140/0x150
? dimmdev_location_show+0x30/0x30
? vfs_lock_file+0xe0/0xe0
? _raw_spin_lock+0x87/0xe0
param_attr_store+0x1b5/0x310
? param_array_set+0x4f0/0x4f0
module_attr_store+0x58/0x80
? module_attr_show+0x80/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
? sysfs_kf_bin_read+0x270/0x270
? kernfs_notify+0x1f0/0x1f0
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa7caa5e970
Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
The buggy address belongs to the variable:
edac_mc_poll_msec+0x0/0x40
Memory state around the buggy address:
ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
>ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
^
ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.
Reviewed-by: James Morse <james.morse@arm.com>
Fixes: 9da21b1509 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 585fb3d93d32dbe89e718b85009f9c322cc554cd ]
In edac_create_csrow_object(), the reference to the object is not
released when adding the device to the device hierarchy fails
(device_add()). This may result in a memory leak.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b8358a951b1e2a534a54924cd8245e58a1c5fb8 ]
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:
ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.
This seems to have been broken since at least
c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")
(Nov 2013).
[ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
as a module. ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 432de7fd7630c84ad24f1c2acd1e3bb4ce3741ca upstream.
The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8960de4a5ca7980ed1e19e7ca5a774d3b7a55c38 upstream.
Add new device IDs for family 17h, models 10h-2fh.
This is required by amd64_edac_mod in order to properly detect PCI
device functions 0 and 6.
Signed-off-by: Michael Jin <mikhail.jin@gmail.com>
Reviewed-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180816192840.31166-1-mikhail.jin@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Extend the driver to check whether segment number and bus number matches
when deciding how to group memory controller PCI devices to CPU sockets.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com
[ Cleanup commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
In the quest to remove all stack VLA usage from the kernel[1], switch to
using a kmalloc-allocated buffer instead of stack space. This should be
fine since the existing routine is allocating memory too.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180629184850.GA37464@beast
Link: https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [1]
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to free and deregister the addrmatch and chancounts devices
allocated during probe in all error paths. Also fix use-after-free in a
probe error path and in the remove success path where the devices were
being put before before deregistration.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 356f0a3086 ("i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy")
Link: http://lkml.kernel.org/r/20180612124335.6420-2-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Make sure to use put_device() to free the initialised struct device so
that resources managed by driver core also gets released in the event of
a registration failure.
Signed-off-by: Johan Hovold <johan@kernel.org>
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2d56b109e3 ("EDAC: Handle error path in edac_mc_sysfs_init() properly")
Link: http://lkml.kernel.org/r/20180612124335.6420-1-johan@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
If regmap_write() fails, we should release some resources as done in all
the other error handling paths of the function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180610174532.22071-1-christophe.jaillet@wanadoo.fr
Fixes: e9918d7faf ("EDAC, altera: Handle SDRAM Uncorrectable Errors on Stratix10")
Signed-off-by: Borislav Petkov <bp@suse.de>
ARM machines all have DMI tables so if they request hw error reporting
through GHES, then the driver should be able to detect DIMMs and report
errors successfully (famous last words :)).
Make the platform-based list x86-specific so that ghes_edac can load on
ARM.
Reported-by: Qiang Zheng <zhengqiang10@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Qiang Zheng <zhengqiang10@huawei.com>
Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
The kbuild test robot reported the following warning:
drivers/edac/altera_edac.c: In function 'ocram_free_mem':
drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
gen_pool_free((struct gen_pool *)other, (u32)p, size);
^
After adding support for ARM64 architectures, the unsigned long
parameter is 64 bits and causes a build warning on 64-bit configs. Fix
by casting to the correct size (unsigned long) instead of u32.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: c3eea1942a ("EDAC, altera: Add Altera L2 cache and OCRAM support")
Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Prevent build error when CONFIG_ACPI_NFIT=m and CONFIG_EDAC_SKX=y by
limiting EDAC_SKX based on how ACPI_NFIT is set.
Fixes this build error:
drivers/edac/skx_edac.o: In function `get_nvdimm_info':
../drivers/edac/skx_edac.c:399: undefined reference to `nfit_get_smbios_id'
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 58ca9ac146 ("EDAC, skx_edac: Detect non-volatile DIMMs")
Link: http://lkml.kernel.org/r/3af91354-8e19-d2af-1bba-ced8dce053f1@infradead.org
Signed-off-by: Borislav Petkov <bp@suse.de>
... for improved readability. Also, add a local mask variable for the
same reason.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
The ghes_edac driver obtains memory type from SMBIOS type 17, but it
does not recognize DDR4 and NVDIMM types.
Add support of DDR4 and NVDIMM types. NVDIMM type is denoted by memory
type DDR3/4 and non-volatile.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180509222030.9299-1-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
On Stratix10, uncorrectable errors are routed to the SError exception
instead of the IRQ exceptions. In Stratix10, uncorrectable SErrors
must be treated as fatal and will cause a panic. Older Altera/Intel
parts printed out a message for UE so do that here using the notifier
framework.
Record the UE in sticky registers that retain the state through a reset.
Check these registers on probe and printout the error on startup.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mark.rutland@arm.com
Cc: mchehab@kernel.org
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1526079610-5527-1-git-send-email-thor.thayer@linux.intel.com
[ Remove unused var in s10_edac_dberr_handler(), reorder args. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The use of the @ghes argument was removed in a previous commit, but
function signature was not updated to reflect this.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Add a null check for ghes_pvt, before dereferencing it. The pointer
could still be null in case the return path is taken before initialising
ghes_pvt in the registration function.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Sughosh Ganu <sughosh.ganu@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1524737809-24475-1-git-send-email-sughosh.ganu@arm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Tony reported seeing
"Internal error: Can't find EDAC structure"
when injecting correctable errors due to the fact that ghes_edac would
still load even if the whitelist won't hit. Drop the pr_err() in
ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac
depends on ghes.c.
While at it, make ghes_edac_register() return an error if it doesn't hit
in the whitelist as it is the only sensible thing to do in that
situation.
Furthermore, move the call to it to happen last in ghes_probe() so that
GHES initializing properly does not depend on ghes_edac init at all
as latter is only reporting errors and not required for GHES's proper
functioning.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Sughosh Ganu <sughosh.ganu@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk
* misc fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlrDZ6EACgkQEsHwGGHe
VUpWWA//RG9WqqdXxPa05TxiRFWMiNSH/HSGJQZsc3K7xm2TEGVdDv5thYsxXCaW
kA0oiUJA3zd62HeUIWMeGypm+hLM8T73836aIKcfxMduxQ6DR9nk51IV4RmHsLvT
rd4bUsrjCov6spapqS265LOxZVnpKvxZv0NVTj6pIk0pkeghWMZh63vZtrxVkLsS
6dHsdIDBr0fMhcFBVQ67SEDpMIyMAqziPOP9sv6NNElG4Q93RnZr6KYvQdE32Iev
Z6lIUU/sjVJEuhgp6FjJrhvjc+FmGJGERsFqo3Z8RT9P26Cj3wq0ov8IY/KSBazG
mfJFGbL+O3lppGKjjuAwMGr4R3fZMLQpgLxdYe7VNvOHB9ea7732q9MDGiNkN/u7
z8o+dtvmPWr0Y1ir8te+7LnSQXIAwxMkKP/lMStzzuFUETQDd0RWr7GAKlbrgPOU
+DhiQRPBFS7qpslA7BdqV6Ybr3nhDVZciDYWsvNpDMwdem05TxlFwC5l/BzP9NtM
FmQ3B0rQ5we6gib6UZWUjW3BH9wUgFUpu/WXKJl1unMHwYNe3CvdM8yD5W7mUylh
6SVYAmkZyuJ1Du40+YMeimroe77XV60IqoUFNEzMSFeB4/GxMG6+u2gUqNuTpHpZ
aMmCRj9DxWGxZyDsTc49An0X/JbVyIE/Aq6heMTYScgBvCaD1VE=
=K89j
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
"Noteworthy is the NVDIMM support:
- NVDIMM support to EDAC (Tony Luck)
- misc fixes"
* tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Remove variable length array usage
EDAC, skx_edac: Detect non-volatile DIMMs
firmware, DMI: Add function to look up a handle and return DIMM size
acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle
EDAC: Add new memory type for non-volatile DIMMs
EDAC: Drop duplicated array of strings for memory type names
EDAC, layerscape: Allow building for LS1021A
This removes the entire architecture code for blackfin, cris, frv, m32r,
metag, mn10300, score, and tile, including the associated device drivers.
I have been working with the (former) maintainers for each one to ensure
that my interpretation was right and the code is definitely unused in
mainline kernels. Many had fond memories of working on the respective
ports to start with and getting them included in upstream, but also saw
no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company
in charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems
that all the SoC product lines are still around, but have not used the
custom CPU architectures for several years at this point. In contrast,
CPU instruction sets that remain popular and have actively maintained
kernel ports tend to all be used across multiple licensees.
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I made
sure that they are all unused. Some architectures (notably tile, mn10300,
and blackfin) are still being shipped in products with old kernels,
but those products will never be updated to newer kernel releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing their
support or getting it added to mainline gcc in the first place.
They all have patched gcc-7.3 ports that work to some degree, but
complete upstream support won't happen before gcc-8.1. Csky posted
their first kernel patch set last week, their situation will be similar.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo
PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb
Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en
yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV
FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI
sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG
ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs
cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76
pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e
X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r
57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s
3iIVHovno/JuJnTOE8LY
=fQ8z
-----END PGP SIGNATURE-----
Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
The Tile architecture is obsolete and getting removed from the kernel,
this driver appears to only be used there, and not on the ARM based
successors (Tile-Mx, BlueField), so we should remove it as well.
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In preparation for enabling -Wvla, remove VLA and replace it with a
fixed-length array instead.
Also, remove max_interleave as it is no longer needed.
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180314182131.GA25259@embeddedgus
Signed-off-by: Borislav Petkov <bp@suse.de>
This just covers the topology function of the EDAC driver. We locate
which DIMM slots are populated with NVDIMMs and query the NFIT and
SMBIOS tables to get the size.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-6-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
There are now non-volatile versions of DIMMs. Add a new entry to "enum
mem_type" and a new string in edac_mem_types[].
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Somehow we ended up with two separate arrays of strings to describe the
"enum mem_type" values.
In edac_mc.c we have an exported list edac_mem_types[] that is used
by a couple of drivers in debug messaged.
In edac_mc_sysfs.c we have a private list that is used to display
values in:
/sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type
/sys/devices/system/edac/mc/mc*/csrow*/mem_type
This list was missing a value for MEM_LRDDR3.
The string values in the two lists were different :-(
Combining the lists, I kept the values so that the sysfs output
will be unchanged as some scripts may depend on that.
Reported-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The LS1021A has a memory controller supported by this driver. It builds
just fine, and I've done some rudimentary testing using the error
injection facility, which suggests that it is indeed working.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: York Sun <york.sun@nxp.com>
Cc: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180220150912.2954-1-rasmus.villemoes@prevas.dk
Signed-off-by: Borislav Petkov <bp@suse.de>
Commit
3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
Landing which supports up to 6 channels.
This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
variables which don't pay critical role on KNL code path, so the memory
corruption wasn't causing any visible driver failures.
The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.
An alternative solution would be to restructure the KNL part of the
driver to 2MC/3channel representation.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Karbownik <anna.karbownik@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: jim.m.snow@intel.com
Cc: krzysztof.paliswiat@intel.com
Cc: lukasz.odzioba@intel.com
Cc: qiuxu.zhuo@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3286d3eb90 ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.
This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:
WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
...
Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.
Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.
Don't try to find the block address on reserved banks.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 PTI and Spectre related fixes and updates from Ingo Molnar:
"Here's the latest set of Spectre and PTI related fixes and updates:
Spectre:
- Add entry code register clearing to reduce the Spectre attack
surface
- Update the Spectre microcode blacklist
- Inline the KVM Spectre helpers to get close to v4.14 performance
again.
- Fix indirect_branch_prediction_barrier()
- Fix/improve Spectre related kernel messages
- Fix array_index_nospec_mask() asm constraint
- KVM: fix two MSR handling bugs
PTI:
- Fix a paranoid entry PTI CR3 handling bug
- Fix comments
objtool:
- Fix paranoid_entry() frame pointer warning
- Annotate WARN()-related UD2 as reachable
- Various fixes
- Add Add Peter Zijlstra as objtool co-maintainer
Misc:
- Various x86 entry code self-test fixes
- Improve/simplify entry code stack frame generation and handling
after recent heavy-handed PTI and Spectre changes. (There's two
more WIP improvements expected here.)
- Type fix for cache entries
There's also some low risk non-fix changes I've included in this
branch to reduce backporting conflicts:
- rename a confusing x86_cpu field name
- de-obfuscate the naming of single-TLB flushing primitives"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/entry/64: Fix CR3 restore in paranoid_exit()
x86/cpu: Change type of x86_cache_size variable to unsigned int
x86/spectre: Fix an error message
x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
selftests/x86/mpx: Fix incorrect bounds with old _sigfault
x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]()
x86/speculation: Add <asm/msr-index.h> dependency
nospec: Move array_index_nospec() parameter checking into separate macro
x86/speculation: Fix up array_index_nospec_mask() asm constraint
x86/debug: Use UD2 for WARN()
x86/debug, objtool: Annotate WARN()-related UD2 as reachable
objtool: Fix segfault in ignore_unreachable_insn()
selftests/x86: Disable tests requiring 32-bit support on pure 64-bit systems
selftests/x86: Do not rely on "int $0x80" in single_step_syscall.c
selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.c
selftests/x86: Fix build bug caused by the 5lvl test which has been moved to the VM directory
selftests/x86/pkeys: Remove unused functions
selftests/x86: Clean up and document sscanf() usage
selftests/x86: Fix vDSO selftest segfault for vsyscall=none
x86/entry/64: Remove the unused 'icebp' macro
...