This is the MCE error code from the MCi_STATUS banks, bits [15:0] which
describe what type of error was encountered: GART TLB, Memory or Bus
error. The semantics of those bits are identical across all MCE banks so
decode those separately, irrespectively of MCE type.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The MCi_STATUS registers have most field definitions in common so decode
them in the general path. Do not pass ecc_type along and compute it in
__amd64_decode_bus_error instead.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.
While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.
Remove superfluous htlink_msgs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
On the good path of BIOS enabled ECC and no override, the value returned
is 1 by omission and thus is deemed failing by the probe-function.
Allow proper module initialization by clearing the retval explicitly.
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
amd64_check_ecc_enabled() returns non-zero status when ECC
checking/correcting is disabled and this fails further loading of the
driver even when 'ecc_enable_override' boot param is used.
Fix that by clearing return status in that case.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Checking whether the machine is using ECC enabled DRAM is done through
testing the DimmEccEn bit in the DRAM Cfg Low register (F2x[1,0]90). Do
that instead of testing all bits from the DimmEccEn upwards.
Also, remove mci->edac_cap assignment and use value returned from
amd64_determine_edac_cap().
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Also, link into Kbuild by adding Kconfig and Makefile entries.
Borislav:
- Kconfig/Makefile splitting
- use zero-sized arrays for the sysfs attrs if not enabled
- rename sysfs attrs to more conform values
- shorten CONFIG_ names
- make multiple structure members assignment vertically aligned
- fix/cleanup comments
- fix function return value patterns
- fix err labels
- fix a memleak bug caught by Ingo
- remove the NUMA dependency and use num_k8_northbrides for initializing
a driver instance per NB.
- do not copy the pvt contents into the mci struct in
amd64_init_2nd_stage() and save it in the mci->pvt_info void ptr
instead.
- cleanup debug calls
- simplify amd64_setup_pci_device()
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Borislav:
- convert to the new {rd|wr}msr_on_cpus interfaces.
- convert pvt->old_mcgctl to a bitmask thus saving some bytes
- fix/cleanup comments
- fix function return value patterns
- add a proper bugfix found by Doug to amd64_check_ecc_enabled where we
missed checking for the ECC enabled bit in NB CFG.
- cleanup debug calls
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Borislav:
- compute dct_sel_base_off in f10_match_to_this_node() correctly since
it cannot be assumed that the Reserved bits are zero and they have to be
masked out instead.
- cleanup, remove StinkyIdentifiers, simplify logic
- fix function return value patterns
- cleanup debug calls
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Borislav:
- fix a wrong negation in f10_determine_base_addr_offset()
- fix a wrong mask in f10_determine_base_addr_offset() which should
select DctSelBaseAddr[31:11] and not [31:16] as it was before
- remove StinkyIdentifiers, trivially simplify code.
- fix/cleanup comments
- fix function return value patterns
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Borislav:
Fail f10_early_channel_count() if error encountered while reading a NB
register since those cached register contents are accessed afterwards.
- fix/cleanup comments
- fix function return value patterns
- cleanup debug calls
Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>