Plug a memory leak in dwarf_unwinder_dump() where we didn't free the
memory that we had previously allocated for the DWARF frames and DWARF
registers.
Now is also a opportune time to implement our own mempool and kmem
cache. It's a good idea to have a certain number of frame and register
objects in reserve at all times, so that we are guaranteed to have our
allocation satisfied even when memory is scarce. Since we have pools to
allocate from we can implement the registers for each frame as a linked
list as opposed to a sparsely populated array. Whilst it's true that the
lookup time for a linked list is larger than for arrays, there's only
usually a maximum of 8 registers per frame. So the overhead isn't that
much of a concern.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
This moves the initialization over to an early_initcall(). This fixes up
some lockdep interaction issues. At the same time, kill off some
superfluous locking in the init path.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Also, remove the "fix" to DW_CFA_def_cfa_register where we reset the
frame's cfa_offset to 0. This action is incorrect when handling
DW_CFA_def_cfa_register as the DWARF spec specifically states that the
previous contents of cfa_offset should be used with the new
register. The reason that I thought cfa_offset should be reset to 0 was
because it was being assigned a bogus value prior to executing the
DW_CFA_def_cfa_register op. It turns out that the bogus cfa_offset value
came from interpreting .cfi_escape pseudo-ops (those used by the GNU
extensions) as CFA_DW_def_cfa ops.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
The previous hack for calculating the return address for the first frame
we unwind (dwarf_unwinder_dump) didn't always work. The problem was that
it assumed once it read the rule for calculating the return address,
there would be no new rules for calculating it. This isn't true because
the way in which the CFA is calculated can change as you progress
through a function and the return address is figured out using the
CFA. Therefore, the way to calculate the return address can change.
So, instead of using some offset from the beginning of
dwarf_unwind_stack which is just a flakey approach, and instead of
executing instructions from the FDE until the return address is setup,
we now figure out the pc in dwarf_unwind_stack() just before we call
dwarf_cfa_execute_insns().
Signed-off-by: Matt Fleming <matt@console-pimps.org>
The way that the CFA is calculated can change as we progress through a
function. If we see a DW_CFA_def_cfa_register op we need to reset the
frame's cfa_offset value which may have been previously setup.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
save_stack_trace_tsk() and friends can be called from atomic context (as
triggered by latencytop), and subsequently hit two problematic allocation
points that were using GFP_KERNEL (these were dwarf_unwind_stack() and
dwarf_frame_alloc_regs()). Convert these over to GFP_ATOMIC and get
latencytop working with the DWARF unwinder.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Trying to figure out the best value for DWARF_ARCH_UNWIND_OFFSET is
tricky at best. Various things can change the size (and offset from the
beginning of the function) of the prologue. Notably, turning on ftrace
adds calls to mcount at the beginning of functions, thereby pushing the
prologue further into the function.
So replace DWARF_ARCH_UNWIND_OFFSET with some code that continues to
execute CFA instructions until the value of return address register is
defined. This is safe to do because we know that the return address must
have been pushed onto the frame before our first function call; we just
can't figure out where at compile-time.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The destination address might be unaligned, so set it with
put_unaligned() for safety. This restores the previous behaviour, albeit
through the proper API.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This was using internal symbols for unaligned accesses, bypassing the
exposed interface for variable sized safe accesses. This converts all of
the __get_unaligned_cpuXX() users over to get_unaligned() directly,
relying on the cast to select the proper internal routine.
Additionally, the __put_unaligned_cpuXX() case is superfluous given that
the destination address is aligned in all of the current cases, so just
drop that outright.
Furthermore, this switches to the asm/unaligned.h header instead of the
asm-generic version, which was silently bypassing the SH-4A optimized
unaligned ops.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is a first cut at a generic DWARF unwinder for the kernel. It's
still lacking DWARF64 support and the DWARF expression support hasn't
been tested very well but it is generating proper stacktraces on SH for
WARN_ON() and NULL dereferences.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>