Commit graph

171 commits

Author SHA1 Message Date
Vineet Gupta
5a45da02cf ARC: Adjustments for gcc 4.8
* DWARF unwinder related
  + Force DWARF2 compliant .debug_frame (gcc 4.8 defaults to DWARF4
    which kernel unwinder can't grok).
  + Discard the additional .eh_frame generated
  + Discard the dwarf4 debug info generated by -gdwarf-2 for normal
    no debug case

* 4.8 already uses arc600 multilibs for -mno-mpy

* switch to using uclibc compiler (to get -mmedium-calls and -mno-sdata)
  and also since buildroot can only use 1 toolchain

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-27 14:35:32 +05:30
Vineet Gupta
05b016ecf5 ARC: Setup Vector Table Base in early boot
Otherwise early boot exceptions such as instructions errors due to
configuration mismatch between kernel and hardware go off to la-la land,
as opposed to hitting the handler and panic()'ing properly.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-26 15:30:51 +05:30
Vineet Gupta
38a9ff6d24 ARC: Remove explicit passing around of ECR
With ECR now part of pt_regs

* No need to propagate from lowest asm handlers as arg
* No need to save it in tsk->thread.cause_code
* Avoid bit chopping to access the bit-fields

More code consolidation, cleanup

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-26 15:30:50 +05:30
Vineet Gupta
502a0c775c ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values
pt_regs->event was set with artificial values to identify the low level
system event (syscall trap / breakpoint trap / exceptions / interrupts)

With r8 saving out of the way, the full word can be used to save real
ECR (Exception Cause Register) which helps idenify the event naturally,
including additional info such as cause code, param.
Only for Interrupts, where ECR is not applicable, do we resort to
synthetic non ECR values.

SAVE_ALL_TRAP/EXCEPTIONS can now be merged as they both use ECR with
different runtime values.

The ptrace helpers now use the sub-fields of ECR to distinguish the
events (e.g. vector 0x25 is trap, param 0 is syscall...)

The following benefits will follow:

(1) This centralizes the location of where ECR is saved and will allow
    the cleanup of task->thread.cause_code ECR placeholder which is set
    in non-uniform way. Then ARC VM code can safely rely on it being
    there for purpose of finer grained VM_EXEC dcache flush (based on
    exec fault: I-TLB Miss)

(2) Further, ECR being passed around from low level handlers as arg can
    be eliminated as it is part of standard reg-file in pt_regs

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-26 14:04:48 +05:30
Vineet Gupta
352c1d95e3 ARC: stop using pt_regs->orig_r8
Historically, pt_regs have had orig_r8, an overloaded container for
  (1) backup copy of r8 (syscall number Trap Exceptions)
  (2) additional system state: (syscall/Exception/Interrupt)

There is no point in keeping (1) since syscall number is never clobbered
in-place, in pt_regs, unlike r0 which duals as first syscall arg as well
as syscall return value and in case of syscall restart, the orig arg0
needs restoring (from orig_r0)  after having been updated in-place with
syscall ret value.

This further paves way to convert (2) to contain ECR itself (rather than
current madeup values)

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:26 +05:30
Vineet Gupta
359105bdb0 ARC: pt_regs update #4: r25 saved/restored unconditionally
(This is a VERY IMP change for low level interrupt/exception handling)

-----------------------------------------------------------------------
WHAT
-----------------------------------------------------------------------
* User 25 now saved in pt_regs->user_r25 (vs. tsk->thread_info.user_r25)

* This allows Low level interrupt code to unconditionally save r25
  (vs. the prev version which would only do it for U->K transition).
  Ofcourse for nested interrupts, only the pt_regs->user_r25 of
  bottom-most frame is useful.

* simplifies the interrupt prologue/epilogue

* Needed for ARCv2 ISA code and done here to keep design similar with
  ARCompact event handling

-----------------------------------------------------------------------
WHY
-------------------------------------------------------------------------
With CONFIG_ARC_CURR_IN_REG, r25 is used to cache "current" task pointer
in kernel mode. So when entering kernel mode from User Mode
- user r25 is specially safe-kept (it being a callee reg is NOT part of
  pt_regs which are saved by default on each interrupt/trap/exception)
- r25 loaded with current task pointer.

Further, if interrupt was taken in kernel mode, this is skipped since we
know that r25 already has valid "current" pointer.

With 2 level of interrupts in ARCompact ISA, detecting this is difficult
but still possible, since we could be in kernel mode but r25 not already saved
(in fact the stack itself might not have been switched).

A. User mode
B. L1 IRQ taken
C. L2 IRQ taken (while on 1st line of L1 ISR)

So in #C, although in kernel mode, r25 not saved (infact SP not
switched at all)

Given that ARcompact has manual stack switching, we could use a bit of
trickey - The low level code would make sure that SP is only set to kernel
mode value at the very end (after saving r25). So a non kernel mode SP,
even if in kernel mode, meant r25 was NOT saved.

The same paradigm won't work in ARCv2 ISA since SP is auto-switched so
it's setting can't be delayed/constrained.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:25 +05:30
Vineet Gupta
ba3558c772 ARC: K/U SP saved from one location in stack switching macro
This paves way for further simplifications.

There's an overhead of 1 insn for the non-common case of interrupt taken
from kernel mode.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:25 +05:30
Vineet Gupta
147aece29b ARC: Entry Handler tweaks: Simplify branch for in-kernel preemption
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:24 +05:30
Vineet Gupta
1898a959b7 ARC: Entry Handler tweaks: Avoid hardcoded LIMMS for ECR values
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:23 +05:30
Vineet Gupta
3ebedbb2fd ARC: Increase readability of entry handlers
* use artificial PUSH/POP contructs for CORE Reg save/restore to stack
* use artificial PUSHAX/POPAX contructs for Auxiliary Space regs
* macro'ize multiple copies of callee-reg-save/restore (SAVE_R13_TO_R24)
* use BIC insn for inverse-and operation

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:23 +05:30
Vineet Gupta
16f9afe651 ARC: pt_regs update #3: Remove unused gutter at start of callee_regs
This is trickier than prev two:

* context switching code saves kernel mode callee regs in the format of
  struct callee_regs thus needs adjustment. This also reduces the height
  of topmost kernel stack frame by 1 word.

* Since kernel stack unwinder is sensitive to height of topmost kernel
  stack frame, that needs a word of adjustment too.

ptrace needs a bit of updating since pt_regs now diverges from
user_regs_struct.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:22 +05:30
Vineet Gupta
2fa919045b ARC: pt_regs update #2: Remove unused gutter at start of pt_regs
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:22 +05:30
Vineet Gupta
283237a04f ARC: pt_regs update #1: Align pt_regs end with end of kernel stack page
Historically, pt_regs would end at offset of 1 word from end of stack
page.

        -----------------  -> START of page (task->stack)
        |               |
        | thread_info   |
        -----------------
        |               |
   ^    ~               ~
   |    ~               ~
   |    |               |
   |    |               | <---- pt_regs used to END here
        -----------------
        | 1 word GUTTER |
        ----------------- -> End of page (START of kernel stack)

This required special "one-off" considerations in low level code.

The root cause is very likely assumption of "empty" SP by the original
ARC kernel hackers, despite ARC700 always been "full" SP.

So finally RIP one word gutter !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:21 +05:30
Vineet Gupta
bed30976e7 ARC: pt_regs update #0: remove kernel stack canary
This stack slot is going to be used in subsequent commits

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:21 +05:30
Vineet Gupta
3e1ae44188 ARC: [mm] Remove @write argument to do_page_fault()
This can be ascertained within do_page_fault() since it gets the full
ECR (Exception Cause Register).

Further, for both the callers of do_page_fault(): Prot-V / D-TLB-Miss,
the cause sub-fields in ECR are same for same type of access, making the
code much more simpler.

D-TLB-Miss [LD] 0x00_21_01_00
Prot-V     [LD] 0x00_23_01_00
                        ^^
D-TLB-Miss [ST] 0x00_21_02_00
Prot-V     [ST] 0x00_23_02_00
                        ^^
D-TLB-Miss [EX] 0x00_21_03_00
Prot-V     [EX] 0x00_23_03_00
                        ^^

This helps code consolidation, which is even better when moving code from
assembler to "C".

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:20 +05:30
Vineet Gupta
3abc944802 ARC: [mm] Make stack/heap Non-executable by default
1. For VM_EXEC based delayed dcache/icache flush, reduces the number of
   flushes.

2. Makes this security feature ON by default rather than OFF before.

3. Applications can use mprotect() to selectively override this.

4. ELF binaries have a GNU_STACK segment which can easily override the
   kernel default permissions.
   For nested-functions/trampolines, gcc already auto-enables executable
   stack in elf. Others needing this can use -Wl,-z,execstack option.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:20 +05:30
Vineet Gupta
2ed21dae02 ARC: [mm] Assume pagecache page dirty by default
Similar to ARM/SH

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:19 +05:30
Vineet Gupta
fedf5b9baf ARC: [mm] optimise VIPT dcache aliasing 2/x
Non-congruent SRC page in copy_user_page() is dcache clean in the end -
so record that fact, to avoid a subsequent extraneous flush.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:19 +05:30
Vineet Gupta
5971bc719d ARC: [mm] optimise VIPT dcache aliasing 1/x
flush_cache_page() - kills icache only if page is executable

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:18 +05:30
Vineet Gupta
29b93c68bf ARC: [mm] Zero page optimization
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:18 +05:30
Alexey Brodkin
2f9e99618f ARC: make dcache VIPT aliasing support dependant on dcache
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:17 +05:30
Vineet Gupta
336e199e9c ARC: No-op full icache flush if !CONFIG_ARC_HAS_ICACHE
Also remove extraneous irq disabling in flush_cache_all() callstack

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:22:42 +05:30
Vineet Gupta
3049918660 ARC: cache detection code bitrot
* Number of (i|d)cache ways can be retrieved from BCRs and hence no need
  to cross check with with built-in constants
* Use of IS_ENABLED() to check for a Kconfig option
* is_not_cache_aligned() not used anymore

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:43 +05:30
Vineet Gupta
6546415226 ARC: Reduce Code for ECR printing
Cause codes are same for D-TLB-Miss and Prot-V

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:42 +05:30
Vineet Gupta
da1677b02d ARC: Disintegrate arcregs.h
* Move the various sub-system defines/types into relevant files/functions
  (reduces compilation time)

* move CPU specific stuff out of asm/tlb.h into asm/mmu.h

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:42 +05:30
Vineet Gupta
18437347b9 ARC: More code beautification with IS_ENABLED()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:42 +05:30
Vineet Gupta
8235703e10 ARC: Use kconfig helper IS_ENABLED() to get rid of defines.h
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:42 +05:30
Mischa Jonker
ba5afadb11 ARC: [plat-arcfpga] Fix build breakage when !CONFIG_ARC_SERIAL
This fixes the following:
- CONFIG_ARC_SERIAL_BAUD is only defined when CONFIG_SERIAL_ARC is defined.
  Make sure that it isn't referenced otherwise.
- There is no use for initializing arc_uart_info[] when CONFIG_SERIAL_ARC is
  not defined.

[vgupta: tweaked changelog title, used IS_ENABLED() kconfig helper]
Signed-off-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:41 +05:30
Vineet Gupta
7bb66f6e6e ARC: lazy dcache flush broke gdb in non-aliasing configs
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.

So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-25 14:15:55 +05:30
Vineet Gupta
006dfb3c9c ARC: Use enough bits for determining page's cache color
The current code uses 2 bits for determining page's dcache color, thus
sorting pages into 4 bins, whereas the aliasing dcache really has 2 bins
(8k page, 64k dcache - 4 way-set-assoc).
This can cause extraneous flushes - e.g. color 0 and 2.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23 14:25:09 +05:30
Vineet Gupta
3e87974dec ARC: Brown paper bag bug in macro for checking cache color
The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()

The intention was to have the equivalent of following:

	if (a || (1 ? b : 0))

but we ended up with following:

	if (a || 1 ? b : 0)

And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of <a>

Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
   for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
   be missed if U/K mappings were congruent.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23 14:24:52 +05:30
Vineet Gupta
a950549c67 ARC: copy_(to|from)_user() to honor usermode-access permissions
This manifested as grep failing psuedo-randomly:

-------------->8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
    inet 127.0.0.1/8 scope host lo
-------------->8---------------------

ARC700 MMU provides fully orthogonal permission bits per page:
Ur, Uw, Ux, Kr, Kw, Kx

The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.

1. Read access to an anon mapped page in libc .bss: write-protected
   zero_page mapped: TLB Entry installed with Ur + K[rwx]

2. grep calls libc:getc() -> buffered read layer calls read(2) with the
   internal read buffer in same .bss page.
   The read() call is on STDIN which has been redirected to a pipe.
   read(2) => sys_read() => pipe_read() => copy_to_user()

3. Since page has Kernel-write permission (despite being user-mode
   write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
   Exception (page-fault for ARC). core-MM is unaware that kernel
   erroneously wrote to the reserved read-only zero-page (BUG #1)

4. Control returns to userspace which now does a write to same .bss page
   Since Linux MM is not aware that page has been modified by kernel, it
   simply reassigns a new writable zero-init page to mapping, loosing the
   prior write by kernel - effectively zero'ing out the libc read buffer
   under the hood - hence grep doesn't see right data (BUG #2)

The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.

The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.

Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.

With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.

Reported-by: Anton Kolesov <akolesov@synopsys.com>
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23 10:33:03 +05:30
Vineet Gupta
f538881cc6 ARC: [mm] Prevent stray dcache lines after__sync_icache_dcach()
Flush and INVALIDATE the dcache page.

This helper is only used for writeback of CODE pages to memory. So
there's no value in keeping the dcache lines around. Infact it is risky
as a writeback on natural eviction under pressure can cause un-needed
writeback with weird issues on aliasing dcache configurations.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23 10:26:33 +05:30
Christian Ruppert
7d19273cd0 ARC: [TB10x] Remove redundant abilis,simple-pinctrl mechanism
The TB10x platform port includes a custom mechanism using to set up
default pin controller configurations using abilis,simple-default
pin configurations of nodes compatible with abilis,simple-pinctrl. This
mechanism is redundant with the Linux standard "default" pin
configuration, see commit ab78029ecc
"drivers/pinctrl: grab default handles from device core".
This patch removes the TB10x custom mechanism in favour of the Linux
standard.

Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-15 10:12:03 +05:30
Linus Torvalds
6019958d14 Aliasing VIPT dcache support for ARC
I'm satisified with testing, specially with fuse which has historically given
 grief to VIPT arches (ARM/PARISC...)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRjPVlAAoJEGnX8d3iisJeF60P/36JELOrdTOJB8GDn/i22yu4
 zxFrroH4EpXCobe4JzrmhXU0vdMKmyhaRsdmQt2pV474LX1ajQeoRi6n7xuiYymr
 p0H8/jde7ZDHGxJRP9OIuIIU57N+sVwQvrXvfnz/dc4c92MD9EWEwvoytnCVuKSx
 k/YIQ5oDj6hFH13V68eXXs+KBrXyPiYO9UIWYmwRZv/0Bm6P1EpXQSMWzeCP0X/y
 FHVEc4i92bIkqpOadUnhCBHGZqjkrhwFL2FCI35/12TgSwjPU1Q6IJCtH7Lg9ygI
 oFqut5wN+dhkxlfiyvgTXErmnvhDOHLbMQJYC9svHin8TtfznQhJDAWYeSH/6Mde
 /hE/bXV55ucdJ48ZT6JRvxHeJQgTAvbXDIIQYe/wAJEvXidWEo4Y/qRTIXP03Ixm
 Ie69d9WSmUlx4FmYxBQodDQFK9slnc+0UfsWcCG+OrT0wwrKBRr3To2WYEFowQer
 fpIk6/68+LiQK+IzFAhPtBppSgcQoEBsXAD/MDKsQcRk84W1nKPt6SiT/wCAzOZ7
 ezTL1eSlfzWXNbtohdm8xN8frt/4fZLZTaZkqtnMPBi0iIC5DAsJy27YlBQDaRd/
 Hzd0y6bJQDcsjjWmZuUkFZVbCyYlE0CIW6sbHC91i51egKReYHy7T6Ef/n+qn+ho
 jxohq5/JvG+0Vha0Q2h6
 =r4lw
 -----END PGP SIGNATURE-----

Merge tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull second set of arc arch updates from Vineet Gupta:
 "Aliasing VIPT dcache support for ARC

  I'm satisified with testing, specially with fuse which has
  historically given grief to VIPT arches (ARM/PARISC...)"

* tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [TB10x] Remove GENERIC_GPIO
  ARC: [mm] Aliasing VIPT dcache support 4/4
  ARC: [mm] Aliasing VIPT dcache support 3/4
  ARC: [mm] Aliasing VIPT dcache support 2/4
  ARC: [mm] Aliasing VIPT dcache support 1/4
  ARC: [mm] refactor the core (i|d)cache line ops loops
  ARC: [mm] serious bug in vaddr based icache flush
2013-05-10 07:24:14 -07:00
Vineet Gupta
e7d5bab5ef ARC: [TB10x] Remove GENERIC_GPIO
This tracks Alexandre Courbot's mainline GPIO rework

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
2013-05-10 09:50:33 +05:30
Linus Torvalds
e30f419245 ARC port updates for Linux 3.10 (part 1)
* Support for two new platforms based on ARC700
  - Abilis TB10x SoC [Chritisian/Pierrick]
  - Simulator only System-C Model [Mischa]
 
 * ARC specific MM improvements
  - Avoid full TLB flush (ASID increment) on munmap (even single page)
  - VIPT Cache Flushing improvements
    + Delayed dcache flush for non-aliasing dcache (big performance boost)
    + icache flush aliasing agnostic (no need to kill all possible aliases)
 
 * Others
  - Avoid needless rebuild of DTB files for every kernel build
  - Remove builtin cmdline as that is already provided by DeviceTree/bootargs
  - Fixing unaligned access emulation corner case
  - checkpatch fixes [Sachin]
  - Various fixlets [Noam]
  - Minor build failures/cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRiydEAAoJEGnX8d3iisJewBkQAJ/cvIjrIuMMdeDo0bokzZN2
 bfsG8U+V7S0CKjqLyUD1bMRXo7rTgus8hp/klVORRXoAwSKiWhkj0p6dqJjCGUGc
 LqtEPlxaLR1X+pe5wKtpB3j6kTpRwictVhFUqkFACUxGZx1GbYFxdeL8+3oDsepW
 lwidBYgoya+Q4puRfmY/sCTtVhJlwTUi6g0lmpaEPWc3T2s83/u1GnlVBYd9B7yA
 fCYkUceC2ZnuRMfH00sRsjQ3USyyYptGpY8U1nv9STFJ3fC+EestBPppAAwAzcm0
 iiRgo3s314EzfeNRgzjjHrbULnVUJWkrdFXPisWepyPyVjsgnVr84YkO3kiEN6l7
 JM1cUCvJUbKZaOb2iUwrlPOqISNjCyY9QZWwqW6PCG3TBpfQsJM2mnXKnPLvV2v2
 5Ttba4+ETZ3rn4pPIuTeC6REr0aV/Zl1LKeWuk8PXz4aljWrlOrifBN6QkXWr4HU
 4z6X3j2nw4T2LzqwbNYF+xLaDaZZpV8UdvQwOkliCxZ04myx135ImvgOim7Hh9j+
 Ow1Jp9mRZha/44qM3jXdZ6Cv55pEOR4oQJsl6OBNUXgb5bnDcFHz1UpZtzoZ2V+s
 RPozsfnleNXxDIJlCGK96+PG5qVTRQvklowsz6aTP8r+EEbQnrRlnrhi82cQWqnM
 sMxzN320zrt/MWXxec89
 =WeDp
 -----END PGP SIGNATURE-----

Merge tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC port updates from Vineet Gupta:
 "Support for two new platforms based on ARC700:
   - Abilis TB10x SoC [Chritisian/Pierrick]
   - Simulator only System-C Model [Mischa]

  ARC specific MM improvements:
   - Avoid full TLB flush (ASID increment) on munmap (even single page)
   - VIPT Cache Flushing improvements
     + Delayed dcache flush for non-aliasing dcache (big performance boost)
     + icache flush aliasing agnostic (no need to kill all possible aliases)

  Others:
   - Avoid needless rebuild of DTB files for every kernel build
   - Remove builtin cmdline as that is already provided by DeviceTree/bootargs
   - Fixing unaligned access emulation corner case
   - checkpatch fixes [Sachin]
   - Various fixlets [Noam]
   - Minor build failures/cleanups"

* tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (35 commits)
  ARC: [mm] Lazy D-cache flush (non aliasing VIPT)
  ARC: [mm] micro-optimize page size icache invalidate
  ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers
  ARC: [mm] consolidate icache/dcache sync code
  ARC: [mm] optimise icache flush for kernel mappings
  ARC: [mm] optimise icache flush for user mappings
  ARC: [mm] optimize needless full mm TLB flush on munmap
  ARC: Add support for nSIM OSCI System C model
  ARC: [TB10x] Adapt device tree to new compatible string
  ARC: [TB10x] Add support for TB10x platform
  ARC: [TB10x] Device tree of TB100 and TB101 Development Kits
  ARC: Prepare interrupt code for external controllers
  ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy
  ARC: [cmdline] Don't overwrite u-boot provided bootargs
  ARC: [cmdline] Remove CONFIG_CMDLINE
  ARC: [plat-arcfpga] defconfig update
  ARC: unaligned access emulation broken if callee-reg dest of LD/ST
  ARC: unaligned access emulation error handling consolidation
  ARC: Debug/crash-printing Improvements
  ARC: fix typo with clock speed
  ...
2013-05-09 14:36:27 -07:00
Vineet Gupta
5bba49f539 ARC: [mm] Aliasing VIPT dcache support 4/4
Enforce congruency of userspace shared mappings

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 22:00:57 +05:30
Vineet Gupta
de2a852cc0 ARC: [mm] Aliasing VIPT dcache support 3/4
Fix the one zillion warnings

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 22:00:57 +05:30
Vineet Gupta
4102b53392 ARC: [mm] Aliasing VIPT dcache support 2/4
This is the meat of the series which prevents any dcache alias creation
by always keeping the U and K mapping of a page congruent.
If a mapping already exists, and other tries to access the page, prev
one is flushed to physical page (wback+inv)

Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
of a page, but try to defer flushing, unless U-mapping exist.
When page is actually mapped to userspace, update_mmu_cache() flushes
the K-mapping (in certain cases this can be optimised out)

Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
handle the puring of stale userspace mappings on exit/munmap...

flush_anon_page() handles the existing U-mapping for anon page before
kernel reads it via the GUP path.

Note that while not complete, this is enough to boot a simple
dynamically linked Busybox based rootfs

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 21:59:46 +05:30
Vineet Gupta
6ec18a81b2 ARC: [mm] Aliasing VIPT dcache support 1/4
This preps the low level dcache flush helpers to take vaddr argument in
addition to the existing paddr to properly flush the VIPT dcache

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 21:53:16 +05:30
Vineet Gupta
a690984d60 ARC: [mm] refactor the core (i|d)cache line ops loops
Nothing semantical
* simplify the alignement code by using & operation only
* rename variables clearly as paddr

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 14:18:50 +05:30
Vineet Gupta
c917a36f5f ARC: [mm] serious bug in vaddr based icache flush
vaddr used to index the cache was clipped from the wrong end, and thus
would potentially fail to flush the correct lines.

The problem was dorment for so long because up until the recent
optimizations it was only used for ptrace break-point only flushes.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09 13:45:12 +05:30
Vineet Gupta
eacd0e950d ARC: [mm] Lazy D-cache flush (non aliasing VIPT)
flush_dcache_page( ) is MM hook to ensure that a page has consistent
views between kernel and userspace. Thus it is called when

* kernel writes to a page which at some later point could get mapped to
  userspace (so kernel mapping needs to be flushed-n-inv)
* kernel is about to read from a page with possible userspace mappings
  (so userspace mappings needs to be made coherent with kernel ones)

However for Non aliasing VIPT dcache, any userspace mapping will always
be congruent to kernel mapping. Thus d-cache need need not be flushed at
all (or delayed indefinitely).

The only reason it does need to be flushed is when mapping code pages.
Since icache doesn't snoop dcache, those dirty dcache lines need to be
written back to memory and icache line invalidated so that icache lines
fetch will get the right data.

Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.

(1) FPGA @ 80 MHZ

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc6-a Linux 3.9.0-r   80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
3.9-rc6-b Linux 3.9.0-r   80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
3.9-rc7-c Linux 3.9.0-r   80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
                                                                ^^^^ ^^^^ ^^^

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                        Create Delete Create Delete Latency Fault  Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc6-a Linux 3.9.0-r  317.8  204.2 1122.3  375.1 3522.0 4.288     20.7 126.8
3.9-rc6-b Linux 3.9.0-r  298.7  223.0 1141.6  367.8 3531.0 4.866     20.9 126.4
3.9-rc7-c Linux 3.9.0-r  278.4  179.2  862.1  339.3 3705.0 3.223     20.3 126.6
                         ^^^^^  ^^^^^  ^^^^^  ^^^^

(2) Customer Silicon @ 500 MHz (166 MHz mem)

------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
abilis-ba Linux 3.9.0-r  497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
abilis-ca Linux 3.9.0-r  497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
                                                                ^^^^ ^^^^ ^^^

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:15 +05:30
Vineet Gupta
764531cc5a ARC: [mm] micro-optimize page size icache invalidate
start address is already page aligned and size is const PAGE_SIZE,
thus fixups for alignment not needed in generated code.

bloat-o-meter vmlinux-mm5 vmlinux
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
function                                     old     new   delta
__inv_icache_page                             82      50     -32

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:14 +05:30
Vineet Gupta
7f250a0fa1 ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers
No users of this code anymore - so RIP !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:13 +05:30
Vineet Gupta
94bad1afee ARC: [mm] consolidate icache/dcache sync code
Now that we have same helper used for all icache invalidates (i.e.
vaddr+paddr based exact line invalidate), consolidate the open coded
calls into one place.

Also rename flush_icache_range_vaddr => __sync_icache_dcache

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:13 +05:30
Vineet Gupta
7586bf7286 ARC: [mm] optimise icache flush for kernel mappings
This change continues the theme from prev commit - this time icache
handling for kernel's own code modification (vmalloc: loadable modules,
breakpoints for kprobes/kgdb...)

flush_icache_range() calls the CDU icache helper with vaddr to enable
exact line invalidate.

For a true kernel-virtual mapping, the vaddr is actually virtual hence
valid as index into cache. For kprobes breakpoint however, the vaddr arg
is actually paddr - since that's how normal kernel is mapped in ARC
memory map.  This implies that CDU will use the same addr for
indexing as for tag match - which is fine since kernel code would only
have that "implicit" mapping and none other.

This should speed up module loading significantly - specially on default
ARC700 icache configurations (32k) which alias.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:12 +05:30
Vineet Gupta
24603fdd19 ARC: [mm] optimise icache flush for user mappings
ARC icache doesn't snoop dcache thus executable pages need to be made
coherent before mapping into userspace in flush_icache_page().

However ARC700 CDU (hardware cache flush module) requires both vaddr
(index in cache) as well as paddr (tag match) to correctly identify a
line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
the paddr only based flush_icache_page() API couldn't be implemented
efficiently. It had to loop thru all possible alias indexes and perform
the invalidate operation (ofcourse the cache op would only succeed at
the index(es) where tag matches - typically only 1, but the cost of
visiting all the cache-bins needs to paid nevertheless).

Turns out however that the vaddr (along with paddr) is available in
update_mmu_cache() hence better suits ARC icache flush semantics.
With both vaddr+paddr, exactly one flush operation per line is done.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 19:08:12 +05:30
Vineet Gupta
8d56bec2f2 ARC: [mm] optimize needless full mm TLB flush on munmap
munmap ends up calling tlb_flush() which for ARC was flushing the entire
TLB unconditionally (by moving the MMU to a new ASID)

do_munmap
  unmap_region
    unmap_vmas
      unmap_single_vma
         unmap_page_range
            tlb_start_vma
            zap_pud_range
            tlb_end_vma()
  tlb_finish_mmu
    tlb_flush()  ---> unconditional flush_tlb_mm()

So even a single page munmap, a frequent operation when uClibc dynamic
linker (ldso) is loading the dependent shared libraries, would move the
the ASID multiple times - needlessly invalidating the pre-faulted TLB
entries (and increasing the rate of ASID wraparound + full TLB flush).

This is now optimised to only be called if tlb->full_mm (which means
for exit/execve) cases only. And for those cases, flush_tlb_mm() is
already optimised to be a no-op for mm->mm_users == 0.

So essentially there are no mmore full mm flushes - except for fork which
anyhow needs it for properly COW'ing parent address space.

munmap now needs to do TLB range flush, which is implemented with
tlb_end_vma()

Results
-------
1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
   7 before).

2. LMBench microbenchmark also shows improvements

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem scal
                                                     pages line   par load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                        Create Delete Create Delete Latency Fault  Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 13:44:00 +05:30