Few wakelocks tends to get stuck for no reason. Blocking them
isn't necessary and sometimes blocking them breaks basic
functionality.
Wakelocks like "tx_swr_ctrl" tends to get stuck if we keep earphones
connected and drops battery massively.
Test: Keep earphones plugged in and leave device for few hours
Expected result: No "tx_swr_ctrl" is being stuck.
Actual result: Patch is working as expected.
Change-Id: I5296990a84ab44cf6e449d6535b8b99408c415c8
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
We need kgsl_worker_thread to preempt
all userspace surfaceflinger threads to
avoid a possible deadlock.
This will prevent the SF
threads from "stealing" cputime from
kgsl_worker_thread.
This is important, since kgsl_worker_thread
executes work which blocks SF from proceeding.
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
It does not make sense to run kgsl on high and devfreq on regular
priority.
Change-Id: Ie5e6c9353a4e1324a6a49278e5ad3638462f551c
Signed-off-by: flar2 <asegaert@gmail.com>
These wakelocks has pretty high wakeups [ rmnet_ipa had 496 wakeups in just 20 minutes, IPA_CLIENT_ had 329 wakeups in ~1hr]. Block them to ensure they aren't causing "unnecessary"(might vary) wakeups.
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Sudeep Duhoon <rishi.gothic@gmail.com>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
There are now two lists:
- the previously existing list of user defined wakelocks to block
- a new list called "wakelock_blocker_default" which comes prepopulated with the most common
and safe wakelocks to block:
qcom_rx_wakelock;wlan;wlan_wow_wl;wlan_extscan_wl;netmgr_wl;NETLINK
A combination of both wakelock lists will be blocked finally.
Based on ideas of FranciscoFranco's non-generic driver.
Sysfs node:
/sys/class/misc/boeffla_wakelock_blocker/wakelock_blocker
- list of wakelocks to be blocked, separated by semicolons
/sys/class/misc/boeffla_wakelock_blocker/debug
- write: 0/1 to switch off and on debug logging into dmesg
- read: get current driver internals
/sys/class/misc/boeffla_wakelock_blocker/version
- show driver version
Signed-off-by: andip71 <andreasp@gmx.de>
Generating a sync fence name by allocating memory dynamically and using
scnprintf in a hot path results in excessive CPU time wasted on unneeded
debug info. Remove the name generation entirely to cut down CPU waste in
the GPU's rendering hot path.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
POPP constantly attempts to lower the GPU's frequency behind the
governor's back in order to save power; however, the GPU governor in use
(msm-adreno-tz) is very good at determining the GPU's load and selecting
an appropriate frequency to run the GPU at.
POPP was created long ago, perhaps when msm-adreno-tz didn't exist or
didn't work so well, so it is clearly deprecated. Remove it.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Waking the GPU upon touch wastes power when the screen is being touched
in a way that does not induce animation or any actual need for GPU usage.
Instead of preemptively waking the GPU on touch input, wake it up upon
receiving a IOCTL_KGSL_GPU_COMMAND ioctl since it is a sign that the GPU
will soon be needed.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
-O3 is much more stable with modern compilers these days than it was a
decade ago. Using -O3 on the kernel results in significantly improved
hackbench performance, which is a sign that overall performance in the
kernel is improved. It works especially well in conjunction with LTO.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
ARM erratum 1188873 can cause very severe issues if left unmitigated on
affected cores. However, if unrelated ARM arch timer mitigations are
disabled (e.g. Freescale, Hisilicon, and Cortex-A73 errata), the
mitigation for 1188873 will no longer be applied even if the config
option is on. This is because this mitigation is implemented as a
generic OOL erratum workaround, and disabling other mitigations will
cause the OOL workaround framework to stop being compiled.
Add an explicit dependency on the workaround framework to fix this, and
also add a dependency on ARM_ARCH_TIMER while we're at it.
On Android, leaving this erratum causes the AlarmManager thread in
system_server to use 100% CPU in the background after 20 minutes of
screen-off idle. The CPU hogging never stops until sytem_server is
killed and sometimes also causes system_server crashes due to invalid
negative durations in batterystats resulting from the corrupted timer
values.
Test: Let device idle for 20 minutes and check AlarmManager CPU usage
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
The fix for ARM erratum 1188873 is the only configurable one in use.
Disable the rest.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Gagan Malvi <malvigagan@gmail.com>
A lot of CPU time is wasted on allocating, populating, and copying
debug names back and forth with userspace when they're not actually
needed. We can't just remove the name buffers from the various sync data
structures though because we must preserve ABI compatibility with
userspace, but instead we can just pretend the name fields of the
user-shared structs aren't there. This massively reduces the sizes of
memory allocated for these data structures and the amount of data passed
between userspace, as well as eliminates a kzalloc() entirely from
sync_file_ioctl_fence_info(), thus improving graphics performance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
A measurably significant amount of CPU time is spent on logging events
for debugging purposes in lpm_cpuidle_enter. Kill the useless logging to
reduce overhead.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted. If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.
The sequence:
mkdir -p /tmp/Mtest/{0..5000}
time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
time umount /tmp/Mtest/*
on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.
Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.
If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds
I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
On a Kryo 485 CPU (semi-custom Cortex-A76 derivative) in a Snapdragon
855 (SM8150) SoC, switching from traditional LL/SC atomics to LSE
causes LKDTM's ATOMIC_TIMING test to regress by 2x:
LL/SC ATOMIC_TIMING: 34.14s 34.08s
LSE ATOMIC_TIMING: 70.84s 71.06s
Prefetching the target operands fixes the regression and makes LSE
perform better than LSE as expected:
LSE+prfm ATOMIC_TIMING: 21.36s 21.21s
"dd if=/dev/zero of=/dev/null count=10000000" also runs faster:
LL/SC: 3.3 3.2 3.3 s
LSE: 3.1 3.2 3.2 s
LSE+p: 2.3 2.3 2.3 s
Commit 0ea366f5e1 applied the same change
to LL/SC atomics, but it was never ported to LSE.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
This adds support to arm64 for fast refcount checking, as contributed
by Kees for x86 based on the implementation by grsecurity/PaX.
The general approach is identical: the existing atomic_t helpers are
cloned for refcount_t, with the arithmetic instruction modified to set
the PSTATE flags, and one or two branch instructions added that jump to
an out of line handler if overflow, decrement to zero or increment from
zero are detected.
One complication that we have to deal with on arm64 is the fact that
it has two atomics implementations: the original LL/SC implementation
using load/store exclusive loops, and the newer LSE one that does mostly
the same in a single instruction. So we need to clone some parts of
both for the refcount handlers, but we also need to deal with the way
LSE builds fall back to LL/SC at runtime if the hardware does not
support it.
As is the case with the x86 version, the performance gain is substantial
(ThunderX2 @ 2.2 GHz, using LSE), even though the arm64 implementation
incorporates an add-from-zero check as well:
perf stat -B -- echo ATOMIC_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
116252672661 cycles # 2.207 GHz
52.689793525 seconds time elapsed
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
127060259162 cycles # 2.207 GHz
57.243690077 seconds time elapsed
For comparison, the numbers below were captured using CONFIG_REFCOUNT_FULL,
which uses the validation routines implemented in C using cmpxchg():
perf stat -B -- echo REFCOUNT_TIMING >/sys/kernel/debug/provoke-crash/DIRECT
Performance counter stats for 'cat /dev/fd/63':
191057942484 cycles # 2.207 GHz
86.568269904 seconds time elapsed
As a bonus, this code has been found to perform significantly better on
systems with many CPUs, due to the fact that it no longer relies on the
load/compare-and-swap combo performed in a tight loop, which is what we
emit for cmpxchg() on arm64.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Cc: Jan Glauber <jglauber@cavium.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[kdrag0n]
- Backported to k4.14 from:
https://www.spinics.net/lists/arm-kernel/msg735992.html
- Forward-ported to k4.19
- Benchmarked on sm8150 using perf and LKDTM REFCOUNT_TIMING:
https://docs.google.com/spreadsheets/d/14CctCmWzQAGhOmpHrBJfXQy_HuNFTpEkMEYSUGKOZR8/edit
| Fast checking | Generic checking
---------+--------------------+-----------------------
Cycles | 79235532616 | 102554062037
| 79391767237 | 99625955749
Time | 32.99879212 sec | 42.5354029 sec
| 32.97133254 sec | 41.31902045 sec
Average:
Cycles | 79313649927 | 101090008893
Time | 33 sec | 42 sec
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Mixing kernel and user debug hooks together is highly error-prone as it
relies on all of the hooks to figure out whether the exception came from
kernel or user, and then to act accordingly.
Make our debug hook code a little more robust by maintaining separate
hook lists for user and kernel, with separate registration functions
to force callers to be explicit about the exception levels that they
care about.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[kdrag0n: Ported to android-4.19 with adaptations for KASAN hook]
Signed-off-by: Danny Lin <danny@kdrag0n.dev>