Certain L3 workloads are latency sensitive to cache snoop traffic and
this traffic is not directly quantified in the existing memlat scheme.
Use L2 writeback percentage as a metric to identify snoop traffic to
improve memlat.
Change-Id: I9d43375d96de5a199c6a87c55e5c1079549b23ce
Signed-off-by: Santosh Mardi <gsantosh@codeaurora.org>
Currently, each individual arm-memlat-mon instance separately reads from
a few shared PMUs, which leads to unnecessary IPI overhead. Tweak the
arm-memlat-mon design to allow multiple monitors to share certain event
counts, lowering overhead.
Change-Id: Ic3358afd1853efd84a4c3d5f3d66c3a86df6e5e7
Signed-off-by: Jonathan Avila <avilaj@codeaurora.org>
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
Some targets support different DDR types. Detect the DDR type
and populate the frequency table accordingly. OPP framework
supports opp-supported-hw bit map which allows the selected
frequencies to be added to the opp table based on the hardware
version check. Implement the same in the bandwidth monitor device
so that the right frequencies would be added to the opp table.
This patch also adds checks for the latency based device to detect
the DDR type at runtime and add the corresponding frequency map.
Change-Id: Ice5a0b14da67b3f2f07e98bc7349220da7d4efdb
Signed-off-by: Santosh Mardi <gsantosh@codeaurora.org>
Signed-off-by: Rama Aparna Mallavarapu <aparnam@codeaurora.org>
Currently, an entirely separate governor is used to handle compute-bound
cases. Roll this functionality into the existing memlat governor and
allow devices to choose which logic to target by specifying different
drivers in the 'compatible' field.
Change-Id: I8dc41bf0474309c3564dec8e6c813511b8a7fc02
Signed-off-by: Jonathan Avila <avilaj@codeaurora.org>
Signed-off-by: Rama Aparna Mallavarapu <aparnam@codeaurora.org>
Some workloads doing memory access might appear memory latency bound even
though they might not actually be memory latency bound.
This error can happen when the core that's running the workload is very
parallelized or can do out of order executions, etc so not all memory
accesses would actually stall the core.
This can also happen when the the memory access monitoring capabilities
aren't ideal and end up counting more kinds of memory accesses than what
would be ideal. In this case, the IPM ratio can be lower than what it would
be if we had ideal monitoring capabilities.
To account for these errors, if the core has a stall cycle counting
capabilities, check for a minimum stall% before the workload is considered
memory latency bound. This would help reduce the inaccuracies, but is not a
replacement for IPM ratio scheme because the stall% method doesn't allow us
to detect which level of memory the workload is latency bound on, but the
IPM ratio does (based on which memory accesses we use for calculating the
ratio).
Change-Id: I4363d7848584e5562f6683b5ad6b0f99017ec71b
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rama Aparna Mallavarapu <aparnam@codeaurora.org>
Use performance counters to detect the memory latency sensitivity
of CPU workloads and vote for higher DDR frequency if required.
Change-Id: Ie77a3523bc5713fc0315bd0abc3913f485a96e0e
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Signed-off-by: Rama Aparna Mallavarapu <aparnam@codeaurora.org>