5745954a5a
The L2 cache perf driver is named 'l2cache_counters' and can be used with perf tool to profile L2 cache events as below => DDR read (Read-Shared, Read-Unique, Read-Clean and Read-Not-Shared-Dirty transactions on GNOC Interface) => DDR write (Write-Back, Write-Clean and Write-Evict transactions on GNOC Interface => SNOOP Read (Read-Once, Read-Shared, Read-Unique, Read-Clean and Read-Not-Shared-Dirty transactions from GNOC to Cluster interface) => ACP Write(Write-Back, Write-Clean and Write-Evict transactions to ACP port of Collapsed Cluster) => Tenure counter(Low-Power mode tenure is used to count tenure (no. of XO- 19.2MHz) of L2 Low-Power mode. => Low/Mid/High occurrence counter: Based on threshold set for low and mid tenure counter, current tenure count is compared and based on which category it belongs, respective occurrence counter gets incremented. e.g: 1. 0 < Current Tenure <= Low-tenure threshold : Low-Tenure 2. Low-tenure < Current Tenure <= Mid-tenure threshold : Mid-Tenure 3. Mid-tenure < Current tenure : High-Tenure Change-Id: I9f8aedd21a92cbd6908deb5a8e4c7e32220bea74 Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
63 lines
2.2 KiB
Text
63 lines
2.2 KiB
Text
Qualcomm Technologies, Inc. l2 Cache counters
|
|
=============================================
|
|
|
|
This driver supports the L2 cache clusters counters found in
|
|
Qualcomm Technologies, Inc.
|
|
|
|
There are multiple physical L2 cache clusters, each with their
|
|
own counters. Each cluster has one or more CPUs associated with it.
|
|
|
|
There is one logical L2 PMU exposed, which aggregates the results from
|
|
the physical PMUs(counters).
|
|
|
|
The driver provides a description of its available events and configuration
|
|
options in sysfs, see /sys/devices/l2cache_counters.
|
|
|
|
The "format" directory describes the format of the events.
|
|
|
|
And format is of the form 0xXXX
|
|
Where,
|
|
|
|
1 bit(lsb) for group (group is either txn/tenure counter).
|
|
4 bits for serial number for counter starting from 0 to 8.
|
|
5 bits for bit position of counter enable bit in a register.
|
|
|
|
The driver provides a "cpumask" sysfs attribute which contains a mask
|
|
consisting of one CPU per cluster which will be used to handle all the PMU
|
|
events on that cluster.
|
|
|
|
Examples for use with perf:
|
|
|
|
perf stat -e l2cache_counters/ddr_read/,l2cache_counters/ddr_write/ -a sleep 1
|
|
|
|
perf stat -e l2cache_counters/cycles/ -C 2 sleep 1
|
|
|
|
Limitation: The driver does not support sampling, therefore "perf record" will
|
|
not work. Per-task perf sessions are not supported.
|
|
|
|
For transaction counters we don't need to set any configuration
|
|
before monitoring.
|
|
|
|
For tenure counter use case, we need to set threshold value of low and mid
|
|
range occurrence counter value of cluster(as these occurrence counter exist
|
|
for each cluster) in sysfs.
|
|
|
|
echo 1 > /sys/bus/eventsource/devices/l2cache_counters/which_cluster_tenure
|
|
echo X > /sys/bus/event_source/devices/l2cache_counters/low_tenure_threshold
|
|
echo Y > /sys/bus/event_source/devices/l2cache_counters/mid_tenure_threshold
|
|
Here, X < Y
|
|
|
|
e.g:
|
|
|
|
perf stat -e l2cache_counters/low_range_occur/ -e
|
|
l2cache_counters/mid_range_occur/ -e l2cache_counters/high_range
|
|
_occur/ -C 4 sleep 10
|
|
|
|
Performance counter stats for 'CPU(s) 4':
|
|
|
|
7 l2cache_counters/low_range_occur/
|
|
5 l2cache_counters/mid_range_occur/
|
|
7 l2cache_counters/high_range_occur/
|
|
|
|
10.204140400 seconds time elapsed
|
|
|