1# Android platform profiling 2 3## Table of Contents 4- [Android platform profiling](#android-platform-profiling) 5 - [Table of Contents](#table-of-contents) 6 - [General Tips](#general-tips) 7 - [Start simpleperf from system_server process](#start-simpleperf-from-system_server-process) 8 - [Hardware PMU counter limit](#hardware-pmu-counter-limit) 9 10## General Tips 11 12Here are some tips for Android platform developers, who build and flash system images on rooted 13devices: 141. After running `adb root`, simpleperf can be used to profile any process or system wide. 152. It is recommended to use the latest simpleperf available in AOSP main, if you are not working 16on the current main branch. Scripts are in `system/extras/simpleperf/scripts`, binaries are in 17`system/extras/simpleperf/scripts/bin/android`. 183. It is recommended to use `app_profiler.py` for recording, and `report_html.py` for reporting. 19Below is an example. 20 21```sh 22# Record surfaceflinger process for 10 seconds with dwarf based call graph. More examples are in 23# scripts reference in the doc. 24$ python app_profiler.py -np surfaceflinger -r "-g --duration 10" 25 26# Generate html report. 27$ python report_html.py 28``` 29 304. Since Android >= O has symbols for system libraries on device, we don't need to use unstripped 31binaries in `$ANDROID_PRODUCT_OUT/symbols` to report call graphs. However, they are needed to add 32source code and disassembly (with line numbers) in the report. Below is an example. 33 34```sh 35# Doing recording with app_profiler.py or simpleperf on device, and generates perf.data on host. 36$ python app_profiler.py -np surfaceflinger -r "--call-graph fp --duration 10" 37 38# Collect unstripped binaries from $ANDROID_PRODUCT_OUT/symbols to binary_cache/. 39$ python binary_cache_builder.py -lib $ANDROID_PRODUCT_OUT/symbols 40 41# Report source code and disassembly. Disassembling all binaries is slow, so it's better to add 42# --binary_filter option to only disassemble selected binaries. 43$ python report_html.py --add_source_code --source_dirs $ANDROID_BUILD_TOP --add_disassembly \ 44 --binary_filter surfaceflinger.so 45``` 46 47## Start simpleperf from system_server process 48 49Sometimes we want to profile a process/system-wide when a special situation happens. In this case, 50we can add code starting simpleperf at the point where the situation is detected. 51 521. Disable selinux by `adb shell setenforce 0`. Because selinux only allows simpleperf running 53 in shell or debuggable/profileable apps. 54 552. Add below code at the point where the special situation is detected. 56 57```java 58try { 59 // for capability check 60 Os.prctl(OsConstants.PR_CAP_AMBIENT, OsConstants.PR_CAP_AMBIENT_RAISE, 61 OsConstants.CAP_SYS_PTRACE, 0, 0); 62 // Write to /data instead of /data/local/tmp. Because /data can be written by system user. 63 Runtime.getRuntime().exec("/system/bin/simpleperf record -g -p " + String.valueOf(Process.myPid()) 64 + " -o /data/perf.data --duration 30 --log-to-android-buffer --log verbose"); 65} catch (Exception e) { 66 Slog.e(TAG, "error while running simpleperf"); 67 e.printStackTrace(); 68} 69``` 70 71## Hardware PMU counter limit 72 73When monitoring instruction and cache related perf events (in hw/cache/raw/pmu category of list cmd), 74these events are mapped to PMU counters on each cpu core. But each core only has a limited number 75of PMU counters. If number of events > number of PMU counters, then the counters are multiplexed 76among events, which probably isn't what we want. 77 78On Pixel devices, the number of PMU counters on each core is usually 7, of which 4 of them are used 79by the kernel to monitor memory latency. So only 3 counters are available. It's fine to monitor up 80to 3 PMU events at the same time. To monitor more than 3 events, the `--use-devfreq-counters` option 81can be used to borrow from the counters used by the kernel. 82