Building a Distributed Android Remote Testing Platform - An Attempt to Make Real GUI Testing Affordable & Comprehensive

23.11.2020 — android, testing, grpc — 9 min read

There are currently ~2.5 billion Android devices — consisting of ~1,300 discrete brands and ~24,000 unique device models. I'm exploring tapping into this latent resource pool to make automated testing affordable & unlock a more comprehensive configuration coverage.

The goal of DART (Distributed Android Remote Testing) is to enable any Android device to run automated tests — without ADB or complex setups/maintenance. A commercial product could pay people per hour of device use, allowing anyone to utilize idle device time. Each device is subsumed to provide a distributed testbed.

DART is orthogonal to crowdsourced testing — the latter requires active human participation.

DART seems to be the best solution to utilize device idle time — the best way to verify that an app works on "Samsung S9" is by executing it on "Samsung S9". Contrast this to other procedures where phones are usually a suboptimal option: server hosting, remote computation, etc.

In the demo below, I ran a few test cases on Mozilla Focus Privacy Browser without cables/ADB.

1.1. 💪 Motivation

The following problems motivated the development of DART.

i. 🧩 Android Device Fragmentation

The top device farms barely cover 1% of the total number of Android device models. Nondistributed device farms favor depth instead of breadth — maintenance cost increases as more diverse devices are added.

Testing on flagship devices alone doesn't always suffice.

Anecdote: I once spent an entire day debugging an ANR that occurs to a small fraction of our users. I discovered it only affects phones purchased in specific regions. I never was able to reproduce the issue on the same phone bought in the USA.

The Android version distribution exacerbates this problem.

Headspin aims to alleviate some of these issues. They claim to have a global infrastructure of more than 22,000 SIM-enabled (similar) devices in 150 locations. A big plus as traditional services (like Firebase Testlabs) can't handle geographical concerns. At Headspin's core is the DeviceFarmer STF open-source library. But like its predecessors, Headspin only supports a limited set of device models.

ii. 💸 Expensive Physical Devices

Real device test farms have an initial cost and require periodic maintenance, hence why they are more expensive than virtual device farms.

Running 10 hours of test daily on real devices costs $1000/month (assuming 20 working days of equal levels of productivity).

1.2. 📱 Product

The key 🔑: Uber averts enormous maintenance costs by not owning all the cars on its global network. Similarly, with a distributed network of test devices, extensive device coverage is attainable with minimal maintenance costs. For one or two co-located devices, maintenance isn't fastidious.

From DeviceFramer STF FAQ: "Aside from a single early failure we had within only a few months, all of our devices were doing fine for about two years. However, having reached the 2-3 year mark, several devices have started to experience visibly expanded batteries [...] In our experience, the system runs just fine most of the time, and any issues are mostly USB-related. You'll usually have to do something about once a week."

Organizations can decide what type of tests are ideal for DART and if the network is made-up of single nodes or clusters of nodes. The simple taxonomy of android testing below should be helpful. [1]

The network can be internal to an organization. An example: X devices stored in HQ, each employee can remotely contribute their test device into the system.

It can also be external to an organization. Imagine paying people $1.5/hour of device time. Assuming a device runs test overnight (thanks to timezone differences), the estimated monthly earning is $240 ($1.5 * 8 hours * 20 working days).

The median income in some countries is ~$400/month.

Below is a demo of the proof-of-concept.

Currently, a simple dashboard renders the test summary.

1.3. 🛡️ Security

There are two perspectives here:

i. 🧐 End-user Perspective

The end-user is concerned about malignant binaries & privacy.

Security checks will guard devices from malicious binaries. The system will be closed — only verified publishers allowed. GooglePlay is leveraged to enforce package name and signature correspondence.

A sandboxed test environment guarantees that private data is inaccessible to apps under test.

ii. 👩🏽‍💻 Publisher Perspective

The publisher cares about test fraud and product leaks.

Proof of work disincentivizes test fraud. The server can use device logs, screenshots, videos with UUID, and past executions to validate submitted work results. As this isn't watertight, it is best combined with other strategies: user verifications, device attestation, randomization, and device throughput throttling.

Preventing product leaks is hard — if it runs on a phone, any sufficiently motivated (& knowledgeable) person can figure out a way to access objects. Three good-enough solutions are:

a. Running headless tests.

b. Using a private network of vetted devices.

c. Adopting only for binaries at the later end of the release pipeline.

1.4. 🎃 Going Headless

It is possible to obscure the application under test using a simple overlay. The video below shows the execution of the test cases of the Google IO 2019 app. On the right, the app interactions are invisible. Depending on the strategy, concealing actions do not affect screenshots or videos.

With framework UI component wrappers, it is possible to borrow some ideas from Layoutlib to offer a full headless solution. Layoutlib is a custom version of the android View framework designed to run inside Eclipse. The goal of the library is to provide layout rendering in Eclipse that are very close to their rendering on devices.

1.5. 📦 Containerization

Tests need to run in an isolated environment. A plugin framework like VirtualAPK can be built to offer a layer of isolation between apps and the OS. This works using the DexClassLoader and some reflection hacks to modify framework components. With this approach, apps can be seamlessly loaded & unloaded without any user interaction.

The GooglePlay dynamic feature delivery system uses some of these reflection techniques to support legacy devices (pre Android 7.0) 🙈. For instance, addAssetPath is called via reflection to make resources not bundled in an APK available for use.

This plugin approach adds cruft that is nonexistent on a regular device. An alternative is using the Android work profile. Work profile creates a separate, self-contained profile on Android devices that isolates corporate data from personal apps and data. [3]

Work profiles also provide always-on VPN configuration, control of runtime permissions, extra security, etc. Work profiles are a great fit, only deviating slightly from the operating mode used by the average user.

Android allows apps signed with the same key to run in the same process, if the apps so request, so that the system treats them as a single application. [3]

1.6. 💔 Flakiness

The system should eliminate most causes of flakiness that results from running tests on real peoples' devices — I have run over 5,000 tests so far without any issues 🚀. In an external network, it would be naive to assume perfection as there are so many variables introduced. As the project evolves, these issues would be identified and mitigated.

A few researchers manually examined 423 projects featuring the Espresso automated GUI testing tool. They derived a set of 27 different causes of modifications and grouped them into nine macro-categories. Category percentages were then computed based on the frequency of modification causes. [2]

1.7. 🚧 Code Repository

There are four primary directories in the Github repo:

i. Apollo: Android project for the client (worker).

ii. WorkServer: Kotlin project for the gRPC work server.

iii. WorkSpecs: Protobuf definitions shared by both the server & Android client.

iv. Dashboard: React project for the web dashboard.

All projects are still a work in progress and lack adequate documentation.

2. 🏗️ System Overview

As shown above, multiple components must work in tandem to run tests on a device. The sequence diagram below shows the component interactions at a high-level.

sequenceDiagram participant Worker as Worker (Device) participant WorkRunner participant TestServer participant TestClient participant Telemetry participant InstrumentationTests participant AppUnderTest participant UiAutomator activate Worker Worker->>WorkRunner: Run Work-X activate WorkRunner WorkRunner->>WorkRunner: Download & unpack payload WorkRunner->>WorkRunner: Perform redundant cleanup WorkRunner->>UiAutomator: Connect UiAutomator activate UiAutomator UiAutomator-->>WorkRunner: Connected! WorkRunner->>UiAutomator: Install AppUnderTest.apk UiAutomator-->>WorkRunner: Installed! WorkRunner->>UiAutomator: Install Test.apk UiAutomator-->>WorkRunner: Installed! WorkRunner->>Worker: Kill other apps activate Worker Worker-->>WorkRunner: Killed! deactivate Worker WorkRunner->>Worker: Dismiss system dialogs activate Worker Worker-->>WorkRunner: Dismissed! deactivate Worker WorkRunner->>Worker: Activate DND activate Worker Worker-->>WorkRunner: Activated! deactivate Worker WorkRunner->>TestServer: Start test server activate TestServer TestServer->>TestServer: Start timeout timer TestServer->>TestServer: Prepare test config TestServer->>TestClient: Start Instrumentation(config) activate TestClient TestClient-->>TestServer: Started! TestClient->>TestServer: Establish server connection TestServer-->>TestClient: Established! TestServer-xTestClient: Connect Activity boostrapper TestClient->>UiAutomator: Acquire runtime permissions UiAutomator-->>TestClient: Acquired! TestClient->>TestClient: Register componenets monitor TestClient->>Telemetry: Start telemetry activate Telemetry Telemetry-->>TestClient: Started! par Telemetry loop every x milliseconds Telemetry->>Telemetry: Collect performance, UI, & device health data Telemetry-xTestClient: Store performance, UI, & device health data TestClient-xTestServer: Store performance, UI, & device health data end and Test Execution TestClient-xInstrumentationTests: Run all instrumentation tests activate InstrumentationTests activate AppUnderTest loop for each test in config InstrumentationTests-xTestClient: On test-i run started TestClient-xTestServer: On test-i run started InstrumentationTests->>InstrumentationTests: Run test-i InstrumentationTests-xAppUnderTest: Interact with components alt is test-i run successful InstrumentationTests-xTestClient: On test-i run successful! TestClient-xTestServer: On test-i run successful! else is test-i run failed InstrumentationTests-xTestClient: On test-i run failed! TestClient-xTestServer: On test-i run failed! end opt is test-i run ignored InstrumentationTests-xTestClient: On test-i run ignored! TestClient-xTestServer: On test-i run ignored! end end deactivate AppUnderTest deactivate InstrumentationTests TestClient-xTestServer: On instrumentation test completed! TestClient-xTestServer: Store artifacts TestClient-xTelemetry: Stop telemetry deactivate Telemetry end TestClient->>TestClient: Unregister components monitor TestClient-xTestServer: Disconnect from server TestClient->>TestClient: Cleanup & terminate deactivate TestClient TestServer->>TestServer: Redundantly terminate client TestServer->>TestServer: Stop timeout timer TestServer->>TestServer: Assemble work report TestServer-xWorkRunner: On work completed(report) WorkRunner-xTestServer: Stop test sever deactivate TestServer WorkRunner-->>Worker: Pack & upload Work-X report Worker->>WorkRunner: Finish WorkRunner->>UiAutomator: Uninstall AppUnderTest.apk UiAutomator-->>WorkRunner: Success! WorkRunner->>UiAutomator: Uninstall Test.apk UiAutomator-->>WorkRunner: Success! WorkRunner-xUiAutomator: Disconnect UiAutomator deactivate UiAutomator WorkRunner->>Worker: Deactivate DND activate Worker Worker-->>WorkRunner: Deactivated! deactivate Worker WorkRunner->>WorkRunner: Delete work data WorkRunner->>WorkRunner: Perform cleanup WorkRunner->>Worker: Runner finished! deactivate WorkRunner deactivate Worker

Note: Lines with an "x" represent asynchronous messages (issue in Mermaid).

The diagram shows the finite system states and their respective transitions.

stateDiagram-v2 [*] --> FindingWork: Start button clicked FindingWork --> [*]: Stop button clicked FindingWork --> DownloadingPayload: Work found DownloadingPayload --> FindingWork: Payload download failed DownloadingPayload --> UnpackingPayload: Payload downloaded UnpackingPayload --> FindingWork: Payload unpack failed UnpackingPayload --> InstallingWork: Payload unpacked InstallingWork --> FindingWork: Install failed InstallingWork --> PreparingRun: Payload installed PreparingRun --> FindingWork: Run preparation failed PreparingRun --> RunningWork: Run preparation successful RunningWork --> FindingWork: Run failed state RunningWork { [*] --> RunningTest RunningTest --> RunningTest: Retry tests RunningTest --> [*] } RunningWork --> FinalizingRun: Run completed FinalizingRun --> PackingResults: Run finalized PackingResults --> FindingWork: Result packing failed PackingResults --> UploadingResults: Result packed UploadingResults --> FindingWork: Upload result failed UploadingResults --> CleaningUpWork: Result uploaded CleaningUpWork --> FindingWork: Cleanup completed

The next few sections cover individual system components. For the sake of brevity, I exclude codes that are irrelevant to the topic at hand. For each component, I only focus on some key ideas. You can always explore the Github repo for the full source code.

3. 🤖 UiAutomator

The framework's UiAutomation can't be used when tests aren't run traditionally. Fortunately, the functionalities of UiAutomation can be replicated using the platform Accessibility APIs and some other functionally equivalent alternatives. As evident from the documentation, UiAutomation can be viewed as an AccessiblityService with extras.

android.app.UiAutomation.java

1/**
2 * Class for interacting with the device's UI by simulation user actions and
3 * introspection of the screen content. It relies on the platform accessibility
4 * APIs to introspect the screen and to perform some actions on the remote view
5 * tree. It also allows injecting of arbitrary raw input events simulating user
6 * interaction with keyboards and touch devices. One can think of a UiAutomation
7 * as a special type of {@link android.accessibilityservice.AccessibilityService}
8 * which does not provide hooks for the service life cycle and exposes other
9 * APIs that are useful for UI test automation.
10 * <p>
11 * The APIs exposed by this class are low-level to maximize flexibility when
12 * developing UI test automation tools and libraries. Generally, a UiAutomation
13 * client should be using a higher-level library or implement high-level functions.
14 * For example, performing a tap on the screen requires construction and injecting
15 * of a touch down and up events which have to be delivered to the system by a
16 * call to {@link #injectInputEvent(InputEvent, boolean)}.
17 * </p>
18 * <p>
19 * The APIs exposed by this class operate across applications enabling a client
20 * to write tests that cover use cases spanning over multiple applications. For
21 * example, going to the settings application to change a setting and then
22 * interacting with another application whose behavior depends on that setting.
23 * </p>
24 */
25public final class UiAutomation {
26}

The alternative UiAutomation (internal) server can be implemented like this:

UiAutomationServer.kt

1class UiAutomationServer(
2    private val screenRotator: ScreenRotator,
3    private val screenViewer: ScreenViewer,
4    private val appPermissioner: AppPermissioner
5): AutomationServer.Stub() {
6
7    var lastAccessibilityEvent: AccessibilityEvent? = null
8    var accessibilityEventListener: OnAccessibilityEventListener? = null
9
10    fun onAccessibilityEvent(event: AccessibilityEvent) {
11        lastAccessibilityEvent = event
12
13        val uiEvent = addUiEvent(event)
14        accessibilityEventListener?.onAccessibilityEvent(uiEvent)
15    }
16
17    override fun setOnAccessibilityEventListener(listener: OnAccessibilityEventListener?) {
18        accessibilityEventListener = listener
19    }
20
21    override fun findFocus(focus: Int) =
22        runWithOriginalIdentity { AutomationService.INSTANCE?.findFocus(focus) }
23
24    override fun performGlobalAction(action: Int) =
25        runWithOriginalIdentity { AutomationService.INSTANCE?.performGlobalAction(action) ?: false }
26
27    override fun getLastEvent() = lastAccessibilityEvent
28
29    override fun setRotation(rotation: Int) =
30        runWithOriginalIdentity { screenRotator.setRotation(rotation) }
31
32    override fun unfreezeCurrentRotation() =
33        runWithOriginalIdentity { screenRotator.unfreezeCurrentRotation() }
34
35    override fun freezeCurrentRotation() =
36        runWithOriginalIdentity { screenRotator.freezeCurrentRotation() }
37
38
39    override fun restoreInitialRotation() =
40        runWithOriginalIdentity { screenRotator.restoreInitialSettings() }
41
42    override fun getWindows(): List<AccessibilityWindowInfo> =
43        runWithOriginalIdentity { AutomationService.INSTANCE?.windows ?: emptyList() }
44
45    override fun findAccessibilityNodeInfosByText(nodeInfo: UiNodeInfo, text: String): List<UiNodeInfo> {
46        return try {
47            nodeInfo.accessibilityNodeInfo.findAccessibilityNodeInfosByText(text).map {
48                addUiNodeInfo(it)
49            }
50        } catch (e: Exception) {
51            Timber.e(e)
52            emptyList()
53        }
54    }
55
56    override fun findAccessibilityNodeInfosByViewId(nodeInfo: UiNodeInfo, viewId: String): List<UiNodeInfo> {
57        return try {
58            nodeInfo.accessibilityNodeInfo.findAccessibilityNodeInfosByViewId(viewId).map {
59                addUiNodeInfo(it)
60            }
61        } catch (e: Exception) {
62            Timber.e(e)
63            emptyList()
64        }
65    }
66
67    override fun performNodeAction(nodeInfo: UiNodeInfo, action: Int): Boolean {
68        return nodeInfo.accessibilityNodeInfo.performAction(action)
69    }
70
71    override fun getUiEventSource(event: UiEvent): UiNodeInfo? {
72        return event.accessibilityEvent.source?.let { addUiNodeInfo(it) }
73    }
74
75    override fun getRootInActiveWindow() =
76        runWithOriginalIdentity { AutomationService.INSTANCE?.rootInActiveWindow }
77
78    override fun getServiceInfo() =
79        runWithOriginalIdentity { AutomationService.INSTANCE?.serviceInfo }
80
81    override fun setServiceInfo(serviceInfo: AccessibilityServiceInfo): Boolean {
82        return runWithOriginalIdentity {
83            AutomationService.INSTANCE?.let {
84                // TODO: Security on ServiceInfo from clients. This maybe should be validated/restricted.
85                it.serviceInfo = serviceInfo
86                return@runWithOriginalIdentity true
87            }
88
89            return@runWithOriginalIdentity false
90        }
91    }
92
93    override fun takeScreenshot(): Bitmap? {
94        return try {
95            screenViewer.capture()
96        } catch (e: Exception) {
97            Timber.e(e, "Failed to take screenshot")
98            null
99        }
100    }
101}
102
103inline fun <R> runWithOriginalIdentity(action: () -> R): R {
104    Binder.clearCallingIdentity()
105    val identity = Binder.clearCallingIdentity()
106    try {
107        return action()
108    } finally {
109        Binder.restoreCallingIdentity(identity)
110    }
111}

Not all functionality is directly available using the platform Accessibility APIs. For instance, DevicePolicyManager is used to grant runtime permissions & MediaProjector for device-wide screenshots.

1class AppPermissioner(
2    private val appContext: Context,
3    private val devicePolicyManager: DevicePolicyManager
4    private val adminComponent: ComponentName
5) {
6
7    @TargetApi(Build.VERSION_CODES.M)
8    fun grantRuntimePermissionAsUser(packageName: String, permission: String, userHandle: UserHandle) {
9        devicePolicyManager.setPermissionGrantState(
10            adminComponent,
11            packageName,
12            permission,
13            PERMISSION_GRANT_STATE_GRANTED
14        )
15    }
16
17    @TargetApi(Build.VERSION_CODES.M)
18    fun revokeRuntimePermissionAsUser(packageName: String, permission: String, userHandle: UserHandle) {
19        devicePolicyManager.setPermissionGrantState(
20            adminComponent,
21            packageName,
22            permission,
23            PERMISSION_GRANT_STATE_DEFAULT
24        )
25    }
26}

Screenshot.kt

1private const val DISPLAY_NAME_SCREENSHOT = "ScreenyScreenshotDisplay"
2// Max images is set to 2 to allow [ImageReader.acquireLatestImage] to do its thing
3private const val MAX_IMAGES = 2
4private const val SCREENSHOT_TIMEOUT_MILLIS = 10_000L
5private const val LOG_TAG = "Screenshot"
6
7internal class Screenshot(
8    private val mediaProjection: MediaProjection,
9    private val backgroundHandler: Handler
10) {
11
12    fun capture(windowManager: WindowManager): Bitmap? {
13        Log.i(LOG_TAG, "Capturing Screenshot!")
14        val countDownLatch = CountDownLatch(1)
15        val displayProps = windowManager.getDefaultDisplayProps()
16        val imageReader = ImageReader.newInstance(displayProps.width, displayProps.height, PixelFormat.RGBA_8888, MAX_IMAGES)
17        var bitmap: Bitmap? = null
18
19        val virtualDisplay = mediaProjection.createVirtualDisplay(
20            DISPLAY_NAME_SCREENSHOT,
21            displayProps,
22            imageReader.surface,
23            backgroundHandler
24        ) {
25            countDownLatch.countDown()
26        }
27
28        imageReader.setOnImageAvailableListener({
29            imageReader.setOnImageAvailableListener(null, null)
30            val image = imageReader.acquireLatestImage()
31
32            image.close()
33            
34            try {
35                bitmap = image.getBitmap(displayProps)
36                Log.i(LOG_TAG, "Screenshot captured!!")
37            } catch (e: Exception) {
38                Log.e(LOG_TAG, "Failed to get bitmap from image", e)
39            } finally {
40                image.closeCatching { Log.e(LOG_TAG, "Error closing Image", it) }
41                imageReader.closeCatching { Log.e(LOG_TAG, "Error closing ImageReader", it) }
42                virtualDisplay.surface = null
43                countDownLatch.countDown()
44            }
45        }, backgroundHandler)
46
47        try {
48            countDownLatch.await(SCREENSHOT_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS)
49        } finally {
50            virtualDisplay.release()
51        }
52
53        return bitmap
54    }
55}
56
57@WorkerThread
58private fun Image.getBitmap(displayProps: DisplayProperties): Bitmap {
59    val buffer: ByteBuffer = planes.first().buffer
60    val pixelStride = planes.first().pixelStride
61    val rowStride = planes.first().rowStride
62    val rowPadding = rowStride - pixelStride * displayProps.width
63
64    val bitmap = Bitmap.createBitmap(
65        displayProps.width + (rowPadding.toFloat() / pixelStride.toFloat()).toInt(),
66        displayProps.height,
67        Bitmap.Config.ARGB_8888
68    )
69
70    bitmap.copyPixelsFromBuffer(buffer)
71    return bitmap
72}

A custom class with the exact set of functions is used as a drop-in replacement for UiAutomation. Client test code either has to use this custom class directly or use a Gradle plugin that changes the import using bytecode manipulation.

The AccessibilityService runs in a dedicated process to isolate it from the work runner app. Also, we use a proxy service to communicate with the AccessibilityService because onBind is final.

4. 🕹️ TestServer

Once the AppUnderTest.apk and Test.apk has been installed, the TestServer calls the Context.startInstrumentation function to run the test cases.

android.content.Context

1/**
2     * Start executing an {@link android.app.Instrumentation} class.  The given
3     * Instrumentation component will be run by killing its target application
4     * (if currently running), starting the target process, instantiating the
5     * instrumentation component, and then letting it drive the application.
6     *
7     * <p>This function is not synchronous -- it returns as soon as the
8     * instrumentation has started and while it is running.
9     *
10     * <p>Instrumentation is normally only allowed to run against a package
11     * that is either unsigned or signed with a signature that the
12     * the instrumentation package is also signed with (ensuring the target
13     * trusts the instrumentation).
14     *
15     * @param className Name of the Instrumentation component to be run.
16     * @param profileFile Optional path to write profiling data as the
17     * instrumentation runs, or null for no profiling.
18     * @param arguments Additional optional arguments to pass to the
19     * instrumentation, or null.
20     *
21     * @return {@code true} if the instrumentation was successfully started,
22     * else {@code false} if it could not be found.
23     */
24    public abstract boolean startInstrumentation(@NonNull ComponentName className,
25            @Nullable String profileFile, @Nullable Bundle arguments);

For the arguments parameter, a Bundle is populated with configuration details from the server.

TestConfigs.kt

1class TestConfigs(private val arguments: Bundle) {
2
3    fun isObscureWindowEnabled() = arguments.getBoolean(ARG_OBSCURE_WINDOW_ENABLED)
4
5    fun shouldRetrieveTestFiles() = arguments.getBoolean(ARG_RETRIEVE_TEST_FILES)
6
7    fun shouldRetrieveAppFiles() = arguments.getBoolean(ARG_RETRIEVE_APP_FILES)
8
9    fun getTestsCount() = arguments.getInt(ARG_TESTS_COUNT)
10
11    fun getProfilerSampleFrequency() = arguments.getInt(ARG_PROFILER_SAMPLE_FREQUENCY)
12
13    fun isClearDataEnabled() = arguments.getBoolean(ARG_CLEAR_DATA)
14
15    fun isAutoScreenShotEnabled() = arguments.getBoolean(ARG_AUTO_SCREEN_SHOT_ENABLED)
16
17    fun getAutoScreenShotFps() = arguments.getInt(ARG_AUTO_SCREEN_SHOT_FPS)
18
19    fun getScreenShotQuality() = arguments.getInt(ARG_AUTO_SCREEN_QUALITY)
20}

TestObserver.aidl is defined for TestClient <> TestServer communication.

TestCallback.aidl

1interface TestCallback {
2
3    void onTestRunStarted(in TestDescription description);
4
5    void onTestRunFinished(in TestResult result);
6
7    void onTestStarted(
8        in TestDescription description,
9        String logFileName,
10        String profilerFileName,
11        String autoScreenShotNamePrefix
12    );
13
14    void onTestFinished(in TestDescription description);
15
16    void onTestFailure(in TestFailure failure);
17
18    void onTestAssumptionFailure(in TestFailure failure);
19
20    void onTestIgnored(in TestDescription description);
21
22    void onProcessCrashed(in TestDescription failure, String stackTrace);
23
24    void onClientConnected(Finisher finisher);
25
26    void onInterrupted(int reasonId);
27
28    void sendString(String message);
29}

4.1. 💾 Remote Storage

Logs and other artifacts need to be "streamed" to the TestServer. This data needs to be stored in the server's private directory but the Android security model doesn't allow foreign apps to write directly into an app's private directories. This limitation can be circumvented using a ContentProvider and FileDescriptors.

RemoteStorageConstants.kt

1object RemoteStorageConstants {
2
3    const val PREFIX_CONTENT = "content://"
4    const val AUTHORITY = "com.fluentbuild.apollo.runtime.remotestorage"
5    const val BASE_URI = "${PREFIX_CONTENT}${AUTHORITY}/"
6
7    const val MODE_READ = "r"
8    const val MODE_WRITE = "w"
9    const val MODE_APPEND = "wa"
10}

RemoteStorageProvider.kt

1private const val REMOTE_STORAGE_DIR = "stash"
2
3class RemoteStorageProvider: ContentProvider() {
4
5    override fun openFile(uri: Uri, mode: String): ParcelFileDescriptor? {
6        val child = uri.toString().substringAfter(RemoteStorageConstants.BASE_URI)
7        if(child.isBlank()) return null
8
9        return try {
10            File(getDir(context!!), child).run {
11                Timber.i("Opening file: %s with mode: %s", absolutePath, mode)
12                parentFile?.mkdirs()
13                openParcelFileDescriptor(mode)
14            }
15        } catch (e: Exception) {
16            null
17        }
18    }
19
20    companion object {
21
22        fun getDir(context: Context) = File(context.filesDir, REMOTE_STORAGE_DIR).apply { mkdirs() }
23    }
24}

5. 🧫 TestClient

The TestClient is included in the Test.apk and controls test execution. It is part of the "com.fluentbuild.apollo:client:version" artifact.

TestClient.kt

1private const val LOG_TAG = "TestClient"
2private const val RUNNER_PACKAGE = "com.fluentbuild.apollo"
3private const val RUNNER_SERVICE = "com.fluentbuild.apollo.RunnerService"
4private const val SERVICE_CONNECTION_TIMEOUT_MILLIS = 10_000L
5
6/**
7 * For the current instrumentation to communicate information back to the RuntimeService.
8 *
9 */
10class TestClient(
11    private val instrumentation: Instrumentation,
12    private val clientFinalizer: ClientFinalizer,
13    private val logWrapper: LogWrapper
14): WorkInterruptCallback {
15
16    private val connectionLatch = CountDownLatch(1)
17
18    @Volatile
19    private lateinit var testCallback: TestCallback
20
21    private val serviceConnection = object : ServiceConnection {
22
23        override fun onServiceConnected(className: ComponentName, service: IBinder) {
24            logWrapper.i(LOG_TAG, "TestClient connected to runner service")
25            testCallback = TestCallback.Stub.asInterface(service)
26            connectionLatch.countDown()
27        }
28
29        override fun onServiceDisconnected(className: ComponentName) {
30            logWrapper.e(LOG_TAG, "TestClient is disconnected from runner service")
31            instrumentation.finishInstrumentation(Activity.RESULT_CANCELED)
32        }
33    }
34
35    // Called on the test thread
36    fun connect() {
37        logWrapper.i(LOG_TAG, "Connecting to runner service")
38
39        val intent = Intent()
40        intent.setClassName(RUNNER_PACKAGE, RUNNER_SERVICE)
41
42        instrumentation.context.requireServiceBind(intent, serviceConnection)
43
44        if(!connectionLatch.await(SERVICE_CONNECTION_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS)) {
45            unbindService()
46            throw TimeoutException("Couldn't connect to runner service")
47        }
48
49        onClientConnected()
50    }
51
52    private fun onClientConnected() {
53        try {
54            testCallback.onClientConnected(object : Finisher.Stub() {
55
56                override fun finish(resultCode: Int) {
57                    logWrapper.i(LOG_TAG, "Instrumentation finish requested")
58                    clientFinalizer.finalize()
59                    instrumentation.finishInstrumentation(resultCode)
60                }
61            })
62        } catch (e: RemoteException) {
63            handleRemoteFailure("Unable to notify runner service of connection!", e)
64        }
65    }
66
67    @MainThread
68    fun disconnect() {
69        unbindService()
70    }
71
72    // Just simple test callbacks below
73}

TestClient works in tandem with an instance of Instrumentation. A custom instance is required to be able to intercept Instrumentation callbacks. In the "com.fluentbuild.apollo:client:version" artifact, an implementation that subclasses AndroidJUnitRunner is provided. A reflection hack is used to register a org.junit.runner.notification.RunListener in AndroidJUnitRunner.

TestClient works in tandem with an instance of Instrumentation. A custom instance is required to be able to intercept Instrumentation callbacks. In the "com.fluentbuild.apollo:client:version" artifact, an implementation that subclasses AndroidJUnitRunner is provided. In this implementation, I used a reflection hack to hook into the InstrumentationResultPrinter. Passing in the TestObserver, which captures test states and eventually calls into the real printer.

1private fun init(testConfigs: TestConfigs) {
2    val printerField = getPrinterField()
3    val printer = printerField.get(runner) as InstrumentationResultPrinter
4    initializer.init(testConfigs, printer)
5    printerField.set(runner, initializer.getTestObserver())
6}
7
8private fun getPrinterField(): Field {
9    return AndroidJUnitRunner::class.java.getDeclaredField("instrumentationResultPrinter")
10        .apply { isAccessible = true }
11}

For publishers that don't use AndroidJunitRunner, they can still use the client artifact by calling into the relevant functions.

5.2. 🔐 Runtime Permissions

UiAutomation can be used to request runtime permission. An extra layer similar to the one in the androidx.test.runner.permission package can be created to simplify this flow.

PermissionGranter.kt

1/**
2 * Requests a runtime permission on devices running Android M (API 23) and above.
3 *
4 * This class is usually used to grant runtime permissions to avoid the permission dialog from
5 * showing up and blocking the App's Ui. This is especially helpful for Ui-Testing to avoid loosing
6 * control over your application under test.
7 *
8 * The requested permissions will be granted for all test methods in the test class. Use [addPermissions] to add a permission to the permission list. To request all
9 * permissions use the [requestPermissions] method.
10 *
11 */
12interface PermissionGranter {
13
14    /**
15     * Adds a permission to the list of permissions which will be requested when [.requestPermissions] is called.
16     *
17     * Precondition: This method does nothing when called on an API level lower than [Build.VERSION_CODES.M].
18     *
19     * @param permissions a list of Android runtime permissions.
20     */
21    fun addPermissions(vararg permissions: String)
22
23    /**
24     * Request all permissions previously added using [.addPermissions]
25     *
26     * Precondition: This method does nothing when called on an API level lower than [ ][Build.VERSION_CODES.M].
27     */
28    fun requestPermissions()
29}

5.3. 🔬 Components Monitor

The components of the AppUnderTest are closely monitored to power some telemetry functionalities. For each type of component, we create an implementation of the Monitor interface.

Monitor.kt

1abstract class Monitor<CallbackT> {
2
3    protected val callbacks = mutableListOf<WeakReference<CallbackT>>()
4
5    internal fun registerCallback(callback: CallbackT) {
6        if(callbacks.none { it.get() == callback }) {
7            callbacks += WeakReference(callback)
8        }
9    }
10
11    internal fun unregisterCallback(callback: CallbackT) {
12        callbacks.removeAll { it.get() == callback }
13    }
14}

AppMonitor monitors the Application lifecycle.

AppMonitor.kt

1class AppMonitor: Monitor<AppMonitor.Callback>() {
2
3    private var appRef: WeakReference<Application>? = null
4
5    private fun signalLifecycleChange(application: Application, stage: ApplicationStats.Stage) {
6        callbacks.forEach { it.get()?.onStageChanged(application, stage) }
7    }
8
9    fun onCallApplicationOnCreate(app: Application, action: () -> Unit) {
10        appRef = WeakReference(app)
11        signalLifecycleChange(app, ApplicationStats.Stage.PRE_ON_CREATE)
12        action()
13        signalLifecycleChange(app, ApplicationStats.Stage.CREATED)
14    }
15
16    fun getActiveApp(): Application? = appRef?.get()
17
18    interface Callback {
19        fun onStageChanged(app: Application, stage: ApplicationStats.Stage)
20    }
21}

ActivityMonitor monitors the lifecycle of all activities in the AppUnderTest. The action function allows the monitor control when the caller (the Instrumentation) can pass the event downstream.

ActivityMonitor.kt

1class ActivityMonitor: Monitor<ActivityMonitor.Callback>() {
2
3    private val activeActivities = WeakHashMap<Activity, ActivityStats.Stage>()
4
5    private fun signalLifecycleChange(activity: Activity, stage: ActivityStats.Stage) {
6        activeActivities[activity] = stage
7        callbacks.forEach {
8            it.get()?.onStageChanged(activity, stage)
9        }
10    }
11
12    fun onCallActivityOnDestroy(activity: Activity, action: () -> Unit) {
13        signalLifecycleChange(activity, ActivityStats.Stage.DESTROYED)
14        action()
15        activeActivities.remove(activity)
16    }
17
18    fun onCallActivityOnRestart(activity: Activity, action: () -> Unit) {
19        action()
20        signalLifecycleChange(activity, ActivityStats.Stage.RESTARTED)
21    }
22
23    fun onCallActivityOnCreate(activity: Activity, bundle: Bundle?, action: () -> Unit) {
24        signalLifecycleChange(activity, ActivityStats.Stage.PRE_ON_CREATE)
25        action()
26        signalLifecycleChange(activity, ActivityStats.Stage.CREATED)
27    }
28
29    fun onCallActivityOnCreate(
30        activity: Activity,
31        bundle: Bundle?,
32        persistentState: PersistableBundle,
33        action: () -> Unit
34    ) {
35        signalLifecycleChange(activity, ActivityStats.Stage.PRE_ON_CREATE)
36        action()
37        signalLifecycleChange(activity, ActivityStats.Stage.CREATED)
38    }
39
40    fun onCallActivityOnStart(activity: Activity, action: () -> Unit) {
41        action()
42        signalLifecycleChange(activity, ActivityStats.Stage.STARTED)
43    }
44
45    fun onCallActivityOnStop(activity: Activity, action: () -> Unit) {
46        action()
47        signalLifecycleChange(activity, ActivityStats.Stage.STOPPED)
48    }
49
50    fun onCallActivityOnResume(activity: Activity, action: () -> Unit) {
51        action()
52        signalLifecycleChange(activity, ActivityStats.Stage.RESUMED)
53    }
54
55    fun onCallActivityOnPause(activity: Activity, action: () -> Unit) {
56        action()
57        signalLifecycleChange(activity, ActivityStats.Stage.PAUSED)
58    }
59    
60    fun getActiveActivities(): Set<Activity>  {
61        return activeActivities.keys
62    }
63
64    interface Callback {
65        fun onStageChanged(activity: Activity, stage: ActivityStats.Stage)
66    }
67}

5.4. 🔂 JUnit

Only Parcelable classes can be used in AIDL IPC calls. Because of this constraint, the following model classes were created to pass notifications from JUnit to the server.

Models.kt

1data class TestDescription(
2    val className: String,
3    val methodName: String?,
4    val displayName: String
5): Parcelable
6
7data class TestFailure(
8    val description: TestDescription,
9    val trace: String
10): Parcelable
11
12data class TestResult(
13    val runtimeMillis: Long,
14    val ignoreCount: Int,
15    val failures: List<TestFailure>
16): Parcelable

The jUnit models will need to be mapped to the model classes defined above.

ModelsMapper.kt

1private const val MAX_TRACE_SIZE = 64 * 1024
2
3internal fun Description.createTestModel(): TestDescription {
4    return TestDescription(className, methodName, displayName)
5}
6
7internal fun Failure.createTestModel(): TestFailure {
8    var stackTrace = trace
9
10    if (stackTrace.length > MAX_TRACE_SIZE) {
11        // Since we report failures back to the runtime via a binder IPC, we need to make sure that
12        // we don't exceed the Binder transaction limit - which is 1MB per process.
13        Log.w(LOG_TAG, "Stack trace too long, trimmed to first $MAX_TRACE_SIZE characters.")
14        stackTrace = trace.substring(0, MAX_TRACE_SIZE) + "\n"
15    }
16
17    return TestFailure(description.createTestModel(), stackTrace)
18}
19
20internal fun Result.createTestModel(): TestResult {
21    return TestResult(runTime, ignoreCount, failures.map { it.createTestModel() })
22}

The TestObserver is an instance of org.junit.runner.notification.RunListener, and is notified of events that occur during a test run. These events are passed to the TestClient which then passes it to the TestServer.

TestObserver.kt

1internal class TestObserver(
2    private val testClient: TestClient,
3    private val wrappedPrinter: InstrumentationResultPrinter,
4    private val clientFinalizer: ClientFinalizer,
5    private val collatorsManager: CollatorsManager
6): InstrumentationResultPrinter() {
7
8    private var startedCount = 0
9    private var lastStartedTest: Description? = null
10
11    override fun testRunStarted(description: Description) {
12        testClient.testRunStarted(description)
13        wrappedPrinter.testRunStarted(description)
14    }
15
16    override fun testStarted(description: Description) {
17        lastStartedTest = description
18        startedCount++
19        testClient.testStarted(description, collatorsManager.getInfo())
20        wrappedPrinter.testStarted(description)
21    }
22
23    override fun testAssumptionFailure(failure: Failure) {
24        testClient.testAssumptionFailure(failure)
25        restartMeasurement()
26        wrappedPrinter.testAssumptionFailure(failure)
27    }
28
29    override fun testRunFinished(result: Result) {
30        testClient.testRunFinished(result)
31        wrappedPrinter.testRunFinished(result)
32    }
33
34    override fun sendString(msg: String) {
35        testClient.sendString(msg)
36        wrappedPrinter.sendString(msg)
37    }
38
39    override fun instrumentationRunFinished(
40        summaryWriter: PrintStream,
41        resultBundle: Bundle,
42        junitResults: Result
43    ) {
44        clientFinalizer.finalize()
45        wrappedPrinter.instrumentationRunFinished(summaryWriter, resultBundle, junitResults)
46    }
47
48    override fun testFailure(failure: Failure) {
49        testClient.testFailure(failure)
50        restartMeasurement()
51        wrappedPrinter.testFailure(failure)
52    }
53
54    override fun testFinished(description: Description) {
55        testClient.testFinished(description)
56        restartMeasurement()
57        wrappedPrinter.testFinished(description)
58    }
59
60    override fun testIgnored(description: Description) {
61        testClient.testIgnored(description)
62        restartMeasurement()
63        wrappedPrinter.testIgnored(description)
64    }
65
66    override fun reportProcessCrash(throwable: Throwable) {
67        testClient.processCrashed(Failure(lastStartedTest, throwable))
68        restartMeasurement()
69        wrappedPrinter.reportProcessCrash(throwable)
70    }
71
72    private fun restartMeasurement() {
73        collatorsManager.restart()
74    }
75}

5.5. 👀 Obscuring Screen

OverlayView is a simple custom fullscreen opaque view that is used to obscure Activities when requested. The overlay is attached to the window immediately after an Activity is created. I haven't noticed any interferences yet between the overlay and test UI interactions.

WindowOverlay.kt

1internal class WindowOverlay(activity: Activity) {
2
3    private val overlayView = OverlayView(activity)
4
5    init {
6        activity.window.addContentView(overlayView.rootView, getWindowParams())
7    }
8
9    fun updateLabel(labelText: String) {
10        overlayView.updateLabel(labelText)
11    }
12
13    fun getRoot() = overlayView.rootView
14
15    private fun getWindowParams(): WindowManager.LayoutParams {
16        val type = if(AndroidVersion.isAtLeastOreo()) {
17            WindowManager.LayoutParams.TYPE_APPLICATION_OVERLAY
18        } else {
19            @Suppress("DEPRECATION")
20            WindowManager.LayoutParams.TYPE_PHONE
21        }
22
23        val formats = WindowManager.LayoutParams.FLAG_NOT_TOUCHABLE or
24                WindowManager.LayoutParams.FLAG_NOT_FOCUSABLE or
25                WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON or
26                WindowManager.LayoutParams.FLAG_FULLSCREEN
27
28        return WindowManager.LayoutParams(
29            WindowManager.LayoutParams.MATCH_PARENT,
30            WindowManager.LayoutParams.MATCH_PARENT,
31            type,
32            formats,
33            PixelFormat.TRANSPARENT
34        )
35    }
36}

5.5. 🤏 User Interrupts

Any external interactions should immediately stop test execution to prevent interference. The NavigationInteractionObserver detects when a user presses a button.

NavigationInteractionObserver.kt

1private const val SYSTEM_DIALOG_REASON_KEY = "reason"
2private const val SYSTEM_DIALOG_REASON_GLOBAL_ACTIONS = "globalactions"
3private const val SYSTEM_DIALOG_REASON_RECENT_APPS = "recentapps"
4private const val SYSTEM_DIALOG_REASON_HOME_KEY = "homekey"
5
6internal class NavigationInteractionObserver(private val callback: Callback): BroadcastReceiver() {
7
8    override fun onReceive(context: Context, intent: Intent) {
9        val reason = intent.getStringExtra(SYSTEM_DIALOG_REASON_KEY)
10
11        if (reason == SYSTEM_DIALOG_REASON_HOME_KEY) {
12            callback.onHomePressed()
13        } else if (reason == SYSTEM_DIALOG_REASON_RECENT_APPS) {
14            callback.onRecentAppsPressed()
15        }
16    }
17
18    fun start(context: Context) {
19        context.registerReceiver(this, IntentFilter(Intent.ACTION_CLOSE_SYSTEM_DIALOGS))
20    }
21
22    fun stop(context: Context) {
23        context.unregisterReceiver(this)
24    }
25
26    interface Callback {
27
28        fun onHomePressed()
29
30        fun onRecentAppsPressed()
31    }
32}

The WindowInteractionObserver detects when a user taps on the screen.

WindowInteractionObserver.kt

1internal class WindowInteractionObserver(private val callback: Callback): ActivityMonitor.Callback {
2
3    override fun onStageChanged(activity: Activity, stage: ActivityStats.Stage) {
4        if(stage == ActivityStats.Stage.CREATED) {
5            activity.window.callback = object: WindowCallbackWrapper(activity.window.callback) {
6
7                override fun dispatchTouchEvent(event: MotionEvent): Boolean {
8                    evaluateUserMotion(event)
9                    return super.dispatchTouchEvent(event)
10                }
11
12                override fun dispatchKeyEvent(event: KeyEvent): Boolean {
13                    evaluateUserKeyInput(event)
14                    return super.dispatchKeyEvent(event)
15                }
16            }
17        }
18    }
19
20    interface Callback {
21
22        fun onWindowTouchedByUser()
23
24        fun onKeyPressedByUser()
25    }
26}

5.7. 💾 Remote Storage

The client can utilize the server's ContentProvider to save files in the server's private directory.

FileDescriptorProvider.kt

1class FileDescriptorProvider(private val context: Context) {
2
3    @Throws(Exception::class)
4    internal fun getReadableDescriptor(fileName: String) = getDescriptor(fileName, MODE_READ)
5
6    @Throws(Exception::class)
7    internal fun getWritableDescriptor(fileName: String) = getDescriptor(fileName, MODE_WRITE)
8
9    @Throws(Exception::class)
10    internal fun getAppendableDescriptor(fileName: String) = getDescriptor(fileName, MODE_APPEND)
11
12    @Throws(Exception::class)
13    private fun getDescriptor(fileName: String, mode: String): ParcelFileDescriptor =
14        context.contentResolver.openFileDescriptor(getFileUri(fileName), mode)!!.apply { checkError() }
15
16    private fun getFileUri(filePath: String) = Uri.parse("${BASE_URI}${filePath}")
17}

5.8. 🧹 Finalizing

After running all tests or when the AppUnderTest crashes, all miscellaneous artifacts are moved to the TestServer.

ArtifactsCopier.kt

1class ArtifactsCopier(
2    private val targetContext: Context,
3    private val instrumentationContext: Context,
4    private val testConfigs: TestConfigs,
5) {
6
7    fun copy() {
8        if (testConfigs.shouldRetrieveTestFiles()) {
9            copyDir(destinationDir = "test_files", sourceDir = instrumentationContext.filesDir)
10        }
11
12        if (testConfigs.shouldRetrieveAppFiles()) {
13            copyDir(destinationDir = "app_files", sourceDir = targetContext.filesDir)
14        }
15    }
16}

ClearDataTask.kt

1internal class ClearDataTask(private val targetContext: Context): Task {
2
3    override fun run() {
4        targetContext.externalMediaDirs.forEach { it?.deleteRecursively() }
5        targetContext.noBackupFilesDir.deleteRecursively()
6
7        targetContext.cacheDir.deleteRecursively()
8        targetContext.codeCacheDir.deleteRecursively()
9        targetContext.externalCacheDir?.deleteRecursively()
10
11        targetContext.filesDir.deleteRecursively()
12        if(AndroidVersion.isAtLeastNougat()) {
13            targetContext.dataDir.deleteRecursively()
14        } else {
15            targetContext.filesDir.parentFile?.deleteRecursively()
16        }
17    }
18}

5.9. ❌ Cancelling

When an error occurs transmitting test state, or when the server connection is broken, the client automatically cancels all executions. The server can also cancel the client using the Finisher interface.

Finisher.aidl

1interface Finisher {
2    void finish(int resultCode);
3}

When the connection to the client is severed, the server uses the CancelSignalObserver in a last-ditch effort to terminate the client. The CancelSignalObserver is a BroadcastReceiver that is tied to the lifecycle of the test process.

CancelSignalObserver.kt

1class CancelSignalObserver(private val instrumentation: Instrumentation): BroadcastReceiver() {
2
3    override fun onReceive(context: Context, intent: Intent) {
4        if(intent.action == ACTION_CANCEL_SIGNAL) {
5            instrumentation.finishInstrumentation(Activity.RESULT_CANCELED)
6        }
7    }
8
9    fun start() {
10        instrumentation.targetContext.registerReceiver(
11            this,
12            IntentFilter(ACTION_CANCEL_SIGNAL)
13        )
14    }
15
16    fun stop() {
17        instrumentation.targetContext.unregisterReceiver(this)
18    }
19}

6. ⏱️ Telemetry

While tests are running, performance & device health data is periodically collated and stored. Protocol buffers are used for data serialization.

MemoryStats.proto

1message MemoryStats {
2
3    /** The proportional set size for dalvik heap.  (Doesn't include other Dalvik overhead.) */
4    int32 appDalvikPss = 1;
5
6    /** The private dirty pages used by dalvik heap. */
7    int32 appDalvikPrivateDirty = 2;
8
9    /** The shared dirty pages used by dalvik heap. */
10    int32 appDalvikSharedDirty = 3;
11
12    /** The proportional set size for the native heap. */
13    int32 appNativePss = 4;
14
15    /** The private dirty pages used by the native heap. */
16    int32 appNativePrivateDirty = 5;
17
18    /** The shared dirty pages used by the native heap. */
19    int32 appNativeSharedDirty = 6;
20
21    /** The proportional set size for everything else. */
22    int32 appOtherPss = 7;
23
24    /** The private dirty pages used by everything else. */
25    int32 appOtherPrivateDirty = 8;
26
27    /** The shared dirty pages used by everything else. */
28    int32 appOtherSharedDirty = 9;
29
30    /**
31    * The total memory accessible by the kernel.  This is basically the
32    * RAM size of the device, not including below-kernel fixed allocations
33    * like DMA buffers, RAM for the baseband CPU, etc.
34    */
35    int64 systemTotalSizeBytes = 10;
36
37    /**
38     * The available memory on the system.  This number should not
39     * be considered absolute: due to the nature of the kernel, a significant
40     * portion of this memory is actually in use and needed for the overall
41     * system to run well.
42     */
43    int64 systemAvailableSizeBytes = 11;
44
45    /**
46     * The threshold of {@link #availMem} at which we consider memory to be
47     * low and start killing background services and other non-extraneous
48     * processes.
49     */
50    int64 systemThresholdSizeBytes = 12;
51
52    int32 relativeTime = 13;
53}

NetworkStats.proto

1message NetworkStats {
2
3    /**
4     * Return number of packets received by the given UID since device boot.
5     * Counts packets across all network interfaces, and always increases
6     * monotonically since device boot. Statistics are measured at the network
7     * layer, so they include both TCP and UDP usage.
8     * <p>
9     * Before {@link android.os.Build.VERSION_CODES#JELLY_BEAN_MR2}, this may return
10     * {@link #UNSUPPORTED} on devices where statistics aren't available.
11     * <p>
12     * Starting in {@link android.os.Build.VERSION_CODES#N} this will only
13     * report traffic statistics for the calling UID. It will return
14     * {@link #UNSUPPORTED} for all other UIDs for privacy reasons. To access
15     * historical network statistics belonging to other UIDs, use
16     * {@link NetworkStatsManager}.
17     *
18     * @see android.os.Process#myUid()
19     * @see android.content.pm.ApplicationInfo#uid
20     */
21    int64 rxPackets = 1;
22
23    /**
24     * Return number of packets transmitted by the given UID since device boot.
25     * Counts packets across all network interfaces, and always increases
26     * monotonically since device boot. Statistics are measured at the network
27     * layer, so they include both TCP and UDP usage.
28     * <p>
29     * Before {@link android.os.Build.VERSION_CODES#JELLY_BEAN_MR2}, this may return
30     * {@link #UNSUPPORTED} on devices where statistics aren't available.
31     * <p>
32     * Starting in {@link android.os.Build.VERSION_CODES#N} this will only
33     * report traffic statistics for the calling UID. It will return
34     * {@link #UNSUPPORTED} for all other UIDs for privacy reasons. To access
35     * historical network statistics belonging to other UIDs, use
36     * {@link NetworkStatsManager}.
37     *
38     * @see android.os.Process#myUid()
39     * @see android.content.pm.ApplicationInfo#uid
40     */
41    int64 txPackets = 2;
42
43    /**
44     * Return number of bytes transmitted by the given UID since device boot.
45     * Counts packets across all network interfaces, and always increases
46     * monotonically since device boot. Statistics are measured at the network
47     * layer, so they include both TCP and UDP usage.
48     * <p>
49     * Before {@link android.os.Build.VERSION_CODES#JELLY_BEAN_MR2}, this may
50     * return {@link #UNSUPPORTED} on devices where statistics aren't available.
51     * <p>
52     * Starting in {@link android.os.Build.VERSION_CODES#N} this will only
53     * report traffic statistics for the calling UID. It will return
54     * {@link #UNSUPPORTED} for all other UIDs for privacy reasons. To access
55     * historical network statistics belonging to other UIDs, use
56     * {@link NetworkStatsManager}.
57     *
58     * @see android.os.Process#myUid()
59     * @see android.content.pm.ApplicationInfo#uid
60     */
61    int64 txBytes = 3;
62
63    /**
64     * Return number of bytes received by the given UID since device boot.
65     * Counts packets across all network interfaces, and always increases
66     * monotonically since device boot. Statistics are measured at the network
67     * layer, so they include both TCP and UDP usage.
68     * <p>
69     * Before {@link android.os.Build.VERSION_CODES#JELLY_BEAN_MR2}, this may return
70     * {@link #UNSUPPORTED} on devices where statistics aren't available.
71     * <p>
72     * Starting in {@link android.os.Build.VERSION_CODES#N} this will only
73     * report traffic statistics for the calling UID. It will return
74     * {@link #UNSUPPORTED} for all other UIDs for privacy reasons. To access
75     * historical network statistics belonging to other UIDs, use
76     * {@link NetworkStatsManager}.
77     *
78     * @see android.os.Process#myUid()
79     * @see android.content.pm.ApplicationInfo#uid
80     */
81    int64 rxBytes = 4;
82
83    int32 relativeTime = 5;
84}

ResourceUsageStats.proto

1message ResourceUsageStats {
2
3    int32 audioCount = 1;
4    int64 audioTimeMillis = 2;
5    int32 videoCount = 3;
6    int64 videoTimeMillis = 4;
7    int32 vibratorCount = 5;
8    int64 vibratorTimeMillis = 6;
9    int32 gpsSensorCount = 7;
10    int64 gpsSensorTimeMillis = 8;
11    int32 bluetoothCount = 9;
12    int64 bluetoothTimeMillis = 10;
13    int32 cameraCount = 11;
14    int64 cameraTimeMillis = 12;
15    int32 flashlightCount = 13;
16    int64 flashlightTimeMillis = 14;
17    int32 wifiScanCount = 15;
18    int64 wifiScanTimeMillis = 16;
19    int32 mobileRadioActiveCount = 17;
20    int64 mobileRadioActiveTimeMillis = 18;
21
22    int64 wifiMultiCastMillis = 19;
23    int64 bluetoothRxBytes = 20;
24    int64 bluetoothTxBytes = 21;
25    int64 bluetoothRxPackets = 22;
26    int64 bluetoothTxPackets = 23;
27
28    /**
29     * Key for a measurement of number of millseconds the wifi controller was
30     * idle but turned on on behalf of this uid.
31     */
32    int64 wifiIdleMillis = 31;
33
34    /**
35     * Key for a measurement of number of millseconds the bluetooth controller was
36     * idle but turned on on behalf of this uid.
37     */
38    int64 bluetoothIdleMillis = 33;
39
40    /**
41     * Key for a measurement of number of millseconds the mobile radio controller was
42     * idle but turned on on behalf of this uid.
43     */
44    int64 mobileIdleMillis = 35;
45
46    /**
47     * Key for a measurement of the estimated number of mA*ms used by this uid
48     * for wifi, that is to say the number of milliseconds of wifi activity
49     * times the mA current during that period.
50     */
51    int64 wifiPowerMams = 32;
52
53    /**
54    * Key for a measurement of the estimated number of mA*ms used by this uid
55    * for bluetooth, that is to say the number of milliseconds of activity
56    * times the mA current during that period.
57    */
58    int64 bluetoothPowerMams = 34;
59
60    /**
61     * Key for a measurement of the estimated number of mA*ms used by this uid
62     * for mobile data, that is to say the number of milliseconds of activity
63     * times the mA current during that period.
64     */
65    int64 mobilePowerMams = 36;
66
67    /**
68     * Key for a measurement of number of millseconds the wifi controller was
69     * active on behalf of this uid.
70     */
71    int64 wifiRunningMs = 37;
72
73    /**
74     * Key for a measurement of number of millseconds that this uid held a full wifi lock.
75     */
76    int64 wifiFullLockMs = 38;
77
78    map<string, Timer> jobs = 24;
79    map<string, Timer> sensors = 25;
80    map<string, Timer> syncs = 26;
81    map<string, Timer> wakeLocksDraw = 27;
82    map<string, Timer> wakeLocksFull = 28;
83    map<string, Timer> wakeLocksPartial = 29;
84    map<string, Timer> wakeLocksWindow = 30;
85
86    int32 relativeTime = 39;
87
88    message Timer {
89        int32 count = 1;
90        int64 timeMillis = 2;
91    }
92}

StorageStats.proto

1message StorageStats {
2
3    Info internalStorage = 1;
4
5    repeated Info externalStorage = 2;
6
7    message Info {
8        int64 totalSizeBytes = 1;
9        int64 availableSizeBytes = 2;
10    }
11}

ThreadStats.proto

1message ThreadStats {
2
3    repeated ThreadInfo threadsInfo = 2;
4
5    int32 relativeTime = 3;
6
7    message ThreadInfo {
8
9        int64 id = 1;
10
11        string name = 2;
12
13        int32 priority = 3;
14
15        bool isInterrupted = 4;
16
17        bool isAlive = 5;
18
19        bool isDaemon = 6;
20
21        State state = 7;
22    }
23
24    /**
25     * A thread state.  A thread can be in one of the following states:
26     * <ul>
27     * <li>{@link #NEW}<br>
28     *     A thread that has not yet started is in this state.
29     *     </li>
30     * <li>{@link #RUNNABLE}<br>
31     *     A thread executing in the Java virtual machine is in this state.
32     *     </li>
33     * <li>{@link #BLOCKED}<br>
34     *     A thread that is blocked waiting for a monitor lock
35     *     is in this state.
36     *     </li>
37     * <li>{@link #WAITING}<br>
38     *     A thread that is waiting indefinitely for another thread to
39     *     perform a particular action is in this state.
40     *     </li>
41     * <li>{@link #TIMED_WAITING}<br>
42     *     A thread that is waiting for another thread to perform an action
43     *     for up to a specified waiting time is in this state.
44     *     </li>
45     * <li>{@link #TERMINATED}<br>
46     *     A thread that has exited is in this state.
47     *     </li>
48     * </ul>
49     *
50     * <p>
51     * A thread can be in only one state at a given point in time.
52     * These states are virtual machine states which do not reflect
53     * any operating system thread states.
54     *
55     * @since   1.5
56     * @see #getState
57     */
58    enum State {
59
60        /**
61         * Thread state for a thread which has not yet started.
62         */
63        NEW = 0;
64
65        /**
66        * Thread state for a runnable thread.  A thread in the runnable
67        * state is executing in the Java virtual machine but it may
68        * be waiting for other resources from the operating system
69        * such as processor.
70        */
71        RUNNABLE = 1;
72
73        /**
74         * Thread state for a thread blocked waiting for a monitor lock.
75         * A thread in the blocked state is waiting for a monitor lock
76         * to enter a synchronized block/method or
77         * reenter a synchronized block/method after calling
78         * {@link Object#wait() Object.wait}.
79         */
80        BLOCKED = 2;
81
82        /**
83         * Thread state for a waiting thread.
84         * A thread is in the waiting state due to calling one of the
85         * following methods:
86         * <ul>
87         *   <li>{@link Object#wait() Object.wait} with no timeout</li>
88         *   <li>{@link #join() Thread.join} with no timeout</li>
89         *   <li>{@link LockSupport#park() LockSupport.park}</li>
90         * </ul>
91         *
92         * <p>A thread in the waiting state is waiting for another thread to
93         * perform a particular action.
94         *
95         * For example, a thread that has called <tt>Object.wait()</tt>
96         * on an object is waiting for another thread to call
97         * <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
98         * that object. A thread that has called <tt>Thread.join()</tt>
99         * is waiting for a specified thread to terminate.
100         */
101        WAITING = 3;
102
103        /**
104         * Thread state for a waiting thread with a specified waiting time.
105         * A thread is in the timed waiting state due to calling one of
106         * the following methods with a specified positive waiting time:
107         * <ul>
108         *   <li>{@link #sleep Thread.sleep}</li>
109         *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
110         *   <li>{@link #join(long) Thread.join} with timeout</li>
111         *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
112         *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
113         * </ul>
114         */
115        TIMED_WAITING = 4;
116
117        /**
118         * Thread state for a terminated thread.
119         * The thread has completed execution.
120         */
121        TERMINATED = 5;
122    }
123}

BinderStats.proto

1message BinderStats {
2
3    int32 deathObjectCount = 1;
4
5    int32 localObjectCount = 2;
6
7    int32 proxyObjectCount = 3;
8
9    int32 receivedTransactions = 4;
10
11    int32 sentTransactions = 5;
12
13    int32 relativeTime = 6;
14}

FileIoStats.proto

1message FileIoStats {
2
3    /* characters read
4    * The number of bytes which this task has caused to be
5    * read from storage.  This is simply the sum of bytes
6    * which this process passed to read(2) and similar system
7    * calls.  It includes things such as terminal I/O and is
8    * unaffected by whether or not actual physical disk I/O
9    * was required (the read might have been satisfied from
10    * pagecache)
11    */
12    int64 charsReadBytes = 1;
13
14    /* characters written
15    * The number of bytes which this task has caused, or
16    * shall cause to be written to disk.  Similar caveats
17    * apply here as with rchar.
18    */
19    int64 charsWriteBytes = 2;
20
21    /* read syscalls
22    * Attempt to count the number of read I/O operations—that
23    * is, system calls such as read(2) and pread(2)
24    */
25    int64 numSysReadCalls = 3;
26
27    /* write syscalls
28    * Attempt to count the number of write I/O operations—
29    * that is, system calls such as write(2) and pwrite(2).
30    */
31    int64 numSysWriteCalls = 4;
32
33    /* bytes read
34    * Attempt to count the number of bytes which this process
35    * really did cause to be fetched from the storage layer.
36    * This is accurate for block-backed filesystems.
37    */
38    int64 readBytes = 5;
39
40    /* bytes written
41    * Attempt to count the number of bytes which this process
42    * caused to be sent to the storage layer.
43    */
44    int64 writeBytes = 6;
45
46    /*
47    The big inaccuracy here is truncate.  If a process
48    * writes 1MB to a file and then deletes the file, it will
49    * in fact perform no writeout.  But it will have been
50    * accounted as having caused 1MB of write.  In other
51    * words: this field represents the number of bytes which
52    * this process caused to not happen, by truncating page‐
53    * cache.  A task can cause "negative" I/O too.  If this
54    * task truncates some dirty pagecache, some I/O which
55    * another task has been accounted for (in its
56    * write_bytes) will not be happening.
57    */
58    int64 cancelledWriteBytes = 7;
59
60    int32 relativeTime = 8;
61}

FrameStats.proto

1message FrameStats {
2
3    string activityName = 1;
4
5    int64 animationDuration = 2;
6
7    int64 commandIssueDuration = 3;
8
9    int64 drawDuration = 4;
10
11    bool firstDrawFrame = 5;
12
13    int64 inputHandlingDuration = 6;
14
15    int64 layoutMeasureDuration = 7;
16
17    int64 swapBuffersDuration = 8;
18
19    int64 syncDuration = 9;
20
21    int64 totalDuration = 10;
22
23    int64 unknownDelayDuration = 11;
24
25    int64 intendedVSyncTimestamp = 12;
26
27    int64 vSyncTimestamp = 13;
28
29    int32 relativeTime = 14;
30}

GcStats.proto

1message GcStats {
2
3    /** The number of garbage collection runs. */
4    int32 runCount = 1;
5
6    /** The total duration of garbage collection runs in ms. */
7    int32 runTotalDuration = 2;
8
9    /** The total number of bytes that the application allocated. */
10    int64 totalBytesAllocated = 3;
11
12    /** The total number of bytes that garbage collection reclaimed. */
13    int64 totalBytesFreed = 4;
14
15    /** The number of blocking garbage collection runs. */
16    int32 blockingRunCount = 5;
17
18    /** The total duration of blocking garbage collection runs in ms. */
19    int32 blockingRunTotalDuration = 6;
20
21    int32 relativeTime = 9;
22}

UnixProcessStats.proto

1message UnixProcessStats {
2
3    /**
4     *
5     * One of the following characters, indicating process state:
6     *
7     *
8     *  * 'R'  Running
9     *  * 'S'  Sleeping in an interruptible wait
10     *  * 'D'  Waiting in uninterruptible disk sleep
11     *  * 'Z'  Zombie
12     *  * 'T'  Stopped (on a signal) or (before Linux 2.6.33) trace stopped
13     *  * 't'  Tracing stop (Linux 2.6.33 onward)
14     *  * 'W'  Paging (only before Linux 2.6.0)
15     *  * 'X'  Dead (from Linux 2.6.0 onward)
16     *  * 'x'  Dead (Linux 2.6.33 to 3.13 only)
17     *  * 'K'  Wakekill (Linux 2.6.33 to 3.13 only)
18     *  * 'W'  Waking (Linux 2.6.33 to 3.13 only)
19     *  * 'P'  Parked (Linux 3.9 to 3.13 only)
20     *
21     */
22    string state = 1;
23
24    /**
25    * The number of minor faults the process has made which have not required loading a memory
26    * page from disk.
27    */
28    int64 numMinorFaults = 2;
29
30    /**
31    * The number of minor faults that the process's waited-for children have made.
32    */
33    int64 numChildMinorFaults = 3;
34
35    /**
36     * The number of major faults the process has made which have required loading a memory page
37     * from disk.
38     */
39    int64 numMajorFaults = 4;
40
41    /**
42     * The number of major faults that the process's waited-for children have made.
43     */
44    int64 numChildMajorFaults = 5;
45
46    /**
47    * Amount of time that this process has been scheduled in user mode, measured in clock ticks
48    * (divide by sysconf(_SC_CLK_TCK)).  This includes guest time, guest_time (time spent running
49    * a virtual CPU, see below), so that applications that are not aware of the guest time field
50    * do not lose that time from their calculations.
51    */
52    int64 userTime = 6;
53
54    /**
55    * Amount of time that this process has been scheduled in kernel mode, measured in clock ticks
56    * (divide by sysconf(_SC_CLK_TCK)).
57    */
58    int64 systemTime = 7;
59
60    /**
61    * Amount of time that this process's waited-for children have been scheduled in user mode,
62    * measured in clock ticks (divide by sysconf(_SC_CLK_TCK)). (See also times(2).)  This
63    * includes guest time, cguest_time (time spent running a virtual CPU, see below).
64    */
65    int64 childUserTime = 8;
66
67    /**
68    * Amount of time that this process's waited-for children have been scheduled in kernel mode,
69    * measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
70    */
71    int64 childSystemTime = 9;
72
73    /**
74     * Virtual memory size in bytes.
75     */
76    int64 virtualMemorySize = 10;
77
78    /**
79    * Resident Set Size: number of pages the process has in real memory.  This is just the pages
80    * which count toward text, data, or stack space.  This does not include pages which have not
81    * been demand-loaded in, or which are swapped out.
82    */
83    int64 rss = 11;
84
85    /**
86    * (since Linux 2.2.8)
87    * CPU number last executed on.
88    */
89    int32 lastCpuExecutedNumber = 12;
90
91    /**
92    * (since Linux 2.6.18)
93    * Aggregated block I/O delays, measured in clock ticks (centiseconds).
94    */
95    int64 aggregatedBlockIoDelaysInTicks = 13;
96
97    int32 relativeTime = 14;
98}

6.1. 🌴 Logs

The logcat process is used to capture logs. The logcat input stream is piped to a file on the TestServer's remote storage. While logcat can directly write to a file, it can't write to the TestServer's private directory.

LogTunnel.kt

1private const val CMD_CLEAR_LOGCAT_BUFFERS = "logcat -b all -c"
2private const val CMD_START_LOGCAT = "logcat -b all -v threadtime,epoch,printable --dividers"
3
4class LogTunnel {
5
6    fun start(sink: File) {
7        Thread {
8            getRuntime().exec(CMD_CLEAR_LOGCAT_BUFFERS)
9            val logcat = getRuntime().exec(CMD_START_LOGCAT)
10
11            val sinkFileDescriptor = fileProvider.getAppendableFileDescriptor(sinkFileName)
12            logcat.inputStream.pipe(sinkFileDescriptor)
13        }.apply {
14            name = "LoggerThread"
15            start()
16        }
17    }
18}

This approach might be problematic if there is ever a need for external logs. Starting from Jellybean, the logs from external apps can't be read.

6.2. 📷 Screenshots

Three strategies are employed to take a screenshot of the app:

PixelCopy
Drawing the window decor view to a canvas. This strategy can't capture dialogs or other windows.
Using reflection to get all relevant windows and draw them to a canvas. This captures dialogs and other elements.

The best strategy is picked and used during runtime.

ScreenShotterFactory.kt

1class ScreenShotterFactory(
2    private val screenshotHandler: Handler
3) {
4
5    fun createOrderedBestShotters(): Set<ScreenShotter> {
6        val shotters = mutableSetOf<ScreenShotter>()
7
8        if(AndroidVersion.isAtLeastOreo()) {
9            shotters += WindowScreenShotter(screenshotHandler)
10        }
11
12        shotters += ReflectionScreenShotter(screenshotHandler)
13        shotters += RootViewScreenShotter(screenshotHandler)
14        return shotters
15    }
16
17    fun createBestShotter() = createOrderedBestShotters().first()
18}

7. 🏃 Worker & WorkRunner

The WorkRunner sits at the heart of DART. Once the user clicks the start button, the Worker notifies the gRPC server that it is now available for work. The worker includes its current hardware and software states in the request. This information allows the gRPC server to match a worker to the right job at the right time. For instance, a worker won't receive work if it's hot.

The work object contains details about the work to run and what configurations to use.

Work.proto

1message Work {
2    /**
3     * Unique key of the given work.
4     */
5    string key = 1;
6    string packageName = 18;
7    string testPackageName = 19;
8    /**
9     * Type of the test being performed.
10     */
11    TestType type = 2;
12    /**
13     * The APK under test.
14     */
15    RemoteFile payload = 3;
16    /**
17     * The max time this test execution can run before it is cancelled (default: 15m).
18     * It does not include any time necessary to prepare and clean up the target device.
19     * The timeout unit is seconds. The maximum possible testing time is 1800 seconds.
20     */
21    int32 timeout = 4;
22    /**
23     * The locale is a two-letter (ISO 639-1) or three-letter (ISO 639-3) representation of the language.
24     */
25    string locale = 5;
26    /**
27     * The default orientation of the device.
28     */
29    ScreenOrientation orientation = 6;
30
31    /**
32     * TODO: Update the comment
33     * A comma-separated, key=value map of environment variables and their desired values. The environment variables are mirrored as extra options to the am instrument -e KEY1 VALUE1 … command and passed to your test runner (typically AndroidJUnitRunner). Examples:
34    Enable code coverage and provide a directory to store the coverage results when using Android Test Orchestrator (--use-orchestrator):
35
36    --environment-variables clearPackageData=true,coverage=true,coverageFilePath=/sdcard/
37    Enable code coverage and provide a file path to store the coverage results when not using Android Test Orchestrator (--no-use-orchestrator):
38
39    --environment-variables coverage=true,coverageFile=/sdcard/coverage.ec
40     */
41    map<string, string> environmentVariables = 8;
42
43    /**
44     * Specifies the number of times a test execution should be reattempted if one or more of its test cases fail for any reason. An execution that initially fails but succeeds on any reattempt is reported as FLAKY.
45    The maximum number of reruns allowed is 10. (Default: 1, which implies one rerun.)
46     */
47    int32 numRetriesPerDevice = 9;
48    int32 numTestRetries = 27;
49
50    /**
51     * Monitor and record performance metrics: CPU, memory, and network usage. Enabled by default.
52     */
53    bool isPerformanceMonitoringEnabled = 10;
54    stats.SampleFrequency sampleFrequency = 24;
55
56    /**
57     * Enable video recording during the test. Enabled by default.
58     */
59    bool isVideoRecordingEnabled = 11;
60    /**
61     * The fully-qualified Java class name of the instrumentation test runner (default: the last name extracted from the APK manifest).
62     */
63    string testRunnerClassName = 12;
64    /**
65     * TODO: Update the comment and think of implementation
66     * A list of one or more test target filters to apply (default: run all test targets).
67     * Each target filter must be fully qualified with the package name, class name, or test annotation desired.
68     * Any test filter supported by am instrument -e … is supported. See https://developer.android.com/reference/android/support/test/runner/AndroidJUnitRunner for more information.
69     * Examples:
70    --test-targets "package com.my.package.name"
71    --test-targets "notPackage com.package.to.skip"
72    --test-targets "class com.foo.ClassName"
73    --test-targets "notClass com.foo.ClassName#testMethodToSkip"
74    --test-targets "annotation com.foo.AnnotationToRun"
75    --test-targets "size large notAnnotation com.foo.AnnotationToSkip"
76     */
77    repeated string testTargets = 13;
78    repeated tests.AtomicTest tests = 14;
79
80    /**
81     * Whether each test runs in its own Instrumentation instance with the Android Test Orchestrator (default: Orchestrator is not used, same as specifying --no-use-orchestrator).
82     * Orchestrator is only compatible with AndroidJUnitRunner v1.0 or higher.
83     * See https://developer.android.com/training/testing/junit-runner.html#using-android-test-orchestrator for more information about Android Test Orchestrator.
84     */
85    bool isIsolated = 15;
86    bool shouldClearPackageData = 16;
87    bool obscureScreen = 17;
88    bool takeWindowAutoShots = 20;
89    bool retrieveAppFiles = 21;
90    bool retrieveTestFiles = 22;
91    bool useSystemProfiler = 23;
92
93    int32 autoScreenShotFps = 25;
94    int32 autoScreenShotQuality = 26;
95}
96
97message RemoteFile {
98    string url = 1;
99    int64 sizeBytes = 2;
100    int64 lastModified = 3;
101}
102
103enum TestType {
104    INSTRUMENTATION = 0;
105}
106
107enum ScreenOrientation {
108    PORTRAIT = 0;
109    LANDSCAPE = 1;
110}

8. ✍️ Addendum

The gRPC server is still in early development. Feel free to check out the Github repo.

Hopefully, I was able to pique your interest in this topic. Please drop a comment if you have any questions.