In any system that employs the exchange of buffers between
independent buffer Producers and buffer Consumers, there is a need for a policy
to control buffer life times (allocation/deallocation) and a policy to control
the access to the buffer memory (read/write).
A third entity, the buffer Allocator, is in charge of providing access
to the system memory and implementing the buffer life time maintenance (a
“dead” buffer cannot be accessed by any entity except the Allocator, while a
“live” buffer may be used by entities other than the Allocator). The “C” language malloc/free system calls are
an example of an Allocator. In a way,
the buffer life time control policy is really another form of buffer access control.
The buffer access control policy determines if either the
Producer or the Consumer can access the buffer in a mutually exclusive manner.
The Android Fence abstraction is a mechanism that implements
a particular buffer access control policy, and does not deal with buffer
lifetime control (allocation/deallocation).
It allows for situations where there is a 1:1 relationship between
Producer:Consumer and a 1:many relationship between Producer:Consumers. Fences are external to buffers (i.e. they are
not part of the buffer structure) and synchronize the exchange of buffer
ownership (access control) between Producer and Consumer(s) or vice versa.
It is of particular importance to understand that in
situations where Android mandates the use of Fences, it is not sufficient for a
Consumer to have a pointer to buffer memory - even when it is explicitly
provided by the Producer. The Fence must
also permit the Consumer to access the buffer memory, for either read or write
access, depending on the situation.
Timelines, Synchronization Points and Fences
To fully understand the Android fences, beyond its use in
the Camera subsystem, you need to get familiar with Timelines and
Synchronization Points. The kernel
documentation (linux/kernel/Documentation/sync.txt) provides the only source of
information on these concepts that I could find, and instead of rephrasing this
documentation, I bring it here in full:
Motivation:
In complicated DMA pipelines such as graphics (multimedia,
camera, gpu, display)
a consumer of a buffer needs to know when the producer has
finished producing
it. Likewise the
producer needs to know when the consumer is finished with the
buffer so it can reuse it.
A particular buffer may be consumed by multiple consumers which will
retain the buffer for different amounts of time. In addition, a consumer may consume multiple
buffers atomically.
The sync framework adds an API which allows
synchronization between the
producers and consumers in a generic way while also allowing
platforms which
have shared hardware synchronization primitives to exploit
them.
Goals:
*
provide a generic API for expressing synchronization dependencies
*
allow drivers to exploit hardware synchronization between hardware
blocks
*
provide a userspace API that allows a compositor to manage
dependencies.
*
provide rich telemetry data to allow debugging slowdowns and stalls of
the graphics pipeline.
Objects:
*
sync_timeline
*
sync_pt
*
sync_fence
sync_timeline:
A sync_timeline is an abstract monotonically increasing
counter. In general, each driver/hardware block context will have one of
these. They can be backed by the
appropriate hardware or rely on the generic sw_sync implementation.
Timelines are only ever created through their specific
implementations
(i.e. sw_sync.)
sync_pt:
A sync_pt is an abstract value which marks a point on a
sync_timeline. Sync_pts have a single timeline parent. They have 3 states: active, signaled, and
error.
They start in active state and transition, once, to either
signaled (when the timeline counter advances beyond the sync_pt’s value) or
error state.
sync_fence:
Sync_fences are the primary primitives used by drivers to
coordinate synchronization of their buffers.
They
are a collection of sync_pts which may or may not have the same timeline parent. A sync_pt can only exist in one fence and the
fence's list of sync_pts is immutable once created. Fences can be waited on synchronously or
asynchronously. Two fences can also be
merged to create a third fence containing a copy of the two fences
ג€™ sync_pts. Fences are backed
by file descriptors to allow userspace to coordinate the display pipeline dependencies.
Use:
A driver implementing sync support should have a work
submission function which:
* takes a fence
argument specifying when to begin work
*
asynchronously queues that work to kick off when the fence is signaled
* returns a fence to indicate when its work
will be done.
*
signals the returned fence once the work is completed.
Consider an imaginary display driver that has the
following API:
/*
* assumes buf is
ready to be displayed.
* blocks until the
buffer is on screen.
*/
void
display_buffer(struct dma_buf *buf);
The new API will become:
/*
*
will display buf when fence is signaled.
*
returns immediately with a fence that will signal when buf
* is
no longer displayed.
*/
struct sync_fence* display_buffer(struct dma_buf *buf,
struct
sync_fence *fence);
The relationships between the
objects described above is depicted in the diagram below.
Android Fence Implementation Details
User-space code can choose between a C++ fence implementation
(using the Fence class) and a C code library implementation. The C++ implementation is just a lean wrapper
around the sync C library code, and the C library does little more than invoke
ioctl system calls on a kernel device implementing the synchronization API.
The Android kernel includes the ‘sync’ module, also known as
the synchronization framework, which implements the Timeline, Fence, and
Synchronization Point infrastructure.
This module can be leveraged by hardware device drivers which choose to
implement the synchronization API.
The kernel also includes a software timeline device driver (/dev/sync)
which implements a software based timeline that does not reference a specific
hardware module. The SW timeline device
driver uses the kernel’s Synchronization framework.
Understanding the Synchronization API
The first step in using the Synchronization API in
user-space is creating a timeline handle (file descriptor). The sample call flow below shows how the
userspace C library creates a handle to an instance of the generic software
timeline (sw_sync) using function
sw_sync_timeline_create.
After the timeline is created, the user can use arbitrarily
increase the timeline counter (sw_sync_timeline_inc) or create fence handles
(sw_sync_fence_create). Each
fence initially contains one synchronization points on the timeline.
If the user needs two or more synchronization points
attached to a fence, he creates more fences and then merges them together (
sync_merge).
//
Create a generic sw_sync timeline
int
sw_timelime = sw_sync_timeline_create();
//
Create two fences on the sw_sync timeline; at sync points 2 and 5
int
sw_fence1 = sw_sync_fence_create(sw_timeline, "fence1", 2);
int
sw_fence2 = sw_sync_fence_create(sw_timeline, "fence2", 5);
//
Merge sw_fence1 and sw_fence2 to create a single fence containing
//
the two sync points
int
sw_fence3 = sync_merge("fence3", sw_fence1, sw_fence2);
The kernel Synchronization API (for in-kernel modules) is
similar, but synchronization points need to be created explicitly:
//
Create a generic sw_sync timeline
struct
sync_timeline* timeline = sw_sync_timeline_create(“some_name”);
//
Create a sync_pt
struct
sync_pt *pt = sw_sync_pt_create(sfb->timeline, sfb->timeline_max);
//
Create a fence attached to a sync_pt
struct
sync_fence *fence = sync_fence_create("some_other_name", pt);
//
Attach a file descriptor to the fence
int
fd = get_unused_fd()
sync_fence_install(fence,
fd);
Using Fences for Synchronization
Recall that the timeline abstraction represents a
monotonically increasing counter, and synchronization points represent specific
future values of this counter (points on the timeline). How a timeline increases (its clock rate, so
to say) is timeline specific. A GPU, for
example, may use an internal counter interrupt to increase its timeline
counter. The generic sw_sync timeline is
manually increased by the Synchronization API client when it invokes sw_sync_timeline_inc. The meaning of the synchronization point values
and the method of how two synchronization points are compared to one another
are timeline specific. The sw_sync
device models simple points on a line. Whenever the Synchronization framework is
notified of timeline counter increase, it tests if the counter reached (or
passed) the timeline value of existing synchronization points on the timeline and
triggers wake-up events on the relevant fences.
Userspace clients of the Synchronization framework that want
to be notified (signaled) about fence state change use the
sync_wait API. Kernel
clients of the
Synchronization framework have a similar API, but also have an API for
asynchronous fence state change notification (via callback registration).
When userspace closes a valid sync_timeline handle, the
Synchronization framework checks if it needs to signal any active fences which
have synchronization points on that timeline.
Closing a fence handle does not signal the fence: it just removes the
fence’s synchronization points from their respective timelines.
Userspace C++ Fence Wrapper
- ./frameworks/native/libs/ui/Fence.cpp
- ./frameworks/native/include/ui/Fence.h
Userspace C Library
- ./system/core/libsync/sync.c
Kernel Software Timeline
- ./linux/kernel/drivers/staging/android/sw_sync.h
- ./linux/kernel/drivers/staging/android/sw_sync.c
- ./external/kernel-headers/original/linux/sw_sync.h
Kernel Fence Framework
- ./external/kernel-headers/original/linux/sync.h
- ./linux/kernel/drivers/staging/android/sync.h
- ./linux/kernel/drivers/staging/android/sync.c