Saturday, October 19, 2013

Android Synchronization Fences – An Introduction

In any system that employs the exchange of buffers between independent buffer Producers and buffer Consumers, there is a need for a policy to control buffer life times (allocation/deallocation) and a policy to control the access to the buffer memory (read/write).  A third entity, the buffer Allocator, is in charge of providing access to the system memory and implementing the buffer life time maintenance (a “dead” buffer cannot be accessed by any entity except the Allocator, while a “live” buffer may be used by entities other than the Allocator).  The “C” language malloc/free system calls are an example of an Allocator.  In a way, the buffer life time control policy is really another form of buffer access control. The buffer access control policy determines if either the Producer or the Consumer can access the buffer in a mutually exclusive manner.



The Android Fence abstraction is a mechanism that implements a particular buffer access control policy, and does not deal with buffer lifetime control (allocation/deallocation).  It allows for situations where there is a 1:1 relationship between Producer:Consumer and a 1:many relationship between Producer:Consumers.  Fences are external to buffers (i.e. they are not part of the buffer structure) and synchronize the exchange of buffer ownership (access control) between Producer and Consumer(s) or vice versa. 
It is of particular importance to understand that in situations where Android mandates the use of Fences, it is not sufficient for a Consumer to have a pointer to buffer memory - even when it is explicitly provided by the Producer.  The Fence must also permit the Consumer to access the buffer memory, for either read or write access, depending on the situation.

Timelines, Synchronization Points and Fences


To fully understand the Android fences, beyond its use in the Camera subsystem, you need to get familiar with Timelines and Synchronization Points.  The kernel documentation (linux/kernel/Documentation/sync.txt) provides the only source of information on these concepts that I could find, and instead of rephrasing this documentation, I bring it here in full:
Motivation:

In complicated DMA pipelines such as graphics (multimedia, camera, gpu, display)
a consumer of a buffer needs to know when the producer has finished producing
it.  Likewise the producer needs to know when the consumer is finished with the
buffer so it can reuse it.  A particular buffer may be consumed by multiple consumers which will retain the buffer for different amounts of time.  In addition, a consumer may consume multiple buffers atomically.
The sync framework adds an API which allows synchronization between the
producers and consumers in a generic way while also allowing platforms which
have shared hardware synchronization primitives to exploit them.

Goals:
                * provide a generic API for expressing synchronization dependencies
                * allow drivers to exploit hardware synchronization between hardware
                  blocks
                * provide a userspace API that allows a compositor to manage
                  dependencies.
                * provide rich telemetry data to allow debugging slowdowns and stalls of
                   the graphics pipeline.

Objects:
                * sync_timeline
                * sync_pt
                * sync_fence

sync_timeline:

A sync_timeline is an abstract monotonically increasing counter. In general, each driver/hardware block context will have one of these.  They can be backed by the appropriate hardware or rely on the generic sw_sync implementation.
Timelines are only ever created through their specific implementations
(i.e. sw_sync.)

sync_pt:

A sync_pt is an abstract value which marks a point on a sync_timeline. Sync_pts have a single timeline parent.  They have 3 states: active, signaled, and error.
They start in active state and transition, once, to either signaled (when the timeline counter advances beyond the sync_pt’s value) or error state.

sync_fence:

Sync_fences are the primary primitives used by drivers to coordinate synchronization of their buffers.  They are a collection of sync_pts which may or may not have the same timeline parent.  A sync_pt can only exist in one fence and the fence's list of sync_pts is immutable once created.  Fences can be waited on synchronously or asynchronously.  Two fences can also be merged to create a third fence containing a copy of the two fencesג€™ sync_pts.  Fences are backed by file descriptors to allow userspace to coordinate the display pipeline dependencies.

Use:

A driver implementing sync support should have a work submission function which:
     * takes a fence argument specifying when to begin work
     * asynchronously queues that work to kick off when the fence is signaled 
     * returns a fence to indicate when its work will be done.
     * signals the returned fence once the work is completed.

Consider an imaginary display driver that has the following API:
/*
 * assumes buf is ready to be displayed.
 * blocks until the buffer is on screen.
 */
    void display_buffer(struct dma_buf *buf);

The new API will become:
/*
 * will display buf when fence is signaled.
 * returns immediately with a fence that will signal when buf
 * is no longer displayed.
 */
struct sync_fence* display_buffer(struct dma_buf *buf,
                                 struct sync_fence *fence);


The relationships between the objects described above is depicted in the diagram below.


Android Fence Implementation Details 

User-space code can choose between a C++ fence implementation (using the Fence class) and a C code library implementation.  The C++ implementation is just a lean wrapper around the sync C library code, and the C library does little more than invoke ioctl system calls on a kernel device implementing the synchronization API.

The Android kernel includes the ‘sync’ module, also known as the synchronization framework, which implements the Timeline, Fence, and Synchronization Point infrastructure.  This module can be leveraged by hardware device drivers which choose to implement the synchronization API. 

The kernel also includes a software timeline device driver (/dev/sync) which implements a software based timeline that does not reference a specific hardware module.  The SW timeline device driver uses the kernel’s Synchronization framework.

Understanding the Synchronization API

The first step in using the Synchronization API in user-space is creating a timeline handle (file descriptor).  The sample call flow below shows how the userspace C library creates a handle to an instance of the generic software timeline (sw_sync) using function sw_sync_timeline_create.


After the timeline is created, the user can use arbitrarily increase the timeline counter (sw_sync_timeline_inc) or create fence handles (sw_sync_fence_create).  Each fence initially contains one synchronization points on the timeline. 



If the user needs two or more synchronization points attached to a fence, he creates more fences and then merges them together (sync_merge).

// Create a generic sw_sync timeline
int sw_timelime = sw_sync_timeline_create();

// Create two fences on the sw_sync timeline; at sync points 2 and 5
int sw_fence1 = sw_sync_fence_create(sw_timeline, "fence1", 2);
int sw_fence2 = sw_sync_fence_create(sw_timeline, "fence2", 5);

// Merge sw_fence1 and sw_fence2 to create a single fence containing
// the two sync points
int sw_fence3 = sync_merge("fence3", sw_fence1, sw_fence2);

 

The kernel Synchronization API (for in-kernel modules) is similar, but synchronization points need to be created explicitly:

// Create a generic sw_sync timeline
struct sync_timeline* timeline = sw_sync_timeline_create(“some_name”);

// Create a sync_pt
struct sync_pt *pt = sw_sync_pt_create(sfb->timeline, sfb->timeline_max);

// Create a fence attached to a sync_pt
struct sync_fence *fence = sync_fence_create("some_other_name", pt);

// Attach a file descriptor to the fence
int fd = get_unused_fd()
sync_fence_install(fence, fd);

Using Fences for Synchronization

Recall that the timeline abstraction represents a monotonically increasing counter, and synchronization points represent specific future values of this counter (points on the timeline).  How a timeline increases (its clock rate, so to say) is timeline specific.  A GPU, for example, may use an internal counter interrupt to increase its timeline counter.  The generic sw_sync timeline is manually increased by the Synchronization API client when it invokes sw_sync_timeline_inc.  The meaning of the synchronization point values and the method of how two synchronization points are compared to one another are timeline specific.  The sw_sync device models simple points on a line.   Whenever the Synchronization framework is notified of timeline counter increase, it tests if the counter reached (or passed) the timeline value of existing synchronization points on the timeline and triggers wake-up events on the relevant fences.

Userspace clients of the Synchronization framework that want to be notified (signaled) about fence state change use the sync_wait API.  Kernel clients of the Synchronization framework have a similar API, but also have an API for asynchronous fence state change notification (via callback registration).

When userspace closes a valid sync_timeline handle, the Synchronization framework checks if it needs to signal any active fences which have synchronization points on that timeline.  Closing a fence handle does not signal the fence: it just removes the fence’s synchronization points from their respective timelines.


Userspace C++ Fence Wrapper

  • ./frameworks/native/libs/ui/Fence.cpp
  • ./frameworks/native/include/ui/Fence.h               
Userspace C Library

  • ./system/core/libsync/sync.c 
Kernel Software Timeline

  • ./linux/kernel/drivers/staging/android/sw_sync.h
  • ./linux/kernel/drivers/staging/android/sw_sync.c
  • ./external/kernel-headers/original/linux/sw_sync.h 
Kernel Fence Framework

  • ./external/kernel-headers/original/linux/sync.h
  • ./linux/kernel/drivers/staging/android/sync.h
  • ./linux/kernel/drivers/staging/android/sync.c

9 comments:

  1. Can you please explain with an example. For EX: there is GPU and Display driver at work. So, GPU will create one Timeline and Display driver will create one timeline.
    Now how it works is: GPU compose a frame and give to DisplayDriver to display on a panel and then waits for the Display to signal.
    How the two different timelines are sunchronized here?

    ReplyDelete
  2. Bollocks design. A single monotonically-progressing timeline for a GPU, what is this, the 70s ?

    ReplyDelete
    Replies
    1. Hi,
      I'm not sure which part of the design you find anachronistic. If you provide more information, perhaps I can clear things up.

      Delete
  3. If you really want to point out a problem, dont be Anonymous and say what is wrong. This article is better written than your comment.

    ReplyDelete
  4. For

    int sw_fence3 = ("fence3", sw_fence1, sw_fence2);

    I assume you mean

    int sw_fence3 = sync_merge("fence3", sw_fence1, sw_fence2);

    ReplyDelete
    Replies
    1. Yes, good catch!
      Thanks for pointing this out - I've fixed the post.
      Neta

      Delete
  5. This is a great article thanks for sharing this informative information. I will visit your blog regularly for some latest post. I will visit your blog regularly for Some latest post.
    גדרות אלומיניום

    ReplyDelete
  6. Hi Neta,

    Thanks for article, And could you please explain about this question as asked by previous reader:

    GPU compose a frame and give to DisplayDriver to display on a panel and then waits for the Display to signal.How the two different timelines are sunchronized here?

    ReplyDelete