Sunday, October 27, 2013

The US Constitution and Meyer's Open/Closed Principle

While trying to explain Meyer's Open/Closed Principle to a friend, I scratched my head trying to find a real-world example which illustrates the principle.  An example which will be hard to dispute and easy to grasp.

On my way home from work the news reported on the NSA's latest shenanigans (this time it was spying on German Chancellor Angela Merkel).  My thoughts drifted and I contemplated the US Constitution.

Some facts on the Constitution of the United States (source):
  • It went into effect on March 4, 1789
  • It has been amended twenty-seven times
  • The Bill of Rights (the first 10 amendments) was ratified on December 15, 1791
  • The list of all 27 amendments is worth reviewing and of particular interest are amendments 18 and 21 ('git revert', anyone?)
Imagine that! 224 years: from 13 states to 50; one Civil war, two World wars, and countless other wars; the invention of the light bulb; radio and television; labor laws; civil rights movement; the Great Depression; the lunar landing; Roe vs. Wade; 9/11.  And on it goes - with only 27 amendments.
Damn!  Tell me that ain't cool.

The US Constitution is perhaps the ultimate, time-tested example of Meyer's Open/Closed Principle: Open for extension; but Closed for modifications.

It is also worthwhile to reflect on the procedures for amending the constitution:

Before an amendment can take effect, it must be proposed to the states by a two-thirds vote of both houses of Congress or by a convention (known as an Article V convention) called by two-thirds of the states, and ratified by three-fourths of the states or by three-fourths of conventions thereof, the method of ratification being determined by Congress at the time of proposal. To date, no convention for proposing amendments has been called by the states, and only once—in 1933 for the ratification of the twenty-first amendment—has the convention method of ratification been employed.


As software architects and designers, perhaps we should build similar protections against perpetual refactoring of production quality code.  No, I didn't mean that in the literal sense, but I do advocate investing the time to excavate an existing architecture to uncover its governing principles, and understanding how it can be extended while preserving those principles.
Maybe we'll end up with software as durable as the US Constitution.

Saturday, October 19, 2013

Android Synchronization Fences – An Introduction

In any system that employs the exchange of buffers between independent buffer Producers and buffer Consumers, there is a need for a policy to control buffer life times (allocation/deallocation) and a policy to control the access to the buffer memory (read/write).  A third entity, the buffer Allocator, is in charge of providing access to the system memory and implementing the buffer life time maintenance (a “dead” buffer cannot be accessed by any entity except the Allocator, while a “live” buffer may be used by entities other than the Allocator).  The “C” language malloc/free system calls are an example of an Allocator.  In a way, the buffer life time control policy is really another form of buffer access control. The buffer access control policy determines if either the Producer or the Consumer can access the buffer in a mutually exclusive manner.



The Android Fence abstraction is a mechanism that implements a particular buffer access control policy, and does not deal with buffer lifetime control (allocation/deallocation).  It allows for situations where there is a 1:1 relationship between Producer:Consumer and a 1:many relationship between Producer:Consumers.  Fences are external to buffers (i.e. they are not part of the buffer structure) and synchronize the exchange of buffer ownership (access control) between Producer and Consumer(s) or vice versa. 
It is of particular importance to understand that in situations where Android mandates the use of Fences, it is not sufficient for a Consumer to have a pointer to buffer memory - even when it is explicitly provided by the Producer.  The Fence must also permit the Consumer to access the buffer memory, for either read or write access, depending on the situation.

Timelines, Synchronization Points and Fences


To fully understand the Android fences, beyond its use in the Camera subsystem, you need to get familiar with Timelines and Synchronization Points.  The kernel documentation (linux/kernel/Documentation/sync.txt) provides the only source of information on these concepts that I could find, and instead of rephrasing this documentation, I bring it here in full:
Motivation:

In complicated DMA pipelines such as graphics (multimedia, camera, gpu, display)
a consumer of a buffer needs to know when the producer has finished producing
it.  Likewise the producer needs to know when the consumer is finished with the
buffer so it can reuse it.  A particular buffer may be consumed by multiple consumers which will retain the buffer for different amounts of time.  In addition, a consumer may consume multiple buffers atomically.
The sync framework adds an API which allows synchronization between the
producers and consumers in a generic way while also allowing platforms which
have shared hardware synchronization primitives to exploit them.

Goals:
                * provide a generic API for expressing synchronization dependencies
                * allow drivers to exploit hardware synchronization between hardware
                  blocks
                * provide a userspace API that allows a compositor to manage
                  dependencies.
                * provide rich telemetry data to allow debugging slowdowns and stalls of
                   the graphics pipeline.

Objects:
                * sync_timeline
                * sync_pt
                * sync_fence

sync_timeline:

A sync_timeline is an abstract monotonically increasing counter. In general, each driver/hardware block context will have one of these.  They can be backed by the appropriate hardware or rely on the generic sw_sync implementation.
Timelines are only ever created through their specific implementations
(i.e. sw_sync.)

sync_pt:

A sync_pt is an abstract value which marks a point on a sync_timeline. Sync_pts have a single timeline parent.  They have 3 states: active, signaled, and error.
They start in active state and transition, once, to either signaled (when the timeline counter advances beyond the sync_pt’s value) or error state.

sync_fence:

Sync_fences are the primary primitives used by drivers to coordinate synchronization of their buffers.  They are a collection of sync_pts which may or may not have the same timeline parent.  A sync_pt can only exist in one fence and the fence's list of sync_pts is immutable once created.  Fences can be waited on synchronously or asynchronously.  Two fences can also be merged to create a third fence containing a copy of the two fencesג€™ sync_pts.  Fences are backed by file descriptors to allow userspace to coordinate the display pipeline dependencies.

Use:

A driver implementing sync support should have a work submission function which:
     * takes a fence argument specifying when to begin work
     * asynchronously queues that work to kick off when the fence is signaled 
     * returns a fence to indicate when its work will be done.
     * signals the returned fence once the work is completed.

Consider an imaginary display driver that has the following API:
/*
 * assumes buf is ready to be displayed.
 * blocks until the buffer is on screen.
 */
    void display_buffer(struct dma_buf *buf);

The new API will become:
/*
 * will display buf when fence is signaled.
 * returns immediately with a fence that will signal when buf
 * is no longer displayed.
 */
struct sync_fence* display_buffer(struct dma_buf *buf,
                                 struct sync_fence *fence);


The relationships between the objects described above is depicted in the diagram below.


Android Fence Implementation Details 

User-space code can choose between a C++ fence implementation (using the Fence class) and a C code library implementation.  The C++ implementation is just a lean wrapper around the sync C library code, and the C library does little more than invoke ioctl system calls on a kernel device implementing the synchronization API.

The Android kernel includes the ‘sync’ module, also known as the synchronization framework, which implements the Timeline, Fence, and Synchronization Point infrastructure.  This module can be leveraged by hardware device drivers which choose to implement the synchronization API. 

The kernel also includes a software timeline device driver (/dev/sync) which implements a software based timeline that does not reference a specific hardware module.  The SW timeline device driver uses the kernel’s Synchronization framework.

Understanding the Synchronization API

The first step in using the Synchronization API in user-space is creating a timeline handle (file descriptor).  The sample call flow below shows how the userspace C library creates a handle to an instance of the generic software timeline (sw_sync) using function sw_sync_timeline_create.


After the timeline is created, the user can use arbitrarily increase the timeline counter (sw_sync_timeline_inc) or create fence handles (sw_sync_fence_create).  Each fence initially contains one synchronization points on the timeline. 



If the user needs two or more synchronization points attached to a fence, he creates more fences and then merges them together (sync_merge).

// Create a generic sw_sync timeline
int sw_timelime = sw_sync_timeline_create();

// Create two fences on the sw_sync timeline; at sync points 2 and 5
int sw_fence1 = sw_sync_fence_create(sw_timeline, "fence1", 2);
int sw_fence2 = sw_sync_fence_create(sw_timeline, "fence2", 5);

// Merge sw_fence1 and sw_fence2 to create a single fence containing
// the two sync points
int sw_fence3 = sync_merge("fence3", sw_fence1, sw_fence2);

 

The kernel Synchronization API (for in-kernel modules) is similar, but synchronization points need to be created explicitly:

// Create a generic sw_sync timeline
struct sync_timeline* timeline = sw_sync_timeline_create(“some_name”);

// Create a sync_pt
struct sync_pt *pt = sw_sync_pt_create(sfb->timeline, sfb->timeline_max);

// Create a fence attached to a sync_pt
struct sync_fence *fence = sync_fence_create("some_other_name", pt);

// Attach a file descriptor to the fence
int fd = get_unused_fd()
sync_fence_install(fence, fd);

Using Fences for Synchronization

Recall that the timeline abstraction represents a monotonically increasing counter, and synchronization points represent specific future values of this counter (points on the timeline).  How a timeline increases (its clock rate, so to say) is timeline specific.  A GPU, for example, may use an internal counter interrupt to increase its timeline counter.  The generic sw_sync timeline is manually increased by the Synchronization API client when it invokes sw_sync_timeline_inc.  The meaning of the synchronization point values and the method of how two synchronization points are compared to one another are timeline specific.  The sw_sync device models simple points on a line.   Whenever the Synchronization framework is notified of timeline counter increase, it tests if the counter reached (or passed) the timeline value of existing synchronization points on the timeline and triggers wake-up events on the relevant fences.

Userspace clients of the Synchronization framework that want to be notified (signaled) about fence state change use the sync_wait API.  Kernel clients of the Synchronization framework have a similar API, but also have an API for asynchronous fence state change notification (via callback registration).

When userspace closes a valid sync_timeline handle, the Synchronization framework checks if it needs to signal any active fences which have synchronization points on that timeline.  Closing a fence handle does not signal the fence: it just removes the fence’s synchronization points from their respective timelines.


Userspace C++ Fence Wrapper

  • ./frameworks/native/libs/ui/Fence.cpp
  • ./frameworks/native/include/ui/Fence.h               
Userspace C Library

  • ./system/core/libsync/sync.c 
Kernel Software Timeline

  • ./linux/kernel/drivers/staging/android/sw_sync.h
  • ./linux/kernel/drivers/staging/android/sw_sync.c
  • ./external/kernel-headers/original/linux/sw_sync.h 
Kernel Fence Framework

  • ./external/kernel-headers/original/linux/sync.h
  • ./linux/kernel/drivers/staging/android/sync.h
  • ./linux/kernel/drivers/staging/android/sync.c