Saturday, April 18, 2015

Android's Graphics Buffer Management System (Part II: BufferQueue)

In the first post on Android's graphics buffer management, I discussed gralloc, which is Android's graphics buffer allocation HAL.  In this post I'll describe graphics buffers flows in Android, with special attention to class BufferQueue which plays a central role in graphics buffer management.

Introduction

Before I dive in, I want to discuss buffers in general.  There is a surprising number of details and aspects involved in designing buffer systems and I think it is best to examine what was done in Android once we've assumed a wide and generic perspective.
Data buffers, and specifically image and graphics data buffers, exist as part of a specific subsystem, such as the camera subsystem, but can also span multiple subsystems, such as buffers shared between the camera and video subsystems.  Buffers provide a means to temporarily store data to allow us to separate the production of data from the consumption of data - in both time and space. That is, we can produce (or collect) data at one moment, and use it at a different moment.  This decouples producer and consumer, and also allows producer and consumer to be asynchronous to one another.  Many times in an event-based system the data producer and the data consumer are triggered (clocked) by different time sources.  For example, the camera on your mobile phone produces image frames at some arbitrary frame-rate (e.g. 30 frames per second, of FPS) while the display panel (showing the preview) can operate at a different refresh-rate (e.g. 60 Hz).  Moreover, even if the devices were guaranteed to operate at the same frequency (or if one frequency is a harmonic of the other), they are unlikely to have the same phase offset since the display operation starts when we turn on the screen, while the camera operation starts at some other arbitrary time when we start the camera application. And of course there is drift and jitter that contribute the asynchronous nature of the two subsystems. There may also be several consumers, or several producers.  SurfaceFlinger, for example, uses buffers from multiple sources and composes them into a single output buffer.

Buffers also allow us to move data from one part of our system to another.  Inevitably, buffers follow some paths within our system and these are commonly referred to the as the "data paths".  A path can start at a buffer provider which allocates new memory or provides a buffer from a pre-allocated pool. The buffers are considered empty at this stage.  That is, they do not contain consumable data or metadata. A source entity provides the initial data by attaching it to a buffer (reference holding buffers) or copying the data to the buffer's memory.  Somehow, a buffer makes its way along a path of buffer handlers until it arrives at the content consumer which uses the data and discards the buffer. A buffer handler may be passive (e.g. monitor or logger), or it be active: filtering (drop), altering, augmenting, extracting, or otherwise manipulating the contents.  These paths can be either dynamic or static.  There are many design patterns which define how a data path is defined and controlled (pipes and filters, layering, pipeline, software bus messaging, direct addressing, broadcasting, observing, and so forth) and I will not cover them here as that would really be diverging from our topic.

Buffer systems are either closed-looped or open-looped.  In closed-loop paths there is a buffer path from the consumer back to the producer.  Sometimes this is made explicit, and sometimes implicit. For example, if the producer and consumer use a shared memory pool they implicitly form a closed-loop.    One can argue that using a shared buffer pool is not really a closed-loop, but I contend that as long as the system is designed using explicit knowledge of shared buffer memory, then it is closed. That is, if the consumer can starve or delay the producer because it controls the flow of buffers available to the producer, then this is a closed-loop system. C

Ah, and there is the question of what we mean by buffer.  A lot of time when people say "buffer" they are referring to the actual backend memory storing the content, but in real systems it is quite rare to see raw data moving around the system.  It is much more common to see buffer objects which contain metadata describing the data content.  What is contained in this metadata is implementation-specific and depends on the problem domain and context, but I'm sure we can agree that one piece of information we need to know is the amount of data stored in the buffer.  And there is the question of pointer-to-data (by reference) vs embedded data (by value).  Obviously zero-copy buffer handling is preferred, but requires us to be exact about buffer memory life-time management.  Life time management, access management and synchronization are other related aspects which I've discussed in the previous post so I'll cut things short right here. 

BufferQueue

After this generic discussion of data buffers, we can finally dive into the Android details. I'll start with class BufferQueue because it is at the center graphic buffer movement in Android.  It abstracts a queue of graphics buffers, uses gralloc to allocate buffers, and has means to connect buffer producers and consumers which reside in different process address spaces.
Code for class BufferQueue and many of the cooperating classes that I'll be discussing can be found in directory /frameworks/native/libs/gui/ with the header files in /frameworks/native/include/gui.


Class BufferQueue has a static factory method, BufferQueue::createBufferQueue, which is used to create BufferQueue instances.

    // BufferQueue manages a pool of gralloc memory slots to be used by
    // producers and consumers. allocator is used to allocate all the
    // needed gralloc buffers.
    static void createBufferQueue(sp* outProducer,
            sp* outConsumer,
            const sp& allocator = NULL);

A quick glance at the implementation reveals that class BufferQueue is only a thin facade to class BufferQueueCore, which conatins the actual implementation logic.  For simplicity of this discussion, I will not make a distinction between these classes.

Working with BufferQueue is pretty straight-forward.  First, producers and consumers connect to the BufferQueue.
1. The producer takes an “empty” buffer from the BufferQueue (dequeueBuffer)
2. The producer (e.g. camera) copies image or graphics data into the buffer
3. The producer returns the “filled” buffer to the BufferQueue (queueBuffer)
4. The consumer receives an indication (via callback) of the presence of a “filled” buffer
5. The consumer removes this buffer from the BufferQueue (acquireBuffer)
6. When the consumer is done consuming the buffer is returned to the BufferQueue (releaseBuffer)

The following diagram shows a simplified interaction diagram between the camera (image buffer producer) and the display (image buffer consumer). 

Figure 1: Simplified data path between the camera subsystem and the GPU
Producers and Consumers may reside in different processes and this is accomplished using Binder, as always.

BufferQueueProducer is the workhorse behind IGraphicBufferProducer.  BufferQueueProducer maintains an intimate relationship with BufferQueueCore and directly accesses its member variables, including mutexes, conditions and other significant members (such as its pointer to IGraphicBufferAlloc).  Personally, I don't like this - it is confusing and fragile. 
When a Producer is requested to provide an empty buffer using dequeueBuffer, it tries to fetch one from BufferQueueCore which maintains an array of buffers and their states (DEQUEUED, QUEUED, ACQUIRED, FREE).  If a free slot is found in the buffer array but it doesn’t contain a buffer, or if the Producer was explicitly asked to reallocate the buffer, then BufferQueueProducer uses BufferQueueCore’s to allocate a new buffer.  

Initially, all invocations of dequeueBuffer results in the allocation of new buffers.  But because this is a closed-loop system, where the buffer Consumer returns buffers once it has consumed their contents (by calling releaseBuffer), we should see the system reaching equilibrium after a very short while.  Be aware that although BufferQueueCore can maintain an array of variable-sized GraphicBuffer objects, it is wise to make all buffers of the same size.  Otherwise, each invocation of dequeueBuffer may require the allocation of a new GraphicBuffer instance.

Figure 2: The main classes related to BufferQueue
 The GraphicBuffer allocation is performed using an implementation of IGraphicBufferAlloc which is provided to BufferQueueCore when it is constructed.  The default implementation of IGraphicBufferAlloc is provided by SurfaceFlinger (the system object in charge of composing all surfaces) and uses gralloc to allocate buffers.  In the previous post I discussed why a central graphics buffers allocator is well-advised when dealing with various hardware SoC modules.
Class BufferQueueCore doesn’t directly store GraphicBuffer – it uses class BufferItem which contains a pointer to a GraphicBuffer instance, including various other metadata (see frameworks/native/include/gui/BufferItem.h).
Figure 3: Class diagram showing the main classes related to graphics buffer allocation

Asynchronous notification interfaces IConsumerListener and IProducerListener are used to alert listeners about events such as a buffer being ready for consumption (IConsumerListener::onFrameAvailable); or the availability of an empty buffer (IProducerListener::onBufferReleased).  These callback interfaces also use Binder and can cross process boundaries.  Checkout further details in frameworks/native/include/gui/IConsumerListener.h

The best source of information I found on Android’s graphics system, aside from the code itself of course, is here.

Consumers

Figure: Some consumer classes

BufferQueue Creation

Figure: Top to bottom BufferQueue creation flow