Saturday, April 18, 2015

Android's Graphics Buffer Management System (Part II: BufferQueue)

In the first post on Android's graphics buffer management, I discussed gralloc, which is Android's graphics buffer allocation HAL.  In this post I'll describe graphics buffers flows in Android, with special attention to class BufferQueue which plays a central role in graphics buffer management.

Introduction

Before I dive in, I want to discuss buffers in general.  There is a surprising number of details and aspects involved in designing buffer systems and I think it is best to examine what was done in Android once we've assumed a wide and generic perspective.
Data buffers, and specifically image and graphics data buffers, exist as part of a specific subsystem, such as the camera subsystem, but can also span multiple subsystems, such as buffers shared between the camera and video subsystems.  Buffers provide a means to temporarily store data to allow us to separate the production of data from the consumption of data - in both time and space. That is, we can produce (or collect) data at one moment, and use it at a different moment.  This decouples producer and consumer, and also allows producer and consumer to be asynchronous to one another.  Many times in an event-based system the data producer and the data consumer are triggered (clocked) by different time sources.  For example, the camera on your mobile phone produces image frames at some arbitrary frame-rate (e.g. 30 frames per second, of FPS) while the display panel (showing the preview) can operate at a different refresh-rate (e.g. 60 Hz).  Moreover, even if the devices were guaranteed to operate at the same frequency (or if one frequency is a harmonic of the other), they are unlikely to have the same phase offset since the display operation starts when we turn on the screen, while the camera operation starts at some other arbitrary time when we start the camera application. And of course there is drift and jitter that contribute the asynchronous nature of the two subsystems. There may also be several consumers, or several producers.  SurfaceFlinger, for example, uses buffers from multiple sources and composes them into a single output buffer.

Buffers also allow us to move data from one part of our system to another.  Inevitably, buffers follow some paths within our system and these are commonly referred to the as the "data paths".  A path can start at a buffer provider which allocates new memory or provides a buffer from a pre-allocated pool. The buffers are considered empty at this stage.  That is, they do not contain consumable data or metadata. A source entity provides the initial data by attaching it to a buffer (reference holding buffers) or copying the data to the buffer's memory.  Somehow, a buffer makes its way along a path of buffer handlers until it arrives at the content consumer which uses the data and discards the buffer. A buffer handler may be passive (e.g. monitor or logger), or it be active: filtering (drop), altering, augmenting, extracting, or otherwise manipulating the contents.  These paths can be either dynamic or static.  There are many design patterns which define how a data path is defined and controlled (pipes and filters, layering, pipeline, software bus messaging, direct addressing, broadcasting, observing, and so forth) and I will not cover them here as that would really be diverging from our topic.

Buffer systems are either closed-looped or open-looped.  In closed-loop paths there is a buffer path from the consumer back to the producer.  Sometimes this is made explicit, and sometimes implicit. For example, if the producer and consumer use a shared memory pool they implicitly form a closed-loop.    One can argue that using a shared buffer pool is not really a closed-loop, but I contend that as long as the system is designed using explicit knowledge of shared buffer memory, then it is closed. That is, if the consumer can starve or delay the producer because it controls the flow of buffers available to the producer, then this is a closed-loop system. C

Ah, and there is the question of what we mean by buffer.  A lot of time when people say "buffer" they are referring to the actual backend memory storing the content, but in real systems it is quite rare to see raw data moving around the system.  It is much more common to see buffer objects which contain metadata describing the data content.  What is contained in this metadata is implementation-specific and depends on the problem domain and context, but I'm sure we can agree that one piece of information we need to know is the amount of data stored in the buffer.  And there is the question of pointer-to-data (by reference) vs embedded data (by value).  Obviously zero-copy buffer handling is preferred, but requires us to be exact about buffer memory life-time management.  Life time management, access management and synchronization are other related aspects which I've discussed in the previous post so I'll cut things short right here. 

BufferQueue

After this generic discussion of data buffers, we can finally dive into the Android details. I'll start with class BufferQueue because it is at the center graphic buffer movement in Android.  It abstracts a queue of graphics buffers, uses gralloc to allocate buffers, and has means to connect buffer producers and consumers which reside in different process address spaces.
Code for class BufferQueue and many of the cooperating classes that I'll be discussing can be found in directory /frameworks/native/libs/gui/ with the header files in /frameworks/native/include/gui.


Class BufferQueue has a static factory method, BufferQueue::createBufferQueue, which is used to create BufferQueue instances.

    // BufferQueue manages a pool of gralloc memory slots to be used by
    // producers and consumers. allocator is used to allocate all the
    // needed gralloc buffers.
    static void createBufferQueue(sp* outProducer,
            sp* outConsumer,
            const sp& allocator = NULL);

A quick glance at the implementation reveals that class BufferQueue is only a thin facade to class BufferQueueCore, which conatins the actual implementation logic.  For simplicity of this discussion, I will not make a distinction between these classes.

Working with BufferQueue is pretty straight-forward.  First, producers and consumers connect to the BufferQueue.
1. The producer takes an “empty” buffer from the BufferQueue (dequeueBuffer)
2. The producer (e.g. camera) copies image or graphics data into the buffer
3. The producer returns the “filled” buffer to the BufferQueue (queueBuffer)
4. The consumer receives an indication (via callback) of the presence of a “filled” buffer
5. The consumer removes this buffer from the BufferQueue (acquireBuffer)
6. When the consumer is done consuming the buffer is returned to the BufferQueue (releaseBuffer)

The following diagram shows a simplified interaction diagram between the camera (image buffer producer) and the display (image buffer consumer). 

Figure 1: Simplified data path between the camera subsystem and the GPU
Producers and Consumers may reside in different processes and this is accomplished using Binder, as always.

BufferQueueProducer is the workhorse behind IGraphicBufferProducer.  BufferQueueProducer maintains an intimate relationship with BufferQueueCore and directly accesses its member variables, including mutexes, conditions and other significant members (such as its pointer to IGraphicBufferAlloc).  Personally, I don't like this - it is confusing and fragile. 
When a Producer is requested to provide an empty buffer using dequeueBuffer, it tries to fetch one from BufferQueueCore which maintains an array of buffers and their states (DEQUEUED, QUEUED, ACQUIRED, FREE).  If a free slot is found in the buffer array but it doesn’t contain a buffer, or if the Producer was explicitly asked to reallocate the buffer, then BufferQueueProducer uses BufferQueueCore’s to allocate a new buffer.  

Initially, all invocations of dequeueBuffer results in the allocation of new buffers.  But because this is a closed-loop system, where the buffer Consumer returns buffers once it has consumed their contents (by calling releaseBuffer), we should see the system reaching equilibrium after a very short while.  Be aware that although BufferQueueCore can maintain an array of variable-sized GraphicBuffer objects, it is wise to make all buffers of the same size.  Otherwise, each invocation of dequeueBuffer may require the allocation of a new GraphicBuffer instance.

Figure 2: The main classes related to BufferQueue
 The GraphicBuffer allocation is performed using an implementation of IGraphicBufferAlloc which is provided to BufferQueueCore when it is constructed.  The default implementation of IGraphicBufferAlloc is provided by SurfaceFlinger (the system object in charge of composing all surfaces) and uses gralloc to allocate buffers.  In the previous post I discussed why a central graphics buffers allocator is well-advised when dealing with various hardware SoC modules.
Class BufferQueueCore doesn’t directly store GraphicBuffer – it uses class BufferItem which contains a pointer to a GraphicBuffer instance, including various other metadata (see frameworks/native/include/gui/BufferItem.h).
Figure 3: Class diagram showing the main classes related to graphics buffer allocation

Asynchronous notification interfaces IConsumerListener and IProducerListener are used to alert listeners about events such as a buffer being ready for consumption (IConsumerListener::onFrameAvailable); or the availability of an empty buffer (IProducerListener::onBufferReleased).  These callback interfaces also use Binder and can cross process boundaries.  Checkout further details in frameworks/native/include/gui/IConsumerListener.h

The best source of information I found on Android’s graphics system, aside from the code itself of course, is here.

Consumers

Figure: Some consumer classes

BufferQueue Creation

Figure: Top to bottom BufferQueue creation flow



8 comments:

  1. Hey, thanks for these past two blog posts and this blog in general. I've been trying to make an app that records an from the camera to an opengl surface that filters video then records it to a media codec or media recorder with a medium amount of luck, and like all good explanations this post answers some questions and raises others.

    1. I noticed that when I forgot to drain my media codec that had a surface as input then the device would basically freeze after a few seconds but without ever detecting an ANR. After stressing that I may have gotten some opengl arcana wrong I realized it was just the EGL14.eglSwapBuffers backing up and that draining it would fix the problem. But it was alarming to see that there was no protection or detection of surface backpressure. I'm a little worried that if I try and do some hefty opengl work on an old phone it might freeze up without warning when a surface somewhere backs up. Is there any way to detect this?

    2. Is the surface format guaranteed to not have things like keyframes or control frames like a video codec, but always real frames?

    3. I've seen you mention multiple consumers of bufferqueues, but until the very recent camera 2 api and the addTarget method I've never seen this done at the application layer. Is it feasible to have multiple consumers, or will there always be timing problems with consumers taking too long to process or copy data?

    4. Where does threading fit into all this? I assume this is all done on the main thread. Is it possible/a good idea to specify a proxy consumer that can copy data to real consumers on different threads for proccesing/io?

    5. A bit off topic, but where does a surface lockCanvas and unlockCanvasAndPost fit into all this wrt a surfaceview? Will writing to a canvas composite the canvas on top of a surfaceview or completely overwrite it?

    ReplyDelete
    Replies
    1. Hi Scott,

      I'll try to answer these questions here, but I need to think if some of these deserve to be answered in a separate blog entry. I've written similar code (except using MediaRecorder instead of MediaCodec and OpenVX for filtering instead of OGL) and maybe I'll be able to share it.

      1. Detecting back pressure is an interesting issue. The first thing that jumps to mind is that you can manually detect back-pressure by using a timer mechanism on the consumer side. It requires you to know the expected interval between buffer arrivals. You configure a timer to a duration that is larger than this interval and If the timer expires then you know that something is not happening correctly. Clunky. Another option is to check the queue level by draining buffers on the consumer side whenever the consumer is alerted about a new buffer being ready. For maximum reactivity, you can use a dedicated thread to listen to onFrameAvailable. This thread then uses some policy to discard buffers if the queue level is too deep. But I don't have a clear answer at the moment and I'll have to think about this.

      2. Yes, the formats are graphic formats defined in graphics.h (https://github.com/android/platform_system_core/blob/master/include/system/graphics.h), One caveat though: the buffer format is determined by gralloc after looking at the consumer and producer, and any hints it gets. So it is perfectly possible that a camera will use a proprietary format when passing buffers to the display (e.g. COLOR_QCOM_FormatYUV420SemiPlanar). There's an interesting, though somewhat old, question on this in one of Google's Android boards: https://groups.google.com/forum/#!topic/android-platform/p_MoSk0JPNM.

      3. Not sure exactly to which quote of mine you are referring to. However, you can look at frameworks/native/include/gui/StreamSplitter.h: "StreamSplitter is an autonomous class that manages one input BufferQueue and multiple output BufferQueues." Of course, there's a closed loop here, too. So the slowest consumer is the potential bottleneck of this setup. I haven't used the Splitter, but it can be useful.

      4. Threading is an orthogonal design decision. In the app I mentioned at the top, for example, I use a thread-pool to process produced buffers. Every frame produced by the camera is consumed by one of the threads. Of course, I have to make sure that I arrange the buffers in the correct order before displaying or recording them.

      5. Associating a Canvas with a Surface is of course only relevant to Surfaces that are accessibly to the Host CPU (i.e. not Surfaces that are used only by HW). Locking a Canvas on a SurfaceView locks the Surface Canvas, which is implemented in function nativeLockCanvas at frameworks/base/core/jni/android_view_Surface.cpp. This locks a specific Surface buffer (or a region of it) so that it is not accessed by hardware at this time:

      ANativeWindow_Buffer outBuffer;
      status_t err = surface->lock(&outBuffer, dirtyRectPtr);

      Writing to the Surface buffer using a Canvas writes directly to the buffer memory. There's no compositing done here. To perform compositing would require two Layers (i.e. two buffers), but here we are dealing with a single buffer (outBuffer).

      Good questions!
      Thanks,
      Neta

      Delete
    2. BTW, I noticed that Google recently updated their graphics documentation. Regarding question #1 above, there is interesting information regarding the 3 modes of BufferQueue here: http://source.android.com/devices/graphics/index.html#bufferqueue

      Delete
    3. Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download Now

      >>>>> Download Full

      Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download LINK

      >>>>> Download Now

      Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download Full

      >>>>> Download LINK z5

      Delete
  2. Looking at the HAL spec, a few more questions occurred to me. https://source.android.com/devices/camera/camera3_requests_hal.html

    What is happening internally when the camera preview is set to a surface but you can also retrieve the frame data with onPreviewFrame? Is this the same data or is their a copy somewhere? Are there two copies of the byte data? I noticed that setDisplayOrientation does not affect the orientation of onPreviewFrame. So is the data going to the preview surface being rotated before or after it gets to the surface?

    The camera2 api has a addTarget(Surface) method so one producer can go to two consumers. What is happening internally here in terms of memory allocation and blocking? Does it use the normal producer-consumer as an intermediary? What happens if there's backpressure on one surface but not another?

    There were a lot of questions in there so I hope I was clear. Basically, it seems like there's some internal method android is using to send graphical data to multiple destinations, as we first saw with the camera sending data to a preview display, a mediacodec/recorder, and onPreviewFrame. Recently we see this again in the Camera2 API addTarget method. What's going on here?

    Thanks for all your help down this rabbit hole.

    ReplyDelete
  3. onPreviewFrame is part of the android.hardware.Camera package (i.e. the legacy Camera APIs which were deprecated by android.hardware.camera2). In this older API, there is dedicated Surface for use for preview display. The Surface is provided to the Camera HAL via setPreviewDisplay(SurfaceHolder). However, for onPreviewFrame there is no Surface infrastructure and instead there is an explicit buffer copy. This asymmetry in handling of image streams was part of the design fix introduced with HALv3 and android.hardware.camera2. Short answer: yes, there are two copies in the legacy API.

    Regarding setDisplayOrientation: the topic of image rotation and mirroring (front camera) and the relationship between the camera orientation and display orientation is confusing and needs a few diagrams to explain (perhaps in a future blog if there is interest). In any case, the application instructs the framework (CameraClient) how to display the preview correctly by calling setDisplayOrientation. This sets a transform attribute in the buffers (native_window_set_buffers_transform) which is used when the Surface is finally composed to the display. Short answer: rotation is not performed on the buffers themselves.

    Regarding addTarget(Surface): this is also an involved topic, but I'll try to answer. I will give the short answer first :-) There is actually a Producer for every Surface provided in addTarget. I know this is confusing, but Surface represents the producer side of the BufferQueue (in fact a Surface is a proxy for IGraphicBufferProducer). Invoking addTarget(Surface) is akin to calling addTarget(IGraphicBufferProducer). In other words, the "real producer" (in this case the camera) is given an IGraphicBufferProducer to which it should queue Buffers. The other tidbit is that there is something that I call "the Stream-Surface" duality: there is a 1:1 relationship between every HALv3 Stream and some Surface. For every Surface in the target list, the Camera HAL device (v3, of course) assigns a Camera3OutputStream. The Camera3OutputStream queue buffers to the Surface. And in fact, the camera hardware produces distinct and different data for each of the streams. This is because each Camera3OutputStream (i.e. each Surface) has a distinct resolution (scale and aspect ration), format (YUV subsampling formats, RGB formats, etc) and memory layout (planar, semi-planar, tiled, etc).
    In summary, if there are two target Surfaces, then there are two HAL Streams and each of the two Surfaces will have a distinct Consumer.
    I hope this helps. In order to explain this well I need to first explain about the HALv3 interface abstractions.
    Regarding the back-pressure: for this, too, you need to understand the Camera HALv3 interface. But without getting into the details: the camera and the buffer consumers form a buffer closed-loop via the BufferQueue. So if a camera CaptureRequest requires 2 buffers, each from a different BufferQueue, it is enough for one of the BufferQueues to be "empry" in order to stall the entire camera pipeline (for that camera device, at least).

    Did this help or did I cause more confusion?#
    Neta

    ReplyDelete
    Replies
    1. Yes, that actually clears things up nicely, it's just that the topics themselves are very confusing and changed over time.

      Delete
  4. Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download Now

    >>>>> Download Full

    Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download LINK

    >>>>> Download Now

    Running Down A Dream: Android'S Graphics Buffer Management System (Part Ii: Bufferqueue) >>>>> Download Full

    >>>>> Download LINK

    ReplyDelete