Running down a dream: 2014

Sunday, December 21, 2014

Several Ways to Express a Convolution in Halide

There are at least two ways to express a convolution operation in Halide; more if the kernel is separable. I'll review these in this post.

Let's examine a simple Gaussian 3x3 lowpass (smoothing) filter (also known as a Gaussian Blur):

$\frac{1}{16} \left[ {\begin{array}{ccc} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \\ \end{array} } \right]$

The straight-forward method sums up the pixel neighborhood, using the weights in the convolution kernel.

Halide::Func gaussian_3x3_1(Halide::Func input) {

Halide::Func k, gaussian;

Halide::Var x,y,c;

gaussian(x,y,c) = input(x-1, y-1, c) * 1 + input(x, y-1, c) * 2 + input(x+1, y-1, c) * 1 +

input(x-1, y, c) * 2 + input(x, y, c) * 4 + input(x+1, y, c) * 2 +

input(x-1, y+1, c) * 1 + input(x, y+1, c) * 2 + input(x+1, y+1, c) * 1;

gaussian(x,y,c) /= 16;

return gaussian;

}

We have to watch out not to overflow the summation of the pixel values. In the gaussian_3x3_1 function below, the type of the summation (gaussian(x,y,c)) is deduced from the type of input(x,y,c) and if this is an 8-bit type for example, then it will most likely overflow and the output will be wrong without emitting any errors. One way to handle this is to explicitly set the kernel weights to floats, but this will most likely hurt performance because it will require a cast and arithmetic operations on the float type.

gaussian(x,y,c) = input(x-1, y-1, c) * 1.f + input(x, y-1, c) * 2.f + input(x+1, y-1, c) * 1.f +

input(x-1, y, c) * 2.f + input(x, y, c) * 4.f + input(x+1, y, c) * 2.f +

input(x-1, y+1, c) * 1.f + input(x, y+1, c) * 2.f + input(x+1, y+1, c) * 1.f;

I prefer to cast the input type so that the caller of the function has the control on when to cast and when not to cast. Here's an example which loads a 24-bit RGB image (8 bit per color channel), clamps the image values and converts from uint8_t to int32_t.

Halide::Image input = load("images/rgb.png");

Halide::Func padded, padded32;

padded(x,y,c) = input(clamp(x, 0, input.width()-1), clamp(y, 0, input.height()-1), c);

padded32(x,y,c) = Halide::cast(padded(x,y,c));

Halide::Func gaussian_3x3_fn = gaussian_3x3_1(padded32);

Another method to perform that convolution uses a 2-dimensional reduction domain for the convolution kernel. We define a 3x3 RDom which spans from -1 to +1 in both width and height. When the RDom in the code below is evaluated, r.x takes the values {-1, 0, 1} and r.y similarly takes the values {-1, 0, 1}. Therefore, x+r.x takes the values {x-1, x, x+1}.

Halide::Func gaussian_3x3_2(Halide::Func input) {

Halide::Func k, gaussian;

Halide::RDom r(-1,3,-1,3);

Halide::Var x,y,c;

k(x,y) = 0;

k(-1,-1) = 1; k(0,-1) = 2; k(1,-1) = 1;

k(-1, 0) = 2; k(0, 0) = 4; k(1, 0) = 2;

k(-1, 1) = 1; k(0, 1) = 2; k(1, 1) = 1;

gaussian(x,y,c) = sum(input(x+r.x, y+r.y, c) * k(r.x, r.y));

gaussian(x,y,c) /= 16;

return gaussian;

}

Because a Gaussian kernel is separable (that is, it can be expressed as the outer product of two vectors), we can express it in yet another way:

Halide::Func gaussian_3x3_3(Halide::Func input) {

Halide::Func gaussian_x, gaussian_y;

Halide::Var x,y,c;

gaussian_x(x,y,c) = (input(x-1,y,c) + input(x,y,c) * 2 + input(x+1,y,c)) / 4;

gaussian_y(x,y,c) = (gaussian_x(x,y-1,c) + gaussian_x(x,y,c) * 2 + gaussian_x(x,y+1,c) ) / 4;

return gaussian_y;

}

Of course, we can also use a reduction domain here. In this case we need a 1-dimensional RDom spanning {-1, 0, 1}:

Halide::Func gaussian_3x3_4(Halide::Func input) {

Halide::Func k, gaussian_x, gaussian_y;

Halide::Var x,y,c;

Halide::RDom r(-1,3);

k(x) = 0;

k(-1) = 1; k(0) = 2; k(1) = 1;

gaussian_x(x,y,c) = sum(input(x+r.x, y, c) * k(r)) / 4;

gaussian_y(x,y,c) = sum(gaussian_x(x, y+r, c) * k(r)) / 4;

return gaussian_y;

}

And we can also play a bit with the syntax, replacing the Halide::Sum function with explicit summation over the reduction domain:

Halide::Func gaussian_3x3_5(Halide::Func input) {

Halide::Func k, gaussian_x, gaussian_y;

Halide::Var x,y,c;

Halide::RDom r(-1,3);

k(x) = 0;

k(-1) = 1; k(0) = 2; k(1) = 1;

gaussian_x(x,y,c) += input(x+r.x, y, c) * k(r) / 4;

gaussian_y(x,y,c) += gaussian_x(x, y+r, c) * k(r) / 4;

return gaussian_y;

}

So does it matter how we specify the algorithm? The premise of Halide says 'no': write the algorithm once, and then experiment with different schedules until you get the best performance. Intuitively, gaussian_3x3_2 is better than gaussian_3x3_1 because the Halide::RDom should have been optimized by Halide's compiler. And gaussian_3x3_3 should perform better than gaussian_3x3_2 because it provides another degree of freedom when scheduling. However, this is intuition and what we care about is actual performance measurements.

I haven't measured this yet, so I owe you the results soon... ;-)

Friday, December 12, 2014

Halide Excursions

I finally took the time to start a Halide-based github project, called halide-excursions.
I'm new to computer imaging and vision, and the algorithms and applications of this domain are a new frontier for my curiosity. The halide-excursions project is an attempt to create a large, open source repository of Halide computer-vision, computational-photography, and image processing functions. Anyone and everyone is more than welcome to join.

Halide is a new language for image processing and computer vision. It was developed by several PhD students at MIT and it is actively maintained. There's a developer mailing list with less than a handful of messages a day (so it is easy to lazily eavesdrop on the conversation) where you can communicate directly with the Halide developers. The response time is very short and the guys on the other end are very nice and eager to help.

Halide's succinct, functional syntax is very appealing and perhaps this is what drew me in. If, like me, you are starting with little knowledge in the domain, and the word 'kernel' makes you think about the Linux kernel; and 'convolution' is a long forgotten concept from the university, then it might take a little more energy to get to smooth sailing. But that's part of the fun. Moving from Wikipedia to implementation is always a nice feeling and I find Halide a great platform to do this. Here's the gradient magnitude of the Sobel operator (source code in halide-excursions):

The source code comes with a bunch of tutorials, sample "applications" and a large set of unit tests that can serve as a jumping board to start learning and messing around. There's no user guide or official language specification, but there is doxygen-generated documentation. All in all, I think there's plenty of resources to get you started.

Halide is supported on several OSs and cores (incl. GPUs) and promises the same performance as hand-optimized native code, with less code lines,less mess and with portability. Hand optimized code - using vectorization and parallelization intrinsics and a sort of other tricks - is hard to read, hard to maintain, hard/impossible to port and makes exploring the scheduling optimization space very slow. Halide's ability to separate the algorithm from the scheduling policy is very appealing and works well, most of the time. For example, when implementing the Viola-Jones face detection algorithm, I found that implementing the classifier cascade phase in Halide cannot be done optimally because of Halide's poor support for control code. In a future post I'll provide more examples showing where Halide shines, and where a hybrid native-Halide solution works better.
Until then, I hope you join the project.

Friday, September 5, 2014

Google's Depth Map (Part II)

In the previous post I described Google's Lens Blur feature and how we can use code to extract the depth map stored in the PNG image output by Lens Blur. Lens Blur performs a series of image frame captures, calculates the depth of each pixel (based on a user-configurable focus locus) and produces a new image having a bokeh effect. In other words, the scene background, is blurred. Lens Blur stores the new (bokeh) image, the original image, and the depth-map in a single PNG file.

In this blog post we'll pick up where we left off last time, right after we extracted the original image and the depth map from the Lens Blur's PNG output file and stored them each in a separate PNG file. This time we'll go in the reverse direction: that is, starting with the original image and the depth map we'll create a depth blurred image - an image with the bokeh effect. I'm going to show that with very little sophistication we can achieve some pretty good results in replicating the output of Lens Blur. This is all going to be kinda crude, but I think the RoI (return-on-investment) is quite satisfactory.

The original image as extracted to a PNG file

My simpleton's approach for depth blurring is this: I start by finding the mean value of the depth map. As you probably recall, the grey values in the depth map correspond to the depth values calculated by Lens Blur. The mean depth value will serve as a threshold value - every pixel above the threshold will be blurred while all other pixels will be left untouched. This sounds rather crude, not to say dirty, but it works surprisingly well. At the outset I thought I would need to use several threshold values, each with a differently sized blurring kernel. Larger blurring kernels use more neighboring pixels and therefore increase the blurring effect. But alas, a simple Boolean threshold filter works good enough.

The depth map as calculated by Lens Blur after extraction and storage as a PNG file.
The grey value of every pixel in this image corresponds to the depth of the pixel in
the original image. Darker pixels are closer to the camera and have a smaller value.

The diagram below shows the Boolean threshold filter in action: we traverse all pixels in the original image an every pixel above the threshold is zeroed (black).

The result of thresholding the original image using the mean value of the depth-map.

You can see that the results are not too shabby. Cool, right?

I think it is interesting to ask if it is legitimate to expect this thresholding technique to work for every Lens Blur depth-map? And what's the optimal threshold, I mean why not threshold using the median? or mean-C? or some other calculated value?

Let's get back to the image above: along the edges of the t-shirt and arm there is (as expected) a sharp gradient in the depth value. Of course, this is due to Google's depth extraction algorithm which performs a non-gradual separation between foreground and background. If we looked at the depth-map as a 3D terrain map, we should see a large and fast rising "mountain" where the foreground objects are (arm, cherry box). We expect this "raised" group of pixels to be a closed and connected convex pixel set. That is, we don't expect multiple "mountains" or sprinkles of "tall" pixels. Another way to look at the depth-map is through the histogram. Unlike intensity histograms , which tell the story of the image illumination, the data in our histogram conveys depth information.

The x-axis of the histogram depicted here (produced by IrfanView) is a value between 0-255 and corresponds to the height value assigned by Lens Blur (after normalizing to the 8-bit sample size space). The y-axis indicates the number of pixels in the image which have the corresponding x values. The red line is the threshold; here I located it at x=172 which is the mean value in the depth-map. All pixels above the threshold (i.e. to the right of the red line) are background pixels and all pixels below the threshold are foreground. This histogram looks like a classic bimodal histogram with two modes of distribution; one corresponding to the foreground pixels and the other corresponding to the background pixels. Under the assumptions I laid above on how the Lens Blur depth algorithm works, this bimodal histogram is what we should expect.
It is now clear how the thresholding technique separates these two groups of pixels and how the choice of threshold value affects the results. Obviously the threshold value needs to be somewhere between the foreground and the background modes. Exactly where? Now that's a bit tougher. In his book on image processing, Alan Bovik suggests applying probability theory to determine the optimal threshold (see pages 73-74). Under our assumption of only two modes (background and foreground), and if we assume binomial probability density functions for the two modes, then Bovik's method is likely to work. But I think the gain is small for the purpose of this proof-of-concept. If you download the source code you can play around with different values for the threshold.

The next step is not much more complicated. Instead of zeroing (blackening) pixels above the threshold, we use some kind of blurring (smoothing) algorithm, as in image denoising. The general idea is to use each input pixel's neighboring pixels in order to calculate the value of the corresponding output pixel. That is, we use convolution to apply a low-pass filter on the pixels which pass the threshold test condition.

Application of a filter on the input image to generate the output image.

As you can see in the code, I've used several different smooothing kernels (mean, median, gausian) with several different sizes. There are a few details that are worth mentioning.
The convolution filter uses pixels from the surrounding environment of the input pixel, and sometimes we don't have such pixels available. Let's take for example the first pixel, at the upper left corner (x,y) = (0,0) and a box kernel of size 5x5. Clearly we are missing the 2 upper rows (y=-1, y=-2) and 2 left columns (x=-1, x=-2). There are several strategies to deal with this such duplicating rows and columns or using a fixed value to replace the missing data. Another strategy is to create an output image that is smaller than the input image. For example, using the 5x5 kernel, we would ignore the first and last two rows and first and last two columns and produce an output image that is 4 columns and 4 rows smaller than the input frame. You can also change the filter kernel such that instead of using the center of the kernel as the pixel we convolve around, we can use one of the corners. This doesn't bring us out of the woods, but it lets us "preserve" two of the sides of the image. And you can do what I did in the code, which is literally "cutting some corners": for all pixels in rows and columns that fall outside of a full kernel, I simply leave them untouched. You can really see this when using larger kernels - the frame is sharp and the internal part of the image is blurry. Ouch - I wouldn't use that in "real" life ;-)
Next, there's the issue of the kernel size. As mentioned above, larger kernel sizes achieve more blurring, but the transition between blurred pixels and non-blurred pixels (along the depth threshold contour lines) is more noticeable. One possible solution is to use a reasonably sized kernel and perform the smoothing pass more than once. If your filter is non-linear (e.g. Gaussian) then the result might be a bit hairy.
In the output of Google's Lens Blur you can also easily detect artifacts along the depth threshold contour lines, but because they change the "strength" of the blurring as a function of the pixel depth (instead of a binary threshold condition as in my implementation) they can achieve a smoother transition at the edges of the foreground object.

Gaussian smoothing with kernel size 9x9

Mean.filter with 7x7 kernel size., 2 passes

Mean.filter with 11x11 kernel size., 1 pass

Mean.filter with 15x15 kernel size., 1 pass

Overall, this was quite a simple little experiment, although the quality of the output image is not as good as Lens Blur's. I guess you do get what you pay for ;-)

/*
 *  The implementation is naive and non-optimized to make this code easier to read
 *  To use this code you need to download LodePNG(lodepng.h and lodepng.cpp) from 
 *  http://lodev.org/lodepng/.
 *  Thanks goes to Lode Vandevenne for a great PNG utility!
 *
 */
#include "stdafx.h"
#include "lodepng.h"
#include 
#include 
#include 
#include 
#include 

struct image_t {
    image_t() : width(0), height(0), channels(0), buf(0) {}
    image_t(size_t w, size_t h, size_t c, uint8_t *buf ) : 
        width(w), height(h), channels(c), buf(buf) {}
    size_t width;
    size_t height;
    size_t channels;
    uint8_t *buf;
};

struct box_t {
    box_t(size_t w, size_t h) : w(w), h(h) {}
    size_t w;
    size_t h;
};

struct image_stats {
    image_stats() : mean(0) {
        memset(histogram, 0, sizeof(histogram));
    }
    size_t histogram[256];
    size_t mean;
};

void calc_stats(image_t img, image_stats &stats) {
    uint64_t sum = 0;  // 64 bit sum to prevent overflow

    // assume the image is grayscale and calc the stats for only the first channel
    for (size_t row=0; row v;
        for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
            for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
                v.push_back(

                   input.buf[y*input.width*input.channels + x*input.channels + color]);
            }
        }
        std::nth_element( v.begin(), v.begin()+(v.size()/2),v.end() );
        return v[v.size()/2];
    }
};


class Gaussian_9x9 : public Filter {
public:
    Gaussian_9x9(const image_t &input, const image_t &output) : 
        Filter(input, output, box_t(9,9)) {}
private:
    size_t convolve(size_t row, size_t col, size_t color) const {
        static const 
        uint8_t kernel[9][9] = {{0, 0, 1,  1,  1,  1, 1, 0, 0}, 
                                {0, 1, 2,  3,  3,  3, 2, 1, 0}, 
                                {1, 2, 3,  6,  7,  6, 3, 2, 1}, 
                                {1, 3, 6,  9, 11,  9, 6, 3, 1}, 
                                {1, 3, 7, 11, 12, 11, 7, 3, 1}, 
                                {1, 3, 6,  9, 11,  9, 6, 3, 1}, 
                                {1, 2, 3,  6,  7,  6, 3, 2, 1}, 
                                {0, 1, 2,  3,  3,  3, 2, 1, 0}, 
                                {0, 0, 1,  1,  1,  1, 1, 0, 0}};
        static const size_t kernel_sum = 256;
        size_t total = 0;
        for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
            for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
                total += input.buf[y*input.width*input.channels + x*input.channels +

                                   color] * 
                         kernel[y-row+size.h/2][x-col+size.w/2];
            }
        }
         return total/kernel_sum;
    }
};

class Gaussian_5x5 : public Filter {
public:
    Gaussian_5x5(const image_t &input, const image_t &output) :

        Filter(input, output, box_t(5,5)) {}
protected:
    size_t convolve(size_t row, size_t col, size_t color) const {
        static const 
        uint8_t kernel[5][5] = {{ 1,  4,  7,  4,  1},
                                { 4, 16, 26, 16,  4},
                                { 7, 26, 41, 26,  7},
                                { 4, 16, 26, 16,  4},
                                { 1,  4,  7,  4,  1}};
        static const size_t kernel_sum = 273;
        size_t total = 0;
        for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
            for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
                // convolve
                total += input.buf[y*input.width*input.channels + x*input.channels +

                                   color] * 
                          kernel[y-row+size.h/2][x-col+size.w/2];
            }
        }
        return total/kernel_sum;
    }
};

void blur_image(const image_t &input_img, const image_t &output_img, 
                const image_t &depth_img, const BlurConfig &cfg) {
  size_t width = input_img.width;
  size_t height = input_img.height;
  size_t channels = input_img.channels;

  for (size_t pass=cfg.num_passes; pass>0; pass--) {
      for (size_t row=0; row cfg.threshold) {
                    size_t new_pixel = cfg.filter.execute(row, col, color);    
                    output_img.buf[row*width*channels+col*channels+color] = new_pixel;
                } else {
                    output_img.buf[row*width*channels + col*channels + color] = 
                        input_img.buf[row*width*channels + col*channels + color];
                }
            }
        }
      }

      // going for another pass: the input for the next pass will be the output

      // of this pass
      if ( pass > 1 ) 
        memcpy(input_img.buf, output_img.buf, height*width*channels);
  }
 }

void do_blur() {
    const std::string wdir("");
    const std::string inimage(wdir + "gimage_image.png");
    const std::string outimage(wdir + "gimage_image.blur.png");
    const std::string depthfile(wdir + "gimage_depth.png");

    image_t depth_img;
    depth_img.channels = 3;
    unsigned error = lodepng_decode24_file(&depth_img.buf, &depth_img.width, 
                                           &depth_img.height, depthfile.c_str());
    if(error) { 
        printf("[%s] decoder error %u: %s\n", depthfile.c_str(), error,

                lodepng_error_text(error));
        return;
    }

    image_t input_img;
    input_img.channels = 3;
    error = lodepng_decode24_file(&input_img.buf, &input_img.width, 
                                  &input_img.height, inimage.c_str());
    if(error) { 
        printf("[%s] decoder error %u: %s\n", depthfile.c_str(), error,

                lodepng_error_text(error));
        return;
    }

    image_t output_img(input_img.width, 
                       input_img.height, 
                       input_img.channels, 
                       (uint8_t *)

                       malloc(input_img.width*input_img.height*input_img.channels));

    
    image_stats depth_stats;
    calc_stats(depth_img, depth_stats);

    // Choose one of these filters or add your own
    // Set the filter connfiguration: filter algo and size, number of passes, threshold
    BlurConfig cfg(MeanBlur(input_img, output_img, box_t(7,7)), 3, depth_stats.mean);

    /*
    BlurConfig cfg(MeanBlur(input_img, output_img, box_t(11,11)), 2, depth_stats.mean);
    BlurConfig cfg(MedianBlur(input_img, output_img, box_t(7,7)), 1, depth_stats.mean);
    BlurConfig cfg(Constant(input_img, output_img), 1, depth_stats.mean);
    BlurConfig cfg(Gaussian_9x9(input_img, output_img), 1, depth_stats.mean);
    BlurConfig cfg(Gaussian_5x5(input_img, output_img), 5, depth_stats.mean);

    */

    blur_image(input_img, output_img, depth_img, cfg);

    error = lodepng_encode24_file(outimage.c_str(), output_img.buf, 
                                  output_img.width, output_img.height);
    if (error) 
        printf("[%s] encoder error %u: %s\n", outimage.c_str(), error,

                lodepng_error_text(error));

    free(depth_img.buf);
    free(input_img.buf);
    free(output_img.buf);
}

int main(int argc, char* argv[])
{
    do_blur();
    return 0;
}

Saturday, June 7, 2014

Google's Depth Map

In my previous post I reported on Android's (presumed) new camera Java API and I briefly mentioned that its purpose is to provide the application developer more control over the camera, therefore allowing innovation in the camera application space. Google's recent updates to the stock Android camera application includes a feature called Lens Blur, which I suspect uses the new camera API to capture the series of frames required for the depth-map calculation (I am pretty sure that Lens Blur is only available on Nexus phones, BTW). In this post I want to examine the image files generated by Lens Blur.

Google uses XMP extended JPEG for storing Lens Blur picture files. The beauty of XMP is that arbitrary metadata can be added to a file without causing any problems for existing image viewing applications. Google's XMP's based depth-map storage format is described by Google on their developer pages but not all metadata fields are actually used by Lens Blur; and not all metadata used by Lens Blur are described on the developer pages. To look closer at this depth XMP format, you can copy a Len Blur image (JPEG) from your Android phone to your PC and open the file using a text editor. You should see the XMP metadata similar to the pasted data below:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GFocus="http://ns.google.com/photos/1.0/focus/"
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        xmlns:xmpNote="http://ns.adobe.com/xmp/note/"
      GFocus:BlurAtInfinity="0.0083850715"
      GFocus:FocalDistance="18.49026"
      GFocus:FocalPointX="0.5078125"
      GFocus:FocalPointY="0.30208334"
      GImage:Mime="image/jpeg"
      GDepth:Format="RangeInverse"
      GDepth:Near="11.851094245910645"
      GDepth:Far="51.39698028564453"
      GDepth:Mime="image/png"
      xmpNote:HasExtendedXMP="7CAF4BA13EEBAC578997926C2A696679"/>
  </rdf:RDF>
</x:xmpmeta>

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        GImage:Data="/9j/4AAQSkZJRQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBw...."
         GDepth:Data="iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uh......"
  </rdf:RDF>
</x:xmpmeta>

Two fields, GImage:Data and GDepth:Data are particularly interesting. The former stores the original image which I suppose is one of the series of images captured by the application. The latter stores the depth map as described by Google and as annotated by the metadata in the first RDF structure. The binary JPEG data that follows is the image that is actually displayed by the viewer and it is not necessarily the same picture that is stored in GImage:Data because this may be the product of a Lens Blur transformation. Storing the original picture data, with the depth-map and the "blurred" image takes a lot of room, but gives you the freedom to continuously alter the same picture. It is quite a nice feature.

Figure 1: Lens Blur output

Figure 2: image data stored in GImage:Data.
Notice that the background is very sharp compared to figure 1.

Figure 3: Depth map as extracted from GDepth:Data

GImage:Data and GDepth:Data are XML text fields so they must be encoded textually somehow, and Google chose to use Base64 for the encoding. When I decoded these fields I found that the image data (GImage:Data) stores a JPEG image, and the depth-map (GDepth:Data) is stored in a PNG image.

The following code extracts GImage:Data and GDepth:Data into two separate files (having JPEG and PNG formats, respectively). It starts by opening the Lens Blur file and searching for either GDepth:Data= or GImage:Data=. It then proceeds to decode the Base 64 data and spits out the decoded data into new files. It is quite straight forward except for a small caveat: interspersed within the GDepth:Data and GImage:Data is some junk that Google inserted in the form of a name-space URL descriptor (http://ns.adobe.com/xmp/extension/), a hash value, and some binary valued-bytes. I remove these simply by skipping 79 bytes once I detect a 0xFF byte.

// Naive O(n) string matcher
// It is naive because it always moves the "cursor" forward - even when a match fails.
// This is a correct assumption that we can make in the context of this program.
bool match(std::ifstream &image, const std::string &to_match) {
    size_t matched = 0;
    while (!image.eof()) {
        char c;
        image.get(c);
        if (image.bad())
            return false;
        if (c == to_match[matched]) {
            matched++;
            if (matched==to_match.size())
                return true;
        }
        else {
            matched = 0;
        }
    }
    return false;
}

class Base64Decoder {
public:
    Base64Decoder() : base64_idx(0) {}
    bool add(char c);
    size_t decode(char binary[3]);
private:
    static int32_t decode(char c);
    char base64[4];
    size_t base64_idx;
};


bool Base64Decoder::add(char c) {
    int32_t val = decode(c);
    if (val < 0)
        return false;

    base64[base64_idx % 4] = c;
    base64_idx = ++base64_idx % 4;
    if (base64_idx % 4 == 0) {
        return true;
    }
    return false;
}

inline
size_t Base64Decoder::decode(char binary[3]) {
    if (base64[3] == '=')  {
        if (base64[3] == '=') {
            int32_t tmp = decode(base64[0]) << 18;
                         
            binary[2] = binary[1] = 0;
            binary[0] = (tmp>>16) & 0xff;
            return 1;
        } else {
            int32_t tmp = decode(base64[0]) << 18 |
                          decode(base64[1]) << 12;
                         
            binary[2] = 0;
            binary[1] = (tmp>>8) & 0xff;
            binary[0] = (tmp>>16) & 0xff;
            return 2;
        }
    }

    int32_t tmp = decode(base64[0]) << 18 |
                  decode(base64[1]) << 12 |
                  decode(base64[2]) << 6  |
                  decode(base64[3]);

    binary[2] = (tmp & 0xff);
    binary[1] = (tmp>>8) & 0xff;
    binary[0] = (tmp>>16) & 0xff;
    return 3;
}


// Decoding can be alternatively performed by a lookup table
inline
int32_t Base64Decoder::decode(char c) {
    if (c>= 'A' && c<='Z')
        return (c-'A');
    if (c>='a' && c<='z')
        return (26+c-'a');
    if (c>='0' && c<='9')
        return (52+c-'0');
    if (c=='+')
        return 62;
    if (c=='/')
        return 63;
     
    return -1;
}

bool decode_and_save(char *buf, size_t buflen, Base64Decoder &decoder, std::ofstream &depth_map) {
    size_t i = 0;
    while (i < buflen) {
         // end of depth data
        if (buf[i] == '\"')
            return true;

        if (buf[i] == (char)0xff) {
            // this is Google junk which we need to skip
            i += 79; // this is the length of the junk
            assert(i        }

        if (decoder.add(buf[i])) {
            char binary[3];
            size_t bin_len = decoder.decode(binary);
            depth_map.write(binary, bin_len);
        }
        i++;
    }
    return false;
}

void extract_depth_map(const std::string &infile, const std::string &outfile, bool extract_depth) {
    std::ifstream blur_image;
    blur_image.open (infile, std::ios::binary | std::ios::in);
    if (!blur_image.is_open()) {
        std::cout << "oops - file " << infile << " did not open" << std::endl;
        return;
    }

    bool b = false;
    if (extract_depth)
        b = match(blur_image, "GDepth:Data=\"");
    else
        b = match(blur_image, "GImage:Data=\"");
    if (!b) {
        std::cout << "oops - file " << infile << " does not contain depth/image info" << std::endl;
        return;
    }

    std::ofstream depth_map;
    depth_map.open (outfile, std::ios::binary | std::ios::out);
    if (!depth_map.is_open()) {
        std::cout << "oops - file " << outfile << " did not open" << std::endl;
        return;
    }
   
    // Consume the data, decode from base64, and write out to file.
    char buf[10 * 1024];
    bool done = false;
    Base64Decoder decoder;
    while (!blur_image.eof() && !done) {
        blur_image.read(buf, sizeof(buf));
        done = decode_and_save(buf, sizeof(buf), decoder, depth_map);
    }

    blur_image.close();
    depth_map.close();
}

void main() {
    const std::string wdir(""); // put here the path to your files
    const std::string infile(wdir + "gimage_original.jpg");
    const std::string imagefile(wdir + "gimage_image.jpg");
    const std::string depthfile(wdir + "gimage_depth.png");

   extract_depth_map(infile, depthfile, true);
   extract_depth_map(infile, imagefile, false);
}

If you want to use the depth-map and image data algorithmically (e.g. to generate your own blurred image), don't forget to decompress the JPEG and PNG files, otherwise you will be accessing compressed pixel data. I used InfranView to generate raw RBG files, which I then manipulated and converted back to BMP files. I didn't include this code because it is not particularly interesting. Some other time I might describe how to use Halide ("a language for image processing and computational photography") to process the depth-map to create new images.

Sunday, June 1, 2014

Android's Hidden (and Future) Camera APIs

With the release of the KitKat AOSP source code Google also exposed its plans for a new API for the camera - package android.hardware.camera2. The new interfaces and classes are marked as @hide and sit quietly in /frameworks/base/core/java/android/hardware/camera2. The @hide attribute excludes these classes from the automatic documentation generation and from the SDK. The code is hidden because the API is not final and committed to by Google, but it is most likely to be quite similar in semantics, if not syntactically equivalent. Anyone who's been watching the camera HAL changes in the last few Android releases will tell you that this new API is expected to become official anytime now - most likely in the L Android release.

The new API is inspired by the FCAM project and aims to give the application developer precise and flexible control over all aspects of the camera. The old point & shoot paradigm which limits the camera to three dominating uses cases (preview, video, stills) is replaced by an API that abstracts the camera as a black box that produces streams of image frames in different formats and resolutions. The camera "black box" is configured and controlled via an abstract canonical model of the camera processing pipeline controls and properties. To understand the philosophy, motivation and details of this API, I think it is important to review the camera HAL (v3) documentation and to read the FCAM papers.

If you're an Android camera application developer, then you would be wise to study the new API.

Figure 1: the android.hardware.camera2 package

Besides studying the android.hardware.camera2 package, I looked for test and sample code in the AOSP to see how the API is used.

./frameworks/base/tests/Camera2Tests/SmartCamera/SimpleCamera/src/androidx/media/filterfw/samples/simplecamera/Camera2Source.java
./cts/tests/tests/hardware/src/android/hardware/camera2/cts/

CameraCharacteristicsTest.java
CameraDeviceTest.java
CameraManagerTest.java
CameraCaptureResultTest.java
ImageReaderTest.java

The diagrams below depict a single simple use-case of setting up camera preview and issuing a few JPEG stills captures requests. They are mostly self explanatory so I've included only a little text to describe them. If this is insufficient you can write your questions in the comments section below.

The process of discovering the cameras attached to the mobile device is depicted above. Note the AvailabilityListener which monitors the dynamic connection and disconnection of cameras to the device. I think it also monitors the use of camera objects by other applications. Both features do not exist in the current (soon to be "legacy") API.

Figure 3: Preparing the surfaces

Before doing anything with the camera, you need to configure the output surfaces - that's where the camera will render the image frames. Note the use of an ImageReader to obtain a Surface object to buffer JPEG formatted images.

Figure 4: Preview request

Preview is created by generating a repeating-request with a SurfaceView as the frame consumer.

Figure 5: Still capture request

Finally, when the user presses on the shutter button, the application issues a single capture request for a JPEG surface. Buffers are held by the ImageReader until the application acquires them individually.

Summary

That was a very brief introduction to the android.hardware.camera2 package which I think will be officially included in the L Android release. I'm sure the legacy API will continue existing for a long time in order to support current camera applications. However, you should consider learning the new API for the greater (and finer) camera control it provides.

Friday, March 21, 2014

Android, QEMU and the Camera - Emulating the Camera Hardware in Android (Part III)

This third post in the series about the Android QEMU camera discusses the camera service (as noted in part two, this should not be confused with the Android framework's Camera Service, which I refer to in capital letters).
The camera service code can be found in directory /external/qemu/android/camera. The camera service is initialized from main() in /external/qemu/vl-android.c. The main() function is interesting if you want to understand the emulator boot process, but be warned that this function is composed of 2000 lines of code! In any case, main() invokes android_camera_service_init which is part of the narrow public interface of the camera service. The initialization control flow is summarized below:

android_camera_service_init
==> _camera_service_init
==> enumerate_camera_devices
==> qemud_service_register(SERVICE_NAME == "camera", _camera_service_connect)

Function _camera_service_init uses a structure of type AndroidHwConfig (defined in /external/qemu/android/avd/hw-config.h). This structure contains "the hardware configuration for this specific virtual device", and more specifically it contains a description of the camera type (webcam, emulated, none) connected to the back and the front of the device. This structure is basically a reflection of the AVD configuration file ($HOME/.android/avd/.avd/hardware-qemu.ini, on Linux) or the AVD description parameters passed to the emulator through the command line parameters.
Function enumerate_camera_devices performs a basic discovery and interrogation of the camera devices on the host machine. There is an implementation for Linux host machines (camera-capture-linux.c), for Windows host machines (camera-capture-windows.c), and for MAC host machines (camera-capture-mac.m). In fact, all of the low-level camera access code is segregated in these threee files. The Linux code uses the V4L2 API, of course and its enumerate_camera_devices implementation opens a video device and enumerates available frame pixel formats, (skipping compressed formats) looking for a match to the requested formats.
Finally, function qemud_service_register registers the camera service with the hw_qemud (see previous post) under the service name "camera" and passes a callback which hw_qemud should invoke when camera service clients attempt to connect to the service.
Examining function _camera_service_connect reveals that the camera service supports two types of clients:

An emulated camera factory client; and
An emulated camera client

And this brings us almost full circle: class EmulatedCameraFactory (discussed in the previous post) uses an emulated camera factory client (of class FactoryQemuClient) and class EmulatedQemuCameraDevice uses an emulated camera client (of class CameraQemuClient).

And now we can tie some loose ends from the previous post and take a deeper look at the control flow of loading the emulated Camera HAL module and creating an emulated camera device. This is a good opportunity to remind ourselves that these posts only examined the emulated (web) cameras, not the emulated "fake" cameras and so there are a couple of shortcuts that I took in the flows below to prevent further confusion.

Invoked when camera.goldfish.so is loaded to memory and gEmulatedCameraFactory is instantiated:

EmulatedCameraFactory::EmulatedCameraFactory
==> FactoryQemuClient::connectClient
==> qemu_pipe_open("qemud:camera")
==> EmulatedCameraFactory::createQemuCameras
==> FactoryQemuClient::listCameras
for each camera:
==> create EmulatedQemuCamera
==> EmulatedQemuCamera::Initialize
==> EmulatedQemuCameraDevice::Initialize
==> CameraQemuClient::connectClient
==> qemu_pipe_open("qemud:camera:??")
==> EmulatedCameraDevice::Initialize()

Invoked when the emulated HAL module is asked to open a camera HAL device:

hw_module_methods_t.open = EmulatedCameraFactory::device_open
==> EmulatedCameraFactory::cameraDeviceOpen
==> EmulatedCamera::connectCamera
==> EmulatedQemuCameraDevice::connectDevice
==> CameraQemuClient::queryConnect
==> QemuClient::doQuery
==> QemuClient::sendMessage
==> qemud_fd_write

Once the camera devices are open and the communication path between the HAL camera device and the emulated web camera is open, communication continues to be facilitated via the CameraQemuClient using the query mechanism that we saw in the call flow above. The query itself is a string composed of a query-identification-string (to identify what we are asking for: connect, disconnect, start, stop, frame) and a list of name-value strings (which depend on the query type). This string is then written to the /dev/qemu device, and from there it makes its way through the goldfish_pipe, then to the hw_qemud service, and finally to the camera service. There the query is parsed, acted upon (e.g. on a Linux host V4L2 commands are sent to the Host kernel to drive the USB webcam connected to the host), and a reply is sent. The sender unblocks from the /dev/qemu read operation and completes its work.

Reference code:

device/generic/goldfish/ - Goldfish device
device/generic/goldfish/Camera/ - Goldfish camera
hardware/libhardware/include/hardware/qemu_pipe.h
linux/kernel/drivers/platform/goldfish/goldfish_pipe.c
external/qemu/hw/goldfish_pipe.c
external/qemu/android/hw-qemud.c
external/qemu/android/camera/
external/qemu/android/camera/camera-service.c
external/qemu/docs/ANDROID-QEMU-PIPE.TXT
hardware/libhardware/hardware.c

Thursday, February 27, 2014

Android, QEMU and the Camera - Emulating the Camera Hardware in Android (Part II)

In my previous post I reviewed the "administrative" background related to camera emulation on Android.
Now let's trace the code backwards, from the point of loading the Camera HAL module and until we open an android.hardware.Camera instance in the application. The diagrams below shows the top-down control flow of loading a camera HALv1 module and initializing a camera HALv1 device. But this is all fairly standard and we are interested in getting some insight into the particulars of the emulated camera, so let's I start at the end: the loading of the HAL module.

The generic Android emulated device created by the AVD tool is called goldfish and, following the HAL naming convention, the dynamically linked (shared object) library containing the goldfish camera HAL is located in /system/lib/hw/camera.goldfish.so. You can 'adb shell' into an emulated device instance and search for camera.goldfish.so in the device file system just like you would do on a real device:

nzmora@~/Dev/aosp:$ adb -s emulator-5554 shell
root@generic_x86:/ # ls -la /system/lib/hw/
-rw-r--r-- root root 9464 2014-03-08 16:20 audio.primary.default.so
-rw-r--r-- root root 13560 2014-03-08 16:20 audio.primary.goldfish.so
-rw-r--r-- root root 144868 2014-03-08 16:21 audio_policy.default.so
-rw-r--r-- root root 2309860 2014-03-08 16:21 bluetooth.default.so
-rw-r--r-- root root 5204 2014-03-08 16:23 camera.goldfish.jpeg.so
-rw-r--r-- root root 288668 2014-03-08 16:22 camera.goldfish.so
-rw-r--r-- root root 13652 2014-03-08 16:21 gps.goldfish.so
-rw-r--r-- root root 13872 2014-03-08 16:20 gralloc.default.so
-rw-r--r-- root root 21836 2014-03-08 16:21 gralloc.goldfish.so
-rw-r--r-- root root 5360 2014-03-08 16:21 keystore.default.so
-rw-r--r-- root root 9456 2014-03-08 16:20 lights.goldfish.so
-rw-r--r-- root root 5364 2014-03-08 16:20 local_time.default.so
-rw-r--r-- root root 5412 2014-03-08 16:17 power.default.so
-rw-r--r-- root root 13660 2014-03-08 16:20 sensors.goldfish.so
-rw-r--r-- root root 5360 2014-03-08 16:17 vibrator.default.so

-rw-r--r-- root root 5364 2014-03-08 16:21 vibrator.goldfish.so

The code itself is found in the Android source tree, at /device/generic/goldfish/camera so let's turn our attention there.

When approaching new code, to get a quick high-level understanding of what's what, I like to start with the makefile, Android.mk, and give it a quick scan: examining the input files, flags and output files. The Android makefile format is fairly self describing and easier to grasp than "true" make files because it hides most of the gory details. In any case, in this particular makefile the LOCAL_SOURCE_FILES variable (which contains the list of files to compile) is listed in a sort of hierarchy - and this gives us a first clue as to how the source files relate to one another.

LOCAL_SRC_FILES := \
EmulatedCameraHal.cpp \
EmulatedCameraFactory.cpp \
EmulatedCameraHotplugThread.cpp \
EmulatedBaseCamera.cpp \
EmulatedCamera.cpp \
EmulatedCameraDevice.cpp \
EmulatedQemuCamera.cpp \
EmulatedQemuCameraDevice.cpp \
EmulatedFakeCamera.cpp \
EmulatedFakeCameraDevice.cpp \
Converters.cpp \
PreviewWindow.cpp \
CallbackNotifier.cpp \
QemuClient.cpp \
JpegCompressor.cpp \
EmulatedCamera2.cpp \
EmulatedFakeCamera2.cpp \
EmulatedQemuCamera2.cpp \
fake-pipeline2/Scene.cpp \
fake-pipeline2/Sensor.cpp \
fake-pipeline2/JpegCompressor.cpp \
EmulatedCamera3.cpp \
EmulatedFakeCamera3.cpp

In the first file, EmulatedCameraHal.cpp, I find the definition of the HAL module structure: HAL_MODULE_INFO_SYM. This is the symbol that /hardware/libhardware/hardware.c loads when CameraService is first referenced (see flow diagram above), and therefore it is the entry way into the HAL.

camera_module_t HAL_MODULE_INFO_SYM = {
common: {
tag: HARDWARE_MODULE_TAG,
module_api_version: CAMERA_MODULE_API_VERSION_2_1,
hal_api_version: HARDWARE_HAL_API_VERSION,
id: CAMERA_HARDWARE_MODULE_ID,
name: "Emulated Camera Module",
author: "The Android Open Source Project",
methods: &android::EmulatedCameraFactory::mCameraModuleMethods,
dso: NULL,
reserved: {0},
},
get_number_of_cameras: android::EmulatedCameraFactory::get_number_of_cameras,
get_camera_info: android::EmulatedCameraFactory::get_camera_info,
set_callbacks: android::EmulatedCameraFactory::set_callbacks,
};

The first thing that we learn is that this is a HAL module v2.1. There are 4 function pointers listed in this structure and cgrep'ing these functions leads us to file EmulatedCameraFactory.cpp, where these functions are defined. We quickly learn from the code documentation in EmulatedCameraFactory.cpp that "A global instance of EmulatedCameraFactory is statically instantiated and initialized when camera emulation HAL is loaded".
When the CameraService invokes camera_module_t::get_camera_info, it actually performs a call to gEmulatedCameraFactory.getCameraInfo. In other words, the three function pointers in camera_module_t just forward the work to gEmulatedCameraFactory (the global singleton factory instance I mentioned above):

int EmulatedCameraFactory::get_camera_info(int camera_id, struct camera_info* info)
{
return gEmulatedCameraFactory.getCameraInfo(camera_id, info);
}

Let's refocus our attention at where the action is: the constructor of EmulatedCameraFactory. The first thing the EmulatedCameraFactory constructor (device/generic/goldfish/camera/EmulatedCameraFactory.cpp) does is connect to the camera service in the Android emulator. Please notice that this is not the CameraService of the Android framework! This is a very important distinction.
I will describe the emulator's camera service in the third post in this series.

The code documentation does a very good job at explaining the EmulatedCameraFactory class responsibility:

/* Class EmulatedCameraFactory - Manages cameras available for the emulation.
*
* When the global static instance of this class is created on the module load,
* it enumerates cameras available for the emulation by connecting to the
* emulator's 'camera' service. For every camera found out there it creates an
* instance of an appropriate class, and stores it an in array of emulated
* cameras. In addition to the cameras reported by the emulator, a fake camera
* emulator is always created, so there is always at least one camera that is
* available.
*
* Instance of this class is also used as the entry point for the camera HAL API,
* including:
* - hw_module_methods_t::open entry point
* - camera_module_t::get_number_of_cameras entry point
* - camera_module_t::get_camera_info entry point
*
*/

Usually, when I'm trying to quickly get familiar with new code I either draw for myself some call flows, or I write them down. This helps me understand the code, and this is also a quick way to re-familiarize myself with the code if I put it away for a prolonged time and then need to reference it. In the case of the EmulatedCameraFactory constructor I used another technique, which I usually use less often. It is a stripped-down syntax-incomplete version of the code. This technique is useful when there's a method like the EmulatedCameraFactory constructor which packs a lot of action. This particular code is self-explanatory, except for the call to mQemuClient.connectClient, but I'll return to that later - for now I choose to do a breadth-wise scanning of the code.

EmulatedCameraFactory::EmulatedCameraFactory()
{
/* Connect to the factory service in the emulator, and create Qemu cameras. */
if (mQemuClient.connectClient(NULL) == NO_ERROR) {
/* Connection has succeeded. Create emulated cameras for each camera
* device, reported by the service. */
createQemuCameras();
}

if (isBackFakeCameraEmulationOn()) {
switch (getBackCameraHalVersion()):
1: new EmulatedFakeCamera(camera_id, true, &HAL_MODULE_INFO_SYM.common);
2: new EmulatedFakeCamera2(camera_id, true, &HAL_MODULE_INFO_SYM.common);
3: new EmulatedFakeCamera3(camera_id, true, &HAL_MODULE_INFO_SYM.common);
mEmulatedCameras[camera_id]->Initialize()
}
if (isFrontFakeCameraEmulationOn()) {
switch (getBackCameraHalVersion()):
1: new EmulatedFakeCamera(camera_id, true, &HAL_MODULE_INFO_SYM.common);
2: new EmulatedFakeCamera2(camera_id, true, &HAL_MODULE_INFO_SYM.common);
3: new EmulatedFakeCamera3(camera_id, true, &HAL_MODULE_INFO_SYM.common);
mEmulatedCameras[camera_id]->Initialize()
}

mHotplugThread = new EmulatedCameraHotplugThread(&cameraIdVector[0], mEmulatedCameraNum);
mHotplugThread->run();
}

When you review the pseudo-code above, remember that FakeCamera refers to a camera device fully emulated in SW, while QemuCamera refers to a real web-camera that is wrapped by the emulation code.

After I got a bit of understanding of the initialization dynamics, I turn to study the structure of rest of the classes. When there are many classes involved (46 files in this case), I find that a class diagram of the overall structure can help me identify the more important classes (I have no intention of going over the code of all these classes). I extract the class hierarchy structure by scanning the .h files looking for class relationships and key value members. Usually I hand sketch some UML on paper - this doesn't have to be complete since I am just trying to get a quick grasp of things.

There are a few structural points to note:

The class structure is similar to the structure of the class listing in the makefile so whoever wrote the makefile was nice enough to be professional all the way (adding pseudo documentation to the makefile, if you will).
EmulatedFakeCamera types are classes that actually simulate the camera frames and behavior, along with a simulated sensor and processing pipeline. Their implementation is interesting in and by itself and I'll return to this in a different post.
EmulatedQemuCamera types are a gateway to actual cameras connected to the host - i.e. webcams connected to your workstation or built into your laptop. I visually differentiate between the Qemu and Fake cameras by giving them different colors.
There are Camera types and CameraDevice types. CameraDevice types are more important as they contain more code.
The EmulatedCameraFactory represents the camera HAL module and contains handles to EmulatedCameras.
There are two classes which abstract the connection to the QEMU emulator. You can see that the EmulatedQemuCameraDevice holds a reference to CameraQemuClient and clearly this is required for communicating with the webcam on the emulator (more on this later). There are three related classes: EmulatedCamera1, EmulatedCamera2, EmulatedCamera3 which represent cameras that are exposed through HALv1, HALv2, and HALv3, respectively. Obviously HALv2 is of no significance by now because Android does not support it. HALv3 does not exist for the webcam, most likely because the new HALv3 does not add any new features to a simple point & shoot webcam.

Now back to the dynamic view (my discovery process is a back & forth dance to discover new components and interactions) - when an application calls android.hardware.Camera.open(cameraId) this call is propagated through the code layers and ends with a call to camera_module_t::methods.open(cameraId) which is actually a call to EmulatedCameraFactory::device_open. You can trace this flow in the first diagram in this blog post.

EmulatedCameraFactory::device_open
==> gEmulatedCameraFactory.cameraDeviceOpen
==> mEmulatedCameras[camera_id]->connectCamera(hw_device_t** device)

There's a whole lot of interesting stuff going on in the emulation code, especially in the emulated fake camera code, but in this blog post I want to look at the emulated QEMU camera code (EmulatedQemuCamera) and the communication with QEMU.

So, on to QemuClient. This class "encapsulates a connection to the 'camera' service in the emulator via qemu pipe". The pipe connection is established by invoking qemu_pipe_open(pipename) which is implemented in /hardware/libhardware/include/hardware/qemu_pipe.h: First a device of type /dev/qemu_pipe is opened, and then the concatenation of the strings "pipe:" and pipename is written to the device. In the kernel, we find the other side of this pipe (i.e. the qemu_pipe device driver) in /drivers/platform/goldfish/goldfish_pipe.c. The header of the driver does an excellent job of describing this driver, so I bring it forth without much ado:

/* This source file contains the implementation of a special device driver
* that intends to provide a *very* fast communication channel between the
* guest system and the QEMU emulator.
*
* Usage from the guest is simply the following (error handling simplified):
*
* int fd = open("/dev/qemu_pipe",O_RDWR);
* .... write() or read() through the pipe.
*
* This driver doesn't deal with the exact protocol used during the session.
* It is intended to be as simple as something like:
*
* // do this _just_ after opening the fd to connect to a specific
* // emulator service.
* const char* msg = "";
* if (write(fd, msg, strlen(msg)+1) < 0) {
* ... could not connect to service
* close(fd);
* }
*
* // after this, simply read() and write() to communicate with the
* // service. Exact protocol details left as an exercise to the reader.
*
* This driver is very fast because it doesn't copy any data through
* intermediate buffers, since the emulator is capable of translating
* guest user addresses into host ones.
*
* Note that we must however ensure that each user page involved in the
* exchange is properly mapped during a transfer.
*/

QEMU pipes are further described in external/qemu/docs/ANDROID-QEMU-PIPE.TXT.

QemuClient::sendMessage and QemuClient::receiveMessage are wrappers for the pipe operations qemud_fd_write and qemud_fd_read, respectively. QemuClient::doQuery is slightly more involved and I'll get to it in the third and final blog post in this series.

To recap, so far I've shown that the camera HAL of the emulated goldfish device contains classes that abstract an emulated QEMU camera (EmulatedQemuCameraDevice) which holds a reference to an instance of CameraQemuClient which it uses to communicate with a device named /dev/qemu_pipe. This character device represents a virtual device with (emulated) MMIO register space and IRQ line, and it belongs to the emulated goldfish platform. On the "other side" of the pipe device is the QEMU emulator and more specifically the goldfish pipe which is implemented in /external/qemu/hw/goldfish_pipe.c
You can think of this pipe as the conduit for communication between the Android Guest kernel and the emulator code. In the Android QEMU codebase, file /external/qemu/android/hw-qemud.c implements a sort of bridge between various Android QEMU services and the goldfish pipe device. One of these Android QEMU services is the camera-service that I briefly mentioned earlier. This camera-service is the topic of the third blog post. I'll wrap up with a diagram showing the relationships between the various components.