Saturday, June 7, 2014

Google's Depth Map


In my previous post I reported on Android's (presumed) new camera Java API and I briefly mentioned that its purpose is to provide the application developer more control over the camera, therefore allowing innovation in the camera application space. Google's recent updates to the stock Android camera application includes a feature called Lens Blur, which I suspect uses the new camera API to capture the series of frames required for the depth-map calculation (I am pretty sure that Lens Blur is only available on Nexus phones, BTW). In this post I want to examine the image files generated by Lens Blur.

Google uses XMP extended JPEG for storing Lens Blur picture files. The beauty of XMP is that arbitrary metadata can be added to a file without causing any problems for existing image viewing applications. Google's XMP's based depth-map storage format is described by Google on their developer pages but not all metadata fields are actually used by Lens Blur; and not all metadata used by Lens Blur are described on the developer pages. To look closer at this depth XMP format, you can copy a Len Blur image (JPEG) from your Android phone to your PC and open the file using a text editor. You should see the XMP metadata similar to the pasted data below:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GFocus="http://ns.google.com/photos/1.0/focus/"
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        xmlns:xmpNote="http://ns.adobe.com/xmp/note/"
      GFocus:BlurAtInfinity="0.0083850715"
      GFocus:FocalDistance="18.49026"
      GFocus:FocalPointX="0.5078125"
      GFocus:FocalPointY="0.30208334"
      GImage:Mime="image/jpeg"
      GDepth:Format="RangeInverse"
      GDepth:Near="11.851094245910645"
      GDepth:Far="51.39698028564453"
      GDepth:Mime="image/png"
      xmpNote:HasExtendedXMP="7CAF4BA13EEBAC578997926C2A696679"/>
  </rdf:RDF>
</x:xmpmeta>


<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        GImage:Data="/9j/4AAQSkZJRQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBw...."
         GDepth:Data="iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uh......"
  </rdf:RDF>
</x:xmpmeta>

Two fields, GImage:Data and GDepth:Data are particularly interesting. The former stores the original image which I suppose is one of the series of images captured by the application. The latter stores the depth map as described by Google and as annotated by the metadata in the first RDF structure. The binary JPEG data that follows is the image that is actually displayed by the viewer and it is not necessarily the same picture that is stored in GImage:Data because this may be the product of a Lens Blur transformation. Storing the original picture data, with the depth-map and the "blurred" image takes a lot of room, but gives you the freedom to continuously alter the same picture. It is quite a nice feature.

Figure 1: Lens Blur output
Figure 2: image data stored in GImage:Data.
Notice that the background is very sharp compared to figure 1.

Figure 3: Depth map as extracted from GDepth:Data
GImage:Data and GDepth:Data are XML text fields so they must be encoded textually somehow, and Google chose to use Base64 for the encoding. When I decoded these fields I found that the image data (GImage:Data) stores a JPEG image, and the depth-map (GDepth:Data) is stored in a PNG image.

The following code extracts GImage:Data and GDepth:Data into two separate files (having JPEG and PNG formats, respectively). It starts by opening the Lens Blur file and searching for either GDepth:Data= or GImage:Data=. It then proceeds to decode the Base 64 data and spits out the decoded data into new files. It is quite straight forward except for a small caveat: interspersed within the GDepth:Data and GImage:Data is some junk that Google inserted in the form of a name-space URL descriptor (http://ns.adobe.com/xmp/extension/), a hash value, and some binary valued-bytes. I remove these simply by skipping 79 bytes once I detect a 0xFF byte.

// Naive O(n) string matcher
// It is naive because it always moves the "cursor" forward - even when a match fails.
// This is a correct assumption that we can make in the context of this program.
bool match(std::ifstream &image, const std::string &to_match) {
    size_t matched = 0;
    while (!image.eof()) {
        char c;
        image.get(c);
        if (image.bad())
            return false;
        if (c == to_match[matched]) {
            matched++;
            if (matched==to_match.size())
                return true;
        }
        else {
            matched = 0;
        }
    }
    return false;
}

class Base64Decoder {
public:
    Base64Decoder() : base64_idx(0) {}
    bool add(char c);
    size_t decode(char binary[3]);
private:
    static int32_t decode(char c);
    char base64[4];
    size_t base64_idx;
};


bool Base64Decoder::add(char c) {
    int32_t val = decode(c);
    if (val < 0)
        return false;

    base64[base64_idx % 4] = c;
    base64_idx = ++base64_idx % 4;
    if (base64_idx % 4 == 0) {
        return true;
    }
    return false;
}

inline
size_t Base64Decoder::decode(char binary[3]) {
    if (base64[3] == '=')  {
        if (base64[3] == '=') {
            int32_t tmp = decode(base64[0]) << 18;
                         
            binary[2] = binary[1] = 0;
            binary[0] = (tmp>>16) & 0xff;
            return 1;
        } else {
            int32_t tmp = decode(base64[0]) << 18 |
                          decode(base64[1]) << 12;
                         
            binary[2] = 0;
            binary[1] = (tmp>>8) & 0xff;
            binary[0] = (tmp>>16) & 0xff;
            return 2;
        }
    }

    int32_t tmp = decode(base64[0]) << 18 |
                  decode(base64[1]) << 12 |
                  decode(base64[2]) << 6  |
                  decode(base64[3]);

    binary[2] = (tmp & 0xff);
    binary[1] = (tmp>>8) & 0xff;
    binary[0] = (tmp>>16) & 0xff;
    return 3;
}


// Decoding can be alternatively performed by a lookup table
inline
int32_t Base64Decoder::decode(char c) {
    if (c>= 'A' && c<='Z')
        return (c-'A');
    if (c>='a' && c<='z')
        return (26+c-'a');
    if (c>='0' && c<='9')
        return (52+c-'0');
    if (c=='+')
        return 62;
    if (c=='/')
        return 63;
     
    return -1;
}

bool decode_and_save(char *buf, size_t buflen, Base64Decoder &decoder, std::ofstream &depth_map) {
    size_t i = 0;
    while (i < buflen) {
         // end of depth data
        if (buf[i] == '\"')
            return true;

        if (buf[i] == (char)0xff) {
            // this is Google junk which we need to skip
            i += 79; // this is the length of the junk
            assert(i        }

        if (decoder.add(buf[i])) {
            char binary[3];
            size_t bin_len = decoder.decode(binary);
            depth_map.write(binary, bin_len);
        }
        i++;
    }
    return false;
}

void extract_depth_map(const std::string &infile, const std::string &outfile, bool extract_depth) {
    std::ifstream blur_image;
    blur_image.open (infile, std::ios::binary | std::ios::in);
    if (!blur_image.is_open()) {
        std::cout << "oops - file " << infile << " did not open" << std::endl;
        return;
    }

    bool b = false;
    if (extract_depth)
        b = match(blur_image, "GDepth:Data=\"");
    else
        b = match(blur_image, "GImage:Data=\"");
    if (!b) {
        std::cout << "oops - file " << infile << " does not contain depth/image info" << std::endl;
        return;
    }

    std::ofstream depth_map;
    depth_map.open (outfile, std::ios::binary | std::ios::out);
    if (!depth_map.is_open()) {
        std::cout << "oops - file " << outfile << " did not open" << std::endl;
        return;
    }
   
    // Consume the data, decode from base64, and write out to file.
    char buf[10 * 1024];
    bool done = false;
    Base64Decoder decoder;
    while (!blur_image.eof() && !done) {
        blur_image.read(buf, sizeof(buf));
        done = decode_and_save(buf, sizeof(buf), decoder, depth_map);
    }

    blur_image.close();
    depth_map.close();
}

void main() {
    const std::string wdir(""); // put here the path to your files
    const std::string infile(wdir + "gimage_original.jpg");
    const std::string imagefile(wdir + "gimage_image.jpg");
    const std::string depthfile(wdir + "gimage_depth.png");

   extract_depth_map(infile, depthfile, true);
   extract_depth_map(infile, imagefile, false);
}

If you want to use the depth-map and image data algorithmically (e.g. to generate your own blurred image), don't forget to decompress the JPEG and PNG files, otherwise you will be accessing compressed pixel data. I used InfranView to generate raw RBG files, which I then manipulated and converted back to BMP files. I didn't include this code because it is not particularly interesting. Some other time I might describe how to use Halide ("a language for image processing and computational photography") to process the depth-map to create new images.

Sunday, June 1, 2014

Android's Hidden (and Future) Camera APIs

With the release of the KitKat AOSP source code Google also exposed its plans for a new API for the camera -  package android.hardware.camera2.  The new interfaces and classes are marked as @hide and sit quietly in /frameworks/base/core/java/android/hardware/camera2The @hide attribute excludes these classes from the automatic documentation generation and from the SDK.  The code is hidden because the API is not final and committed to by Google, but it is most likely to be quite similar in semantics, if not syntactically equivalent.  Anyone who's been watching the camera HAL changes in the last few Android releases will tell you that this new API is expected to become official anytime now - most likely in the L Android release.
The new API is inspired by the FCAM project and aims to give the application developer precise and flexible control over all aspects of the camera.  The old point & shoot paradigm which limits the camera to three dominating uses cases (preview, video, stills) is replaced by an API that abstracts the camera as a black box that produces streams of image frames in different formats and resolutions.  The camera "black box" is configured and controlled via an abstract canonical model of the camera processing pipeline controls and properties.  To understand the philosophy, motivation and details of this API, I think it is important to review the camera HAL (v3) documentation and to read the FCAM papers.  

If you're an Android camera application developer, then you would be wise to study the new API.  

Figure 1: the android.hardware.camera2 package

Besides studying the android.hardware.camera2 package, I looked for test and sample code in the AOSP to see how the API is used.
  • ./frameworks/base/tests/Camera2Tests/SmartCamera/SimpleCamera/src/androidx/media/filterfw/samples/simplecamera/Camera2Source.java
  • ./cts/tests/tests/hardware/src/android/hardware/camera2/cts/
    • CameraCharacteristicsTest.java
    • CameraDeviceTest.java
    • CameraManagerTest.java
    • CameraCaptureResultTest.java
    • ImageReaderTest.java

The diagrams below depict a single simple use-case of setting up camera preview and issuing a few JPEG stills captures requests.  They are mostly self explanatory so I've included only a little text to describe them. If this is insufficient you can write your questions in the comments section below.



The process of discovering the cameras attached to the mobile device is depicted above.  Note the AvailabilityListener which monitors the dynamic connection and disconnection of cameras to the device.  I think it also monitors the use of camera objects by other applications.  Both features do not exist in the current (soon to be "legacy") API.


Figure 3: Preparing the surfaces

Before doing anything with the camera, you need to configure the output surfaces - that's where the camera will render the image frames.  Note the use of an ImageReader to obtain a Surface object to buffer JPEG formatted images.

Figure 4: Preview request

Preview is created by generating a repeating-request with a SurfaceView as the frame consumer.

Figure 5: Still capture request

Finally, when the user presses on the shutter button, the application issues a single capture request for a JPEG surface.  Buffers are held by the ImageReader until the application acquires them individually.

Summary
That was a very brief introduction to the android.hardware.camera2 package which I think will be officially included in the L Android release. I'm sure the legacy API will continue existing for a long time in order to support current camera applications.  However, you should consider learning the new API for the greater (and finer) camera control it provides.