Saturday, June 7, 2014

Google's Depth Map


In my previous post I reported on Android's (presumed) new camera Java API and I briefly mentioned that its purpose is to provide the application developer more control over the camera, therefore allowing innovation in the camera application space. Google's recent updates to the stock Android camera application includes a feature called Lens Blur, which I suspect uses the new camera API to capture the series of frames required for the depth-map calculation (I am pretty sure that Lens Blur is only available on Nexus phones, BTW). In this post I want to examine the image files generated by Lens Blur.

Google uses XMP extended JPEG for storing Lens Blur picture files. The beauty of XMP is that arbitrary metadata can be added to a file without causing any problems for existing image viewing applications. Google's XMP's based depth-map storage format is described by Google on their developer pages but not all metadata fields are actually used by Lens Blur; and not all metadata used by Lens Blur are described on the developer pages. To look closer at this depth XMP format, you can copy a Len Blur image (JPEG) from your Android phone to your PC and open the file using a text editor. You should see the XMP metadata similar to the pasted data below:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GFocus="http://ns.google.com/photos/1.0/focus/"
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        xmlns:xmpNote="http://ns.adobe.com/xmp/note/"
      GFocus:BlurAtInfinity="0.0083850715"
      GFocus:FocalDistance="18.49026"
      GFocus:FocalPointX="0.5078125"
      GFocus:FocalPointY="0.30208334"
      GImage:Mime="image/jpeg"
      GDepth:Format="RangeInverse"
      GDepth:Near="11.851094245910645"
      GDepth:Far="51.39698028564453"
      GDepth:Mime="image/png"
      xmpNote:HasExtendedXMP="7CAF4BA13EEBAC578997926C2A696679"/>
  </rdf:RDF>
</x:xmpmeta>


<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
        xmlns:GImage="http://ns.google.com/photos/1.0/image/"
        xmlns:GDepth="http://ns.google.com/photos/1.0/depthmap/"
        GImage:Data="/9j/4AAQSkZJRQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBw...."
         GDepth:Data="iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uh......"
  </rdf:RDF>
</x:xmpmeta>

Two fields, GImage:Data and GDepth:Data are particularly interesting. The former stores the original image which I suppose is one of the series of images captured by the application. The latter stores the depth map as described by Google and as annotated by the metadata in the first RDF structure. The binary JPEG data that follows is the image that is actually displayed by the viewer and it is not necessarily the same picture that is stored in GImage:Data because this may be the product of a Lens Blur transformation. Storing the original picture data, with the depth-map and the "blurred" image takes a lot of room, but gives you the freedom to continuously alter the same picture. It is quite a nice feature.

Figure 1: Lens Blur output
Figure 2: image data stored in GImage:Data.
Notice that the background is very sharp compared to figure 1.

Figure 3: Depth map as extracted from GDepth:Data
GImage:Data and GDepth:Data are XML text fields so they must be encoded textually somehow, and Google chose to use Base64 for the encoding. When I decoded these fields I found that the image data (GImage:Data) stores a JPEG image, and the depth-map (GDepth:Data) is stored in a PNG image.

The following code extracts GImage:Data and GDepth:Data into two separate files (having JPEG and PNG formats, respectively). It starts by opening the Lens Blur file and searching for either GDepth:Data= or GImage:Data=. It then proceeds to decode the Base 64 data and spits out the decoded data into new files. It is quite straight forward except for a small caveat: interspersed within the GDepth:Data and GImage:Data is some junk that Google inserted in the form of a name-space URL descriptor (http://ns.adobe.com/xmp/extension/), a hash value, and some binary valued-bytes. I remove these simply by skipping 79 bytes once I detect a 0xFF byte.

// Naive O(n) string matcher
// It is naive because it always moves the "cursor" forward - even when a match fails.
// This is a correct assumption that we can make in the context of this program.
bool match(std::ifstream &image, const std::string &to_match) {
    size_t matched = 0;
    while (!image.eof()) {
        char c;
        image.get(c);
        if (image.bad())
            return false;
        if (c == to_match[matched]) {
            matched++;
            if (matched==to_match.size())
                return true;
        }
        else {
            matched = 0;
        }
    }
    return false;
}

class Base64Decoder {
public:
    Base64Decoder() : base64_idx(0) {}
    bool add(char c);
    size_t decode(char binary[3]);
private:
    static int32_t decode(char c);
    char base64[4];
    size_t base64_idx;
};


bool Base64Decoder::add(char c) {
    int32_t val = decode(c);
    if (val < 0)
        return false;

    base64[base64_idx % 4] = c;
    base64_idx = ++base64_idx % 4;
    if (base64_idx % 4 == 0) {
        return true;
    }
    return false;
}

inline
size_t Base64Decoder::decode(char binary[3]) {
    if (base64[3] == '=')  {
        if (base64[3] == '=') {
            int32_t tmp = decode(base64[0]) << 18;
                         
            binary[2] = binary[1] = 0;
            binary[0] = (tmp>>16) & 0xff;
            return 1;
        } else {
            int32_t tmp = decode(base64[0]) << 18 |
                          decode(base64[1]) << 12;
                         
            binary[2] = 0;
            binary[1] = (tmp>>8) & 0xff;
            binary[0] = (tmp>>16) & 0xff;
            return 2;
        }
    }

    int32_t tmp = decode(base64[0]) << 18 |
                  decode(base64[1]) << 12 |
                  decode(base64[2]) << 6  |
                  decode(base64[3]);

    binary[2] = (tmp & 0xff);
    binary[1] = (tmp>>8) & 0xff;
    binary[0] = (tmp>>16) & 0xff;
    return 3;
}


// Decoding can be alternatively performed by a lookup table
inline
int32_t Base64Decoder::decode(char c) {
    if (c>= 'A' && c<='Z')
        return (c-'A');
    if (c>='a' && c<='z')
        return (26+c-'a');
    if (c>='0' && c<='9')
        return (52+c-'0');
    if (c=='+')
        return 62;
    if (c=='/')
        return 63;
     
    return -1;
}

bool decode_and_save(char *buf, size_t buflen, Base64Decoder &decoder, std::ofstream &depth_map) {
    size_t i = 0;
    while (i < buflen) {
         // end of depth data
        if (buf[i] == '\"')
            return true;

        if (buf[i] == (char)0xff) {
            // this is Google junk which we need to skip
            i += 79; // this is the length of the junk
            assert(i        }

        if (decoder.add(buf[i])) {
            char binary[3];
            size_t bin_len = decoder.decode(binary);
            depth_map.write(binary, bin_len);
        }
        i++;
    }
    return false;
}

void extract_depth_map(const std::string &infile, const std::string &outfile, bool extract_depth) {
    std::ifstream blur_image;
    blur_image.open (infile, std::ios::binary | std::ios::in);
    if (!blur_image.is_open()) {
        std::cout << "oops - file " << infile << " did not open" << std::endl;
        return;
    }

    bool b = false;
    if (extract_depth)
        b = match(blur_image, "GDepth:Data=\"");
    else
        b = match(blur_image, "GImage:Data=\"");
    if (!b) {
        std::cout << "oops - file " << infile << " does not contain depth/image info" << std::endl;
        return;
    }

    std::ofstream depth_map;
    depth_map.open (outfile, std::ios::binary | std::ios::out);
    if (!depth_map.is_open()) {
        std::cout << "oops - file " << outfile << " did not open" << std::endl;
        return;
    }
   
    // Consume the data, decode from base64, and write out to file.
    char buf[10 * 1024];
    bool done = false;
    Base64Decoder decoder;
    while (!blur_image.eof() && !done) {
        blur_image.read(buf, sizeof(buf));
        done = decode_and_save(buf, sizeof(buf), decoder, depth_map);
    }

    blur_image.close();
    depth_map.close();
}

void main() {
    const std::string wdir(""); // put here the path to your files
    const std::string infile(wdir + "gimage_original.jpg");
    const std::string imagefile(wdir + "gimage_image.jpg");
    const std::string depthfile(wdir + "gimage_depth.png");

   extract_depth_map(infile, depthfile, true);
   extract_depth_map(infile, imagefile, false);
}

If you want to use the depth-map and image data algorithmically (e.g. to generate your own blurred image), don't forget to decompress the JPEG and PNG files, otherwise you will be accessing compressed pixel data. I used InfranView to generate raw RBG files, which I then manipulated and converted back to BMP files. I didn't include this code because it is not particularly interesting. Some other time I might describe how to use Halide ("a language for image processing and computational photography") to process the depth-map to create new images.

1 comment:

  1. good job... about that skipping 79 bytes, starting from 0xff

    1) 0xff, 0xe1 : 2 bytes for jpeg APP1 marker
    2) HI, LO : 2 bytes for this APP1 size = (HI<<8)|LO
    3) A null-terminated signature string of "http://ns.adobe.com/xmp/extension/" : 35 bytes
    4) A 128-bit GUID stored as a 32 bytes ASCII hex string : 32 bytes
    5) The full lenth of the ExtendedXMP serialization as a 32-bit unsigned int : 4 bytes
    6) The offset of this portion as a 32-bit unsigned int : 4 bytes
    2 + 2 + 35 + 32 + 4 + 4 = 79

    Happy coding~

    ReplyDelete