tag:blogger.com,1999:blog-1825493090270529332024-03-13T23:19:08.282+02:00Running down a dreamYeah runnin' down a dream
That never would come to me
Workin' on a mystery, goin' wherever it leads
Runnin' down a dreamnetazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.comBlogger28125tag:blogger.com,1999:blog-182549309027052933.post-87374480683068212692016-08-13T23:59:00.001+03:002017-07-26T17:57:14.832+03:00Confused about Caffe’s Pooling layer input region behavior?<div class="MsoNormal">
Caffe’s formulas for calculating the input region for
Convolution and Pooling layers are, surprisingly, not the same. They are only so slightly different, but this
difference can cause the output sizes of Convolution and Pooling layers to be
different, even if they are both parameterized with the same input size, receptive-field,
padding and stride. This unexpected behavior
seems to <a href="https://github.com/BVLC/caffe/issues/1318">confuse many</a>, and I’m the
latest victim.<o:p></o:p></div>
<div class="MsoNormal">
In this post I’ll explain what’s going on, why, and where
you can see it in the code. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I’m working on a small <a href="https://github.com/netaz/prototxt2csv">set of Python scripts</a> to discover
structure in Caffe networks. I won’t go
into the details, since I’m just starting and my ideas are not mature yet, but
in essence I want to look at the “engineering underbelly” of the network:
memory allocation and movement, number and types of layers, and other such odd
information ;-). The scripts parse a given Caffe network
prototxt file, recreate the abstract network structure in memory (using a DAG),
and then analyze it. <o:p></o:p></div>
<div class="MsoNormal">
I was coding the calculation of the output sizes of Convolution
and Pooling layers, when I noticed that I wasn’t getting the correct values. As one of my test inputs I used the <a href="https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt">original GoogLeNet network</a> and compared the layers' output BLOB sizes I was calculating to those
published in the <a href="http://arxiv.org/abs/1409.4842">GoogLeNet paper</a> - and I was getting the wrong results.Why?<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Given a Pooling or Convolutional filter, receptive-field (F), stride (S) and padding (P) parameters, and an input feature-map (C*W*H),
you can calculate the size of the output feature-map (OFM width) using this formula:<br />
<o:p></o:p></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivz_44wIu71Yhf53_dMLR80Us5k8CKJE2zY7PNYuAz3ojOOFa1Ohcom0PZiO548NTLUuOkmcngdcB_2_Zj7JRjFyVVoge5SDblgyMwzv-yyXdBc6Xv01-tKEM6kEqxTA936nerUL5Vd6se/s1600/formula.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivz_44wIu71Yhf53_dMLR80Us5k8CKJE2zY7PNYuAz3ojOOFa1Ohcom0PZiO548NTLUuOkmcngdcB_2_Zj7JRjFyVVoge5SDblgyMwzv-yyXdBc6Xv01-tKEM6kEqxTA936nerUL5Vd6se/s1600/formula.png" /></a></div>
<div class="MsoNormal">
Note that this assumes that the receptive-field and stride
are square (same value for height and width dimensions) , which is true for the
networks I’m aware of, so this assumption is valid. </div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<o:p>If W != H, you can calculate the OFM height simply by replacing W with H in the above formula.</o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Here’s a toy example to illustrate this: <o:p></o:p></div>
<div class="MsoNormal">
</div>
<ul>
<li>Input W, H (IFM width, height) = 10</li>
<li>F (receptive field size) = 3</li>
<li>S (stride height and width) = 2</li>
<li><span style="color: red;">P (padding height and width) = 1</span></li>
</ul>
<br />
<ul></ul>
<div class="MsoNormal">
</div>
<div>
Which leads to output (OFM width, height) size of (10 - 3 + 2) / 2 + 1 = 5 pixels.</div>
<br />
<div class="MsoNormal">
In the image below, the green and gray input pixels compose the IFM (10x10 pixels), where the green pixels represent the centers of the receptive fields as the filter window slides across and down the IFM. The zero-padded pixels are yellow.</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZt2kNN8-URRE9VYZ5npozLak2PP0TYqst3nFCujleWeG8Kap9BD3t0yTguT2zuYufxAcBa29BdHyOxoTEdQTuLPnN8n_Gt_-FHq_0Y_Fk-KOtaDwhrtO7a-IBQ6yo7zCbo9ZhjOwv_ykY/s1600/valid.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZt2kNN8-URRE9VYZ5npozLak2PP0TYqst3nFCujleWeG8Kap9BD3t0yTguT2zuYufxAcBa29BdHyOxoTEdQTuLPnN8n_Gt_-FHq_0Y_Fk-KOtaDwhrtO7a-IBQ6yo7zCbo9ZhjOwv_ykY/s1600/valid.png" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Andrej Karpathy <a href="http://cs231n.github.io/convolutional-networks/#conv">calls this a “valid” configuration</a> because “the neurons “fit” neatly and
symmetrically across the input.” In other
words, all of the input pixels that we want to pool or convolve can be used, because their receptive field (3x3) fits entirely in the input feature-map.<o:p></o:p></div>
<div class="MsoNormal">
<br />
Now let's contrast this with a different configuration:<o:p></o:p></div>
<ul>
<li>Input W,H = 10</li>
<li>F = 3</li>
<li>S = 2</li>
<li><span style="color: red;">P = 0 (i.e. no padding)</span></li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEX1c_6ry-j-LuobR2aSBw_e1zO4zbuMzQ3qSoa3uGcc_jm5eNN6mUdU7WuqyDLjc6S8lwbXiSSldTkzExJm5FoH4DNS0w9XTF6wqiI53YsrhSjpD20kqlrP_FG_zVSDRP1czzNqcTqhVB/s1600/invalid.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEX1c_6ry-j-LuobR2aSBw_e1zO4zbuMzQ3qSoa3uGcc_jm5eNN6mUdU7WuqyDLjc6S8lwbXiSSldTkzExJm5FoH4DNS0w9XTF6wqiI53YsrhSjpD20kqlrP_FG_zVSDRP1czzNqcTqhVB/s1600/invalid.png" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Notice what happens to the pixels on the right and bottom (painted blue):
we want to use them because they are part of the IFM, but the receptive fields of the bottom and right-most pixels extend beyond the IFM borders - and therefore they can't be used :-(<o:p></o:p></div>
<div class="MsoNormal">
When we plug the parameters in our OFM formula, we find that the size of the output is (10 - 3) / 2 + 1 = 4.5 which is not an integer. Karpathy calls this configuration non-valid and it's clear why after we look at the above image. The configuration leads to a seemingly impossible situation where the "blue" pixels need to participate in the computations, but simply can't...</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This kind of Convolution and Pooling configurations might be non-valid, but they do appear in real networks. For an example, take a look at the first Convolution layer (conv1/7x7_s2) of the <a href="https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt">GooLeNet network</a> I mentioned above. It has this configuration:</div>
<ul>
<li>Input W,H = 224</li>
<li>F = 7</li>
<li>S = 2</li>
<li>P = 3</li>
</ul>
<div class="MsoNormal">
and the size of the output is (224 - 7 + 6) / 2 + 1 = 112.5, which is not an integer and therefore not valid. The correct OFM value, as gleaned from the <a href="http://arxiv.org/abs/1409.4842">GoogLeNet paper</a>, is 112. So the 112.5 result we calculated using the OFM formula has been rounded down (floor operation).</div>
<div class="MsoNormal">
A bit later in this network, the configuration of a Pooling layer (layer pool1/3x3_s2) is:</div>
<ul>
<li>Input W,H = 112</li>
<li>F = 3</li>
<li>S = 2</li>
<li>P = 0</li>
</ul>
<div class="MsoNormal">
and the size of the output is (112 - 3) / 2 + 1 = 55.5. Here again, we see a non-valid configuration. And if you check the <a href="http://arxiv.org/abs/1409.4842">GoogLeNet paper</a>, you'll see that it gives a value of 56 pixels, which is a rounding-up operation (ceiling) of the 55.5 which we calculated.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
By way of a short digression, here's the Caffe configuration for this layer:</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">layer {</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> name: "pool1/3x3_s2"</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> type: "Pooling"</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> bottom: "conv1/7x7_s2"</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> top: "pool1/3x3_s2"</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pooling_param {</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pool: MAX</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> kernel_size: 3</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> stride: 2</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> }</span></div>
<div class="MsoNormal">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<br /></div>
<div>
You might notice that the padding configuration is missing, but it defaults to zero, as we can see in <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto">caffe.proto</a>:</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">message PoolingParameter {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> enum PoolMethod {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> MAX = 0;</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> AVE = 1;</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> STOCHASTIC = 2;</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> }</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> optional PoolMethod pool = 1 [default = MAX]; // The pooling method</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> // Pad, kernel size, and stride are all given as a single value for equal</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> // dimensions in height and width or as Y, X pairs.</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> optional uint32 <span style="background-color: yellow;">pad</span> = 4 [<span style="background-color: yellow;">default = 0</span>]; // The padding size (equal in Y, X)</span></div>
</div>
<div>
<br /></div>
<div class="MsoNormal">
Back to our topic: when we look at the Caffe code for <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cpp">Convolution </a>and <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/pooling_layer.cpp">Pooling</a> we can see that Caffe rounds down for Convolution layers and rounds-up for Pooling layers. But what does this mean?</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
If we turn to the toy example from above, we can see graphically how a non-valid Convolution configuration is treated by Caffe:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT3Uq6f2j6Ygl0Wh4ehJqlMpUji00EnbhHJE01bxdMoRKw8tkW6q8E9-qv51gxm8wTrL_D3EHmtJHhc3sfKCNJxNM6is8zOFY-_ejGYu5VGJhZyKVNK1zNj4e_zcCbQyRtrsWacLj3R9sP/s1600/caffe_conv.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT3Uq6f2j6Ygl0Wh4ehJqlMpUji00EnbhHJE01bxdMoRKw8tkW6q8E9-qv51gxm8wTrL_D3EHmtJHhc3sfKCNJxNM6is8zOFY-_ejGYu5VGJhZyKVNK1zNj4e_zcCbQyRtrsWacLj3R9sP/s1600/caffe_conv.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="text-align: start;">floor((10 - 3) / 2 + 1) = 4</span></div>
<div class="MsoNormal">
<br />
The rounding-down (floor) operation, essentially eliminates a row and a column of pixels from the input. The smaller the IFM, the more impact this rounding decision has.<br />
<br />
Now here's how Caffe treats the same non-valid configuration, but this time for a Pooling layer:</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3H_8O1THEmF-oZzrm3QY-3zOexJioqal3O6pzfj9JXddVDJZ3c2dDkLgehHSFJbD2c743gEjYyc6CaPtaStvyQOles5XX4H5brASrb5Yw035zkPxMOETYwewr0l9deoklr6k-JUR59jxo/s1600/caffe_pooling.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3H_8O1THEmF-oZzrm3QY-3zOexJioqal3O6pzfj9JXddVDJZ3c2dDkLgehHSFJbD2c743gEjYyc6CaPtaStvyQOles5XX4H5brASrb5Yw035zkPxMOETYwewr0l9deoklr6k-JUR59jxo/s1600/caffe_pooling.png" /></a></div>
<div class="MsoNormal" style="text-align: center;">
<span style="text-align: start;">ceiling((10 - 3) / 2 + 1) = 5</span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As the image above shows, we can't round up unless we add zero-padded pixels on the top and left borders of the IFM (or right and bottom borders). Caffe implicitly performs this padding. For Max-Pooling at least, this makes sense: we get to pool some more IFM pixels, and we don't affect the output value because max(x,0) = x.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
By now, I think we've cleared up the confusion about what was happening to my calculations. I think that it helps to be aware of this somewhat odd behavior.<br />
I'm not a Torch user, but according to <a href="https://github.com/BVLC/caffe/pull/3057">this suggestion</a> to "Add parameter for pooling layer to specify <i>ceil</i> or <i>floor</i>" in Caffe, Torch has means to explicitly specify how to handle non-valid configurations (here's the <a href="https://github.com/BVLC/caffe/pull/3057/commits/aed992edc67d8bdf57e3ac836559d55b4a3aecdc">commit code</a> for adding this feature to Caffe). I'm not sure why this hasn't been merged, but maybe this will be added one day and the confusion will end ;-)</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com1tag:blogger.com,1999:blog-182549309027052933.post-21888708447300725242015-11-04T01:23:00.000+02:002015-11-19T20:33:49.825+02:00DMA Buffer SharingThe need to share DMA buffers between drivers and applications is common in multimedia platforms. Android's gralloc and <a href="http://lxr.free-electrons.com/source/drivers/staging/android/ion/ion.c">Ion driver</a> provide this and some other <a href="http://netaz.blogspot.co.il/2015/03/androids-graphics-buffer-management.html">goodies</a>, but Linaro's dmabuf buffer sharing <a href="http://lwn.net/Articles/474819/">driver</a> provides a ligher-weight alternative which is plenty good for many situations. <a href="https://lwn.net/Articles/480055/">Here's </a>a good comparison of Ion and dmabuf.<br />
<br />
I'm a visual person and relate to diagrams more than I do to textual descriptions. I use diagrams to quickly create a mental model of a subject-matter I'm learning, and use text to understand the fine details if I decide to dive in. Towards this end I created the sequence diagrams below, to complement the <a href="https://www.kernel.org/doc/Documentation/dma-buf-sharing.txt">dmabuf documentation</a> and help me follow to interactions between the importer, exporter and application. The kernel documentation is clear and concise, so I'm not adding further explanations, lest I detract more than I add. Hopefully these diagram will help you, too.<br />
<br />
I've used <a href="https://sites.google.com/site/mscgen2393/">MSC Generator</a> to generate the diagrams and I'm providing the source for that as well. Enjoy ;-)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSGvp-sVv_o_l53EV-MVLulu-ZEGC4fDkuh5h6m44RoFgZ-djMZW-wDrESws7L0pkJv3c5iugqUSBcAow7LityXIXPV3uWCERq5Emc-plsPj5PcwvniiUgeCwYyOwkIJj7kY6b2SELVobB/s1600/dmabuf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSGvp-sVv_o_l53EV-MVLulu-ZEGC4fDkuh5h6m44RoFgZ-djMZW-wDrESws7L0pkJv3c5iugqUSBcAow7LityXIXPV3uWCERq5Emc-plsPj5PcwvniiUgeCwYyOwkIJj7kY6b2SELVobB/s640/dmabuf.png" width="568" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">dma-buf operations for device dma only</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQI70aTCYacs5tVNheGKd2r0DaV-wh8vlT1d2-yoQkmjEa-qUnZ5gPaA9vwpbeGHlG-mWPxYtuKW0F5TkyOBNCv49epQeg8Kom3edsShnT5te7Je3my2nS0GSr6lpN3bCZnIvroUa54j5C/s1600/dmabuf.kernel.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQI70aTCYacs5tVNheGKd2r0DaV-wh8vlT1d2-yoQkmjEa-qUnZ5gPaA9vwpbeGHlG-mWPxYtuKW0F5TkyOBNCv49epQeg8Kom3edsShnT5te7Je3my2nS0GSr6lpN3bCZnIvroUa54j5C/s640/dmabuf.kernel.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Kernel cpu access to a dma-buf buffer object</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<script src="https://gist.github.com/netaz/145b0169992ea0ca35d7.js"></script><br />
<a href="https://gist.github.com/145b0169992ea0ca35d7.git">git clone</a><br />
<script src="https://gist.github.com/netaz/d62a5a5b7339e572dd1a.js"></script><a href="https://gist.github.com/d62a5a5b7339e572dd1a.git">git clone</a>netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-23499592804852332722015-10-04T03:32:00.003+03:002015-10-13T22:16:47.145+03:00The great experiment: teaching my son to program (part II)<a href="http://netaz.blogspot.co.il/2015/10/the-great-experiment-teaching-my-son-to.html">Last time</a> I described why I'm trying to teach my son how to program, and my two first failed attempts. The third attempt is going to be a game. <br />
<br />
My experience with the basic command-line project I did my son, has taught me that he is not as averse to text-mode (AKA "no-graphics") as I had previously thought. I also learned that text-mode graphics is an easy and intuitive way to do I/O. It's also quite powerful - just think back about some of the early games <a href="https://www.youtube.com/watch?v=XNRx5hc4gYc">:-)</a>. It is really important for me to show my son that there is no magic in programming. That he can create a games with bare-bones tools. That he understands how everything is done. No physics engines, no sprites, no magic.<br />
<br />
I decided that the next project will be a text-mode tank fighting game: two tanks fighting each other in a 2D terrain. Something like <a href="https://www.youtube.com/watch?v=63Y6UjECAC4">this</a>, only using text-mode graphics.<br />
There are plenty of features we can develop, so this gives a long runway for this project alone. I wrote down some of the features we might develop:<br />
<br />
<ul>
<li>A 2D map which might be larger than the screen</li>
<li>Animation of shooting a tank shell</li>
<li>Animation of moving a tank around the screen</li>
<li>Different types of walls: some impenetrable, some not</li>
<li>Different types of ammunition</li>
<li>Game levels with different maps and different challenges</li>
<li>Animation when moving to the next level</li>
<li>Blast/explosion animation</li>
<li>Colors</li>
<li>Sound</li>
<li>Damage level and replenishment</li>
<li>Ammunition level and replenishment, through picking up "ammunition presents"</li>
<li>Different types of ammunition</li>
<li>Hidden walls, keys and other tricks that add surprises</li>
<li>Multi-player mode</li>
<li>Multi-player mode - over the network</li>
<li>Control from an Android application</li>
<li>Score display</li>
<li>Opening screen</li>
<li>Console interaction</li>
<li>Sprites, threads, collision detection</li>
</ul>
<br />
I can go on and on, but the point is that this is going to be interesting.<br />
After making the list, I shared it with my son. I wanted to get him excited more than I was. Everything is designed to pique his interest: the theme, the game, and even the project name.<br />
I decided to stick to my bottom-up approach. That is, I decided to teach through "doing" instead of teaching him in a structured manner. Instead of teaching about classes, objects, control structures, variables, and the lot, we will just dive in. This is how I learned to program and it was plenty good :-).<br />
<br />
<div>
Each sit-down session is going to be self contained. That is, I don't want to quit in the middle of a feature. The application should always work. For this to work, we need to program in small increments. Another principle is that we both program: I show him how I do something and then ask him to implement something similar. And, as before, keep things lean and bare-bones. For example, if I can refrain from using functions for a few sessions, then I'll prefer to have one long 'main' function than to confuse him with functions for the sake of correctness.</div>
<div>
<br /></div>
<div>
We are using github to manage the source control and you can follow our progress <a href="https://github.com/netaz/textmode-tanks">there</a>.<br />
The first session was around the basic input/output:<br />
<br />
<ul>
<li>We used Console.WriteLine to create a 2D map. To add some excitement, we added a title to the screen and played with the colors of the foreground/background.</li>
<li>The main loop waits for user key presses and interprets them. This introduced some basic control statements ('if', 'while') and their syntax. Again, I introduced these non-nonchalantly: "we need to loop on the key press and this is how we do this. Do you understand why we call this a loop? Can you explain why we loop forever?"</li>
<li>My son understands the Cartesian coordinate system from school, so introducing the Console cursor location attributes was pretty straight-forward.</li>
<li>I first showed him how to handle a right-arrow key and then asked him to write the code for the other three arrow keys. Of course, I sat next to him while he was doing this. Some guidance was still required here and there. </li>
<li>After he finished coding the basic loop, we player with the "tank" a bit and I showed him that nothing stops the tank from going off the map (or even the screen). We discussed how to fix this (boundary checking) and I again added the code to handle this in case of moving to the right, and then asked him to add the code for the other cases.</li>
<li>Finally, we added the option to quit the game. </li>
<li>He told me that today's games display "GG" (good game) when exiting, so I showed him how we can add some ASCII art. </li>
<li>Of course, because the exit is so quick, we didn't see the GG displayed so we worked out a 2-second delay.</li>
</ul>
<div>
And that was it. A good first session. I ended by committing the code to a <a href="https://github.com/netaz/textmode-tanks">new repository</a> in Github.<br />
I felt my son enjoyed himself, so I might have done it right this time :-)</div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-59430814599582676682015-10-04T01:27:00.001+03:002015-10-04T01:27:23.293+03:00The great experiment: teaching my son to program (part I)There are many reasons why I want to teach my kids to program. For starters, it's just great fun. What could be a better present that to teach your children a new way to enjoy themselves? When I program I sometimes feel like a novelist creating new worlds; sometimes like a carpenter using old techniques with new tools; sometimes like an architect free to implement my vision under the constraints of time, budget, physics and the almighty customer; sometimes like a detective collecting data to uncover who killed the pointer. And all the time I'm an engineer - a problem solver. In 2015, I believe, everyone should know something about programming. Lawyers, and artists; doctors and carpenters; pilots and salesmen. Everywhere around us are systems that someone else programmed, and understanding something about how these "things" work is empowering. Improving on these "things" can be life-changing. <br />
Joy comes from quenching one's thirst to understand, to imagine and to create. And so this skill is more than a utilitarian skill - it can be a lifelong hobby through which the kids will express their artistic sides.<br />
Or not :-)<br />
I don't know how this "experiment" will end: will my bubble burst, or will I help them open the gates of a new world? At least, I hope, we will bond. If only just a bit. <br />
<br />
My daughter wasn't interested in this "project" of mine.<br />
I couldn't entice her with promises of discovering bright new worlds. Bummer. It took me a couple of weeks to recover from this arrow to the heart. <br />
Alas, my son, who is younger and spends enough hours of the day playing games on the net, was more amenable. So off we went. <br />
To his room, that is, to crank up the old IDE.<br />
<br />
Roll back a few weeks earlier: I was in the fantasizing stage at that time. First thing I did was choose a language. I wanted a high-level language to abstract away the complexities of the underlying hardware. A language that has great development tools, and a large set of libraries pre-integrated in the development environment. In other words, it should be intuitive and with minimum friction so that we can get off and going with no resistance. C# was the obvious choice for me - it meets all my criteria and has a <a href="https://www.visualstudio.com/en-us/products/visual-studio-express-vs.aspx">free world-class IDE</a> provided by Microsoft. <br />
<br />
I searched for online C# learning resources, but quickly gave up on that direction. None of the sites I found provided an environment that is hands-on, educational, and captivating to a 13 year old with a short span of attention. <br />
So it is up to me. I have one chance, I know. Get it wrong from the out-start, and I will lose his interest for a long time.<br />
My son is a youtub'er and an online gamer fond of gadgets and is not easily impressed by simple graphics. It took me a few drives to work and back, to come up with an idea (these long drives are good for catching up on podcasts, but also for reflection). But before I hit this idea I had a couple of failures. I first tried interesting him in building a site on <a href="http://www.wix.com/">Wix</a>. He thought it was "cool", but it didn't stick. Mostly it was my fault: I didn't plan ahead and I thought that once I show him the tools, he will pick it up from there - motivated by his endless curiosity. But, no, that didn't happen. He was much more "curious" about watching some youtube video :-( And this was OK by me, because really, Wix is not programming. Next I tried interesting him in replicating the Mine Craft console interface. I often see my son using the Mine Craft console and he commands it better that I know that Linux Bash shell. We hacked up in a single session, a simple command-line-interface console application which accepted user input and executed simple commands such as the 'log-in' command with (fake) user and password; 'quit' command to exit the application, and error messages. I showed him how to use the Console object for basic I/O and taught him the basics of the "if" statement and string comparison. I stuck to a bare-bones approach: use only what you need and use it "leanly" - no fluffy stuff to add confusion. So, for example, when we used Console.WriteLine - well, it was just a way to print characters to the screen. No explanation about classes, objects, or methods. <br />
I never use C# myself, which is fortunate for this experiment. Each time I wasn't sure how to do something, we googled it. Using the experiences and advice of others through google is a great lesson.<br />
<br />
And then came the flop.<br />
My son's attention was wavering and we've achieved enough for this first session. So I called it quits and gave him some ideas on how he could make a few incremental changes. And then I left him, hoping that I did enough to pique his curiosity and that he will continue alone later. That "later" never came. In the coming couple of weeks I tried to get him to continue alone a few times, but youtube and Mine Craft were more interesting. But I saw the "spark" of curiosity and some excitement, and that was enough to keep me motivated. He said he now understands how the Mine Craft command line works. <br />
<br />
I learned two things from that experience. First, this is going to be "our" project. There are too many distractions luring him, and I'll have to sit with him through several sessions before trying again to invite him to work by himself. And this is fine because I enjoyed our session. Second, I really have to choose an interesting project, which also has a lot of room for new features.<br />
<br />
The next project will be a game.<br />
<br />
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-56719463103165033132015-07-06T18:29:00.001+03:002015-07-06T18:29:22.245+03:00Installing Caffe on Ubuntu 12.04I have recently installed <a href="http://caffe.berkeleyvision.org/">Caffe</a>, the deep learning framework, on an Ubuntu 12.04 workstation and found a problem with the <a href="http://caffe.berkeleyvision.org/install_apt.html">Ubuntu installation instruction</a>s.<br />
The instructions point us to gitorious, to clone the LMDB (Lighting Memory-Mapped Database)<br />
repository:<br />
<br />
<pre style="background: rgb(253, 254, 251); border-radius: 4px; border: 1px solid rgb(215, 216, 200); font-size: 12px; font-stretch: inherit; line-height: 14px; margin-bottom: 16px; overflow: auto; padding: 6px 12px; vertical-align: baseline; white-space: pre-wrap;"><code style="border: 0px; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;">git clone https://gitorious.org/mdb/mdb.git</code></pre>
However, trying to clone this repository results in a connection timeout. According to the official <a href="http://blog.gitorious.org/">gitorious blog</a>, gitorious was acquired and is migrating their repositories to another location:<br />
<br />
<div style="background-color: white; color: dimgrey; font-family: 'Helvetica Neue', Helvetica, sans-serif; font-size: 14px; line-height: 20px; margin-bottom: 16px; padding: 0px;">
<i>As you may know, <a href="https://about.gitlab.com/2015/03/03/gitlab-acquires-gitorious/" rel="nofollow" style="color: #0088cc; margin: 0px; padding: 0px; text-decoration: none;">Gitorious was acquired by GitLab</a> about a month ago, and we announced that Gitorious.org would be shutting down at the end of May, 2015.</i></div>
<div style="background-color: white; color: dimgrey; font-family: 'Helvetica Neue', Helvetica, sans-serif; font-size: 14px; line-height: 20px; margin-bottom: 16px; padding: 0px;">
<i>After the announcement we talked to the <a href="http://archiveteam.org/" rel="nofollow" style="color: #0088cc; margin: 0px; padding: 0px; text-decoration: none;">Archive Team</a> about how to preserve Gitorious.org and its data for the future. A member of the Archive Team graciously offered to host gitorious.org as a read-only archive on Gitorious.org and GitLab agreed to allow to use the Gitorious.org domain name for this.</i></div>
As of today, at least, the above repo link is <i>not</i> working for me. To bypass this issue, I downloaded and installed the following two packages:<br />
<br />
<pre style="background: rgb(253, 254, 251); border-radius: 4px; border: 1px solid rgb(215, 216, 200); font-size: 12px; font-stretch: inherit; line-height: 14px; margin-bottom: 16px; overflow: auto; padding: 6px 12px; vertical-align: baseline; white-space: pre-wrap;"><div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ wget <a href="http://launchpadlibrarian.net/188228946/liblmdb0_0.9.14-1_amd64.deb">http://launchpadlibrarian.net/188228946/liblmdb0_0.9.14-1_amd64.deb</a></div>
<div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ wget <a href="http://launchpadlibrarian.net/188228947/liblmdb-dev_0.9.14-1_amd64.deb">http://launchpadlibrarian.net/188228947/liblmdb-dev_0.9.14-1_amd64.deb</a></div>
<div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ sudo dpkg -I liblmdb0_0.9.14-1_amd64.deb</div>
<div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ sudo apt-get install -f .</div>
<div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ sudo dpkg -I liblmdb-dev_0.9.14-1_amd64.deb</div>
<div style="font-family: Calibri; font-size: 11pt; margin: 0in;">
$ sudo apt-get install -f .</div>
</pre>
I hope this information will help any of you trying to install Caffe on Ubuntu 12.04.<br />
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com4tag:blogger.com,1999:blog-182549309027052933.post-44694109859795206932015-06-20T15:57:00.001+03:002015-06-20T17:43:34.930+03:00Android BitTubeBitTube is a tiny AOSP (Android Open Source Project) class that I came upon while scouring the SensorService code. It first piqued my interest because of its name, which I really like for some reason (do geeks really need reasons to like class names?). But although the class is small, it felt like there was something interesting going on here, and that it will be worthwhile to do some digging.<br />
<br />
If I came upon the BitTube class outside of the Android context, it would have been quite unremarkable and forgetful. The BitTube implementation is pretty obvious and straight-forward: it is a "parcel-able" wrapper to a pair of sockets. A <i>socketpair </i>to be exact. And that's the eyebrow-raising tidbit: a socketpair is a Linux/Unix IPC (inter-process communication) mechanism very similar to a pipe. What is a Linux IPC doing at the heart of AOSP when Binder is used almost everywhere else (another outlier is the RIL to rild - radio interface daemon - socket IPC)? <br />
<br />
A socketpair sets up a two-way communication pipe with a socket attached to each end. With file descriptor duplication (dup/dup2), you can pass the socket handle to another process, duplicate it and start communicating. BitTube uses Unix sockets with <a href="http://urchin.earth.li/~twic/Sequenced_Packets_Over_Ordinary_TCP.html">sequenced packets</a> (SOCK_SEQPACKET) which, like datagrams, only deliver whole records, and like SOCK_STREAM, are connection-oriented and guarantee in-order delivery. Although socketpair is a two-way communication pipe, BitTube uses it as a one-way pipe and assigns one end of the pipe to a writer and another to a reader (more on that later on). The send and receive buffers are set to a default limit of 4KB each. There's an interface for writing and reading a sequence of same-size "objects" (sendObjects, recvObjects). <br />
<br />
A short look around AOSP reveals that BitTube is used by the Display subsystem and by the Sensors subsystem, so let's look at how it is used the Sensors subsystem. I'll provide a very brief recap of the <a href="http://developer.android.com/reference/android/hardware/SensorManager.html">Sensors Java API </a>to level-set, in case you are not familiar with this.<br />
An application uses the SensorManager system service to access (virtual and physical) device sensors. It registers to receive sensor events via two callbacks, which report an accuracy change or the availability of a sensor reading sample (event).<br />
<br />
public class SensorActivity extends Activity, implements SensorEventListener {<br />
private final SensorManager mSensorManager;<br />
private final Sensor mAccelerometer;<br />
<br />
public SensorActivity() {<br />
<span style="color: blue;">mSensorManager = (SensorManager)getSystemService(SENSOR_SERVICE);</span><br />
<span style="color: blue;"> mAccelerometer = mSensorManager.getDefaultSensor(Sensor.TYPE_ACCELEROMETER);</span><br />
}<br />
<br />
protected void onResume() {<br />
super.onResume();<br />
<span style="color: blue;">mSensorManager.registerListener(this, mAccelerometer, SensorManager.SENSOR_DELAY_NORMAL);</span><br />
}<br />
<br />
protected void onPause() {<br />
super.onPause();<br />
mSensorManager.unregisterListener(this);<br />
}<br />
<br />
public void onAccuracyChanged(Sensor sensor, int accuracy) {<br />
}<br />
<br />
public void onSensorChanged(SensorEvent event) {<br />
}<br />
}<br />
<br />
There's a lot of work performed behind the scenes in order to implement the SensorManager.registerListener. First, SensorManager delegates the request to SystemSensorManager which is the real workhorse. I've copy-pasted the Lollipop code after removing some of the less-important, yet distracting code:<br />
<br />
/** @hide */<br />
@Override<br />
protected boolean registerListenerImpl(SensorEventListener listener, Sensor sensor,<br />
int delayUs, Handler handler, int maxBatchReportLatencyUs, int reservedFlags) {<br />
<br />
// Invariants to preserve:<br />
// - one Looper per SensorEventListener<br />
// - one Looper per SensorEventQueue<br />
// We map SensorEventListener to a SensorEventQueue, which holds the looper<br />
synchronized (mSensorListeners) {<br />
SensorEventQueue queue = mSensorListeners.get(listener);<br />
if (queue == null) {<br />
Looper looper = (handler != null) ? handler.getLooper() : mMainLooper;<br />
queue = new SensorEventQueue(listener, looper, this);<br />
if (!queue.addSensor(sensor, delayUs, maxBatchReportLatencyUs, reservedFlags)) {<br />
queue.dispose();<br />
return false;<br />
}<br />
mSensorListeners.put(listener, queue);<br />
return true;<br />
} else {<br />
return queue.addSensor(sensor, delayUs, maxBatchReportLatencyUs, reservedFlags);<br />
}<br />
}<br />
}<br />
<div>
<br /></div>
<div>
As you can see, a SensorEventQueue and Looper are created per registered SensorEventListener.</div>
<div>
The SensorEventQueue is the object which eventually delivers sensor events to the application. This class diagram can give you some high-level grasp of the Java and native class hierarchy.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCzWK5fCS7SHRdDd5GDiXR79G5IICZIxctc35CjYK7pogohF8433OYPWVSoHPImxd_W1IaVEGhyguEp7POvXG1mNJBQ-_RoLCZlQ7brsiMfGOLlYbZGwTY08YNhr-BkOtwnxx-lD9JriwN/s1600/sensors-class-diagram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="473" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCzWK5fCS7SHRdDd5GDiXR79G5IICZIxctc35CjYK7pogohF8433OYPWVSoHPImxd_W1IaVEGhyguEp7POvXG1mNJBQ-_RoLCZlQ7brsiMfGOLlYbZGwTY08YNhr-BkOtwnxx-lD9JriwN/s640/sensors-class-diagram.png" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Because this blog entry is about BitTube and not about the Sensor subsystem, I'll jump over many details: eventually a native SensorEventQueue is created.</div>
<div>
The native SensorEventQueue uses a SensorEventConnection to bridge the process address-space gap and communicate with the native SensorService. The BnSensorEventConnection (this is the server-side of the IPC) creates a BitTube and with it a socketpair. One of the socket handles is dup'ed ('dup' system call) by the BpSensorEventConnection and, voila: we have a communication pipe between the two processes, as depicted below.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsmuNHARV6wkFwvd-yYSGjzUifCdSlD3asFYU7usCHvrEQGtGLGedkA0kY-AKvbYDvRW9pA8lw_RH9dGh1d3BzRgIa5D4hASMux7nTR5r431Zur6eH5iDHTA3fI7r-m1tNfAWxnHylpDnI/s1600/sensors-deployment.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="548" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsmuNHARV6wkFwvd-yYSGjzUifCdSlD3asFYU7usCHvrEQGtGLGedkA0kY-AKvbYDvRW9pA8lw_RH9dGh1d3BzRgIa5D4hASMux7nTR5r431Zur6eH5iDHTA3fI7r-m1tNfAWxnHylpDnI/s640/sensors-deployment.png" width="640" /></a></div>
<br />
<br />
As I mentioned above, the BitTube is used as a one-way pipe: events are written on one side and read by the SensorEventQueue, on the other side. <br />
<br />
During the construction of the socketpair 2 x 4KB (default size) buffers are allocated by the kernel (one for the send-side buffer and the other for the receive-side buffer) using the SO_SNDBUF and SO_RCVBUF socket options. Remember that this is done per SensorEventListener. And there's also a Looper thread per SensorEventListener. Quite a lot of overhead.<br />
So, the question still begs, what's gained by using this "new" IPC? At first I thought that this was some legacy design from the early days of Android, or perhaps from some module that was integrated some time ago into the code-base. But this wouldn't explain why BitTube is also used by DisplayEventReceiver for what looks like a similar setup. <br />
Maybe it provides extra low latency? BitTube can deliver several events in one write/read, but that can also be done with Binder without introducing any complications. They both incur about the same number of context switches, buffer copies, and system calls.<br />
Is simplicity the motivation? No, BitTube is about as complex as using Binder.<br />
This leaves me with throughput as the only other reason I can think of. But sensors are <a href="https://source.android.com/devices/sensors/index.html">defined </a>as low bandwidth components:<br />
<br />
<i>Not included in the list of physical devices providing data are camera, fingerprint sensor, microphone, and touch screen. These devices have their own reporting mechanism; the separation is arbitrary, but in general, Android sensors provide lower bandwidth data. For example, “100hz x 3 channels” for an accelerometer versus “25hz x 8 MP x 3 channels” for a camera or “44kHz x 1 channel” for a microphone</i>.<br />
<br />
For me, the mystery remains. If you have some thoughts on this, please comment - I'd love to learn.<br />
In any case, BitTube provides another tool in our AOSP tool chest - although I'm hesitant about using it, until I understand what extra powers it give me :-)netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com4tag:blogger.com,1999:blog-182549309027052933.post-66556192235021271302015-06-14T01:11:00.000+03:002015-11-02T19:44:06.337+02:00OpenVx for Android - ovx4androidOpenVX is a new Khronos specification for an API for hardware-accelerated computer vision. <br />
The <a href="https://www.khronos.org/openvx/">Khronos OpenVX homepage</a> describes it such:<br />
<br />
<span style="background-color: white; color: #222222; font-family: "lucida grande" , "arial" , "helvetica" , sans-serif; font-size: 12px; line-height: 19.2000007629395px;"><i>OpenVX is an open, royalty-free standard for cross platform acceleration of computer vision applications. OpenVX enables performance and power-optimized computer vision processing, especially important in embedded and real-time uses cases such as face, body and gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, robotics and more.</i></span><br />
<br />
The OpenVX specification and sample code for download are available <a href="https://www.khronos.org/registry/vx/">here</a>.<br />
Unfortunately, Khronos didn't bother testing their release on Android and it doesn't even compile. I went ahead and made the necessary changes to compile the code with NDK (I tested with NDKr9d) and I've made it available on <a href="https://github.com/netaz/ovx4android">github</a>.<br />
<br />
In a future post I'll describe how to integrate this with android.hardware.camera2.<br />
I hope you enjoy experimenting with this. <br />
Netanetazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com4tag:blogger.com,1999:blog-182549309027052933.post-61787620257419823712015-04-18T23:08:00.000+03:002015-05-12T14:41:02.256+03:00Android's Graphics Buffer Management System (Part II: BufferQueue)<a href="http://www.codeproject.com" rel="tag" style="display:none">CodeProject</a>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
In the <a href="http://netaz.blogspot.co.il/2015/03/androids-graphics-buffer-management.html">first post on Android's graphics buffer management</a>, I discussed gralloc, which is Android's graphics buffer allocation HAL. In this post I'll describe graphics buffers flows in Android, with special attention to class BufferQueue which plays a central role in graphics buffer management.<br />
<br />
<h4>
Introduction</h4>
Before I dive in, I want to discuss buffers in general. There is a surprising number of details and aspects involved in designing buffer systems and I think it is best to examine what was done in Android once we've assumed a wide and generic perspective.<br />
Data buffers, and specifically image and graphics data buffers, exist as part of a specific subsystem, such as the camera subsystem, but can also span multiple subsystems, such as buffers shared between the camera and video subsystems. Buffers provide a means to temporarily store data to allow us to separate the production of data from the consumption of data - in both time and space. That is, we can produce (or collect) data at one moment, and use it at a different moment. This decouples producer and consumer, and also allows producer and consumer to be asynchronous to one another. Many times in an event-based system the data producer and the data consumer are triggered (clocked) by different time sources. For example, the camera on your mobile phone produces image frames at some arbitrary frame-rate (e.g. 30 frames per second, of FPS) while the display panel (showing the preview) can operate at a different refresh-rate (e.g. 60 Hz). Moreover, even if the devices were guaranteed to operate at the same frequency (or if one frequency is a harmonic of the other), they are unlikely to have the same phase offset since the display operation starts when we turn on the screen, while the camera operation starts at some other arbitrary time when we start the camera application. And of course there is drift and jitter that contribute the asynchronous nature of the two subsystems. There may also be several consumers, or several producers. SurfaceFlinger, for example, uses buffers from multiple sources and composes them into a single output buffer.<br />
<br />
Buffers also allow us to move data from one part of our system to another. Inevitably, buffers follow some paths within our system and these are commonly referred to the as the "data paths". A path can start at a buffer provider which allocates new memory or provides a buffer from a pre-allocated pool. The buffers are considered empty at this stage. That is, they do not contain consumable data or metadata. A source entity provides the initial data by attaching it to a buffer (reference holding buffers) or copying the data to the buffer's memory. Somehow, a buffer makes its way along a path of buffer handlers until it arrives at the content consumer which uses the data and discards the buffer. A buffer handler may be passive (e.g. monitor or logger), or it be active: filtering (drop), altering, augmenting, extracting, or otherwise manipulating the contents. These paths can be either dynamic or static. There are many design patterns which define how a data path is defined and controlled (pipes and filters, layering, pipeline, software bus messaging, direct addressing, broadcasting, observing, and so forth) and I will not cover them here as that would really be diverging from our topic.<br />
<br />
<div>
Buffer systems are either closed-looped or open-looped. In closed-loop paths there is a buffer path from the consumer back to the producer. Sometimes this is made explicit, and sometimes implicit. For example, if the producer and consumer use a shared memory pool they implicitly form a closed-loop. One can argue that using a shared buffer pool is not really a closed-loop, but I contend that as long as the system is designed using explicit knowledge of shared buffer memory, then it is closed. That is, if the consumer can starve or delay the producer because it controls the flow of buffers available to the producer, then this is a closed-loop system. C<br />
<br />
Ah, and there is the question of what we mean by buffer. A lot of time when people say "buffer" they are referring to the actual backend memory storing the content, but in real systems it is quite rare to see raw data moving around the system. It is much more common to see buffer objects which contain metadata describing the data content. What is contained in this metadata is implementation-specific and depends on the problem domain and context, but I'm sure we can agree that one piece of information we need to know is the amount of data stored in the buffer. And there is the question of pointer-to-data (by reference) vs embedded data (by value). Obviously zero-copy buffer handling is preferred, but requires us to be exact about buffer memory life-time management. Life time management, access management and synchronization are other related aspects which I've discussed in the <a href="http://netaz.blogspot.co.il/2015/03/androids-graphics-buffer-management.html">previous post</a> so I'll cut things short right here. </div>
<h4>
BufferQueue</h4>
After this generic discussion of data buffers, we can finally dive into the Android details. I'll start with class BufferQueue because it is at the center graphic buffer movement in Android. It abstracts a queue of graphics buffers, uses gralloc to allocate buffers, and has means to connect buffer producers and consumers which reside in different process address spaces.<br />
Code for class BufferQueue and many of the cooperating classes that I'll be discussing can be found in directory <aosp code="">/frameworks/native/libs/gui/ with the header files in <aosp code="">/frameworks/native/include/gui.</aosp></aosp><br />
<aosp code=""><aosp code=""><br /></aosp></aosp>
<br />
<div class="MsoNormal">
Class BufferQueue has a static factory method,
BufferQueue::createBufferQueue, which is used to create BufferQueue instances.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
// BufferQueue
manages a pool of gralloc memory slots to be used by<o:p></o:p></div>
<div class="MsoNormal">
// producers and
consumers. allocator is used to allocate all the<o:p></o:p></div>
<div class="MsoNormal">
// needed gralloc
buffers.<o:p></o:p></div>
<div class="MsoNormal">
static void
createBufferQueue(sp<igraphicbufferproducer>* outProducer,<o:p></o:p></igraphicbufferproducer></div>
<div class="MsoNormal">
sp<igraphicbufferconsumer>* outConsumer,<o:p></o:p></igraphicbufferconsumer></div>
<div class="MsoNormal">
const
sp<igraphicbufferalloc>& allocator = NULL);<o:p></o:p></igraphicbufferalloc></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
A quick glance at the implementation reveals that class
BufferQueue is only a thin facade to class BufferQueueCore, which conatins the
actual implementation logic. For
simplicity of this discussion, I will not make a distinction between these classes.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Working with BufferQueue is pretty straight-forward. First, producers and consumers connect to the
BufferQueue.<o:p></o:p></div>
<div class="MsoNormal">
1. The producer takes an “empty” buffer from the BufferQueue
(dequeueBuffer)<o:p></o:p></div>
<div class="MsoNormal">
2. The producer (e.g. camera) copies image or graphics data
into the buffer<o:p></o:p></div>
<div class="MsoNormal">
3. The producer returns the “filled” buffer to the
BufferQueue (queueBuffer)<o:p></o:p></div>
<div class="MsoNormal">
4. The consumer receives an indication (via callback) of the
presence of a “filled” buffer<o:p></o:p></div>
<div class="MsoNormal">
5. The consumer removes this buffer from the BufferQueue
(acquireBuffer)<o:p></o:p></div>
<div class="MsoNormal">
6. When the consumer is done consuming the buffer is
returned to the BufferQueue (releaseBuffer)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The following diagram shows a simplified interaction diagram
between the camera (image buffer producer) and the display (image buffer
consumer). <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfJV_ncfBbsgSbt-vHGKmKCyVngSWbYePibHufWWL-P4QmPJj6ye6mOiYw0vUWxleL67twWuC9BlpYDz1U8q_a9GaCWCgNCfhSod1w8K8KP8pcmfyTzpm1sgr0Qgwbx9IN3UNb_37lfj6O/s1600/gfx-sample_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfJV_ncfBbsgSbt-vHGKmKCyVngSWbYePibHufWWL-P4QmPJj6ye6mOiYw0vUWxleL67twWuC9BlpYDz1U8q_a9GaCWCgNCfhSod1w8K8KP8pcmfyTzpm1sgr0Qgwbx9IN3UNb_37lfj6O/s1600/gfx-sample_flow.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1: Simplified data path between the camera subsystem and the GPU</td></tr>
</tbody></table>
<div class="MsoNormal">
Producers and Consumers may reside in different processes
and this is accomplished using Binder, as always.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
BufferQueueProducer is the workhorse behind
IGraphicBufferProducer.
BufferQueueProducer maintains an intimate relationship with
BufferQueueCore and directly accesses its member variables, including mutexes,
conditions and other significant members (such as its pointer to
IGraphicBufferAlloc). Personally, I
don't like this - it is confusing and fragile. </div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
When a Producer is requested to provide an empty buffer
using dequeueBuffer, it tries to fetch one from BufferQueueCore which maintains
an array of buffers and their states (DEQUEUED, QUEUED, ACQUIRED, FREE). If a free slot is found in the buffer array
but it doesn’t contain a buffer, or if the Producer was explicitly asked to
reallocate the buffer, then BufferQueueProducer uses BufferQueueCore’s to
allocate a new buffer. </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrswrtFCjVrrYR_lHZqPyc7KMIf3C-ll8uhScCacKhoNx610Gl3t_hsLqs6y96x6elaU5OgsAHydMM5ofDEgwAG4kCjhMb1upYMyZ8hp64rxTGCRu-EqWXYbGnWGCvRmYgsV23maJzAIi/s1600/gfx-allocator_flow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHrswrtFCjVrrYR_lHZqPyc7KMIf3C-ll8uhScCacKhoNx610Gl3t_hsLqs6y96x6elaU5OgsAHydMM5ofDEgwAG4kCjhMb1upYMyZ8hp64rxTGCRu-EqWXYbGnWGCvRmYgsV23maJzAIi/s1600/gfx-allocator_flow.png" width="640" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Initially, all
invocations of dequeueBuffer results in the allocation of new buffers. But because this is a closed-loop system,
where the buffer Consumer returns buffers once it has consumed their contents
(by calling releaseBuffer), we should see the system reaching equilibrium after
a very short while. Be aware that
although BufferQueueCore can maintain an array of variable-sized GraphicBuffer objects,
it is wise to make all buffers of the same size. Otherwise, each invocation of dequeueBuffer
may require the allocation of a new GraphicBuffer instance.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3YhekwKPndGFrMRUSvh38eXhGdLsNXZ7sVBM121xZZOETnMHu47DQ43O0hN6Y9wbJQLVCxxa5TyG79xo5_86-kBW0jIkkiKmk7YILQc52hsyEtlWoH555sZpM45cWZODJex6oH3fumnoX/s1600/gfx-buffer_queue_classes.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="530" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3YhekwKPndGFrMRUSvh38eXhGdLsNXZ7sVBM121xZZOETnMHu47DQ43O0hN6Y9wbJQLVCxxa5TyG79xo5_86-kBW0jIkkiKmk7YILQc52hsyEtlWoH555sZpM45cWZODJex6oH3fumnoX/s1600/gfx-buffer_queue_classes.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 2: The main classes related to BufferQueue</td></tr>
</tbody></table>
<div class="MsoNormal">
<o:p> </o:p>The GraphicBuffer allocation is performed using an
implementation of IGraphicBufferAlloc which is provided to BufferQueueCore when
it is constructed. The default
implementation of IGraphicBufferAlloc is provided by SurfaceFlinger (the system
object in charge of composing all surfaces) and uses gralloc to allocate
buffers. In the previous post I
discussed why a central graphics buffers allocator is well-advised when dealing
with various hardware SoC modules.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
Class BufferQueueCore doesn’t directly store GraphicBuffer –
it uses class BufferItem which contains a pointer to a GraphicBuffer instance, including
various other metadata (see frameworks/native/include/gui/BufferItem.h).<o:p></o:p></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg93F_2y9JNbXWCqgDpbtzMO4RYfQcdTvYc6dqmW4r2vS4Hypn2_nV-oq3qTBx09hnQNEsWI_6xDOIdlPE5YayBPRlI4shkq45dcx0OCovWsj964dLfuENf-uPpm0IHeRV0y8a_WJrxGqwD/s1600/gfx-allocator_classes.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg93F_2y9JNbXWCqgDpbtzMO4RYfQcdTvYc6dqmW4r2vS4Hypn2_nV-oq3qTBx09hnQNEsWI_6xDOIdlPE5YayBPRlI4shkq45dcx0OCovWsj964dLfuENf-uPpm0IHeRV0y8a_WJrxGqwD/s1600/gfx-allocator_classes.png" width="306" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 3: Class diagram showing the main classes related to graphics buffer allocation</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Asynchronous notification interfaces IConsumerListener and
IProducerListener are used to alert listeners about events such as a buffer
being ready for consumption (IConsumerListener::onFrameAvailable); or the availability
of an empty buffer (IProducerListener::onBufferReleased). These callback interfaces also use Binder and
can cross process boundaries. Checkout further
details in frameworks/native/include/gui/IConsumerListener.h</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
The best source of information I found on Android’s graphics
system, aside from the code itself of course, is <a href="https://source.android.com/devices/graphics/architecture.html">here</a>.<o:p></o:p></div>
<aosp code="">
</aosp><br />
<h4>
Consumers</h4>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGASCOQRmRujkmyOoLQ1Rsbvm8YqBFPuSauAE6uSVTRynKxJfapPMDHQjnyW9qRQ8tWS22g9VvtZ0wa2GZo0oJmfZjjho-IYyPW4D25XfdydnZ4OMXDkrPTDMNsfO8yScqYtFEaxiDQOuF/s1600/gfx-consumer_classes.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGASCOQRmRujkmyOoLQ1Rsbvm8YqBFPuSauAE6uSVTRynKxJfapPMDHQjnyW9qRQ8tWS22g9VvtZ0wa2GZo0oJmfZjjho-IYyPW4D25XfdydnZ4OMXDkrPTDMNsfO8yScqYtFEaxiDQOuF/s1600/gfx-consumer_classes.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure: Some consumer classes</td></tr>
</tbody></table>
<h4>
BufferQueue Creation</h4>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhROrNkwn_DEb2SKKatLWiU-9i5Ugb6jhDZJlsq6ulmq8jSGv_NoN_e-Tlwo1ehLQr-EKveCTarefG8LvhRBKdKVyxbAiiX2zH2Lcj0UJrS2hGQcocPoNFMRP4xsyRzb5XVcpuvo4Uzwj0S/s1600/gfx-bq_creation_flow.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhROrNkwn_DEb2SKKatLWiU-9i5Ugb6jhDZJlsq6ulmq8jSGv_NoN_e-Tlwo1ehLQr-EKveCTarefG8LvhRBKdKVyxbAiiX2zH2Lcj0UJrS2hGQcocPoNFMRP4xsyRzb5XVcpuvo4Uzwj0S/s1600/gfx-bq_creation_flow.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure: Top to bottom BufferQueue creation flow</td></tr>
</tbody></table>
<div>
<br /></div>
<br />
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com8tag:blogger.com,1999:blog-182549309027052933.post-10203103165201164192015-03-21T02:43:00.001+02:002015-05-14T12:20:46.426+03:00Android's Graphics Buffer Management System (Part I: gralloc)<a href="http://www.codeproject.com" rel="tag" style="display:none">CodeProject</a>
In this post series I'll do a deep dive into Android's graphics buffer management system. I'll cover how buffers produced by the camera use the generic BufferQueue abstraction to flow to different parts of the system, how buffers are shared between different hardware modules, and how they traverse process boundaries.<br />
But I will start at buffer allocation, and before I describe what triggers buffer allocation and when, let's look at the low-level graphics buffer allocator, a.k.a. gralloc.<br />
<br />
<h2>
gralloc: Buffer Allocation</h2>
The gralloc is part of the HAL (Hardware Abstraction Layer) which means that the implementation is platform-specific. You can find the interface definitions in hardware/libhardware/include/hardware/gralloc.h. As expected from a HAL component, the interface is divided into a <i>module</i> interface (<i>gralloc_module_t</i>) and a <i> device</i> interface (<i>alloc_device_t</i>). Loading the gralloc module is performed as for all HAL modules, so I won't go into these details because they can be easily googled. But I will mention that the entry point into a newly loaded HAL module is via the <i>open</i> method of the structure <i>hw_module_methods </i>which is referenced by the structure <i>hw_module_t.</i> Structure <i>hw_module_t</i> acts as a mandatory "base class" (not quite since this is "C" code) of all HAL modules including <i>gralloc_module_t.</i><br />
Both the module and the device interfaces are versioned. The current module version is 0.3 and the device version is 0.1. Only Google knows why these interfaces have these sub-1.0 interface versions. :-)<br />
<br />
As I said above, gralloc implementations are platform-specific and for reference you can look at the goldfish device's implementation (device/generic/goldfish/opengl/system/gralloc/gralloc.c). Goldfish is the code name for the Android emulation platform device. <br />
The sole responsibility of the d<i>evice</i> (<i>alloc_device_t</i>) is allocation (and consequent release) of buffer memory so it has a straight-forward signature:<br />
<br />
typedef struct alloc_device_t {<br />
struct hw_device_t common;<br />
<br />
/*<br />
* (*alloc)() Allocates a buffer in graphic memory with the requested<br />
* parameters and returns a buffer_handle_t and the stride in pixels to<br />
* allow the implementation to satisfy hardware constraints on the width<br />
* of a pixmap (eg: it may have to be multiple of 8 pixels).<br />
* The CALLER TAKES OWNERSHIP of the buffer_handle_t.<br />
*<br />
* If format is HAL_PIXEL_FORMAT_YCbCr_420_888, the returned stride must be<br />
* 0, since the actual strides are available from the android_ycbcr<br />
* structure.<br />
*<br />
* Returns 0 on success or -errno on error.<br />
*/<br />
<br />
int (*alloc)(struct alloc_device_t* dev,<br />
int w, int h, int format, int usage,<br />
buffer_handle_t* handle, int* stride);<br />
/*<br />
* (*free)() Frees a previously allocated buffer.<br />
* Behavior is undefined if the buffer is still mapped in any process,<br />
* but shall not result in termination of the program or security breaches<br />
* (allowing a process to get access to another process' buffers).<br />
* THIS FUNCTION TAKES OWNERSHIP of the buffer_handle_t which becomes<br />
* invalid after the call.<br />
*<br />
* Returns 0 on success or -errno on error.<br />
*/<br />
<br />
int (*free)(struct alloc_device_t* dev,<br />
buffer_handle_t handle);<br />
<br />
/* This hook is OPTIONAL.<br />
*<br />
* If non NULL it will be caused by SurfaceFlinger on dumpsys<br />
*/<br />
void (*dump)(struct alloc_device_t *dev, char *buff, int buff_len);<br />
void* reserved_proc[7];<br />
} alloc_device_t;<br />
<br />
Lets examine the parameters for the alloc() function. The first parameter (dev) is of course the <b>instance handle</b>.<br />
<br />
The next two parameters (w, h) provide the requested <b>width and height</b> of the buffer. When describing the dimensions of a graphics buffer there are two points to watch for. First, we need to understand the units of the dimensions. If the dimensions are expressed in pixels, as is the case for gralloc, then we need to understand how to translate pixels to bits. And for this we need to know the color encoding format. <br />
<br />
The requested<b> color format</b> is the forth parameter. The color formats that Android supports are defined in <android>/system/core/include/system/graphics.h. Color format HAL_PIXEL_FORMAT_RGBA_8888 uses 32 bits for each pixel (8 pixels for each of the pixel components: red, green, blue and alpha-blending), while HAL_PIXEL_FORMAT_RGB_565 uses 16 bits for each pixel (5 bits for red and blue, and 6 bits for green).</android><br />
<br />
The second important factor affecting the physical dimensions of the graphics buffer is its stride. <b>Stride </b>is the last parameter to alloc and it is also an <i>out</i> parameter. To understand stride (a.k.a. pitch), it is easiest to refer to a diagram:<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbVMGi8uc61tNWKzJfM1N8dlhkPVm-pmIaACq5NcbEhYlzk8Pc0u74STXn0rDhJV30-I2j7HXQYnucIELA4SVbckcN12EjwAr1MexBC3T-r_vrCjOPwU8tNI_ZLjCTkYxPPfvzfyE8t5hr/s1600/stride.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbVMGi8uc61tNWKzJfM1N8dlhkPVm-pmIaACq5NcbEhYlzk8Pc0u74STXn0rDhJV30-I2j7HXQYnucIELA4SVbckcN12EjwAr1MexBC3T-r_vrCjOPwU8tNI_ZLjCTkYxPPfvzfyE8t5hr/s1600/stride.png" height="255" width="400" /></a></div>
<br />
<br />
We can think of memory buffers as matrices arranged in rows and columns of pixels. A row is usually referred to as a <i>line</i>. Stride is defined as the number of pixels (or bytes, depending on your units!) that need to be counted from the beginning of one buffer line, to the next buffer line. As the diagram above shows, the stride is necessarily at least equal to the width of the buffer, but can very well be larger than the width. The difference between the stride and the width (stride-width) is just wasted memory and one takeaway from this is that the memory used to store an image or graphics may not be continuous. So where does the stride come from? Due to hardware implementation complexity, memory bandwidth optimizations, and other constraints, the hardware accessing the graphics memory may require the buffer to be a multiple of some number of bytes. For example, if for a particular hardware module the line addresses need to align to 64 bytes, then memory widths need to be multiples of 64 bytes. If this constraint results in longer lines than requested, then the buffer stride is different from the width. Another motivation for stride is buffer reuse: imagine that you want to refer to a cropped image within another image. In this case, the cropped (internal) image has a stride different than the width.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiynA2qMImIhmqy4ezcq4rcuHqYThByeot5zQz1Ji2WesONKbcY_B-WZmN6Mjb4qMHzheubxHkwznRzSgmdyZYyZbQTdZkuuGb1QihOLByFgGOD1pQfjSrnVsRYWe5IKfxHgHyYYBh2TU2P/s1600/cropped-stride.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiynA2qMImIhmqy4ezcq4rcuHqYThByeot5zQz1Ji2WesONKbcY_B-WZmN6Mjb4qMHzheubxHkwznRzSgmdyZYyZbQTdZkuuGb1QihOLByFgGOD1pQfjSrnVsRYWe5IKfxHgHyYYBh2TU2P/s1600/cropped-stride.png" height="230" width="400" /></a></div>
<br />
<br />
Allocated buffer memory can be written to, or read from, by user-space code of course, but first and foremost it is written to, or read from, by different hardware modules such as the GPU (graphics processing unit), camera, composition engine, DMA engine, display controller, etc. On a typical SoC these hardware modules come from different vendors and have different constraints on the buffer memory which all need to be reconciled if they are to share buffers. For example, a buffer written by the GPU should be readable by the display controller. The different constraints on the buffers are not necessarily the result of heterogeneous component vendors, but also because of different optimization points. In any case, gralloc needs to ensure that the image format and memory layout is agreeable to both image producer and consumer. This is where the usage parameter comes into play.<br />
<br />
The <b>usage </b>flags are defined in file gralloc.h. The first four least significant bits (bits 0-3) describe how the software reads the buffer (never, rarely, often); and the next four bits (bits 4-7) describe how the software writes the buffer (never, rarely, often). The next twelve bits describe how the hardware uses the buffer: as an OpenGL ES texture or OpenGL ES render target; by the 2D hardware blitter, HWComposer, framebuffer device, or HW video encoder; written or read by the HW camera pipeline; used as part of zero-shutter-lag camera queue; used as a RenderScript Allocation; displayed full-screen on an external display; or used as a cursor.<br />
Obviously there may be some coupling between the color format and the usage flag. For example, if the usage parameter indicates that the buffer is written by the camera and read by the video encoder, then the format must be agreeable by both HW modules.<br />
If software needs to access the buffer contents, either for read or write, then gralloc needs to make sure that there is a mapping from the physical address space to the CPU's virtual address space and that the cache is kept coherent.<br />
For a sample implementation, you can examine the goldfish device implementation at <android>/device/generic/goldfish/opengl/system/gralloc/gralloc.cpp.</android><br />
<br />
<h3>
Other factors affecting buffer memory</h3>
There are other factors affecting how graphic and image memory is allocated and how images are stored (memory layout) and accessed which we should briefly review:<br />
<b>Alignment</b><br />
Once again, different hardware may impose hard or soft memory alignment requirements. Not complying with a hard requirement will result in the failure of the hardware to perform its function, while not complying with a soft requirement will result in an sub-optimal use of the hardware (usually expressed in power, thermal and performance).<br />
<b><br /></b>
<b>Color Space, Formats and Memory Layout</b><br />
There are several <a href="http://en.wikipedia.org/wiki/Color_space">color spaces</a> of which the most familiar ones are YCbCr (images) and RGB (graphics). Within each color space information may be encoded differently. Some sample RGB encodings include RGB565 (16 bits; 5 bits for red and blue and 6 bits for green), RGB888 (24 bits) or ARGB8888 (32 bits; with the alpha blending channel). YCbCr encoding formats usually employ <a href="http://en.wikipedia.org/wiki/Chroma_subsampling">chroma subsampling</a>.<br />
Because our eyes are less sensitive to color than to gray levels, the chroma channels can have a lower sampling rate compared to the luma channel with little loss of perceptual quality. The subsampling scheme used does not necessarily dictate the memory layout. For example, for 4:2:0 subsampling formats NV12 and YV12 there are two very different memory layouts, as depicted in the diagram below.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbkfGcBwXzv-GiCA-JPtwXlOGFYg8zLAuJ9h9mlFrY0CHJel4VclK37j3qKLH99qftxVvP8UrRBmJfCeOctuS64-bFUw4L7pN4LkDcoddbo37J7dhOiPhJQvo6btCj7GZIJdYVuDKaQS-8/s1600/YV12-stride.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbkfGcBwXzv-GiCA-JPtwXlOGFYg8zLAuJ9h9mlFrY0CHJel4VclK37j3qKLH99qftxVvP8UrRBmJfCeOctuS64-bFUw4L7pN4LkDcoddbo37J7dhOiPhJQvo6btCj7GZIJdYVuDKaQS-8/s1600/YV12-stride.png" height="400" width="390" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">YV12 color - format memory layout (planar)</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrg9eJI8B8c4BQ4mP73YZWVxx2IDf9p-L06oHPuu0RwGarsXsrL9v43U0_2q_rk1-oGP6cbedE8bkiylQDpBexGx-3tQaUM-T4Vf1AV6SRQHLZ4Y3u5ylQg5cTNdwncPSzWzFkzEMM2dHs/s1600/NV12-stride.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrg9eJI8B8c4BQ4mP73YZWVxx2IDf9p-L06oHPuu0RwGarsXsrL9v43U0_2q_rk1-oGP6cbedE8bkiylQDpBexGx-3tQaUM-T4Vf1AV6SRQHLZ4Y3u5ylQg5cTNdwncPSzWzFkzEMM2dHs/s1600/NV12-stride.png" height="370" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8000001907349px;">NV12 color - format memory layout (packed)</span></td></tr>
</tbody></table>
There are two YUV formats: packed formats (also known as semi-planar) and planar formats. NV12 is an example of a packed format, and YV12 is an example of a planar format. In a packed format, the Y, U, and V components are stored in a single array. Pixels are organized into groups of macropixels, whose layout depends on the format. In a planar format, the Y, U, and V components are stored as three separate planes.<br />
In the YV12 diagram above the Y (luma) plane has size equal to width * height, and each of the chroma planes (U, V) has a size equal to width/2 * height/2. This means that both width and height must be even integers. <a href="http://www.fourcc.org/yuv.php#YV12">YV12</a> also <a href="http://developer.android.com/reference/android/graphics/ImageFormat.html#YV12">stipulates</a> hat the line stride must be a multiple of 16 pixels. Because both NV12 and YV12 are 4:2:0 subsampled, for each 2x2 group of pixels, there are 4*Y samples and 1*U and 1*V samples.<br />
<br />
<b>Tiling</b><b> </b><br />
If the SoC hardware uses algorithms which mostly access blocks of neighboring pixels, then it is probably more efficient to arrange the image's memory layout such that neighboring pixels are laid out in line, instead of their usual position.This is called tiling.<br />
Some graphics/imaging hardware use more elaborate tiling, such as supporting two tile sizes: a group of small tiles might be arranged in some scan order inside a larger tile.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvfF1XIVqToSQnkN2fEK9o0N_yP6y8AFxLnef_Sk5vr41F6N4tSZmmRh5OOwNdINDdyCC2hP7DQZTF4lvU6xUjdqPFVazqYXnCYLUs0rvmvrM2sAuJUskSW2tXObANRFNrIGJh6ffyjrFw/s1600/tiling.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvfF1XIVqToSQnkN2fEK9o0N_yP6y8AFxLnef_Sk5vr41F6N4tSZmmRh5OOwNdINDdyCC2hP7DQZTF4lvU6xUjdqPFVazqYXnCYLUs0rvmvrM2sAuJUskSW2tXObANRFNrIGJh6ffyjrFw/s1600/tiling.png" height="284" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Tiling: one the left is the image with the pixels in their natural order. The green frame defines the 4x4 tile size and the red arrow shows the scan order. On the right is the same image, but now with pixels arranged in the tile scan order.</td></tr>
</tbody></table>
<br />
<b>Compression</b><br />
If both producer and consumer are hardware components on the same SoC, then the may write and read a common, proprietary compressed data format and decompress the data on-the-fly (i.e. using on-chip memory, usually SRAM, just before processing the pixel data).<br />
<br />
<b>Memory Contiguity</b><br />
Some older imaging hardware modules (cameras, display, etc) don't have an MMU or don't support scatter-gather DMA. In this case the device DMA is programmed using physical addresses which point to contiguous memory. This does not affect the memory layout, but it is certainly the kind of platform-specific constraint that gralloc needs to be aware of when it allocates memory.<br />
<br />
<h2>
gralloc: Buffer Ownership Management</h2>
Memory is a shared resource. It is either shared between the graphics hardware module and the CPU; or between two graphics modules. If the CPU is rendering to a graphics buffer, we have to make sure that the display controller waits for the CPU to complete writing, before it begins reading the buffer memory. This is done using system-level synchronization which I'll discuss in a later blog entry. But this synchronization is not sufficient to ensure that the display controller will be accessing a coherent view of the memory. In the above example, the final updates to the buffer that the CPU writes may not have been flushed from the cache to the system memory. If this happens, the display might show an incorrect view of the graphics buffer. Therefore, we need some kind of low-level atomic synchronization mechanism to explicitly manage the transfer of memory buffer ownership which verifies that the memory "owner" sees a consistent view of the memory.<br />
<br />
Access to buffer memory (both read and write, for both hardware and software) is explicitly managed by gralloc users (this can be done synchronously or asynchronously). This is done by locking and unlocking a buffer memory patch. There can be many threads with a read-lock concurrently, but only one thread can hold a write lock.<br />
<br />
/*<br />
* The (*lock)() method is called before a buffer is accessed for the<br />
* specified usage. This call may block, for instance if the h/w needs<br />
* to finish rendering or if CPU caches need to be synchronized.<br />
*<br />
* The caller promises to modify only pixels in the area specified<br />
* by (l,t,w,h).<br />
*<br />
* The content of the buffer outside of the specified area is NOT modified<br />
* by this call.<br />
*<br />
* If usage specifies GRALLOC_USAGE_SW_*, vaddr is filled with the address<br />
* of the buffer in virtual memory.<br />
*<br />
* Note calling (*lock)() on HAL_PIXEL_FORMAT_YCbCr_*_888 buffers will fail<br />
* and return -EINVAL. These buffers must be locked with (*lock_ycbcr)()<br />
* instead.<br />
*<br />
* THREADING CONSIDERATIONS:<br />
*<br />
* It is legal for several different threads to lock a buffer from<br />
* read access, none of the threads are blocked.<br />
*<br />
* However, locking a buffer simultaneously for write or read/write is<br />
* undefined, but:<br />
* - shall not result in termination of the process<br />
* - shall not block the caller<br />
* It is acceptable to return an error or to leave the buffer's content<br />
* into an indeterminate state.<br />
*<br />
* If the buffer was created with a usage mask incompatible with the<br />
* requested usage flags here, -EINVAL is returned.<br />
*<br />
*/<br />
<br />
int (*lock)(struct gralloc_module_t const* module,<br />
buffer_handle_t handle, int usage,<br />
int l, int t, int w, int h,<br />
void** vaddr);<br />
/*<br />
* The (*lockAsync)() method is like the (*lock)() method except<br />
* that the buffer's sync fence object is passed into the lock<br />
* call instead of requiring the caller to wait for completion.<br />
*<br />
* The gralloc implementation takes ownership of the fenceFd and<br />
* is responsible for closing it when no longer needed.<br />
*/<br />
int (*lockAsync)(struct gralloc_module_t const* module,<br />
buffer_handle_t handle, int usage,<br />
int l, int t, int w, int h,<br />
void** vaddr, int fenceFd);<br />
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="white-space: normal;"><br /></span></span></div>
<div>
<br /></div>
<b>Cache Coherence</b><br />
If software needs to access a graphics buffer, then the correct data needs to be accessible to the CPU for reading and/or writing. Keeping the cache coherent is one of the responsibilities of gralloc. Needlessly flushing the cache, or enabling bus snooping on some SoCs, to keep the memory view consistent across graphics hardware and CPU wastes power and can add latency. Therefore, here too, gralloc needs to employ platform-specific mechanisms.<br />
<br />
<b>Locking Pages in RAM</b><br />
Another aspect of sharing memory between CPU and graphics hardware is making sure that memory pages are not flushed to the swap file when they are used by the hardware. I can't remember seeing Android on a device configured with a swap file, but it is certainly feasible, and lock() should literally lock the memory pages in RAM.<br />
A related issue is page remapping which happens when a virtual page that is assigned to one physical page, is dynamically reassigned a different physical page (page migration). One reason the kernel might choose to do this is to prevent fragmentation by rearranging the physical memory allocation. From the CPU's point of view this is fine as long as the new physical page contains the correct content. But from the point of a graphics hardware module, this is pulling the rug under its feet. Pages shared with hardware should be designated non-movable.<br />
<br />
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com4tag:blogger.com,1999:blog-182549309027052933.post-432616912081710132015-01-23T15:44:00.001+02:002015-05-14T12:24:51.968+03:00Revisiting the Active Object Pattern - with C++11 Closures<a href="http://www.codeproject.com" rel="tag" style="display:none">CodeProject</a>
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit;">I have a confession: with the never ending things going on (you know: life ;-) I missed the (fairly) recent changes in the C++ language. C++ was my first OO language and it probably remains my favorite. I can't help but love the mix of high level abstractions with metal grinding pointer arithmetic - it's like a cool sports car with manual transmission. Beauty and power. You have to be more alert, more involved. That's part of the fun - <i>you</i> are taking control. <span style="text-align: -webkit-auto;">But for some time now C++ felt old, tired, disconnected from the endless stream of new languages. </span><span style="text-align: -webkit-auto;">Until C++11 came along. </span></span></div>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<br /></div>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<a href="http://en.wikipedia.org/wiki/C%2B%2B11" style="font-family: inherit; text-align: -webkit-auto;" target="_blank">Wikipedia describes C++11</a><span style="font-family: inherit; text-align: -webkit-auto;"> as follows: </span></div>
<blockquote class="tr_bq" style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit; text-align: -webkit-auto;">C++11 (formerly known as C++0x) is the most recent version of the standard of the C++ programming language. It was approved by ISO on 12 August 2011. C++11 includes several additions to the core language and extends the C++ standard library, incorporating most of the C++ Technical Report 1 (TR1) libraries.</span></blockquote>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit;"><span style="text-align: -webkit-auto;"><a href="http://www.stroustrup.com/C++11FAQ.html#think" target="_blank">Bjarne Stroustrup</a> wrote that:</span></span></div>
<blockquote class="tr_bq" style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit;"><span style="text-align: -webkit-auto;">Surprisingly, C++11 feels like a new language: The pieces just fit together better than they used to and I find a higher-level style of programming more natural than before and as efficient as ever.</span></span></blockquote>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit;"><span style="text-align: -webkit-auto;">And it does feel like a new language. And this is exciting for geeks like me. In this blog post I discuss how I implemented Schmidt's <a href="http://www.cs.wustl.edu/~schmidt/PDF/Act-Obj.pdf%E2%80%8E" target="_blank">Active Object pattern</a> in a novel way using C++11 Closures.</span></span></div>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="text-align: -webkit-auto;"><span style="font-family: inherit;"><br /></span></span></div>
<div style="orphans: 2; text-align: -webkit-auto; widows: 2;">
<span style="font-family: inherit;">First, another confession: for a long time I've suffered from Node envy. Node.js envy, to be precise. Look at this "Hello World" Javascript code:</span><br />
<span style="text-align: -webkit-auto;"><span style="font-family: inherit;">
</span></span>
<br />
<div style="text-align: -webkit-auto;">
<span style="text-align: -webkit-auto;"><script src="https://gist.github.com/netaz/a98186a340e1d2c89225.js"></script></span><br />
<span style="text-align: -webkit-auto;"><br /></span></div>
<span style="text-align: -webkit-auto;"><span style="font-family: inherit;">
</span></span></div>
<div style="orphans: 2; widows: 2;">
<div>
<span style="font-family: inherit;">What I "envy" is not the use of asynchronous I/O operation with callbacks ("the callback pattern"), but the compelling beauty of </span><a href="http://en.wikipedia.org/wiki/Anonymous_function" style="font-family: inherit;" target="_blank">Lambda functions</a><span style="font-family: inherit;">. Lambda functions simplify asynchronous programming because they allow us to write code that is seemingly synchronous. The code that is executed by the lambda function is </span>temporally<span style="font-family: inherit;"> disjointed from the code that </span>precedes it, and yet both parts are spatially co-located. And the outcome is smaller, tighter code that feels more natural and is easier to read and maintain. And this can be done in C++11.</div>
<div>
<br /></div>
<div>
I won't discuss C++11 lambda functions because others have done this better than I can. <a href="http://www.cprogramming.com/c++11/c++11-lambda-closures.html" target="_blank">This</a> article is an example of some of the great coverage you can find on the net (Alex Allain has lots of interesting material to read). But I do want to touch on the difference between Lambda functions and Closures, since my implementation below uses Closures. Lambda functions are anonymous functions that don't need to be bound to a name and can be specified as lambda expressions. A <a href="http://en.wikipedia.org/wiki/Closure_(computer_programming)" target="_blank">Closure</a> is an example of a lambda function which "closes" over the environment in which it was specified (meaning that it can access the variables available in the referencing environment). Alex Allain's article (which I referenced above) doesn't make a big distinction between lambdas and closures and simply treats closures as lambdas with "variable capture". Syntactically in C++ lambdas and closures are almost identical, so the distinction is there and it is slight, yet I think it is important to note the semantic difference.</div>
<div>
<br />
On to Active Object.<br />
<br /></div>
<div>
Douglas Schmidt <a href="http://materias.fi.uba.ar/7562/2007/POSA2.pdf" target="_blank">describes</a> the Active Object design pattern in <i>Pattern Oriented Software Architecture (Volume 2: Patterns for Concurrent and Networked Objects)</i>:</div>
<div>
<blockquote class="tr_bq">
The Active Object design pattern decouples method execution from method invocation to enhance concurrency and simplify synchronized access to objects that reside in their own threads of control. </blockquote>
Once again, I don't want to paraphrase the work of others, so I assume that you are knowledgeable about the details of the Active Object pattern. If not, you should probably familiarize yourself with the pattern before reading on.<br />
To illustrate my ideas, I will only concentrate on one variation of the Active Object pattern. In this variation the Client and Proxy are "folded" into the same object and the Scheduler and ActivationList implement a simple message queue policy (this is reminiscent of Schmidt's <a href="http://www.cs.utexas.edu/users/lavender/papers/active-object.pdf" target="_blank">original AO paper</a>, which he later expanded on). I think this is probably the most prevalent variation of the pattern - in which we want to serialize access to an object, and use an in-order queue (FIFO) to "bounce" the method invocation from one thread to another.<br />
Let's look at the example code from the <a href="http://en.wikipedia.org/wiki/Active_object">Wikipedia entry on Active Object</a>. The Wikipedia code is implemented in Java and I went and implemented it using C++11. I placed the comments in the code to explain the logic.<br />
<script src="https://gist.github.com/netaz/7a9e2a58b2d0bf1bf825.js"></script><br />
The more "traditional" method of implementing ActiveObject in C++ involves defining two sets of interfaces: a public interface and a private interface. Every method in the public interface also appears in the private interface. The public interface is used by clients to invoke methods on the object, and they create a message indicating the request and its parameters and enqueue the message. The private interface is used by the dispatcher which dequeues messages and invokes the private method. This works well enough but creates big classes that have a lot of extraneous code that is there just to get all this mechanics to work. Every change to the interface requires a series of changes (public and private interface; message definition). <br />
<script src="https://gist.github.com/netaz/1007142ad9780bce76e8.js"></script><br />
A somewhat more sophisticated implementation uses functors. We no longer need the code which does the switching on the message type when we grab a message from the FIFO and dispatch it. But the sophistication of the code probably only adds a layer of obfuscation if you are not familiar with the underlying idiom. We gain too little from this to be worthwhile.<br />
<script src="https://gist.github.com/netaz/c6f521bd143e6b54630d.js"></script><br />
Now let's come full circle and return to the Closure implementation of ActiveObject and add a few of features to it.<br />
<script src="https://gist.github.com/netaz/04c2547820320d9dc1ca.js"></script><br />
<div class="MsoNormal" style="background: white; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt;">
</div>
</div>
</div>
</div>netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-32082192632828033722015-01-02T10:57:00.002+02:002015-04-23T21:14:13.808+03:00Measuring the Performance of Halide ConvolutionsIn the previous post I detailed five ways to express a 3x3 Gaussian smoothing filter in Halide. I was curious to understand if the choice of the algorithm expression will have any impact of its performance. After all, I interpret Halide's promise to "write the algorithm once and then search for the optimal schedule" (not a direct quote) as telling us that, all things being equal, the algorithm implementation is not very important as long as it is functionally correct. So off I went to do some experimenting and data collection.<br />
<br />
To perform the tests, I used rgb.png (from the Halide distribution) as the input image. This image is has dimensions 768x1280 and has a 24-bit RGB format, which means that each one of the three color channels (red, green, and blue) is represented by 8 bits.<br />
My workstation uses an Intel i7-3770 CPU which supports SSE, SSE2, SSE3, SSE4.1, SSE 4.2 and AVX. On a Linux machine you can learn about your cpu by invoking:<br />
$ cat /proc/cpuinfo<br />
Interestingly, the cpuid program did not list all of the vectorization features supported by the CPU so I double checked <a href="http://www.cpu-world.com/CPUs/Core_i7/Intel-Core%20i7-3770.html">here</a>.<br />
I selected a set of schedules for the separable Gaussian implementations (those expressed using two Halide functions) and a different set of schedules for the non-separable implementations. I ran each of the schedules 50 times in a loop and calculated the mean value after removing the two smallest and two largest time samples. It takes Halide a couple of rounds to "warm up", which I found a bit strange since I invoked the JIT compiler before starting each 50-run loop. Perhaps some code gets mapped to the instruction cache. I also calculated the standard deviation of each 50-run loop. Naturally, schedules using parallelization show more jitter.<br />
<br />
<div class="MsoNormal">
Before I show the results, I want to discuss some interesting results I observed.<br />
<br />
<h4>
Simple update steps impact performance</h4>
Update steps are separately scheduled, but I never expected that a simple update such as the one highlighted below can impact performance. I expected the default inline schedule will be used, but since the data is readily available in the cache, the update would be painless. I pasted below two filter implementations, the first with an extra update step and the second without the update step:</div>
<div class="MsoNormal">
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Halide::Func gaussian_3x3_1(Halide::Func input) {</div>
<div class="MsoNormal">
Halide::Func k, gaussian;</div>
<div class="MsoNormal">
Halide::Var x,y,c;</div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
gaussian(x,y,c) = input(x-1, y-1, c) * 1 + input(x, y-1, c) * 2 + input(x+1, y-1, c) * 1 +</div>
<div class="MsoNormal">
input(x-1, y, c) * 2 + input(x, y, c) * 4 + input(x+1, y, c) * 2 +</div>
<div class="MsoNormal">
input(x-1, y+1, c) * 1 + input(x, y+1, c) * 2 + input(x+1, y+1, c) * 1;</div>
<div class="MsoNormal">
<span style="background-color: yellow;">gaussian(x,y,c) /= 16;</span></div>
<div class="MsoNormal">
return gaussian;</div>
<div class="MsoNormal">
}</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<div class="MsoNormal">
Halide::Func gaussian_3x3_1(Halide::Func input) {</div>
<div class="MsoNormal">
Halide::Func k, gaussian;</div>
<div class="MsoNormal">
Halide::Var x,y,c;</div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
gaussian(x,y,c) = (input(x-1, y-1, c) * 1 + input(x, y-1, c) * 2 + input(x+1, y-1, c) * 1 +</div>
<div class="MsoNormal">
input(x-1, y, c) * 2 + input(x, y, c) * 4 + input(x+1, y, c) * 2 +</div>
<div class="MsoNormal">
input(x-1, y+1, c) * 1 + input(x, y+1, c) * 2 + input(x+1, y+1, c) * 1)<span style="background-color: yellow;"> /16</span>;</div>
<div class="MsoNormal">
return gaussian;</div>
<div class="MsoNormal">
}</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
How much does this affect the performance? This really depends on the algorithm and the schedule, in ways that I cannot explain. I ran the second Gaussian function (gaussian_3x3_2) with and without an update and sometimes I got better results, sometimes worse. This is shown in the second and third results rows of the table below (2-update and 2-no-update). In the best performing schedules of gaussian_3x3_2,the no-update implementation provided the best results.</div>
<div class="MsoNormal">
<br /></div>
<h4>
Casting operations also impact performance</h4>
<div>
In the previous post I discussed why we need to cast the 8-bit pixel channel inputs before calling the Gaussian filter. If we want to realize the results of the Gaussian filter to a 24-bit PNG image, then we also need to cast the results back to uint8_t. I found that if I perform the update as part of the Gaussian filter I get the best results. But if I insist on performing the casting on the results of the filter (i.e. after existing the filter function), then that cast is a legitimate part of the Halide schedule and should be optimized like all other parts. </div>
<div>
<div class="MsoNormal">
Halide::Var
x,y,xi,yi,c;<o:p></o:p></div>
<div class="MsoNormal">
Halide::Func
padded, padded32, test2;<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
padded(x,y,c) =
input(clamp(x, 0, input.width()-1), clamp(y, 0, input.height()-1), c);<o:p></o:p></div>
<div class="MsoNormal">
padded32(x,y,c) =
Halide::cast<int32_t>(padded(x,y,c));<o:p></o:p></int32_t></div>
<div class="MsoNormal">
Halide::Func test
= gaussian_3x3_5(padded32, Separable2dConvolutionSched(7));<o:p></o:p></div>
<div class="MsoNormal">
test.compute_root();<o:p></o:p></div>
<div class="MsoNormal">
test2(x,y,c) =
Halide::cast<uint8_t>(test(x,y,c));<o:p></o:p></uint8_t></div>
<div class="MsoNormal">
test2.vectorize(x,
4).parallel(y);<o:p></o:p><br />
<br /></div>
</div>
<h4>
The choice of schedule varies widely with the implementation of the algorithm</h4>
<div class="MsoNormal">
In the table below you can see seven schedules applied to six different algorithm implementation of Gaussian 3x3. For each pair of {algorithm, schedule} I provide the mean and standard deviation. The first three implementations (1, 2-update, 2-no-update) are single function implementations while the latter three implementations use two functions (separable convolutions kernels). That means that the seven schedules of the first three algorithms are different from the schedules of the latter three algorithms. None the less, it is clear that the choice of schedule varies widely with the implementation of the algorithm, as expected.</div>
<div class="MsoNormal">
<br />
<h4>
Separable kernels are faster</h4>
</div>
<div class="MsoNormal">
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<o:p></o:p></div>
Gaussian 3x3 is a separable filter which means that it can be expressed as the outer product of two vectors. The number of computations in the non-separated Gaussian is roughly (MxNx3x3) when applied to an input image of size MxN pixels. For the separated version it is (MxN)x(3+3). So less computations and we would expect it to run faster. And indeed the results show that the separated implementation is the fastest (of course, I could be wrong. It is possible that I have not found the optimal schedules). This is expected, but disheartening. It means that we should be thinking about optimizing our algorithm, not just the schedule.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<h4>
Inline reductions perform best</h4>
<span style="text-indent: -18pt;">Functions gaussian_3x3_4 and
gaussian_3x3_5 are the same except for how they do the accumulation. Function </span><span style="text-indent: -24px;">gaussian_3x3_4 uses the Halide:sum inline reduction while </span><span style="text-indent: -24px;">gaussian_3x3_5 accumulates over the domain using an update step. The results speak for themselves: using the Halide::sum inline reduction provides almost 14-fold better results compared to using an update step (look at the best results in the rows for functions 4 and 5 in the table below).</span><br />
<span style="text-indent: -24px;"><br /></span>
<br />
<div class="MsoNormal">
Halide::Func gaussian_3x3_4(Halide::Func input, const
Scheduler &s) {<o:p></o:p></div>
<div class="MsoNormal">
Halide::Func k,
gaussian_x, gaussian_y;<o:p></o:p></div>
<div class="MsoNormal">
Halide::Var
x,y,xi,yi,c;<o:p></o:p></div>
<div class="MsoNormal">
Halide::RDom
r(-1,3);<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
k(x) = 0;<o:p></o:p></div>
<div class="MsoNormal">
k(-1) = 1; k(0) = 2;
k(1) = 1;<o:p></o:p></div>
<div class="MsoNormal">
gaussian_x(x,y,c)
= sum(input(x+r.x, y, c) * k(r)) / 4;<o:p></o:p></div>
<div class="MsoNormal">
gaussian_y(x,y,c)
= sum(gaussian_x(x, y+r, c) * k(r)) / 4;<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
s.schedule(gaussian_x, gaussian_y, x, y);<o:p></o:p></div>
<div class="MsoNormal">
return gaussian_y;<o:p></o:p></div>
<div class="MsoNormal">
}<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Halide::Func gaussian_3x3_5(Halide::Func input, const
Scheduler &s) {<o:p></o:p></div>
<div class="MsoNormal">
Halide::Func k,
gaussian_x, gaussian_y;<o:p></o:p></div>
<div class="MsoNormal">
Halide::Var
x,y,xi,yi,c;<o:p></o:p></div>
<div class="MsoNormal">
Halide::RDom
r(-1,3);<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
k(x) = 0;<o:p></o:p></div>
<div class="MsoNormal">
k(-1) = 1; k(0) = 2;
k(1) = 1;<o:p></o:p></div>
<div class="MsoNormal">
gaussian_x(x,y,c)
+= input(x+r.x, y, c) * k(r) / 4;<o:p></o:p></div>
<div class="MsoNormal">
gaussian_y(x,y,c)
+= gaussian_x(x, y+r, c) * k(r) / 4;<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
s.schedule(gaussian_x, gaussian_y, x, y);<o:p></o:p></div>
<div class="MsoNormal">
return gaussian_y;<o:p></o:p></div>
<br />
<span style="text-indent: -24px;">}</span><br />
<span style="text-indent: -24px;"><br /></span>
<br />
<h4>
<span style="text-indent: -24px;">Larger vector sizes perform better</span></h4>
<span style="text-indent: -24px;">OK, this one was predictable, but it's nice to see the empirical results. And make sure your machine supports the vectorization you're trying out.</span><br />
<span style="text-indent: -24px;"><br /></span>
<br />
<h4>
A sample set consisting of 50 measurement is usually too small</h4>
The data I present in the table below consists of 50 samples per tests, but I noticed that sometimes there was variation in the results (the average measurement of the 50 samples) between two test runs (of 50 samples each) which can't be explained by the standard deviation. When I increased the sample set to 500 samples I got very stable results (I didn't try anything between 50-500, laziness).<br />
<br />
<h4>
Really bad schedules have a really high sample variance</h4>
I need to understand why this happens. I would expect the high variance to appear when thread parallelization is used in high granularity (frequent context switches), but it seems to work the opposite. I am missing something.<br />
<br />
<h4>
Choosing tile size is a delicate act</h4>
My workstation has 4x32KB 8-way set associative L1 data caches and 4x256KB 8-way set associative L2 caches. Plenty of memory, no doubt. The largest tile size with which I managed to achieve good performance has size 256x32x4=32KB. Remember that each 8-bit pixel channel value is expanded to 32 bits to prevent overflow, and that is why I multiplied the tile size by 4. This limits the vector sizes and also the tile sizes. Now, if we also parallelize the tile computation, then Halide allocates several instances of the temporary tile buffer and I suspect this is why I saw the optimized tile size peek at 256x32. <br />
Finally,pay attention to the relationship between your tile (or split) sizes and the vector dimensions. The dimension which you choose to vectorize should not be smaller than the vector size that you choose. Once you set a size for that dimension, you can try several values for the second dimension, while keeping the size of the entire tile within the cache bounds. <br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZJpdMQ3SIRQRhZIcFNW8h7Lr0mfr7wsUeETQHG-IzWQy3F27fWWP-HN4pzpJNNIXnWL-BRXTZIT0y9VwykEEFgBX33Es5eBMJ0DnVbMrapDM9v2FtEpTmPxphQBawE8acUUK4rJUz5neO/s1600/guassian_results.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZJpdMQ3SIRQRhZIcFNW8h7Lr0mfr7wsUeETQHG-IzWQy3F27fWWP-HN4pzpJNNIXnWL-BRXTZIT0y9VwykEEFgBX33Es5eBMJ0DnVbMrapDM9v2FtEpTmPxphQBawE8acUUK4rJUz5neO/s1600/guassian_results.png" height="96" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Results of running different Gaussian 3x3 implementations with different schedules</td></tr>
</tbody></table>
<div class="MsoListParagraph" style="mso-list: l0 level1 lfo1; text-indent: -18.0pt;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-QkpXOrkoE602OiS5Fwe8liZhO-lw3E0AVPehdAlTiKmIgaOxC3Vm4Fccc6vIMCj9D9VAyuOAVoL3uoW9dCS81eZz5J5nw7B2Cj-RSo4aZvk3IUm4gGu3LAjbduDjnQOVhiyUWA63N75h/s1600/guassian_schedules.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-QkpXOrkoE602OiS5Fwe8liZhO-lw3E0AVPehdAlTiKmIgaOxC3Vm4Fccc6vIMCj9D9VAyuOAVoL3uoW9dCS81eZz5J5nw7B2Cj-RSo4aZvk3IUm4gGu3LAjbduDjnQOVhiyUWA63N75h/s1600/guassian_schedules.png" height="367" width="640" /></a></div>
<br /></div>
</div>
</div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-89471203796735077112014-12-21T01:46:00.002+02:002015-04-23T21:14:02.086+03:00Several Ways to Express a Convolution in Halide<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">There are at least two ways to express a convolution operation in
Halide; more if the kernel is separable. I'll review these in this post.<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Let's examine a simple Gaussian 3x3 lowpass (smoothing) filter
(also known as a Gaussian Blur):<o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<br /></div>
<div>
<br /></div>
<div>
</div>
<div>
<div style="text-align: center;">
<img alt="\frac{1}{16}
\left[ {\begin{array}{ccc}
1 & 2 & 1 \\
2 & 4 & 2 \\
1 & 2 & 1 \\
\end{array} } \right]" src="http://mathforum.org/mathimages/imgUpload/math/8/f/4/8f4eb18c8c35a55c00629e7dde0480f9.png" /></div>
<div style="text-align: left;">
<div>
<br />
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">The straight-forward method sums up the pixel neighborhood, using
the weights in the convolution kernel.<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_1(Halide::Func input) {<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Func k, gaussian;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Var x,y,c;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> <o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian(x,y,c) = input(x-1, y-1, c) * 1 +
input(x, y-1, c) * 2 + input(x+1, y-1, c) * 1 +<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">
input(x-1, y, c) * 2
+ input(x, y, c) * 4 + input(x+1, y, c) * 2 +<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">
input(x-1, y+1, c) * 1
+ input(x, y+1, c) * 2 + input(x+1, y+1, c) * 1;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian(x,y,c) /= 16;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> return gaussian;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">}<o:p></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt;">
<br /></div>
<div style="margin: 0cm 0cm 0.0001pt;">
<span style="font-size: 13.5pt;">We have to watch out not to overflow the
summation of the pixel values. In the gaussian_3x3_1 function below, the
type of the summation (gaussian(x,y,c)) is deduced from the type of
input(x,y,c) and if this is an 8-bit type for example, then it will most likely
overflow and the output will be wrong without emitting any errors. One
way to handle this is to explicitly set the kernel weights to floats, but this
will most likely hurt performance because it will require a cast and arithmetic
operations on the float type.<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">gaussian(x,y,c) = input(x-1, y-1, c) * 1.f + input(x, y-1, c) *
2.f + input(x+1, y-1, c) * 1.f +<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">
input(x-1, y, c) * 2.f + input(x, y, c) * 4.f + input(x+1, y, c)
* 2.f +<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">
input(x-1, y+1, c) * 1.f + input(x, y+1, c) * 2.f + input(x+1, y+1, c) * 1.f;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">I prefer to cast the input type so that the caller of the function
has the control on when to cast and when not to cast. Here's an example
which loads a 24-bit RGB image (8 bit per color channel), clamps the image
values and converts from uint8_t to int32_t.<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Image<span class="apple-converted-space"><uint8_t> </uint8_t></span>input
= load<uint8_t>("images/rgb.png");<o:p></o:p></uint8_t></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func padded, padded32;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">padded(x,y,c) = input(clamp(x, 0, input.width()-1), clamp(y, 0,
input.height()-1), c);<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">padded32(x,y,c) = Halide::cast<int32_t>(padded(x,y,c));<o:p></o:p></int32_t></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_fn = gaussian_3x3_1(padded32);<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Another method to perform that convolution uses a 2-dimensional
reduction domain for the convolution kernel. We define a 3x3 RDom which
spans from -1 to +1 in both width and height. When the RDom in the code below
is evaluated, r.x takes the values {-1, 0, 1} and r.y similarly takes the
values {-1, 0, 1}. Therefore, x+r.x takes the values {x-1, x, x+1}. <o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_2(Halide::Func input) {<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Func k, gaussian;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::RDom r(-1,3,-1,3);<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Var x,y,c;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> <o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(x,y) = 0;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(-1,-1) = 1; k(0,-1) = 2;
k(1,-1) = 1;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(-1, 0) = 2; k(0, 0) = 4;
k(1, 0) = 2;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(-1, 1) = 1; k(0, 1) = 2;
k(1, 1) = 1;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian(x,y,c) = sum(input(x+r.x, y+r.y, c) *
k(r.x, r.y));<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian(x,y,c) /= 16;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> return gaussian;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">}<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Because a Gaussian kernel is separable (that is, it can be
expressed as the outer product of two vectors), we can express it in yet
another way:<o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<br /></div>
<br /></div>
</div>
</div>
<div style="text-align: center;">
<img alt="h[m,n]" src="http://www.songho.ca/dsp/convolution/files/conv2d_eq22.gif" /></div>
<div>
<br /></div>
<div>
<div>
<code></code></div>
<div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_3(Halide::Func input) {<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Func gaussian_x, gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Var x,y,c;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> <o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_x(x,y,c) = (input(x-1,y,c) + input(x,y,c) *
2 + input(x+1,y,c)) / 4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_y(x,y,c) = (gaussian_x(x,y-1,c) +
gaussian_x(x,y,c) * 2 + gaussian_x(x,y+1,c) ) / 4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> return gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">}<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Of course, we can also use a reduction domain here. In this
case we need a 1-dimensional RDom spanning {-1, 0, 1}:<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_4(Halide::Func input) {<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Func k, gaussian_x, gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Var x,y,c;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::RDom r(-1,3);<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(x) = 0;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(-1) = 1; k(0) = 2; k(1)
= 1;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_x(x,y,c) = sum(input(x+r.x, y, c) * k(r)) /
4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_y(x,y,c) = sum(gaussian_x(x, y+r, c) *
k(r)) / 4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> return gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">}<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">And we can also play a bit with the syntax, replacing the
Halide::Sum function with explicit summation over the reduction domain:<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">Halide::Func gaussian_3x3_5(Halide::Func input) {<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Func k, gaussian_x, gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::Var x,y,c;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> Halide::RDom r(-1,3);<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(x) = 0;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> k(-1) = 1; k(0) = 2; k(1)
= 1;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_x(x,y,c) += input(x+r.x, y, c) * k(r) / 4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> gaussian_y(x,y,c) += gaussian_x(x, y+r, c) * k(r) /
4;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;"> return gaussian_y;<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">}<o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">So does it matter how we specify the algorithm? The premise
of Halide says 'no': write the algorithm once, and then experiment with
different schedules until you get the best performance. Intuitively,
gaussian_3x3_2 is better than gaussian_3x3_1 because the Halide::RDom should
have been optimized by Halide's compiler. And gaussian_3x3_3 should
perform better than gaussian_3x3_2 because it provides another degree of
freedom when scheduling. However, this is intuition and what we care
about is actual performance measurements. <o:p></o:p></span></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<br /></div>
<div style="margin-bottom: .0001pt; margin: 0cm;">
<span style="font-size: 13.5pt;">I haven't measured this yet, so I owe you the results soon...
;-)<o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<br /></div>
</div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-88040760135063448212014-12-12T20:13:00.000+02:002015-04-23T21:07:20.552+03:00Halide ExcursionsI finally took the time to start a Halide-based github project, called <a href="https://github.com/netaz/halide-excursions">halide-excursions</a>.<br />
I'm new to computer imaging and vision, and the algorithms and applications of this domain are a new frontier for my curiosity. The <a href="https://github.com/netaz/halide-excursions">halide-excursions</a> project is an attempt to create a large, open source repository of Halide computer-vision, computational-photography, and image processing functions. Anyone and everyone is more than welcome to join.<br />
<br />
<a href="http://halide-lang.org/">Halide</a> is a new language for image processing and computer vision. It was developed by several PhD students at MIT and it is actively maintained. There's a <a href="https://lists.csail.mit.edu/mailman/listinfo/halide-dev">developer mailing list</a> with less than a handful of messages a day (so it is easy to lazily eavesdrop on the conversation) where you can communicate directly with the Halide developers. The response time is very short and the guys on the other end are very nice and eager to help.<br />
<br />
Halide's succinct, functional syntax is very appealing and perhaps this is what drew me in. If, like me, you are starting with little knowledge in the domain, and the word 'kernel' makes you think about the Linux kernel; and 'convolution' is a long forgotten concept from the university, then it might take a little more energy to get to smooth sailing. But that's part of the fun. Moving from Wikipedia to implementation is always a nice feeling and I find Halide a great platform to do this. Here's the gradient magnitude of the Sobel operator (source code in <a href="https://github.com/netaz/halide-excursions">halide-excursions</a>):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPZQiNtyYybapj0cSYVQSs8Ryoxw1G8XnqZodxNo97Un1Tdncru14Z4G6Oum9T4GjsquridP-aP9OzVwMY-BDxtcGOu3Df9epKM5JEZq69K5bmtP0U_Nzkr1vv7cDw1l1KBtgJjv2KofCR/s1600/sobel_mag.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPZQiNtyYybapj0cSYVQSs8Ryoxw1G8XnqZodxNo97Un1Tdncru14Z4G6Oum9T4GjsquridP-aP9OzVwMY-BDxtcGOu3Df9epKM5JEZq69K5bmtP0U_Nzkr1vv7cDw1l1KBtgJjv2KofCR/s1600/sobel_mag.png" height="300" width="400" /></a></div>
<br />
<br />
The <a href="https://github.com/halide/Halide">source code</a> comes with a bunch of tutorials, sample "applications" and a large set of unit tests that can serve as a jumping board to start learning and messing around. There's no user guide or official language specification, but there is <a href="http://halide-lang.org/docs/index.html">doxygen-generated documentation</a>. All in all, I think there's plenty of resources to get you started. <br />
<br />
Halide is supported on several OSs and cores (incl. GPUs) and promises the same performance as hand-optimized native code, with less code lines,less mess and with portability. Hand optimized code - using vectorization and parallelization intrinsics and a sort of other tricks - is hard to read, hard to maintain, hard/impossible to port and makes exploring the scheduling optimization space very slow. Halide's ability to separate the algorithm from the scheduling policy is very appealing and works well, most of the time. For example, when implementing the <a href="http://www.cs.ubc.ca/~lowe/425/slides/13-ViolaJones.pdf">Viola-Jones face detection</a> algorithm, I found that implementing the classifier cascade phase in Halide cannot be done optimally because of Halide's poor support for control code. In a future post I'll provide more examples showing where Halide shines, and where a hybrid native-Halide solution works better. <br />
Until then, I hope you join the project.<br />
<br />
<br />
<br />
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-27502445232066608412014-09-05T18:04:00.003+03:002015-05-14T12:28:28.304+03:00Google's Depth Map (Part II)<a href="http://www.codeproject.com" rel="tag" style="display:none">CodeProject</a>
<div class="MsoNormal">
In the <a href="http://netaz.blogspot.co.il/2014/06/googles-depth-map.html">previous post</a> I described Google's Lens Blur
feature and how we can use code to extract the depth map stored in the PNG image output by Lens Blur. Lens Blur performs a series of image frame captures, calculates the depth of each pixel (based on a user-configurable focus locus) and produces a new image having a <a href="http://en.wikipedia.org/wiki/Bokeh">bokeh effect</a>. In other words, the scene background, is blurred. Lens Blur stores the new (bokeh) image, the original image, and the depth-map in a single PNG file. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In this blog post we'll pick up where we left off last time,
right after we extracted the original image and the depth map from the Lens Blur's PNG output file and stored them each in a separate PNG file. This time
we'll go in the reverse direction: that is, starting with the original image and
the depth map we'll create a depth blurred image - an image with the bokeh effect. I'm going to show that
with very little sophistication we can achieve some pretty good results in
replicating the output of Lens Blur. This is all going to be kinda crude, but I think
the RoI (return-on-investment) is quite satisfactory.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOa_GUdix_WLx5aj3krIxR9vBJKYitPYJv2hyUtmsMkc-b4gEcs74UCZAvXX1IVxgasnU8Pf4U4UPdL614C6ToLycDVonx0g2T8sIEBlGm0A48nYPtif0fM7FkgInhoNy_f1P4TrZqPPMB/s1600/gimage_image.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOa_GUdix_WLx5aj3krIxR9vBJKYitPYJv2hyUtmsMkc-b4gEcs74UCZAvXX1IVxgasnU8Pf4U4UPdL614C6ToLycDVonx0g2T8sIEBlGm0A48nYPtif0fM7FkgInhoNy_f1P4TrZqPPMB/s1600/gimage_image.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The original image as extracted to a PNG file</td></tr>
</tbody></table>
<div class="MsoNormal">
My simpleton's approach for depth blurring is this: I start by finding the mean value of the depth map. As you probably recall, the grey values in the depth map correspond to the depth values calculated by Lens Blur. The mean depth value will serve as a threshold value - every pixel above the threshold will be blurred while all other pixels will be left untouched. This sounds rather crude, not to say dirty, but it works surprisingly well. At the outset I thought I would need to use several threshold values, each with a differently sized blurring kernel. Larger blurring kernels use more neighboring pixels and therefore increase the blurring effect. But alas, a simple Boolean threshold filter works good enough.</div>
<div class="MsoNormal">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEim9Ie22XKLCFUDI9s4SGv9QMl6Q0QX-l4sc4FcuP68l1eB5BMgt5U2gR6AR0CopGX6olxDEpM54umJTaYJTS2JHdG-6RDU58UCkTM0zFfd7bHlmjy-i0f2iLlHGAtAWDBlt-bM31D2YRYl/s1600/gimage_depth.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEim9Ie22XKLCFUDI9s4SGv9QMl6Q0QX-l4sc4FcuP68l1eB5BMgt5U2gR6AR0CopGX6olxDEpM54umJTaYJTS2JHdG-6RDU58UCkTM0zFfd7bHlmjy-i0f2iLlHGAtAWDBlt-bM31D2YRYl/s1600/gimage_depth.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The depth map as calculated by Lens Blur after extraction and storage as a PNG file.<br />
The grey value of every pixel in this image corresponds to the depth of the pixel in<br />
the original image. Darker pixels are closer to the camera and have a smaller value.</td></tr>
</tbody></table>
<div class="MsoNormal">
The diagram below shows the Boolean threshold filter in
action: we traverse all pixels in the original image an every pixel above the
threshold is zeroed (black).</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip79SZOZ0zBSW5ABRoF4Fb1-DLQat265kUEfKQas3LVA-kEFvE_wAAOwdec2zhrr_Hc4tTOTv1ZxM9IMnU101BTIkbZJ-GshVonGDbanYsBDFY3G3oiYPAoghJfI2OSTLHWMvtZEMMI43b/s1600/gimage_image.threshold.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip79SZOZ0zBSW5ABRoF4Fb1-DLQat265kUEfKQas3LVA-kEFvE_wAAOwdec2zhrr_Hc4tTOTv1ZxM9IMnU101BTIkbZJ-GshVonGDbanYsBDFY3G3oiYPAoghJfI2OSTLHWMvtZEMMI43b/s1600/gimage_image.threshold.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The result of thresholding the original image using the mean value of the depth-map.</td></tr>
</tbody></table>
<div class="MsoNormal">
You can see that the results are not too shabby. Cool, right?</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I think it is interesting to ask if it is legitimate to expect this thresholding technique to work for every Lens Blur depth-map? And what's the optimal threshold, I mean why not threshold using the median? or mean-C? or some other calculated value?</div>
<div class="MsoNormal">
Let's get back to the image above: along the edges of the t-shirt and arm there is (as expected) a sharp gradient in the depth value. Of course, this is due to Google's depth extraction algorithm which performs a non-gradual separation between foreground and background. If we looked at the depth-map as a 3D terrain map, we should see a large and fast rising "mountain" where the foreground objects are (arm, cherry box). We expect this "raised" group of pixels to be a closed and connected convex pixel set. That is, we don't expect multiple "mountains" or sprinkles of "tall" pixels. Another way to look at the depth-map is through the histogram. Unlike <a href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/histgram.htm">intensity histograms</a> , which tell the story of the image illumination, the data in our histogram conveys depth information.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhynsHfmaJodb4ttT6WXgqRYFifR31m7_C16T17TGu75i-D7gE0qZj50MsyjgeuEC0dJIEP32q5slzCdlFtG2TRtN1mgdmLLsdwoJn9LiZ3N4ZtEuAu5N7hJzMFTsQPc_u7SDo9OhN0cMZH/s1600/depth_histogram.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhynsHfmaJodb4ttT6WXgqRYFifR31m7_C16T17TGu75i-D7gE0qZj50MsyjgeuEC0dJIEP32q5slzCdlFtG2TRtN1mgdmLLsdwoJn9LiZ3N4ZtEuAu5N7hJzMFTsQPc_u7SDo9OhN0cMZH/s1600/depth_histogram.png" height="400" width="360" /></a></div>
<div class="MsoNormal">
The x-axis of the histogram depicted here (produced by <a href="http://www.irfanview.com/">IrfanView</a>) is a value between 0-255 and corresponds to the height value assigned by Lens Blur (after normalizing to the 8-bit sample size space). The y-axis indicates the number of pixels in the image which have the corresponding x values. The red line is the threshold; here I located it at x=172 which is the mean value in the depth-map. All pixels above the threshold (i.e. to the right of the red line) are background pixels and all pixels below the threshold are foreground. This histogram looks like a classic <a href="http://www.brighthubpm.com/software-reviews-tips/62274-explaining-bimodal-histograms/">bimodal histogram</a> with two modes of distribution; one corresponding to the foreground pixels and the other corresponding to the background pixels. Under the assumptions I laid above on how the Lens Blur depth algorithm works, this bimodal histogram is what we should expect.<br />
It is now clear how the thresholding technique separates these two groups of pixels and how the choice of threshold value affects the results. Obviously the threshold value needs to be somewhere between the foreground and the background modes. Exactly where? Now that's a bit tougher. In his <a href="http://www.amazon.com/The-Essential-Guide-Image-Processing/dp/0123744571">book on image processing</a>, Alan Bovik suggests applying probability theory to determine the optimal threshold (see pages 73-74). Under our assumption of only two modes (background and foreground), and if we assume binomial probability density functions for the two modes, then Bovik's method is likely to work. But I think the gain is small for the purpose of this proof-of-concept. If you download the source code you can play around with different values for the threshold.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The next step is not much more complicated. Instead of
zeroing (blackening) pixels above the threshold, we use some kind of blurring (smoothing) algorithm, as in image denoising. The general idea is to use each input pixel's neighboring pixels in order to calculate the value of the corresponding output pixel. That is, we use <a href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/convolve.htm">convolution </a>to apply a <a href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/filtops.htm">low-pass filter</a> on the pixels which pass the threshold test condition.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOjIh46ftJW4ZvjuefhwPHEYAb9R9pu_tgBEw0NlN5EqcO99rnjonsUPCuvA69stXtdCCV64ukkiFqP4fiRjRL-yks-7rqKX2BWQRcYIOC4ueHWSCMan19fHCQ4upa4fv9WsT_lcmoVuQD/s1600/convolution.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOjIh46ftJW4ZvjuefhwPHEYAb9R9pu_tgBEw0NlN5EqcO99rnjonsUPCuvA69stXtdCCV64ukkiFqP4fiRjRL-yks-7rqKX2BWQRcYIOC4ueHWSCMan19fHCQ4upa4fv9WsT_lcmoVuQD/s1600/convolution.png" height="632" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Application of a filter on the input image to generate the output image.</td></tr>
</tbody></table>
<br />
As you can see in the code, I've used several different smooothing kernels
(mean, median, gausian) with several different sizes. There are a few details that are worth mentioning.<br />
The convolution filter uses pixels from the surrounding environment of the input pixel, and sometimes we don't have such pixels available. Let's take for example the first pixel, at the upper left corner (x,y) = (0,0) and a box kernel of size 5x5. Clearly we are missing the 2 upper rows (y=-1, y=-2) and 2 left columns (x=-1, x=-2). There are several strategies to deal with this such duplicating rows and columns or using a fixed value to replace the missing data. Another strategy is to create an output image that is smaller than the input image. For example, using the 5x5 kernel, we would ignore the first and last two rows and first and last two columns and produce an output image that is 4 columns and 4 rows smaller than the input frame. You can also change the filter kernel such that instead of using the center of the kernel as the pixel we convolve around, we can use one of the corners. This doesn't bring us out of the woods, but it lets us "preserve" two of the sides of the image. And you can do what I did in the code, which is literally "cutting some corners": for all pixels in rows and columns that fall outside of a full kernel, I simply leave them untouched. You can really see this when using larger kernels - the frame is sharp and the internal part of the image is blurry. Ouch - I wouldn't use that in "real" life ;-)<br />
Next, there's the issue of the kernel size. As mentioned above,
larger kernel sizes achieve more blurring, but the transition
between blurred pixels and non-blurred pixels (along the depth threshold
contour lines) is more noticeable. One possible solution is to use a reasonably sized kernel and perform the smoothing pass more than once. If your filter is non-linear (e.g. Gaussian) then the result might be a bit hairy.<br />
In the output of Google's Lens Blur you
can also easily detect artifacts along the depth threshold contour lines, but because they change the "strength" of the blurring as a function of the pixel depth (instead of a binary threshold condition as in my implementation) they can achieve a smoother transition at the edges of the foreground object.<br />
<br /></div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTd7fql_7zxt64BUXLxFNf9AGc4MUGDoabz-h_C-vgcdsrES7j2q-g-4jb78G846TO52P6sqxH5n6opGWMUJrdpyqy-imR2hr7exhsL5h_dQ7ovpoXJ6roKiOGZ5rURPOJ19NM5CnQP-TX/s1600/gimage_image.blur.gauss.9x9.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTd7fql_7zxt64BUXLxFNf9AGc4MUGDoabz-h_C-vgcdsrES7j2q-g-4jb78G846TO52P6sqxH5n6opGWMUJrdpyqy-imR2hr7exhsL5h_dQ7ovpoXJ6roKiOGZ5rURPOJ19NM5CnQP-TX/s1600/gimage_image.blur.gauss.9x9.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Gaussian smoothing with kernel size 9x9</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_VWlf690W0eWvdvPcl079tdkO-NTx52wxJx19YN00sH77JWMEJ7ETt291kDKvhHAk9vhhTFssoGHagvVEZouBJq2gVIHKsZijS7Vo_6ZcQo-ghGCxgFOiiDiiomx8vmYrpygekjPu0Ydt/s1600/gimage_image.blur.mean.2x2.2-passes.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_VWlf690W0eWvdvPcl079tdkO-NTx52wxJx19YN00sH77JWMEJ7ETt291kDKvhHAk9vhhTFssoGHagvVEZouBJq2gVIHKsZijS7Vo_6ZcQo-ghGCxgFOiiDiiomx8vmYrpygekjPu0Ydt/s1600/gimage_image.blur.mean.2x2.2-passes.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Mean.filter with 7x7 kernel size., 2 passes</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwV7nDPRTdJfSS3W7zZ1C3TXPY-uHZK5KdUjv84APP0LbfsSmwYO__obRRLUkKEynQtkjI8q5fzc-VUlsz2TB8dalrR_HKa9YYrWn41E2v4cODwHSjyE__2NxbXoh0b-hCV7FPgZRkPja7/s1600/gimage_image.blur.mean.11x11.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwV7nDPRTdJfSS3W7zZ1C3TXPY-uHZK5KdUjv84APP0LbfsSmwYO__obRRLUkKEynQtkjI8q5fzc-VUlsz2TB8dalrR_HKa9YYrWn41E2v4cODwHSjyE__2NxbXoh0b-hCV7FPgZRkPja7/s1600/gimage_image.blur.mean.11x11.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Mean.filter with 11x11 kernel size., 1 pass</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZjcx8hP9v_yI8JMscAzBFdm1fojS65ZUrJPl2hSVhgrxiP1RusC9hINpEMPDbXdyo1FBO3BAdetMbTR28cDFZfExLYxZbwR83fwRZ5tHOHDCxyaPniChoRN-5_QRHpy0L7CSFHDJ57Ucr/s1600/gimage_image.blur.mean.15x15.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZjcx8hP9v_yI8JMscAzBFdm1fojS65ZUrJPl2hSVhgrxiP1RusC9hINpEMPDbXdyo1FBO3BAdetMbTR28cDFZfExLYxZbwR83fwRZ5tHOHDCxyaPniChoRN-5_QRHpy0L7CSFHDJ57Ucr/s1600/gimage_image.blur.mean.15x15.png" height="480" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Mean.filter with 15x15 kernel size., 1 pass</td></tr>
</tbody></table>
<br /></div>
Overall, this was quite a simple little experiment, although the quality of the output image is not as
good as Lens Blur's. I guess you do get what you pay for ;-)<br />
<br />
<br />
<div style="background: rgb(255, 255, 255); border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;">
<pre><span style="color: navy;"><span style="line-height: 16.25px;">/*
* The implementation is naive and non-optimized to make this code easier to read
* To use this code you need to download LodePNG(lodepng.h and lodepng.cpp) from
* http://lodev.org/lodepng/.
* Thanks goes to Lode Vandevenne for a great PNG utility!
*
*/
#include "stdafx.h"
#include "lodepng.h"
#include <string>
#include <vector>
#include <algorithm>
#include <assert .h="">
#include <stdint .h="">
struct image_t {
image_t() : width(0), height(0), channels(0), buf(0) {}
image_t(size_t w, size_t h, size_t c, uint8_t *buf ) :
width(w), height(h), channels(c), buf(buf) {}
size_t width;
size_t height;
size_t channels;
uint8_t *buf;
};
struct box_t {
box_t(size_t w, size_t h) : w(w), h(h) {}
size_t w;
size_t h;
};
struct image_stats {
image_stats() : mean(0) {
memset(histogram, 0, sizeof(histogram));
}
size_t histogram[256];
size_t mean;
};
void calc_stats(image_t img, image_stats &stats) {
uint64_t sum = 0; // 64 bit sum to prevent overflow
// assume the image is grayscale and calc the stats for only the first channel
for (size_t row=0; row<img .height="" 0="" 2="" 8-bit="" :="" amp="" assert="" blurconfig="" box_t="" boxes="" class="" col="" color="" const="" constant="" convolve="" dimensions="" execute="" filter="" for="" i="" if="" image_t="" img.channels="" img.height="" img.width="" input.buf="" input.channels="" input.width="" input="" lazy="" m="" meanblur="" medianblur="" not="" num_passes="" odd="" output="" overflow="" pixel="" pixels="" protected:="" public:="" public="" restricting="" return="" row="" size.h="" size.w="" size="" size_t="" so="" stats.histogram="" stats.mean="sum" std::vector="" struct="" sum="" threshold="" to="" total="" virtual="" will="" x="" y="" /> v;
for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
v.push_back( </stdint></assert></algorithm></vector></string></span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> input.buf[y*input.width*input.channels + x*input.channels + color]);
}
}
std::nth_element( v.begin(), v.begin()+(v.size()/2),v.end() );
return v[v.size()/2];
}
};
class Gaussian_9x9 : public Filter {
public:
Gaussian_9x9(const image_t &input, const image_t &output) :
Filter(input, output, box_t(9,9)) {}
private:
size_t convolve(size_t row, size_t col, size_t color) const {
static const
uint8_t kernel[9][9] = {{0, 0, 1, 1, 1, 1, 1, 0, 0},
{0, 1, 2, 3, 3, 3, 2, 1, 0},
{1, 2, 3, 6, 7, 6, 3, 2, 1},
{1, 3, 6, 9, 11, 9, 6, 3, 1},
{1, 3, 7, 11, 12, 11, 7, 3, 1},
{1, 3, 6, 9, 11, 9, 6, 3, 1},
{1, 2, 3, 6, 7, 6, 3, 2, 1},
{0, 1, 2, 3, 3, 3, 2, 1, 0},
{0, 0, 1, 1, 1, 1, 1, 0, 0}};
static const size_t kernel_sum = 256;
size_t total = 0;
for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
total += input.buf[y*input.width*input.channels + x*input.channels + </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> color] *
kernel[y-row+size.h/2][x-col+size.w/2];
}
}
return total/kernel_sum;
}
};
class Gaussian_5x5 : public Filter {
public:
Gaussian_5x5(const image_t &input, const image_t &output) : </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> Filter(input, output, box_t(5,5)) {}
protected:
size_t convolve(size_t row, size_t col, size_t color) const {
static const
uint8_t kernel[5][5] = {{ 1, 4, 7, 4, 1},
{ 4, 16, 26, 16, 4},
{ 7, 26, 41, 26, 7},
{ 4, 16, 26, 16, 4},
{ 1, 4, 7, 4, 1}};
static const size_t kernel_sum = 273;
size_t total = 0;
for (size_t y=row-size.h/2; y<=row+size.h/2; y++) {
for (size_t x=col-size.w/2; x<=col+size.w/2; x++) {
// convolve
total += input.buf[y*input.width*input.channels + x*input.channels + </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> color] *
kernel[y-row+size.h/2][x-col+size.w/2];
}
}
return total/kernel_sum;
}
};
void blur_image(const image_t &input_img, const image_t &output_img,
const image_t &depth_img, const BlurConfig &cfg) {
size_t width = input_img.width;
size_t height = input_img.height;
size_t channels = input_img.channels;
for (size_t pass=cfg.num_passes; pass>0; pass--) {
for (size_t row=0; row<height channels="" col="" color="" crude="" depth_img.buf="" for="" if="" row="" size_t="" thresholding="" width=""> cfg.threshold) {
size_t new_pixel = cfg.filter.execute(row, col, color);
output_img.buf[row*width*channels+col*channels+color] = new_pixel;
} else {
output_img.buf[row*width*channels + col*channels + color] =
input_img.buf[row*width*channels + col*channels + color];
}
}
}
}</height></span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> </span></span><span style="color: navy; line-height: 16.25px;">// going for another pass: the input for the next pass will be the output </span></pre>
<pre><span style="color: navy; line-height: 16.25px;"> // of this pass</span><span style="color: navy;"><span style="line-height: 16.25px;">
</span></span><span style="color: navy;"><span style="line-height: 16.25px;"> if ( pass > 1 )
memcpy(input_img.buf, output_img.buf, height*width*channels);
}
}
void do_blur() {
const std::string wdir("");
const std::string inimage(wdir + "gimage_image.png");
const std::string outimage(wdir + "gimage_image.blur.png");
const std::string depthfile(wdir + "gimage_depth.png");
image_t depth_img;
depth_img.channels = 3;
unsigned error = lodepng_decode24_file(&depth_img.buf, &depth_img.width,
&depth_img.height, depthfile.c_str());
if(error) {
printf("[%s] decoder error %u: %s\n", depthfile.c_str(), error, </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> lodepng_error_text(error));
return;
}
image_t input_img;
input_img.channels = 3;
error = lodepng_decode24_file(&input_img.buf, &input_img.width,
&input_img.height, inimage.c_str());
if(error) {
printf("[%s] decoder error %u: %s\n", depthfile.c_str(), error, </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> lodepng_error_text(error));
return;
}
image_t output_img(input_img.width,
input_img.height,
input_img.channels,
(uint8_t *) </span></span></pre>
<pre><span style="color: navy; line-height: 16.25px;"> malloc(input_img.width*input_img.height*input_img.channels));</span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;">
image_stats depth_stats;
calc_stats(depth_img, depth_stats);
</span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> // Choose one of these filters or add your own
// Set the filter connfiguration: filter algo and size, number of passes, threshold
BlurConfig cfg(MeanBlur(input_img, output_img, box_t(7,7)), 3, depth_stats.mean);</span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> /*
BlurConfig cfg(MeanBlur(input_img, output_img, box_t(11,11)), 2, depth_stats.mean);
BlurConfig cfg(MedianBlur(input_img, output_img, box_t(7,7)), 1, depth_stats.mean);
BlurConfig cfg(Constant(input_img, output_img), 1, depth_stats.mean);
BlurConfig cfg(Gaussian_9x9(input_img, output_img), 1, depth_stats.mean);
BlurConfig cfg(Gaussian_5x5(input_img, output_img), 5, depth_stats.mean);</span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> */
blur_image(input_img, output_img, depth_img, cfg);
error = lodepng_encode24_file(outimage.c_str(), output_img.buf,
output_img.width, output_img.height);
if (error)
printf("[%s] encoder error %u: %s\n", outimage.c_str(), error, </span></span></pre>
<pre><span style="color: navy;"><span style="line-height: 16.25px;"> lodepng_error_text(error));
free(depth_img.buf);
free(input_img.buf);
free(output_img.buf);
}
int main(int argc, char* argv[])
{
do_blur();
return 0;
}</span></span><span style="line-height: 16.25px;">
</span></pre>
</div>
<br />netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-1373093498124847002014-06-07T01:48:00.000+03:002015-05-14T12:26:28.285+03:00Google's Depth Map<a href="http://www.codeproject.com" rel="tag" style="display:none">CodeProject</a>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: justify;">
In my <a href="http://netaz.blogspot.com/2014/06/androids-hidden-and-future-camera-apis.html" target="_blank">previous post</a> I reported on Android's (presumed) new camera Java API and I briefly mentioned that its purpose is to provide the application developer more control over the camera, therefore allowing innovation in the camera application space. Google's recent updates to the stock Android camera application includes a feature called <a href="http://googleresearch.blogspot.co.il/2014/04/lens-blur-in-new-google-camera-app.html" target="_blank">Lens Blur</a>, which I suspect uses the new camera API to capture the series of frames required for the depth-map calculation (I am pretty sure that Lens Blur is only available on Nexus phones, BTW). In this post I want to examine the image files generated by Lens Blur.
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Google uses <a href="http://en.wikipedia.org/wiki/Extensible_Metadata_Platform" target="_blank">XMP</a> extended JPEG for storing Lens Blur picture files. The beauty of XMP is that arbitrary metadata can be added to a file without causing any problems for existing image viewing applications. Google's XMP's based depth-map storage format is described by Google on their <a href="https://developers.google.com/depthmap-metadata/" target="_blank">developer pages</a> but not all metadata fields are actually used by Lens Blur; and not all metadata used by Lens Blur are described on the developer pages. To look closer at this depth XMP format, you can copy a Len Blur image (JPEG) from your Android phone to your PC and open the file using a text editor. You should see the XMP metadata similar to the pasted data below:
</div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: navy;"><x:xmpmeta</span> <span style="color: teal;">xmlns:x=</span><span style="color: #bb8844;">"adobe:ns:meta/"</span> <span style="color: teal;">x:xmptk=</span><span style="color: #bb8844;">"Adobe XMP Core 5.1.0-jc003"</span><span style="color: navy;">></span>
<span style="color: navy;"><rdf:RDF</span> <span style="color: teal;">xmlns:rdf=</span><span style="color: #bb8844;">"http://www.w3.org/1999/02/22-rdf-syntax-ns#"</span><span style="color: navy;">></span>
<span style="color: navy;"><rdf:Description</span> <span style="color: teal;">rdf:about=</span><span style="color: #bb8844;">""</span>
<span style="color: teal;">xmlns:GFocus=</span><span style="color: #bb8844;">"http://ns.google.com/photos/1.0/focus/"</span>
<span style="color: teal;">xmlns:GImage=</span><span style="color: #bb8844;">"http://ns.google.com/photos/1.0/image/"</span>
<span style="color: teal;">xmlns:GDepth=</span><span style="color: #bb8844;">"http://ns.google.com/photos/1.0/depthmap/"</span>
<span style="color: teal;">xmlns:xmpNote=</span><span style="color: #bb8844;">"http://ns.adobe.com/xmp/note/"</span>
<span style="color: teal;">GFocus:BlurAtInfinity=</span><span style="color: #bb8844;">"0.0083850715"</span>
<span style="color: teal;">GFocus:FocalDistance=</span><span style="color: #bb8844;">"18.49026"</span>
<span style="color: teal;">GFocus:FocalPointX=</span><span style="color: #bb8844;">"0.5078125"</span>
<span style="color: teal;">GFocus:FocalPointY=</span><span style="color: #bb8844;">"0.30208334"</span>
<span style="color: teal;">GImage:Mime=</span><span style="color: #bb8844;">"image/jpeg"</span>
<span style="color: teal;">GDepth:Format=</span><span style="color: #bb8844;">"RangeInverse"</span>
<span style="color: teal;">GDepth:Near=</span><span style="color: #bb8844;">"11.851094245910645"</span>
<span style="color: teal;">GDepth:Far=</span><span style="color: #bb8844;">"51.39698028564453"</span>
<span style="color: teal;">GDepth:Mime=</span><span style="color: #bb8844;">"image/png"</span>
<span style="color: teal;">xmpNote:HasExtendedXMP=</span><span style="color: #bb8844;">"7CAF4BA13EEBAC578997926C2A696679"</span><span style="color: navy;">/></span>
<span style="color: navy;"></rdf:RDF></span>
<span style="color: navy;"></x:xmpmeta></span>
</pre>
</div>
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: navy;"><x:xmpmeta</span> <span style="color: teal;">xmlns:x=</span><span style="color: #bb8844;">"adobe:ns:meta/"</span> <span style="color: teal;">x:xmptk=</span><span style="color: #bb8844;">"Adobe XMP Core 5.1.0-jc003"</span><span style="color: navy;">></span>
<span style="color: navy;"><rdf:RDF</span> <span style="color: teal;">xmlns:rdf=</span><span style="color: #bb8844;">"http://www.w3.org/1999/02/22-rdf-syntax-ns#"</span><span style="color: navy;">></span>
<span style="color: navy;"><rdf:Description</span> <span style="color: teal;">rdf:about=</span><span style="color: #bb8844;">""</span>
<span style="color: teal;">xmlns:GImage=</span><span style="color: #bb8844;">"http://ns.google.com/photos/1.0/image/"</span>
<span style="color: teal;">xmlns:GDepth=</span><span style="color: #bb8844;">"http://ns.google.com/photos/1.0/depthmap/"</span>
<span style="color: teal;">GImage:Data=</span><span style="color: #bb8844;">"/9j/4AAQSkZJRQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBw...."</span>
<span style="color: teal;">GDepth:Data=</span><span style="color: #bb8844;">"iVBORw0KGgoAAAANSUhEUgAABAAAAAMACAYAAAC6uh......"</span>
<span style="background-color: #e3d2d2; color: #a61717;"></rdf:RDF</span><span style="color: navy;">></span>
<span style="color: navy;"></x:xmpmeta></span>
</pre>
</div>
<br />
Two fields, GImage:Data and GDepth:Data are particularly interesting. The former stores the original image which I suppose is one of the series of images captured by the application. The latter stores the depth map as <a href="https://developers.google.com/depthmap-metadata/encoding" target="_blank">described by Google</a> and as annotated by the metadata in the first RDF structure. The binary JPEG data that follows is the image that is actually displayed by the viewer and it is not necessarily the same picture that is stored in GImage:Data because this may be the product of a Lens Blur transformation. Storing the original picture data, with the depth-map and the "blurred" image takes a lot of room, but gives you the freedom to continuously alter the same picture. It is quite a nice feature.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><br /></td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiETjKqQZndFj0bjQPTnua2G4FKXvc0RchuWVzdsZ4WzUlr8vCu5fNeGE3pTA2SMwjbs-k27oM9-KwZf8S_CQu_PsJk8rsuVVJ2MQZMwb3KOI-CuxO1n2jASrJWr4x2Ui03Bs4nKk0SnTGc/s1600/lens_blur.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiETjKqQZndFj0bjQPTnua2G4FKXvc0RchuWVzdsZ4WzUlr8vCu5fNeGE3pTA2SMwjbs-k27oM9-KwZf8S_CQu_PsJk8rsuVVJ2MQZMwb3KOI-CuxO1n2jASrJWr4x2Ui03Bs4nKk0SnTGc/s1600/lens_blur.jpg" height="300" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1: Lens Blur output</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3xqt0IUNSlqByvx5vfe6-IBGzGb-EcL4O9tguJrMfnxzBleBTtc4RdAxP3pialj9Rf-Als2pO3HqGCx0wx4shK-Y-KkzZMZ30n_V2pzVH2Ncuv0RH5RQl4gTNu-gUg1-azJbMVjlHi4ad/s1600/gimage_image_data.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3xqt0IUNSlqByvx5vfe6-IBGzGb-EcL4O9tguJrMfnxzBleBTtc4RdAxP3pialj9Rf-Als2pO3HqGCx0wx4shK-Y-KkzZMZ30n_V2pzVH2Ncuv0RH5RQl4gTNu-gUg1-azJbMVjlHi4ad/s1600/gimage_image_data.jpg" height="300" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="font-size: 13px;">Figure 2: image data stored in GImage:Data.<br />
Notice that the background is very sharp compared to figure 1.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.blogger.com/blogger.g?blogID=182549309027052933&pli=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="" /></a></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij32meZAReFtPBLc_LiQ-VA8ydctKwfx-8iWlhvSg7CyJBotU4Ye1UUdd-lsR3oMsATwEWzRZ-2GeoDVwpAyTRuzFAJK51ShkvZDfIiUJ62vuQXK5YnpcQe7j8PovZh7ffP95vPP3hsi2r/s1600/gimage_depth.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEij32meZAReFtPBLc_LiQ-VA8ydctKwfx-8iWlhvSg7CyJBotU4Ye1UUdd-lsR3oMsATwEWzRZ-2GeoDVwpAyTRuzFAJK51ShkvZDfIiUJ62vuQXK5YnpcQe7j8PovZh7ffP95vPP3hsi2r/s1600/gimage_depth.jpg" height="300" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="font-size: 13px;">Figure 3: Depth map as extracted from GDepth:Data</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
GImage:Data and GDepth:Data are XML text fields so they must be encoded textually somehow, and Google chose to use Base64 for the encoding. When I decoded these fields I found that the image data (GImage:Data) stores a JPEG image, and the depth-map (GDepth:Data) is stored in a PNG image.<br />
<br />
The following code extracts GImage:Data and GDepth:Data into two separate files (having JPEG and PNG formats, respectively). It starts by opening the Lens Blur file and searching for either GDepth:Data= or GImage:Data=. It then proceeds to decode the Base 64 data and spits out the decoded data into new files. It is quite straight forward except for a small caveat: interspersed within the GDepth:Data and GImage:Data is some junk that Google inserted in the form of a name-space URL descriptor (http://ns.adobe.com/xmp/extension/), a hash value, and some binary valued-bytes. I remove these simply by skipping 79 bytes once I detect a 0xFF byte.
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #ffffff; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #999988; font-style: italic;">// Naive O(n) string matcher</span>
<span style="color: #999988; font-style: italic;">// It is naive because it always moves the "cursor" forward - even when a match fails.</span>
<span style="color: #999988; font-style: italic;">// This is a correct assumption that we can make in the context of this program.</span>
<span style="color: #445588; font-weight: bold;">bool</span> match(std<span style="font-weight: bold;">::</span>ifstream <span style="font-weight: bold;">&</span>image, <span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string <span style="font-weight: bold;">&</span>to_match) {
<span style="color: #445588; font-weight: bold;">size_t</span> matched <span style="font-weight: bold;">=</span> <span style="color: #009999;">0</span>;
<span style="font-weight: bold;">while</span> (<span style="font-weight: bold;">!</span>image.eof()) {
<span style="color: #445588; font-weight: bold;">char</span> c;
image.get(c);
<span style="font-weight: bold;">if</span> (image.bad())
<span style="font-weight: bold;">return</span> <span style="color: #999999;">false</span>;
<span style="font-weight: bold;">if</span> (c <span style="font-weight: bold;">==</span> to_match[matched]) {
matched<span style="font-weight: bold;">++</span>;
<span style="font-weight: bold;">if</span> (matched<span style="font-weight: bold;">==</span>to_match.size())
<span style="font-weight: bold;">return</span> <span style="color: #999999;">true</span>;
}
<span style="font-weight: bold;">else</span> {
matched <span style="font-weight: bold;">=</span> <span style="color: #009999;">0</span>;
}
}
<span style="font-weight: bold;">return</span> <span style="color: #999999;">false</span>;
}
<span style="font-weight: bold;">class</span> <span style="color: #445588; font-weight: bold;">Base64Decoder</span> {
public:
Base64Decoder() <span style="font-weight: bold;">:</span> base64_idx(<span style="color: #009999;">0</span>) {}
<span style="color: #445588; font-weight: bold;">bool</span> add(<span style="color: #445588; font-weight: bold;">char</span> c);
<span style="color: #445588; font-weight: bold;">size_t</span> <span style="color: #990000; font-weight: bold;">decode</span>(<span style="color: #445588; font-weight: bold;">char</span> binary[<span style="color: #009999;">3</span>]);
private:
<span style="font-weight: bold;">static</span> <span style="color: #445588; font-weight: bold;">int32_t</span> <span style="color: #990000; font-weight: bold;">decode</span>(<span style="color: #445588; font-weight: bold;">char</span> c);
<span style="color: #445588; font-weight: bold;">char</span> base64[<span style="color: #009999;">4</span>];
<span style="color: #445588; font-weight: bold;">size_t</span> base64_idx;
};
<span style="color: #445588; font-weight: bold;">bool</span> Base64Decoder<span style="font-weight: bold;">::</span>add(<span style="color: #445588; font-weight: bold;">char</span> c) {
<span style="color: #445588; font-weight: bold;">int32_t</span> val <span style="font-weight: bold;">=</span> decode(c);
<span style="font-weight: bold;">if</span> (val <span style="font-weight: bold;"><</span> <span style="color: #009999;">0</span>)
<span style="font-weight: bold;">return</span> <span style="color: #999999;">false</span>;
base64[base64_idx <span style="font-weight: bold;">%</span> <span style="color: #009999;">4</span>] <span style="font-weight: bold;">=</span> c;
base64_idx <span style="font-weight: bold;">=</span> <span style="font-weight: bold;">++</span>base64_idx <span style="font-weight: bold;">%</span> <span style="color: #009999;">4</span>;
<span style="font-weight: bold;">if</span> (base64_idx <span style="font-weight: bold;">%</span> <span style="color: #009999;">4</span> <span style="font-weight: bold;">==</span> <span style="color: #009999;">0</span>) {
<span style="font-weight: bold;">return</span> <span style="color: #999999;">true</span>;
}
<span style="font-weight: bold;">return</span> <span style="color: #999999;">false</span>;
}
<span style="font-weight: bold;">inline</span>
<span style="color: #445588; font-weight: bold;">size_t</span> Base64Decoder<span style="font-weight: bold;">::</span>decode(<span style="color: #445588; font-weight: bold;">char</span> binary[<span style="color: #009999;">3</span>]) {
<span style="font-weight: bold;">if</span> (base64[<span style="color: #009999;">3</span>] <span style="font-weight: bold;">==</span> <span style="color: #bb8844;">'='</span>) {
<span style="font-weight: bold;">if</span> (base64[<span style="color: #009999;">3</span>] <span style="font-weight: bold;">==</span> <span style="color: #bb8844;">'='</span>) {
<span style="color: #445588; font-weight: bold;">int32_t</span> tmp <span style="font-weight: bold;">=</span> decode(base64[<span style="color: #009999;">0</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">18</span>;
binary[<span style="color: #009999;">2</span>] <span style="font-weight: bold;">=</span> binary[<span style="color: #009999;">1</span>] <span style="font-weight: bold;">=</span> <span style="color: #009999;">0</span>;
binary[<span style="color: #009999;">0</span>] <span style="font-weight: bold;">=</span> (tmp<span style="font-weight: bold;">>></span><span style="color: #009999;">16</span>) <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>;
<span style="font-weight: bold;">return</span> <span style="color: #009999;">1</span>;
} <span style="font-weight: bold;">else</span> {
<span style="color: #445588; font-weight: bold;">int32_t</span> tmp <span style="font-weight: bold;">=</span> decode(base64[<span style="color: #009999;">0</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">18</span> <span style="font-weight: bold;">|</span>
decode(base64[<span style="color: #009999;">1</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">12</span>;
binary[<span style="color: #009999;">2</span>] <span style="font-weight: bold;">=</span> <span style="color: #009999;">0</span>;
binary[<span style="color: #009999;">1</span>] <span style="font-weight: bold;">=</span> (tmp<span style="font-weight: bold;">>></span><span style="color: #009999;">8</span>) <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>;
binary[<span style="color: #009999;">0</span>] <span style="font-weight: bold;">=</span> (tmp<span style="font-weight: bold;">>></span><span style="color: #009999;">16</span>) <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>;
<span style="font-weight: bold;">return</span> <span style="color: #009999;">2</span>;
}
}
<span style="color: #445588; font-weight: bold;">int32_t</span> tmp <span style="font-weight: bold;">=</span> decode(base64[<span style="color: #009999;">0</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">18</span> <span style="font-weight: bold;">|</span>
decode(base64[<span style="color: #009999;">1</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">12</span> <span style="font-weight: bold;">|</span>
decode(base64[<span style="color: #009999;">2</span>]) <span style="font-weight: bold;"><<</span> <span style="color: #009999;">6</span> <span style="font-weight: bold;">|</span>
decode(base64[<span style="color: #009999;">3</span>]);
binary[<span style="color: #009999;">2</span>] <span style="font-weight: bold;">=</span> (tmp <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>);
binary[<span style="color: #009999;">1</span>] <span style="font-weight: bold;">=</span> (tmp<span style="font-weight: bold;">>></span><span style="color: #009999;">8</span>) <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>;
binary[<span style="color: #009999;">0</span>] <span style="font-weight: bold;">=</span> (tmp<span style="font-weight: bold;">>></span><span style="color: #009999;">16</span>) <span style="font-weight: bold;">&</span> <span style="color: #009999;">0xff</span>;
<span style="font-weight: bold;">return</span> <span style="color: #009999;">3</span>;
}
<span style="color: #999988; font-style: italic;">// Decoding can be alternatively performed by a lookup table</span>
<span style="font-weight: bold;">inline</span>
<span style="color: #445588; font-weight: bold;">int32_t</span> Base64Decoder<span style="font-weight: bold;">::</span>decode(<span style="color: #445588; font-weight: bold;">char</span> c) {
<span style="font-weight: bold;">if</span> (c<span style="font-weight: bold;">>=</span> <span style="color: #bb8844;">'A'</span> <span style="font-weight: bold;">&&</span> c<span style="font-weight: bold;"><=</span><span style="color: #bb8844;">'Z'</span>)
<span style="font-weight: bold;">return</span> (c<span style="font-weight: bold;">-</span><span style="color: #bb8844;">'A'</span>);
<span style="font-weight: bold;">if</span> (c<span style="font-weight: bold;">>=</span><span style="color: #bb8844;">'a'</span> <span style="font-weight: bold;">&&</span> c<span style="font-weight: bold;"><=</span><span style="color: #bb8844;">'z'</span>)
<span style="font-weight: bold;">return</span> (<span style="color: #009999;">26</span><span style="font-weight: bold;">+</span>c<span style="font-weight: bold;">-</span><span style="color: #bb8844;">'a'</span>);
<span style="font-weight: bold;">if</span> (c<span style="font-weight: bold;">>=</span><span style="color: #bb8844;">'0'</span> <span style="font-weight: bold;">&&</span> c<span style="font-weight: bold;"><=</span><span style="color: #bb8844;">'9'</span>)
<span style="font-weight: bold;">return</span> (<span style="color: #009999;">52</span><span style="font-weight: bold;">+</span>c<span style="font-weight: bold;">-</span><span style="color: #bb8844;">'0'</span>);
<span style="font-weight: bold;">if</span> (c<span style="font-weight: bold;">==</span><span style="color: #bb8844;">'+'</span>)
<span style="font-weight: bold;">return</span> <span style="color: #009999;">62</span>;
<span style="font-weight: bold;">if</span> (c<span style="font-weight: bold;">==</span><span style="color: #bb8844;">'/'</span>)
<span style="font-weight: bold;">return</span> <span style="color: #009999;">63</span>;
<span style="font-weight: bold;">return</span> <span style="font-weight: bold;">-</span><span style="color: #009999;">1</span>;
}
<span style="color: #445588; font-weight: bold;">bool</span> decode_and_save(<span style="color: #445588; font-weight: bold;">char</span> <span style="font-weight: bold;">*</span>buf, <span style="color: #445588; font-weight: bold;">size_t</span> buflen, Base64Decoder <span style="font-weight: bold;">&</span>decoder, std<span style="font-weight: bold;">::</span>ofstream <span style="font-weight: bold;">&</span>depth_map) {
<span style="color: #445588; font-weight: bold;">size_t</span> i <span style="font-weight: bold;">=</span> <span style="color: #009999;">0</span>;
<span style="font-weight: bold;">while</span> (i <span style="font-weight: bold;"><</span> buflen) {
<span style="color: #999988; font-style: italic;">// end of depth data</span>
<span style="font-weight: bold;">if</span> (buf[i] <span style="font-weight: bold;">==</span> <span style="color: #bb8844;">'\"'</span>)
<span style="font-weight: bold;">return</span> <span style="color: #999999;">true</span>;
<span style="font-weight: bold;">if</span> (buf[i] <span style="font-weight: bold;">==</span> (<span style="color: #445588; font-weight: bold;">char</span>)<span style="color: #009999;">0xff</span>) {
<span style="color: #999988; font-style: italic;">// this is Google junk which we need to skip</span>
i <span style="font-weight: bold;">+=</span> <span style="color: #009999;">79</span>; <span style="color: #999988; font-style: italic;">// this is the length of the junk</span>
assert(i }
<span style="font-weight: bold;">if</span> (decoder.add(buf[i])) {
<span style="color: #445588; font-weight: bold;">char</span> binary[<span style="color: #009999;">3</span>];
<span style="color: #445588; font-weight: bold;">size_t</span> bin_len <span style="font-weight: bold;">=</span> decoder.decode(binary);
depth_map.write(binary, bin_len);
}
i<span style="font-weight: bold;">++</span>;
}
<span style="font-weight: bold;">return</span> <span style="color: #999999;">false</span>;
}
<span style="color: #445588; font-weight: bold;">void</span> extract_depth_map(<span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string <span style="font-weight: bold;">&</span>infile, <span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string <span style="font-weight: bold;">&</span>outfile, <span style="color: #445588; font-weight: bold;">bool</span> extract_depth) {
std<span style="font-weight: bold;">::</span>ifstream blur_image;
blur_image.open (infile, std<span style="font-weight: bold;">::</span>ios<span style="font-weight: bold;">::</span>binary <span style="font-weight: bold;">|</span> std<span style="font-weight: bold;">::</span>ios<span style="font-weight: bold;">::</span>in);
<span style="font-weight: bold;">if</span> (<span style="font-weight: bold;">!</span>blur_image.is_open()) {
std<span style="font-weight: bold;">::</span>cout <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">"oops - file "</span> <span style="font-weight: bold;"><<</span> infile <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">" did not open"</span> <span style="font-weight: bold;"><<</span> std<span style="font-weight: bold;">::</span>endl;
<span style="font-weight: bold;">return</span>;
}
<span style="color: #445588; font-weight: bold;">bool</span> b <span style="font-weight: bold;">=</span> <span style="color: #999999;">false</span>;
<span style="font-weight: bold;">if</span> (extract_depth)
b <span style="font-weight: bold;">=</span> match(blur_image, <span style="color: #bb8844;">"GDepth:Data=\""</span>);
<span style="font-weight: bold;">else</span>
b <span style="font-weight: bold;">=</span> match(blur_image, <span style="color: #bb8844;">"GImage:Data=\""</span>);
<span style="font-weight: bold;">if</span> (<span style="font-weight: bold;">!</span>b) {
std<span style="font-weight: bold;">::</span>cout <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">"oops - file "</span> <span style="font-weight: bold;"><<</span> infile <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">" does not contain depth/image info"</span> <span style="font-weight: bold;"><<</span> std<span style="font-weight: bold;">::</span>endl;
<span style="font-weight: bold;">return</span>;
}
std<span style="font-weight: bold;">::</span>ofstream depth_map;
depth_map.open (outfile, std<span style="font-weight: bold;">::</span>ios<span style="font-weight: bold;">::</span>binary <span style="font-weight: bold;">|</span> std<span style="font-weight: bold;">::</span>ios<span style="font-weight: bold;">::</span>out);
<span style="font-weight: bold;">if</span> (<span style="font-weight: bold;">!</span>depth_map.is_open()) {
std<span style="font-weight: bold;">::</span>cout <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">"oops - file "</span> <span style="font-weight: bold;"><<</span> outfile <span style="font-weight: bold;"><<</span> <span style="color: #bb8844;">" did not open"</span> <span style="font-weight: bold;"><<</span> std<span style="font-weight: bold;">::</span>endl;
<span style="font-weight: bold;">return</span>;
}
<span style="color: #999988; font-style: italic;">// Consume the data, decode from base64, and write out to file.</span>
<span style="color: #445588; font-weight: bold;">char</span> buf[<span style="color: #009999;">10</span> <span style="font-weight: bold;">*</span> <span style="color: #009999;">1024</span>];
<span style="color: #445588; font-weight: bold;">bool</span> done <span style="font-weight: bold;">=</span> <span style="color: #999999;">false</span>;
Base64Decoder decoder;
<span style="font-weight: bold;">while</span> (<span style="font-weight: bold;">!</span>blur_image.eof() <span style="font-weight: bold;">&&</span> <span style="font-weight: bold;">!</span>done) {
blur_image.read(buf, <span style="font-weight: bold;">sizeof</span>(buf));
done <span style="font-weight: bold;">=</span> decode_and_save(buf, <span style="font-weight: bold;">sizeof</span>(buf), decoder, depth_map);
}
blur_image.close();
depth_map.close();
}
<span style="color: #445588; font-weight: bold;">void</span> main() {
<span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string wdir(<span style="color: #bb8844;">""</span>); <span style="color: #999988; font-style: italic;">// put here the path to your files</span>
<span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string infile(wdir <span style="font-weight: bold;">+</span> <span style="color: #bb8844;">"gimage_original.jpg"</span>);
<span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string imagefile(wdir <span style="font-weight: bold;">+</span> <span style="color: #bb8844;">"gimage_image.jpg"</span>);
<span style="font-weight: bold;">const</span> std<span style="font-weight: bold;">::</span>string depthfile(wdir <span style="font-weight: bold;">+</span> <span style="color: #bb8844;">"gimage_depth.png"</span>);
extract_depth_map(infile, depthfile, <span style="color: #999999;">true</span>);
extract_depth_map(infile, imagefile, <span style="color: #999999;">false</span>);
}
</pre>
</div>
<br />
If you want to use the depth-map and image data algorithmically (e.g. to generate your own blurred image), don't forget to decompress the JPEG and PNG files, otherwise you will be accessing compressed pixel data. I used InfranView to generate raw RBG files, which I then manipulated and converted back to BMP files. I didn't include this code because it is not particularly interesting. Some other time I might describe how to use Halide ("a language for image processing and computational photography") to process the depth-map to create new images.netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com1tag:blogger.com,1999:blog-182549309027052933.post-9252188250577551872014-06-01T01:31:00.001+03:002014-06-07T01:51:52.783+03:00Android's Hidden (and Future) Camera APIs<div style="text-align: left;">
With the release of the KitKat AOSP source code Google also exposed its plans for a new API for the camera - <span style="text-align: center;"> </span><span style="text-align: center;">package </span><span style="text-align: center;">android.hardware.camera2</span>. The new interfaces and classes are marked as @hide and sit quietly in <a href="http://androidxref.com/4.4.2_r2/xref/frameworks/base/core/java/android/hardware/camera2/"><span style="background-color: white; font-family: monospace;">/</span>frameworks<span style="background-color: white; font-family: monospace;">/</span>base<span style="background-color: white; font-family: monospace;">/</span>core<span style="background-color: white; font-family: monospace;">/</span>java<span style="background-color: white; font-family: monospace;">/</span>android<span style="background-color: white; font-family: monospace;">/</span>hardware<span style="background-color: white; font-family: monospace;">/</span>camera2</a><span style="background-color: white; font-family: monospace;">. </span>The @hide attribute excludes these classes from the automatic documentation generation and from the SDK. The code is hidden because the API is not final and committed to by Google, but it is most likely to be quite similar in semantics, if not syntactically equivalent. Anyone who's been watching the camera HAL changes in the last few Android releases will tell you that this new API is expected to become official anytime now - most likely in the L Android release.</div>
<div style="text-align: left;">
The new API is inspired by the <a href="http://fcam.garage.maemo.org/" target="_blank">FCAM</a> project and aims to give the application developer precise and flexible control over all aspects of the camera. The old point & shoot paradigm which limits the camera to three dominating uses cases (preview, video, stills) is replaced by an API that abstracts the camera as a black box that produces streams of image frames in different formats and resolutions. The camera "black box" is configured and controlled via an abstract canonical model of the camera processing pipeline controls and properties. To understand the philosophy, motivation and details of this API, I think it is important to review the camera HAL (v3) <a href="http://source.android.com/devices/camera/camera3.html" target="_blank">documentation</a> and to read the FCAM <a href="http://graphics.stanford.edu/papers/fcam/" target="_blank">papers</a>. </div>
<div>
<br /></div>
<div style="text-align: left;">
If you're an Android camera application developer, then you would be wise to study the new API. </div>
<div style="text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQz8nLIZiy2AH5YXE5Pgj726MW8-NE-Z5z-x77HLD8oKHGHqHvaSV6nwI8w9bAVzFz2Jjsm-SiUKj-tXfRN-lAouHlG6-SIvlcgfErmIDnPoPTB30qH2CCR_LFKKqB2jepjcHH6ZxnTUa6/s1600/android.hardware.camera2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQz8nLIZiy2AH5YXE5Pgj726MW8-NE-Z5z-x77HLD8oKHGHqHvaSV6nwI8w9bAVzFz2Jjsm-SiUKj-tXfRN-lAouHlG6-SIvlcgfErmIDnPoPTB30qH2CCR_LFKKqB2jepjcHH6ZxnTUa6/s1600/android.hardware.camera2.png" height="412" width="640" /></a></div>
<div style="text-align: center;">
Figure 1: the android.hardware.camera2 package</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Besides studying the <span style="text-align: center;">android.hardware.camera2 package, I looked for test and sample code in the AOSP to see how the API is used.</span></div>
<div style="text-align: left;">
</div>
<ul>
<li>./frameworks/base/tests/Camera2Tests/SmartCamera/SimpleCamera/src/androidx/media/filterfw/samples/simplecamera/Camera2Source.java</li>
<li>./cts/tests/tests/hardware/src/android/hardware/camera2/cts/</li>
<ul>
<li>CameraCharacteristicsTest.java</li>
<li>CameraDeviceTest.java</li>
<li>CameraManagerTest.java</li>
<li>CameraCaptureResultTest.java</li>
<li>ImageReaderTest.java</li>
</ul>
</ul>
<div>
<br /></div>
<div>
The diagrams below depict a single simple use-case of setting up camera preview and issuing a few JPEG stills captures requests. They are mostly self explanatory so I've included only a little text to describe them. If this is insufficient you can write your questions in the comments section below.</div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEaQhvPrfBS93NuaStENKlnyooCuh91wHlQO95kF7TRR9yrXMZqZyMlTESaa1tWv2zqTa_-fLpaaHLIYjlhJNmJIFJA3jIDDLCwNlBYpsgEF9zHafrYcgdWjf7sQIeDhFKd_jC0USOHKhX/s1600/android.hardware.camera2-camera_discovery.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhEaQhvPrfBS93NuaStENKlnyooCuh91wHlQO95kF7TRR9yrXMZqZyMlTESaa1tWv2zqTa_-fLpaaHLIYjlhJNmJIFJA3jIDDLCwNlBYpsgEF9zHafrYcgdWjf7sQIeDhFKd_jC0USOHKhX/s1600/android.hardware.camera2-camera_discovery.png" height="390" width="640" /></a></div>
<div style="text-align: left;">
<span style="text-align: center;"><br /></span></div>
<div style="text-align: left;">
The process of discovering the cameras attached to the mobile device is depicted above. Note the AvailabilityListener which monitors the dynamic connection and disconnection of cameras to the device. I think it also monitors the use of camera objects by other applications. Both features do not exist in the current (soon to be "legacy") API.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDKPBpmZDgQ0iESItJc6q9YqdsxKY_X3Tg6aXhQ5qadj98Fu91mQwBUSOyB6dNLzjmyEFORTqb4CteZqvhvDrpAM3mIhTXLIiYxbeCWvhiH0hzhcf4oH0gmXiGw4f3MHrEEZ6XiZWpmW_U/s1600/android.hardware.camera2-preparing_stills_capture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDKPBpmZDgQ0iESItJc6q9YqdsxKY_X3Tg6aXhQ5qadj98Fu91mQwBUSOyB6dNLzjmyEFORTqb4CteZqvhvDrpAM3mIhTXLIiYxbeCWvhiH0hzhcf4oH0gmXiGw4f3MHrEEZ6XiZWpmW_U/s1600/android.hardware.camera2-preparing_stills_capture.png" height="421" width="640" /></a></div>
<div style="text-align: center;">
Figure 3: Preparing the surfaces</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Before doing anything with the camera, you need to configure the output surfaces - that's where the camera will render the image frames. Note the use of an ImageReader to obtain a Surface object to buffer JPEG formatted images.</div>
<div style="text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_hQGUEEsQdfsZcyYpFg0hb0alML8Sg5qpaXPNxCuHCYIZTXexSMIf_QBfv9jbyj2gvKXGp536JwH-MQcKtvYTO8gQxj0mr_0cSHKaGCyflZvjXbNdqtR__SBL4RCT-JteMq_P6UvdeHWA/s1600/android.hardware.camera2-preview_request.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_hQGUEEsQdfsZcyYpFg0hb0alML8Sg5qpaXPNxCuHCYIZTXexSMIf_QBfv9jbyj2gvKXGp536JwH-MQcKtvYTO8gQxj0mr_0cSHKaGCyflZvjXbNdqtR__SBL4RCT-JteMq_P6UvdeHWA/s1600/android.hardware.camera2-preview_request.png" height="226" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 4: Preview request</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Preview is created by generating a repeating-request with a SurfaceView as the frame consumer.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9jGMW89Brt0F5hAaQUNUJ0kYdf4f7u8hRzrGll-0Y7mraoEllKA2rkP-KFQl8kCyh8vv7SC9fOYE5nk-9abfqPDf1bfUIAEmurFV6n5atT7Q8lr17W9FzjFtsMiYyURbbRBDBz6Xj0buB/s1600/android.hardware.camera2-stills_request.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9jGMW89Brt0F5hAaQUNUJ0kYdf4f7u8hRzrGll-0Y7mraoEllKA2rkP-KFQl8kCyh8vv7SC9fOYE5nk-9abfqPDf1bfUIAEmurFV6n5atT7Q8lr17W9FzjFtsMiYyURbbRBDBz6Xj0buB/s1600/android.hardware.camera2-stills_request.png" height="440" width="640" /></a></div>
<div style="text-align: center;">
Figure 5: Still capture request</div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
Finally, when the user presses on the shutter button, the application issues a single capture request for a JPEG surface. Buffers are held by the ImageReader until the application acquires them individually.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<b>Summary</b></div>
<div style="text-align: left;">
That was a very brief introduction to the android.hardware.camera2 package which I think will be officially included in the L Android release. I'm sure the legacy API will continue existing for a long time in order to support current camera applications. However, you should consider learning the new API for the greater (and finer) camera control it provides.</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com4tag:blogger.com,1999:blog-182549309027052933.post-73668054967140229762014-03-21T04:21:00.000+02:002014-03-21T04:21:01.146+02:00Android, QEMU and the Camera - Emulating the Camera Hardware in Android (Part III)<div dir="ltr" style="text-align: left;" trbidi="on">
This third post in the series about the Android QEMU camera discusses the camera service (as noted in part two, this should not be confused with the Android framework's Camera Service, which I refer to in capital letters). <br />
The camera service code can be found in directory<b> <android>/external/qemu/android/camera</android></b>. The camera service is initialized from main() in <b><android>/external/qemu/vl-android.c</android></b>. The main() function is interesting if you want to understand the emulator boot process, but be warned that this function is composed of 2000 lines of code! In any case, main() invokes android_camera_service_init which is part of the narrow public interface of the camera service. The initialization control flow is summarized below:<br />
<br />
android_camera_service_init<br />
==> _camera_service_init<br />
==> enumerate_camera_devices<br />
==> qemud_service_register(SERVICE_NAME == "camera", _camera_service_connect)<br />
<br />
Function _camera_service_init uses a structure of type AndroidHwConfig (defined in <b><android>/external/qemu/android/avd/hw-config.h</android></b>). This structure contains "the hardware configuration for this specific virtual device", and more specifically it contains a description of the camera type (webcam<n>, emulated, none) connected to the back and the front of the device. This structure is basically a reflection of the AVD configuration file (<b>$HOME/.android/avd/<your-avd>.avd/hardware-qemu.ini</your-avd></b>, on Linux) or the AVD description parameters passed to the emulator through the command line parameters. </n><br />
Function enumerate_camera_devices performs a basic discovery and interrogation of the camera devices on the host machine. There is an implementation for Linux host machines (camera-capture-linux.c), for Windows host machines (camera-capture-windows.c), and for MAC host machines (camera-capture-mac.m). In fact, all of the low-level camera access code is segregated in these threee files. The Linux code uses the V4L2 API, of course and its enumerate_camera_devices implementation opens a video device and enumerates available frame pixel formats, (skipping compressed formats) looking for a match to the requested formats.<br />
Finally, function qemud_service_register registers the camera service with the hw_qemud (see <a href="http://netaz.blogspot.com/2014/02/android-qemu-and-camera-emulating.html" target="_blank">previous post</a>) under the service name "camera" and passes a callback which hw_qemud should invoke when camera service clients attempt to connect to the service.<br />
Examining function _camera_service_connect reveals that the camera service supports two types of clients:<br />
<br />
<ul style="text-align: left;">
<li>An emulated camera factory client; and</li>
<li>An emulated camera client</li>
</ul>
<br />
And this brings us almost full circle: class EmulatedCameraFactory (discussed in the <a href="http://netaz.blogspot.com/2014/02/android-qemu-and-camera-emulating.html" target="_blank">previous post</a>) uses an emulated camera factory client (of class FactoryQemuClient) and class EmulatedQemuCameraDevice uses an emulated camera client (of class CameraQemuClient).<br />
<br />
And now we can tie some loose ends from the previous post and take a deeper look at the control flow of loading the emulated Camera HAL module and creating an emulated camera device. This is a good opportunity to remind ourselves that these posts only examined the emulated (web) cameras, not the emulated "fake" cameras and so there are a couple of shortcuts that I took in the flows below to prevent further confusion.<br />
<br />
<u>Invoked when camera.goldfish.so is loaded to memory and gEmulatedCameraFactory is instantiated:</u><br />
<br />
EmulatedCameraFactory::EmulatedCameraFactory<br />
==> FactoryQemuClient::connectClient<br />
==> qemu_pipe_open("qemud:camera")<br />
==> EmulatedCameraFactory::createQemuCameras<br />
==> FactoryQemuClient::listCameras<br />
for each camera:<br />
==> create EmulatedQemuCamera<br />
==> EmulatedQemuCamera::Initialize<br />
==> EmulatedQemuCameraDevice::Initialize<br />
==> CameraQemuClient::connectClient<br />
==> qemu_pipe_open("qemud:camera:??")<br />
==> EmulatedCameraDevice::Initialize()<br />
<br />
<u>Invoked when the emulated HAL module is asked to open a camera HAL device:</u><br />
<br />
hw_module_methods_t.open = EmulatedCameraFactory::device_open<br />
==> EmulatedCameraFactory::cameraDeviceOpen<br />
==> EmulatedCamera::connectCamera<br />
==> EmulatedQemuCameraDevice::connectDevice<br />
==> CameraQemuClient::queryConnect<br />
==> QemuClient::doQuery<br />
==> QemuClient::sendMessage<br />
==> qemud_fd_write<br />
<br />
Once the camera devices are open and the communication path between the HAL camera device and the emulated web camera is open, communication continues to be facilitated via the CameraQemuClient using the <i>query </i>mechanism that we saw in the call flow above. The query itself is a string composed of a query-identification-string (to identify what we are asking for: connect, disconnect, start, stop, frame) and a list of name-value strings (which depend on the query type). This string is then written to the /dev/qemu device, and from there it makes its way through the goldfish_pipe, then to the hw_qemud service, and finally to the camera service. There the query is parsed, acted upon (e.g. on a Linux host V4L2 commands are sent to the Host kernel to drive the USB webcam connected to the host), and a reply is sent. The sender unblocks from the /dev/qemu read operation and completes its work.<br />
<br />
<b><u>Reference code:</u></b><br />
<br />
<ul style="text-align: left;">
<li>device/generic/goldfish/ - Goldfish device</li>
<li>device/generic/goldfish/Camera/ - Goldfish camera</li>
<li>hardware/libhardware/include/hardware/qemu_pipe.h</li>
<li>linux/kernel/drivers/platform/goldfish/goldfish_pipe.c</li>
<li>external/qemu/hw/goldfish_pipe.c</li>
<li>external/qemu/android/hw-qemud.c</li>
<li>external/qemu/android/camera/</li>
<li>external/qemu/android/camera/camera-service.c</li>
<li>external/qemu/docs/ANDROID-QEMU-PIPE.TXT</li>
<li>hardware/libhardware/hardware.c</li>
</ul>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com6tag:blogger.com,1999:blog-182549309027052933.post-11460265097691248312014-02-27T02:07:00.002+02:002014-03-21T03:03:20.503+02:00Android, QEMU and the Camera - Emulating the Camera Hardware in Android (Part II)<div dir="ltr" style="text-align: left;" trbidi="on">
In my <a href="http://netaz.blogspot.co.il/2014/02/android-qemu-and-camera-emulating_25.html" target="_blank">previous post</a> I reviewed the "administrative" background related to camera emulation on Android.<br />
<span style="font-family: inherit;">Now let's trace the code backwards, from the point of loading the Camera HAL module and until we open an android.hardware.Camera instance in the application. The diagrams below shows the top-down control flow of loading a camera HALv1 module and initializing a camera HALv1 device. But this is all fairly standard and we are interested in getting some insight into the particulars of the emulated camera, so let's I start at the end: the loading of the HAL module.</span><br />
<span style="font-family: inherit;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjG_s02GjRIs9M-O9IbxbZeF36Gq600BK4PnpKe81r-UXq6tN7mrT-4__5mhVGCZDE4D4Ntn1_o0UXtKtYv3eU20FRYZtyBcrzvn7DbOGxU5ha-fvyuMmFFO087i5TXOCH_wppvngKjS0BP/s1600/startup01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjG_s02GjRIs9M-O9IbxbZeF36Gq600BK4PnpKe81r-UXq6tN7mrT-4__5mhVGCZDE4D4Ntn1_o0UXtKtYv3eU20FRYZtyBcrzvn7DbOGxU5ha-fvyuMmFFO087i5TXOCH_wppvngKjS0BP/s1600/startup01.png" height="302" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip47zH3X1hlCHlO5lzINqb9aILAZIrmH4R-nrRLMxfwfDUXaZCXFstZtDsRhUP9bcwL4zxGSlAWI45uFu213aMc5R1eHic4CyTAH9mhTOMJu3npIRuvMq32T_nLZxqlFXRtqutoRj74_FY/s1600/startup02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip47zH3X1hlCHlO5lzINqb9aILAZIrmH4R-nrRLMxfwfDUXaZCXFstZtDsRhUP9bcwL4zxGSlAWI45uFu213aMc5R1eHic4CyTAH9mhTOMJu3npIRuvMq32T_nLZxqlFXRtqutoRj74_FY/s1600/startup02.png" height="344" width="640" /></a></div>
<br />
<span style="font-family: inherit;">The generic Android emulated device created by the AVD tool is called <i>goldfish </i>and, following the HAL naming convention, the </span>dynamically linked (shared object) library containing the <span style="font-family: inherit;">goldfish camera HAL is located in <b>/system/lib/hw/camera.goldfish.so</b>. You can 'adb shell' into an emulated device instance and search for camera.goldfish.so in the device file system just like you would do on a real device: </span><br />
<span style="font-family: inherit;"><br /></span>
nzmora@~/Dev/aosp:$ adb -s emulator-5554 shell<br />
root@generic_x86:/ # ls -la /system/lib/hw/<br />
-rw-r--r-- root root 9464 2014-03-08 16:20 audio.primary.default.so<br />
-rw-r--r-- root root 13560 2014-03-08 16:20 audio.primary.goldfish.so<br />
-rw-r--r-- root root 144868 2014-03-08 16:21 audio_policy.default.so<br />
-rw-r--r-- root root 2309860 2014-03-08 16:21 bluetooth.default.so<br />
<b>-rw-r--r-- root root 5204 2014-03-08 16:23 camera.goldfish.jpeg.so</b><br />
<b>-rw-r--r-- root root 288668 2014-03-08 16:22 camera.goldfish.so</b><br />
-rw-r--r-- root root 13652 2014-03-08 16:21 gps.goldfish.so<br />
-rw-r--r-- root root 13872 2014-03-08 16:20 gralloc.default.so<br />
-rw-r--r-- root root 21836 2014-03-08 16:21 gralloc.goldfish.so<br />
-rw-r--r-- root root 5360 2014-03-08 16:21 keystore.default.so<br />
-rw-r--r-- root root 9456 2014-03-08 16:20 lights.goldfish.so<br />
-rw-r--r-- root root 5364 2014-03-08 16:20 local_time.default.so<br />
-rw-r--r-- root root 5412 2014-03-08 16:17 power.default.so<br />
-rw-r--r-- root root 13660 2014-03-08 16:20 sensors.goldfish.so<br />
-rw-r--r-- root root 5360 2014-03-08 16:17 vibrator.default.so<br />
<br />
-rw-r--r-- root root 5364 2014-03-08 16:21 vibrator.goldfish.so<br />
<div>
<br /></div>
<div>
<span style="font-family: inherit;">The code itself is found in the Android source tree, at <b><android></android></b></span><android style="font-family: inherit;"><b>/device/generic/goldfish/camera</b> so let's turn our attention there.</android></div>
<span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">When approaching new code, to get a quick high-level understanding of what's what, I like to start with the makefile, Android.mk, and give it a quick scan: examining the input files, flags and output files. The Android makefile format is fairly self describing and easier to grasp than "true" make files because it hides most of the gory details. In any case, in this particular makefile the LOCAL_SOURCE_FILES variable (which contains the list of files to compile) is listed in a sort of hierarchy - and this gives us a first clue as to how the source files relate to one another. </span><br />
<span style="font-family: inherit;"><br /></span>
LOCAL_SRC_FILES := \<br />
EmulatedCameraHal.cpp \<br />
EmulatedCameraFactory.cpp \<br />
EmulatedCameraHotplugThread.cpp \<br />
EmulatedBaseCamera.cpp \<br />
EmulatedCamera.cpp \<br />
EmulatedCameraDevice.cpp \<br />
EmulatedQemuCamera.cpp \<br />
EmulatedQemuCameraDevice.cpp \<br />
EmulatedFakeCamera.cpp \<br />
EmulatedFakeCameraDevice.cpp \<br />
Converters.cpp \<br />
PreviewWindow.cpp \<br />
CallbackNotifier.cpp \<br />
QemuClient.cpp \<br />
JpegCompressor.cpp \<br />
EmulatedCamera2.cpp \<br />
EmulatedFakeCamera2.cpp \<br />
EmulatedQemuCamera2.cpp \<br />
fake-pipeline2/Scene.cpp \<br />
fake-pipeline2/Sensor.cpp \<br />
fake-pipeline2/JpegCompressor.cpp \<br />
EmulatedCamera3.cpp \<br />
EmulatedFakeCamera3.cpp<br />
<div>
<br /></div>
<span style="font-family: inherit;">In the first file, EmulatedCameraHal.cpp, I find the definition of the HAL module structure: HAL_MODULE_INFO_SYM. This is the symbol that <android></android></span><b>/hardware/libhardware/hardware.c</b><span style="font-family: inherit;"> loads when CameraService is first referenced (see flow diagram above), and therefore it is the entry way into the HAL.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">camera_module_t HAL_MODULE_INFO_SYM = {</span><br />
<span style="font-family: inherit;"> common: {</span><br />
<span style="font-family: inherit;"> tag: HARDWARE_MODULE_TAG,</span><br />
<span style="font-family: inherit;"> module_api_version: CAMERA_MODULE_API_VERSION_2_1,</span><br />
<span style="font-family: inherit;"> hal_api_version: HARDWARE_HAL_API_VERSION,</span><br />
<span style="font-family: inherit;"> id: CAMERA_HARDWARE_MODULE_ID,</span><br />
<span style="font-family: inherit;"> name: "Emulated Camera Module",</span><br />
<span style="font-family: inherit;"> author: "The Android Open Source Project",</span><br />
<span style="font-family: inherit;"> methods: &<b>android::EmulatedCameraFactory::mCameraModuleMethods</b>,</span><br />
<span style="font-family: inherit;"> dso: NULL,</span><br />
<span style="font-family: inherit;"> reserved: {0},</span><br />
<span style="font-family: inherit;"> },</span><br />
<span style="font-family: inherit;"> get_number_of_cameras: <b>android::EmulatedCameraFactory::get_number_of_cameras</b>,</span><br />
<span style="font-family: inherit;"> get_camera_info: <b>android::EmulatedCameraFactory::get_camera_info</b>,</span><br />
<span style="font-family: inherit;"> set_callbacks: <b>android::EmulatedCameraFactory::set_callbacks</b>,</span><br />
<span style="font-family: inherit;">};</span><br />
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
The first thing that we learn is that this is a HAL <b>module</b> v2.1. There are 4 function pointers listed in this structure and cgrep'ing these functions leads us to file EmulatedCameraFactory.cpp, where these functions are defined. We quickly learn from the code documentation in EmulatedCameraFactory.cpp that "A global instance of EmulatedCameraFactory is statically instantiated and initialized when camera emulation HAL is loaded". <br />
When the CameraService invokes camera_module_t::get_camera_info, it actually performs a call to gEmulatedCameraFactory.getCameraInfo. In other words, the three function pointers in camera_module_t just forward the work to gEmulatedCameraFactory (the global singleton factory instance I mentioned above):<br />
<br />
int EmulatedCameraFactory::get_camera_info(int camera_id, struct camera_info* info)<br />
{<br />
return gEmulatedCameraFactory.getCameraInfo(camera_id, info);<br />
}<br />
<br />
Let's refocus our attention at where the action is: the constructor of EmulatedCameraFactory. The first thing the EmulatedCameraFactory constructor (device/generic/goldfish/camera/EmulatedCameraFactory.cpp) does is connect to the camera service in the Android emulator. Please notice that this is <b>not </b>the CameraService of the Android framework! This is a very important distinction.<br />
I will describe the emulator's camera service in the third post in this series.<br />
<br />
The code documentation does a very good job at explaining the EmulatedCameraFactory class responsibility:<br />
<br />
/* Class EmulatedCameraFactory - Manages cameras available for the emulation.<br />
*<br />
* When the global static instance of this class is created on the module load,<br />
* it enumerates cameras available for the emulation by connecting to the<br />
* emulator's 'camera' service. For every camera found out there it creates an<br />
* instance of an appropriate class, and stores it an in array of emulated<br />
* cameras. In addition to the cameras reported by the emulator, a fake camera<br />
* emulator is always created, so there is always at least one camera that is<br />
* available.<br />
*<br />
* Instance of this class is also used as the entry point for the camera HAL API,<br />
* including:<br />
* - hw_module_methods_t::open entry point<br />
* - camera_module_t::get_number_of_cameras entry point<br />
* - camera_module_t::get_camera_info entry point<br />
*<br />
*/<br />
<br />
Usually, when I'm trying to quickly get familiar with new code I either draw for myself some call flows, or I write them down. This helps me understand the code, and this is also a quick way to re-familiarize myself with the code if I put it away for a prolonged time and then need to reference it. In the case of the EmulatedCameraFactory constructor I used another technique, which I usually use less often. It is a stripped-down syntax-incomplete version of the code. This technique is useful when there's a method like the EmulatedCameraFactory constructor which packs a lot of action. This particular code is self-explanatory, except for the call to mQemuClient.connectClient, but I'll return to that later - for now I choose to do a breadth-wise scanning of the code.<br />
<br />
EmulatedCameraFactory::EmulatedCameraFactory()<br />
{<br />
/* Connect to the factory service in the emulator, and create Qemu cameras. */<br />
if (mQemuClient.connectClient(NULL) == NO_ERROR) {<br />
/* Connection has succeeded. Create emulated cameras for each camera<br />
* device, reported by the service. */<br />
createQemuCameras();<br />
}<br />
<br />
if (isBackFakeCameraEmulationOn()) {<br />
switch (getBackCameraHalVersion()):<br />
1: new EmulatedFakeCamera(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
2: new EmulatedFakeCamera2(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
3: new EmulatedFakeCamera3(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
mEmulatedCameras[camera_id]->Initialize()<br />
}<br />
if (isFrontFakeCameraEmulationOn()) {<br />
switch (getBackCameraHalVersion()):<br />
1: new EmulatedFakeCamera(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
2: new EmulatedFakeCamera2(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
3: new EmulatedFakeCamera3(camera_id, true, &HAL_MODULE_INFO_SYM.common);<br />
mEmulatedCameras[camera_id]->Initialize()<br />
}<br />
<br />
mHotplugThread = new EmulatedCameraHotplugThread(&cameraIdVector[0], mEmulatedCameraNum);<br />
mHotplugThread->run();<br />
}<br />
<br />
When you review the pseudo-code above, remember that FakeCamera refers to a camera device fully emulated in SW, while QemuCamera refers to a real web-camera that is wrapped by the emulation code.<br />
<br />
After I got a bit of understanding of the initialization dynamics, I turn to study the structure of rest of the classes. When there are many classes involved (46 files in this case), I find that a class diagram of the overall structure can help me identify the more important classes (I have no intention of going over the code of all these classes). I extract the class hierarchy structure by scanning the .h files looking for class relationships and key value members. Usually I hand sketch some UML on paper - this doesn't have to be complete since I am just trying to get a quick grasp of things.<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhF5Wf-n4FW4B7aC8bwIONXw3yO6hHj0wSr2lTfoWuyTbnLq3eHQiEWYBX2O48ljeRilm8wk5fPN9Qs3isuyJyMMZzx2LndMKlhcVcR34qWxY3EbcKGkmAqszdK_mtjp91EVbaQI2mP-oGO/s1600/startup.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhF5Wf-n4FW4B7aC8bwIONXw3yO6hHj0wSr2lTfoWuyTbnLq3eHQiEWYBX2O48ljeRilm8wk5fPN9Qs3isuyJyMMZzx2LndMKlhcVcR34qWxY3EbcKGkmAqszdK_mtjp91EVbaQI2mP-oGO/s1600/startup.png" height="280" width="640" /></a></div>
<br />
There are a few structural points to note:<br />
<ul style="text-align: left;">
<li>The class structure is similar to the structure of the class listing in the makefile so whoever wrote the makefile was nice enough to be professional all the way (adding pseudo documentation to the makefile, if you will).</li>
<li>EmulatedFakeCamera types are classes that actually simulate the camera frames and behavior, along with a simulated sensor and processing pipeline. Their implementation is interesting in and by itself and I'll return to this in a different post. </li>
<li>EmulatedQemuCamera types are a gateway to actual cameras connected to the host - i.e. webcams connected to your workstation or built into your laptop. I visually differentiate between the Qemu and Fake cameras by giving them different colors.</li>
<li>There are Camera types and CameraDevice types. CameraDevice types are more important as they contain more code.</li>
<li>The EmulatedCameraFactory represents the camera HAL <i>module</i> and contains handles to EmulatedCameras.</li>
<li>There are two classes which abstract the connection to the QEMU emulator. You can see that the EmulatedQemuCameraDevice holds a reference to CameraQemuClient and clearly this is required for communicating with the webcam on the emulator (more on this later). There are three related classes: EmulatedCamera1, EmulatedCamera2, EmulatedCamera3 which represent cameras that are exposed through HALv1, HALv2, and HALv3, respectively. Obviously HALv2 is of no significance by now because Android does not support it. HALv3 does not exist for the webcam, most likely because the new HALv3 does not add any new features to a simple point & shoot webcam.</li>
</ul>
<div>
Now back to the dynamic view (my discovery process is a back & forth dance to discover new components and interactions) - when an application calls android.hardware.Camera.open(cameraId) this call is propagated through the code layers and ends with a call to camera_module_t::methods.open(cameraId) which is actually a call to EmulatedCameraFactory::device_open. You can trace this flow in the first diagram in this blog post.</div>
<br />
EmulatedCameraFactory::device_open<br />
==> gEmulatedCameraFactory.cameraDeviceOpen<br />
==> mEmulatedCameras[camera_id]->connectCamera(hw_device_t** device)<br />
<br />
There's a whole lot of interesting stuff going on in the emulation code, especially in the emulated <u><i>fake</i></u> camera code, but in this blog post I want to look at the emulated QEMU camera code (EmulatedQemuCamera) and the communication with QEMU.<br />
<br />
So, on to QemuClient. This class "<i>encapsulates a connection to the 'camera' service in the emulator via qemu pipe". </i>The pipe connection is established by invoking qemu_pipe_open(pipename) which is implemented in <b><android>/hardware/libhardware/include/hardware/qemu_pipe.h: </android></b>First a device of type /dev/qemu_pipe is opened, and then the concatenation of the strings "pipe:" and pipename is written to the device. In the kernel, we find the other side of this pipe (i.e. the qemu_pipe device driver) in <b><kernel>/drivers/platform/goldfish/goldfish_pipe.c</kernel></b>. The header of the driver does an excellent job of describing this driver, so I bring it forth without much ado:<br />
<br />
/* This source file contains the implementation of a special device driver<br />
* that intends to provide a *very* fast communication channel between the<br />
* guest system and the QEMU emulator.<br />
*<br />
* Usage from the guest is simply the following (error handling simplified):<br />
*<br />
* int fd = open("/dev/qemu_pipe",O_RDWR);<br />
* .... write() or read() through the pipe.<br />
*<br />
* This driver doesn't deal with the exact protocol used during the session.<br />
* It is intended to be as simple as something like:<br />
*<br />
* // do this _just_ after opening the fd to connect to a specific<br />
* // emulator service.<br />
* const char* msg = "<pipename>";</pipename><br />
* if (write(fd, msg, strlen(msg)+1) < 0) {<br />
* ... could not connect to <pipename> service</pipename><br />
* close(fd);<br />
* }<br />
*<br />
* // after this, simply read() and write() to communicate with the<br />
* // service. Exact protocol details left as an exercise to the reader.<br />
*<br />
* This driver is very fast because it doesn't copy any data through<br />
* intermediate buffers, since the emulator is capable of translating<br />
* guest user addresses into host ones.<br />
*<br />
* Note that we must however ensure that each user page involved in the<br />
* exchange is properly mapped during a transfer.<br />
*/<br />
<div>
<br />
QEMU pipes are further described in <b>external/qemu/docs/ANDROID-QEMU-PIPE.TXT</b>.<br />
<br /></div>
QemuClient::sendMessage and QemuClient::receiveMessage are wrappers for the pipe operations qemud_fd_write and qemud_fd_read, respectively. QemuClient::doQuery is slightly more involved and I'll get to it in the third and final blog post in this series. <br />
<br />
To recap, so far I've shown that the camera HAL of the emulated goldfish device contains classes that abstract an emulated QEMU camera (EmulatedQemuCameraDevice) which holds a reference to an instance of CameraQemuClient which it uses to communicate with a device named /dev/qemu_pipe. This character device represents a virtual device with (emulated) MMIO register space and IRQ line, and it belongs to the emulated goldfish platform. On the "other side" of the pipe device is the QEMU emulator and more specifically the goldfish pipe which is implemented in <b><android><android>/</android></android></b><b><android>external/qemu/hw/goldfish_pipe.c</android></b><br />
You can think of this pipe as the conduit for communication between the Android Guest kernel and the emulator code. In the Android QEMU codebase, file <b><android><android>/external/qemu/android/hw-qemud.c</android></android></b> implements a sort of bridge between various Android QEMU services and the goldfish pipe device. One of these Android QEMU services is the camera-service that I briefly mentioned earlier. This camera-service is the topic of the third blog post. I'll wrap up with a diagram showing the relationships between the various components.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixQ6U-Wr4Uqs2xuvREFJz9zx4DE0jRbrT9Zv4q-FbXiczOhqXZYR0H4D5MWBZeMVAGUJVJnMJvpdE1E0WYFUW_1LdSOxn7HGZ8LTJEAR2uJWb7mLilqmwtdSfVR241bfkgia4SzHpsv2KH/s1600/camera-service.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixQ6U-Wr4Uqs2xuvREFJz9zx4DE0jRbrT9Zv4q-FbXiczOhqXZYR0H4D5MWBZeMVAGUJVJnMJvpdE1E0WYFUW_1LdSOxn7HGZ8LTJEAR2uJWb7mLilqmwtdSfVR241bfkgia4SzHpsv2KH/s1600/camera-service.png" height="583" width="640" /></a></div>
</div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-36130009563787528352014-02-26T01:35:00.000+02:002014-02-27T00:42:22.547+02:00Android, QEMU and the Camera - Emulating the Camera Hardware in Android (Part I)<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: inherit;">I was playing around with a simple Android application that makes use of the camera, in order to prove a point about the rotation and mirroring of camera preview frames. I tested the code on my Nexus 4 and Nexus 7 and after making sure that the Android API performed as I expected, I decided to give it a go on an Android virtual device (AVD) on Eclipse. I quickly configured a Nexus 4 AVD with Webcam0 as the front camera, but the emulator did not behave as I expected: the preview picture was rotated 270 degrees! I figured this was due to the unknown scanning order of the webcam on my laptop and therefore this is the resulting quirk. Still, when I switched the front camera to use the emulated camera, I get the same strange results from android.hardware.Camera.CameraInfo.orientation - a 270 degree orientation which is unexpected and not like the behavior of the camera on the actual Nexus 4 device that I own.</span><br />
<span style="font-family: inherit;"><br />OK, so this obviously called for an investigation of how the camera is emulated and why I am getting these strange orientation values. It turned out that the camera orientation is hard coded - but by the time I discovered that code, my </span>curiosity about how the camera works in emulation mode led me to dig deeper.<br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">This three-part blog post is an attempt to review the AVD camera emulation code and the process that I usually use for reverse </span>engineering<span style="font-family: inherit;"> code. Along the way I added to the Android emulator code to emulate Intel's Atom SoC camera and some other neat (geek talk) stuff. The first blog post will lay all the infrastructure details before I get to the interesting implementation details in the second post.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">First thing, first: when you create a new AVD you configure the camera emulation for both the front and back cameras of the emulated device. There are three choice, None, Emulated, and Webcam0. As we will see later, Emulated refers to a completely fake emulated camera which uses synthetically generated image frames. Webcam0 refers to the webcam attached or built-in to your PC or laptop.</span><br />
<span style="font-family: inherit;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9oSo0RLo3BfQkJ6LPg03rw9NXupic5dqByWjC1a1b-_D4pE1tJ-CL2dl5Y9VUm-S5faEy104dNs2Egyfrb2xjqzZguzcHnvhrACJuBNPjsPfdY2nNPVstZBFxv3npyYSZaHEi_CaJ7llc/s1600/AVD_config.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9oSo0RLo3BfQkJ6LPg03rw9NXupic5dqByWjC1a1b-_D4pE1tJ-CL2dl5Y9VUm-S5faEy104dNs2Egyfrb2xjqzZguzcHnvhrACJuBNPjsPfdY2nNPVstZBFxv3npyYSZaHEi_CaJ7llc/s1600/AVD_config.png" height="400" width="322" /></a></div>
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">The AVD configuration is saved in a file on your workstation (on Windows it is located <span style="font-family: inherit;">in </span><span style="background-color: white;"><span style="color: #550000; font-size: 14px;">%USERPROFILE%</span>\</span>.android\avd\AtomAVD.avd\hardware-qemu.ini</span>) and I'll refer to it later when I describe how the emulator chooses which emulated camera to create. Here's the snippet from my AVD's camera configuration:<br />
hw.camera.back = emulated<br />
hw.camera.front = webcam0<br />
<br />
I'll show later that this configuration can be extended with private variables if you need to configure your proprietary emulator.<br />
<br />
To wrap up this post I want to give a very high level of the emulation environment. As with all virtual environments, we have three interacting components: the Host OS, the Guest OS, and the emulator. The Google SDK AVD uses the QEMU emulator in "computer emulation mode". To quote Wikipedia:<br />
<br />
<i>QEMU (short for "Quick EMUlator") is a free and open-source hosted hypervisor that performs hardware virtualization.</i><br />
<i>QEMU is a hosted virtual machine monitor: It emulates central processing units through dynamic binary translation and provides a set of device models, enabling it to run a variety of unmodified guest operating systems. It also provides an accelerated mode for supporting a mixture of binary translation (for kernel code) and native execution (for user code), in the same fashion as VMware Workstation and VirtualBox do. QEMU can also be used purely for CPU emulation for user-level processes, allowing applications compiled for one architecture to be run on another.</i><br />
<br />
The diagram below depicts QEMU in full system emulation mode running in parallel to a few other Host applications. QEMU runs as a Host OS process and emulates a full Android mobile device, including an Atom x86 or ARM processor, SoC IPs and various peripherals. On a Linux Host OS, the KVM (Kernel Virtual Machine) driver can be used for virtualizing the CPU and memory. On an x86 Host machine, HAXM <span style="font-family: inherit;">(<span style="background-color: white; line-height: 16.1200008392334px;">Hardware Accelerated Execution Manager</span>)</span> can be used for further acceleration.<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXWmi75i6F9Rx1SHWam9JK8tIRtcAaI-MBxl6tQLthlI_v_lozv1hJdefMEKrHpC69rF0i2bMUg8JMY1k4cUbqlqMTXk3Q3sIAunusfH4qNGu64x9_Hk4-G4poX3bSO7GW0LdTZ6RQ1Ust/s1600/emulation-sw-layers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXWmi75i6F9Rx1SHWam9JK8tIRtcAaI-MBxl6tQLthlI_v_lozv1hJdefMEKrHpC69rF0i2bMUg8JMY1k4cUbqlqMTXk3Q3sIAunusfH4qNGu64x9_Hk4-G4poX3bSO7GW0LdTZ6RQ1Ust/s1600/emulation-sw-layers.png" height="561" width="640" /></a><br />
<br />
There is a very good description with many further details in <a href="https://wiki.diebin.at/Under_the_hood_of_Android_Emulator_(appcert)" target="_blank">this site about Android Emulator Under the Hood.</a></div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com1tag:blogger.com,1999:blog-182549309027052933.post-62852403874887677812013-10-27T22:49:00.001+02:002015-06-05T10:21:22.646+03:00The US Constitution and Meyer's Open/Closed Principle<div dir="ltr" style="text-align: left;" trbidi="on">
While trying to explain <a href="http://en.wikipedia.org/wiki/Open/closed_principle#Meyer.27s_open.2Fclosed_principle">Meyer's Open/Closed Principle</a> to a friend, I scratched my head trying to find a real-world example which illustrates the principle. An example which will be hard to dispute and easy to grasp.<br />
<br />
On my way home from work the news reported on the NSA's latest shenanigans (this time it was spying on German Chancellor Angela Merkel). My thoughts drifted and I contemplated the US Constitution.<br />
<br />
Some facts on the Constitution of the United States (<a href="http://en.wikipedia.org/wiki/United_States_Constitution" target="_blank">source</a>):<br />
<ul>
<li>It went into effect on March 4, 1789</li>
<li>It has been amended twenty-seven times</li>
<li>The Bill of Rights (the first 10 amendments) was ratified on December 15, 1791</li>
<li>The <a href="http://en.wikipedia.org/wiki/List_of_amendments_to_the_United_States_Constitution" target="_blank">list of all 27 amendments</a> is worth reviewing and of particular interest are amendments 18 and 21 ('git revert', anyone?)</li>
</ul>
Imagine that! <a href="http://en.wikipedia.org/wiki/Timeline_of_United_States_history" target="_blank">224 years</a>: from 13 states to 50; one Civil war, two World wars, and countless other wars; the invention of the light bulb; radio and television; labor laws; civil rights movement; the Great Depression; the lunar landing; Roe vs. Wade; 9/11. And on it goes - with only 27 amendments.<br />
Damn! Tell me that ain't cool.<br />
<br />
The US Constitution is perhaps the ultimate, time-tested example of Meyer's Open/Closed Principle: Open for extension; but Closed for modifications.<br />
<br />
It is also worthwhile to reflect on the procedures for amending the constitution:<br />
<br />
<i>Before an amendment can take effect, it must be proposed to the states by a two-thirds vote of both houses of Congress or by a convention (known as an Article V convention) called by two-thirds of the states, and ratified by three-fourths of the states or by three-fourths of conventions thereof, the method of ratification being determined by Congress at the time of proposal. To date, no convention for proposing amendments has been called by the states, and only once—in 1933 for the ratification of the twenty-first amendment—has the convention method of ratification been employed.</i><br />
<i><br /></i>
<br />
As software architects and designers, perhaps we should build similar protections against perpetual refactoring of production quality code. No, I didn't mean that in the literal sense, but I do advocate investing the time to excavate an existing architecture to uncover its governing principles, and understanding how it can be extended while preserving those principles. <br />
Maybe we'll end up with software as durable as the US Constitution.</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-84878706415567686362013-10-19T22:41:00.000+03:002015-12-23T21:19:21.848+02:00Android Synchronization Fences – An Introduction<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
In any system that employs the exchange of buffers between
independent buffer Producers and buffer Consumers, there is a need for a policy
to control buffer life times (allocation/deallocation) and a policy to control
the access to the buffer memory (read/write).
A third entity, the buffer Allocator, is in charge of providing access
to the system memory and implementing the buffer life time maintenance (a
“dead” buffer cannot be accessed by any entity except the Allocator, while a
“live” buffer may be used by entities other than the Allocator). The “C” language malloc/free system calls are
an example of an Allocator. In a way,
the buffer life time control policy is really another form of buffer access control.
The buffer access control policy determines if either the
Producer or the Consumer can access the buffer in a mutually exclusive manner.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlLhNpBvfYeIrL81g55UUSzzWPTaSG5jXxWovlgcufAFHJtSNIWgfR_YNEX0EcVu-EaT5pBN8HAOthRADlqpuVK-fIa-uCj5kdWIxO04S9wcz2M0IXBsOsmcrrETrgPAO19THABzp8sJJK/s1600/strategies.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlLhNpBvfYeIrL81g55UUSzzWPTaSG5jXxWovlgcufAFHJtSNIWgfR_YNEX0EcVu-EaT5pBN8HAOthRADlqpuVK-fIa-uCj5kdWIxO04S9wcz2M0IXBsOsmcrrETrgPAO19THABzp8sJJK/s1600/strategies.jpg" width="400" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Android Fence abstraction is a mechanism that implements
a particular buffer access control policy, and does not deal with buffer
lifetime control (allocation/deallocation).
It allows for situations where there is a 1:1 relationship between
Producer:Consumer and a 1:many relationship between Producer:Consumers. Fences are external to buffers (i.e. they are
not part of the buffer structure) and <i>synchronize</i> the exchange of buffer
ownership (access control) between Producer and Consumer(s) or vice versa. <o:p></o:p></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
It is of particular importance to understand that in
situations where Android mandates the use of Fences, it is not sufficient for a
Consumer to have a pointer to buffer memory - even when it is explicitly
provided by the Producer. The Fence must
also permit the Consumer to access the buffer memory, for either read or write
access, depending on the situation.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<h3 style="text-align: left;">
Timelines, Synchronization Points and Fences</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
To fully understand the Android fences, beyond its use in
the Camera subsystem, you need to get familiar with Timelines and
Synchronization Points. The kernel
documentation (linux/kernel/Documentation/sync.txt) provides the only source of
information on these concepts that I could find, and instead of rephrasing this
documentation, I bring it here in full:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Motivation:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
In complicated DMA pipelines such as graphics (multimedia,
camera, gpu, display)<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
a consumer of a buffer needs to know when the producer has
finished producing<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
it. Likewise the
producer needs to know when the consumer is finished with the<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
buffer so it can reuse it.
A particular buffer may be consumed by multiple consumers which will
retain the buffer for different amounts of time. In addition, a consumer may consume multiple
buffers atomically.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
The sync framework adds an API which allows
synchronization between the<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
producers and consumers in a generic way while also allowing
platforms which<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
have shared hardware synchronization primitives to exploit
them.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Goals:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
provide a generic API for expressing synchronization dependencies<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
allow drivers to exploit hardware synchronization between hardware<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
blocks<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
provide a userspace API that allows a compositor to manage<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
dependencies.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
provide rich telemetry data to allow debugging slowdowns and stalls of<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
the graphics pipeline.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Objects:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
sync_timeline<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
sync_pt<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
sync_fence<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
sync_timeline:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
A sync_timeline is an abstract monotonically increasing
counter. In general, each driver/hardware block context will have one of
these. They can be backed by the
appropriate hardware or rely on the generic sw_sync implementation.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Timelines are only ever created through their specific
implementations<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
(i.e. sw_sync.)<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
sync_pt:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
A sync_pt is an abstract value which marks a point on a
sync_timeline. Sync_pts have a single timeline parent. They have 3 states: active, signaled, and
error.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
They start in active state and transition, once, to either
signaled (when the timeline counter advances beyond the sync_pt’s value) or
error state.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
sync_fence:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Sync_fences are the primary primitives used by drivers to
coordinate synchronization of their buffers.
They <a href="http://www.blogger.com/blogger.g?blogID=182549309027052933"></a><a href="http://www.blogger.com/blogger.g?blogID=182549309027052933">are a collection of sync_pts</a> which may or may not have the same timeline parent. A sync_pt can only exist in one fence and the
fence's list of sync_pts is immutable once created. Fences can be waited on synchronously or
asynchronously. Two fences can also be
merged to create a third fence containing a copy of the two fences<span dir="RTL" lang="HE" style="font-family: "Arial","sans-serif"; mso-ascii-font-family: Calibri; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: Arial; mso-bidi-theme-font: minor-bidi; mso-hansi-font-family: Calibri; mso-hansi-theme-font: minor-latin;">ג</span><span dir="LTR"></span><span dir="LTR"></span><span dir="LTR"></span><span dir="LTR"></span>€™ sync_pts. Fences are backed
by file descriptors to allow userspace to coordinate the display pipeline dependencies.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Use:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
A driver implementing sync support should have a work
submission function which:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* takes a fence
argument specifying when to begin work<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
asynchronously queues that work to kick off when the fence is signaled<span class="MsoCommentReference"><span style="font-size: 8.0pt;"> </span></span><o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* returns a fence to indicate when its work
will be done.<a href="http://www.blogger.com/blogger.g?blogID=182549309027052933"><o:p></o:p></a></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
signals the returned fence once the work is completed.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
Consider an imaginary display driver that has the
following API:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
/*<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* assumes buf is
ready to be displayed.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* blocks until the
buffer is on screen.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*/<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
void
display_buffer(struct dma_buf *buf);<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
The new API will become:<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
/*<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* <a href="http://www.blogger.com/blogger.g?blogID=182549309027052933"></a>will display buf when fence is signaled.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*
returns immediately with a fence that will signal when buf<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
* is
no longer displayed.<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
*/<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt; mso-layout-grid-align: none; text-autospace: none;">
struct sync_fence* display_buffer(struct dma_buf *buf,<o:p></o:p></div>
<div class="MsoNormal" style="margin-left: 36.0pt;">
struct
sync_fence *fence);<o:p></o:p><br />
<br /></div>
<div class="MsoNormal">
</div>
<div>
<!--[if !supportAnnotations]-->
<br />
<div>
<div class="msocomtxt" id="_com_8" language="JavaScript">
<!--[if !supportAnnotations]--></div>
<!--[endif]--></div>
</div>
<div>
<div style="text-align: left;">
<span style="color: windowtext; font-size: 11pt; font-weight: normal;"><span style="font-family: Times, Times New Roman, serif;">The relationships between the
objects described above is depicted in the diagram below.</span></span></div>
<div style="text-align: left;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFmMBAz8mjcRkCVeFAWDGqY1kv0AJrLNwrL7SS-7TSJonFnd2wGxKCvPhGGw77VX0pEWzg7VGWor21BbTksOLpYTcYRKPBpS45j2YwZEl05i3P0dlMUQT6vlC3G5X4iC2koNcNoflvteZY/s1600/sync_components.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="454" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFmMBAz8mjcRkCVeFAWDGqY1kv0AJrLNwrL7SS-7TSJonFnd2wGxKCvPhGGw77VX0pEWzg7VGWor21BbTksOLpYTcYRKPBpS45j2YwZEl05i3P0dlMUQT6vlC3G5X4iC2koNcNoflvteZY/s1600/sync_components.jpg" width="640" /></a></div>
<span style="color: windowtext; font-size: 11pt; font-weight: normal;"><br /></span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<h2 style="text-align: left;">
Android Fence Implementation Details </h2>
<div class="MsoNormal">
User-space code can choose between a C++ fence implementation
(using the Fence class) and a C code library implementation. The C++ implementation is just a lean wrapper
around the sync C library code, and the C library does little more than invoke
ioctl system calls on a kernel device implementing the synchronization API.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The Android kernel includes the ‘sync’ module, also known as
the synchronization framework, which implements the Timeline, Fence, and
Synchronization Point infrastructure.
This module can be leveraged by hardware device drivers which choose to
implement the synchronization API. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The kernel also includes a software timeline device driver (/dev/sync)
which implements a software based timeline that does not reference a specific
hardware module. The SW timeline device
driver uses the kernel’s Synchronization framework.<br />
<br /></div>
<h3 style="text-align: left;">
Understanding the Synchronization API</h3>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
The first step in using the Synchronization API in
user-space is creating a timeline handle (file descriptor). The sample call flow below shows how the
userspace C library creates a handle to an instance of the generic software
timeline (sw_sync) using function <b>sw_sync_timeline_create.</b><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzHf12xCa2wB2X9KYppOoRypHdya_mYjF9m4ZRLhxDpfWQx4SuUXVkHK6__MUfhoR1s5Ke6Do8Jia2zKKfePzENXn8hFVbPCP22Ge4Nf3B4blfl6cRHmei7XatrsoSUdkcUZjNaLgWpR6H/s1600/timeline_create.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="212" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzHf12xCa2wB2X9KYppOoRypHdya_mYjF9m4ZRLhxDpfWQx4SuUXVkHK6__MUfhoR1s5Ke6Do8Jia2zKKfePzENXn8hFVbPCP22Ge4Nf3B4blfl6cRHmei7XatrsoSUdkcUZjNaLgWpR6H/s1600/timeline_create.jpg" width="640" /></a></div>
<b><br /></b></div>
<div class="MsoNormal">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
After the timeline is created, the user can use arbitrarily
increase the timeline counter (<b>sw_sync_timeline_inc</b>) or create fence handles
(<b>sw_sync_fence_create</b>). Each
fence initially contains one synchronization points on the timeline. </div>
<div class="MsoNormal">
<o:p></o:p><br />
<br /></div>
<div class="MsoNormal">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgynZhs_Z4TKW27l9ig44AzvxFcASOQHOIVLepIt2sYDoMI61BD6GJM2kltNcJk2C3bcEdlPmn8ku5Bk-vmtrOLTSWZQpCDuL_0_xdQ2jUcf9gY9Wq-gsQBITa8jIN5r3oiFISfJV79OoSE/s1600/fence_create.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="302" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgynZhs_Z4TKW27l9ig44AzvxFcASOQHOIVLepIt2sYDoMI61BD6GJM2kltNcJk2C3bcEdlPmn8ku5Bk-vmtrOLTSWZQpCDuL_0_xdQ2jUcf9gY9Wq-gsQBITa8jIN5r3oiFISfJV79OoSE/s1600/fence_create.jpg" width="640" /></a></div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
If the user needs two or more synchronization points
attached to a fence, he creates more fences and then merges them together (<b>sync_merge</b>).<br />
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Create a generic sw_sync timeline <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">int
sw_timelime = sw_sync_timeline_create();<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Create two fences on the sw_sync timeline; at sync points 2 and 5<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">int
sw_fence1 = sw_sync_fence_create(sw_timeline, "fence1", 2);<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">int
sw_fence2 = sw_sync_fence_create(sw_timeline, "fence2", 5);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Merge sw_fence1 and sw_fence2 to create a single fence containing<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
the two sync points<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">int
sw_fence3 = sync_merge("fence3", sw_fence1, sw_fence2);<o:p></o:p></span></div>
<h3>
<o:p> </o:p></h3>
<div class="MsoNormal">
The kernel Synchronization API (for in-kernel modules) is
similar, but synchronization points need to be created explicitly:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Create a generic sw_sync timeline <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">struct
sync_timeline* timeline = sw_sync_timeline_create(“some_name”);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Create a sync_pt<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">struct
sync_pt *pt = sw_sync_pt_create(sfb->timeline, sfb->timeline_max);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Create a fence attached to a sync_pt<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">struct
sync_fence *fence = sync_fence_create("some_other_name", pt);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">//
Attach a file descriptor to the fence<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">int
fd = get_unused_fd()<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 9.0pt;">sync_fence_install(fence,
fd);<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h3 style="text-align: left;">
Using Fences for Synchronization<o:p></o:p></h3>
<div class="MsoNormal">
Recall that the timeline abstraction represents a
monotonically increasing counter, and synchronization points represent specific
future values of this counter (points on the timeline). How a timeline increases (its clock rate, so
to say) is timeline specific. A GPU, for
example, may use an internal counter interrupt to increase its timeline
counter. The generic sw_sync timeline is
manually increased by the Synchronization API client when it invokes <b>sw_sync_timeline_inc</b>. The meaning of the synchronization point values
and the method of how two synchronization points are compared to one another
are timeline specific. The sw_sync
device models simple points on a line. Whenever the Synchronization framework is
notified of timeline counter increase, it tests if the counter reached (or
passed) the timeline value of existing synchronization points on the timeline and
triggers wake-up events on the relevant fences.</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghD2n0PYBkwBSbKHjqvxd_hAiST47mq-5KV6NNJDhZl-XO0kuORTBkuDYsIiPRThWiziK5RvyC-W1YkVw_a-nevCNeu50ACKI-kpWjjmj6CjrqdpJsHIiUKoQv9hx8ilqORjDd_qV14dAi/s1600/timeline_inc.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghD2n0PYBkwBSbKHjqvxd_hAiST47mq-5KV6NNJDhZl-XO0kuORTBkuDYsIiPRThWiziK5RvyC-W1YkVw_a-nevCNeu50ACKI-kpWjjmj6CjrqdpJsHIiUKoQv9hx8ilqORjDd_qV14dAi/s1600/timeline_inc.jpg" width="640" /></a></div>
<br />
Userspace clients of the Synchronization framework that want
to be notified (signaled) about fence state change use the <b>sync_wait</b><b><span style="font-family: 'Courier New'; font-size: 10pt;"> </span></b>API. Kernel<b><span style="font-family: 'Courier New'; font-size: 10pt;"> </span></b>clients of the
Synchronization framework have a similar API, but also have an API for
asynchronous fence state change notification (via callback registration).</div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4W6OsTXaKqnnFokM5uSmVEKIrvOV5FBexTmdVCSR4bKJtiWzEEQktE_Xgmg_6oNtiKk5zS8vEVP9qOU6YOBX8rjYRJ6CuqNvZ4ory97QhyqIRDDF-fSqSE3hwLstMmudRaRPgHugmwP7Y/s1600/sync_wait.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="186" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4W6OsTXaKqnnFokM5uSmVEKIrvOV5FBexTmdVCSR4bKJtiWzEEQktE_Xgmg_6oNtiKk5zS8vEVP9qOU6YOBX8rjYRJ6CuqNvZ4ory97QhyqIRDDF-fSqSE3hwLstMmudRaRPgHugmwP7Y/s1600/sync_wait.jpg" width="640" /></a></div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
When userspace closes a valid sync_timeline handle, the
Synchronization framework checks if it needs to signal any active fences which
have synchronization points on that timeline.
Closing a fence handle does not signal the fence: it just removes the
fence’s synchronization points from their respective timelines.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiPn_2ZzPV6MKbBq0PH2xHnZgQlnHQHazvZkSd1s4LIQv5GpXJiwFDts7LcCiA2ZgVQMFgtSlfFghnA5j5y8g2TkYOO057vS0F_qDmZehdk1LKc-5RIVdPsSZFPvh9IQd-QM5ZgL-AtEYB/s1600/close.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="484" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiPn_2ZzPV6MKbBq0PH2xHnZgQlnHQHazvZkSd1s4LIQv5GpXJiwFDts7LcCiA2ZgVQMFgtSlfFghnA5j5y8g2TkYOO057vS0F_qDmZehdk1LKc-5RIVdPsSZFPvh9IQd-QM5ZgL-AtEYB/s1600/close.jpg" width="640" /></a></div>
<br /></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="MsoNormal">
<b>Userspace C++ Fence Wrapper</b></div>
<h3>
<o:p></o:p></h3>
<div class="MsoNormal">
<ul>
<li>./frameworks/native/libs/ui/Fence.cpp</li>
<li>./frameworks/native/include/ui/Fence.h </li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div style="text-align: left;">
<b>
Userspace C Library</b></div>
<h3>
<o:p></o:p></h3>
<div class="MsoNormal">
<ul>
<li>./system/core/libsync/sync.c </li>
</ul>
</div>
<div style="text-align: left;">
<b>
Kernel Software Timeline</b></div>
<h3>
<o:p></o:p></h3>
<div class="MsoNormal">
<ul>
<li>./linux/kernel/drivers/staging/android/sw_sync.h</li>
<li>./linux/kernel/drivers/staging/android/sw_sync.c</li>
<li>./external/kernel-headers/original/linux/sw_sync.h </li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div style="text-align: left;">
<b>
Kernel Fence Framework</b></div>
<h3>
<o:p></o:p></h3>
<div class="MsoNormal">
<ul>
<li>./external/kernel-headers/original/linux/sync.h</li>
<li>./linux/kernel/drivers/staging/android/sync.h</li>
<li>./linux/kernel/drivers/staging/android/sync.c</li>
</ul>
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
<o:p></o:p></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<br /></div>
</div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com9tag:blogger.com,1999:blog-182549309027052933.post-65217140748073954992013-04-13T11:58:00.004+03:002013-10-18T13:49:09.308+03:00Broken windows will turn your code to spaghetti<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Software engineers don't need to read "<a href="http://www.amazon.com/AntiPatterns-Refactoring-Software-Architectures-Projects/dp/0471197130" target="_blank">AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis</a>" to know what "spaghetti code" means. But what does it have to do with broken windows?<br />
<br />
Enough <a href="http://www.rtuin.nl/2012/08/software-development-and-the-broken-windows-theory/" target="_blank">has been written</a> on the topic of the Broken Windows Theory and its relation to software development so there is no reason for me to repeat this yet again. If this is your first introduction to the topic, just Google "Broken Windows Software" and you'll get plenty of background information and opinions on the topic.<br />
I started thinking about the connection between the Broken Windows Theory and spaghetti code some time after hearing former New York mayor Rudy Giuliani talk about the cleaning up of NYC's streets. I was a software team leader at the time, and I formed my private theory based on empirical observations I made on my team and others around the company. I could see the spaghetti start cooking whenever we loosened our standards of practice, either because of pressure to deliver or just plain sloth. It takes discipline, time, and energy to fix every broken piece of code, design, or document and this is never easy. Methodologies are easier to follow if you start from a place of conviction, and so I began sharing and discussing my "theory" with the team. Not everyone bought into the story, but years after I still believe (and practice) constant refactoring of the code-base is essential in order to prevent "chaos creep" and eventual spaghetti code. However, it is interesting to note that today there are <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CD0QFjAB&url=http%3A%2F%2Fwww.forbes.com%2Fsites%2Fmarkbergen%2F2012%2F05%2F16%2Frudy-giuliani-still-egregiously-wrong-on-crime%2F&ei=U_toUaiKIuWX1AWfloGgBg&usg=AFQjCNFD2rub9JfMm5uHblu77PTnOoxtng&sig2=p41R9l3gdKWTSAR0yrhNmg" target="_blank">many who believe Giuliani was wrong</a> and that Broken Windows had nothing to do with the reduction of the crime rate in NYC during the '90s.<br />
<br />
It is the engineer's job to fix broken windows, but who's responsibility is it to make sure that broken windows are fixed? If you're a manager, then it's your responsibility. In everything you do: how you prioritize tasks, how you treat documentation, how you treat your own broken windows; and not least of which, how you reward engineers with a knack for clean shiny windows - you determine what kind of code base you end up with. Having guidelines, coding standards, designs and architectures to follow is essential to discern broken from fixed, but this is not sufficient.<br />
<br />
And as you move higher up you are in the management ladder, your values, your attitude and your actions will have broader consequences - on everything your engineers do; and that includes fixing broken windows. So I try to follow these guidelines:<br />
<ul>
<li>Tell the team that a clean code-base is important to me. Again. And again. And again. In fact, most messages to your team need to be repeated over and again; and this is especially true when you want to change team behavior patterns until they form new habits.</li>
<li>Show my team that a clean code-base is important to me by rewarding engineers who exhibit extra care for the code base. "Rewarding" has many manifestations, but if this is done in a "public" manner I increase the impact of my message because the entire team is made aware. Backing my code-base "gate keepers" against internal and external "sources of entropy" is very important as it solidifies trust and mutual respect.</li>
<li>Point out broken windows as I review code, git commit messages, documentation, presentations and designs.</li>
</ul>
<div>
Many times when a broken window is brought to my attention as a manager, I cannot afford to halt development to immediately attend to the fix. There are customers waiting at the end of the rainbow and they have schedules and priorities. But ignoring or dismissing a broken window brought to my attention will have consequences on how the development team perceives my values, so I try to:</div>
<ul>
<li>Never ignore an engineer who reports a broken window. I either add it to the team's task burn-down list or file a bug report, as a minimum first stage of showing intention. It is important to make the engineers part of the discussion of how and when to do the fix, since ultimately I want them to be accountable for the quality of the code base (shifting accountability from the team leader to the team members deserves its own post which I hope to write some time).</li>
<li>Schedule a percentage of the team's time to specifically handle broken windows. Adding this to the work plan helps fight the temptation to cave under the daily work pressure. This also shows the engineers that broken windows are not buried and ignored in some list or database and gives credibility to the act of deferring fixes to a later stage.</li>
</ul>
<div>
Of course, not all code bases are alike and if your code base is short lived because you are in an exploratory phase, a start-up in its seed phase, or producing a one-time demo, then your energy should be focused elsewhere. This is where the saying "First things first, second things never" is most applicable. But when building long-lasting code-bases these "second things" can bring havoc if ignored for too long. Fixing every small issue that threatens the integrity of your code is hard work for your engineers, but it starts, and continues, from hard work that you - the manager - puts in.</div>
<br />
<ul style="text-align: left;">
</ul>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-70263204119825360252013-03-24T00:45:00.003+02:002013-04-13T12:11:56.292+03:00The Innovator's Dilemma?<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
<i><a href="http://www.amazon.com/Innovators-Dilemma-Revolutionary-Change-Business/dp/0062060244" target="_blank">The Innovator's Dilemma</a></i> - I watch and live the dilemma happen every day, as the company I work for tries to recover from the disruption that took us by a surprise reserved only for the disrupted. A blind spot that plagues the best of companies, as this book describes very well.</div>
<div style="text-align: left;">
As I leafed through the pages of <i>The Innovator's Dilemma </i>today, a book that I've read before it all became so personal, I asked myself these questions:</div>
<ul style="text-align: left;">
<li>Anecdotal evidence shows that change is happening faster than ever and adaptation to these changes needs to be at least as fast. If we failed to innovate we should at least have the capacity to identify innovative trends and quickly realign to them. It is interesting to study what structural, psychological, and cultural qualities and attributes characterize R&D groups that rise to the challenge of responsive innovation while catering to the needs of existing customers and products. <br />
</li>
<li>As the excerpt below so honestly describes, when innovation is not in your DNA (or company mission), R&D middle management plays a crucial role in unknowingly supporting or hampering innovation. How do we align the low-level decisions with the strategic decision to keep innovation a priority?</li>
<li>As we manage our own professional career, aren't we exposed to the same forces and circumstances which can "cause great <strike>firms</strike> employees to fail" (if I may paraphrase the title)? Is our private disruption around the corner? Are we correctly spotting our personal disruptive threats and opportunities?</li>
</ul>
<div style="text-align: left;">
<i>"As we saw in chapter 4, resource allocation is not simply a matter of top-down decision making f</i><i>ollowed by implementation. Typically, senior managers are asked to decide whether to fund a project </i><i>only after many others at lower levels in the organization have already decided which types of project </i><i>proposals they want to package and send on to senior management for approval and which they don’t </i><i>think are worth the effort. Senior managers typically see only a well-screened subset of the innovative </i><i>ideas generated.</i></div>
<div style="text-align: left;">
<i>And even after senior management has endorsed funding for a particular project, it is rarely a “done </i><i>deal.” Many crucial resource allocation decisions are made after project approval—indeed, after </i><i>product launch—by mid-level managers who set priorities when multiple projects and products </i><i>compete for the time of the same people, equipment, and vendors. As management scholar Chester </i><i>Barnard has noted: </i></div>
<div style="text-align: left;">
<i>From the point of view of the relative importance of specific decisions, those of executives properly </i><i>call for first attention. But from the point of view of aggregate importance, it is not decisions of </i><i>executives, but of non-executive participants in organizations which should enlist major interest. </i></div>
<div style="text-align: left;">
<i>So how do non-executive participants make their resource allocation decisions? They decide which </i><i>projects they will propose to senior management and which they will give priority to, based upon their </i><i>understanding of what types of customers and products are most profitable to the company. Tightly </i><i>coupled with this is their view of how their sponsorship of different proposals will affect their own </i><i>career trajectories within the company, a view that is formed heavily by their understanding of what </i><i>customers want and what types of products the company needs to sell more of in order to be more </i><i>profitable. Individuals’ career trajectories can soar when they sponsor highly profitable innovation </i><i>programs. It is through these mechanisms of seeking corporate profit and personal success, therefore, </i><i>that customers exert a profound influence on the process of resource allocation, and hence on the </i><i>patterns of innovation, in most companies."</i></div>
</div>
netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-82123662509275293642011-10-08T00:16:00.000+02:002011-10-08T00:20:34.624+02:00Be Afraid<span style="font-family: arial;">From </span><a style="font-family: arial;" href="http://www.economist.com/node/21530986">The Economist</a><span style="font-family: arial;">: amusing...</span><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSjEa7fieDqBSv0sC7mdnRE9mY6dZLY3JsEKlEgkXW_lbL8O5cnX0CE2ILek3EVWF-cV17nZpy-QVwCzzPyEYeVCSNSzNUvT1y51gMlISMyV-UsSSPXRNZ-EL04QrVMndBT52xmRP7O3IT/s1600/be_afraid.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 225px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSjEa7fieDqBSv0sC7mdnRE9mY6dZLY3JsEKlEgkXW_lbL8O5cnX0CE2ILek3EVWF-cV17nZpy-QVwCzzPyEYeVCSNSzNUvT1y51gMlISMyV-UsSSPXRNZ-EL04QrVMndBT52xmRP7O3IT/s400/be_afraid.jpg" alt="" id="BLOGGER_PHOTO_ID_5660878361855224018" border="0" /></a>netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0tag:blogger.com,1999:blog-182549309027052933.post-62068215815668440832009-11-15T00:54:00.000+02:002009-11-15T01:34:47.484+02:00Why is Google holding out information about Chrome OS??Google has not released a single document, let alone word, about it's Chrome OS, since the introductory Google Blog (http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html).<br /><br />What is stopping them from answering the most fundamental questions people have about this OS.<br /><ul><li>What hardware and processors will Chrome support?</li><li>Will the Chome OS support ARM processors in its first release (ARM processors are not prevalent in the notebook market, which Google is aiming the OS for)? </li><li>Will it require a hard-drive?</li><li>Will it use the Android kernel or will Chrome OS introduce another kernel?</li><li>Will the promised fast boot feature require special kernel or hardware support?<br /></li><li>Will it support local applications? Will these only be HTML/CSS/JavaScript applications or will binary applications be able to interact with the Windowing System?</li><li>What cloud applications will Chrome OS support? What will be the limitations of these applications?<br /></li><li>Will Flash 10 be supported?</li><li>How will the OS interface acceleration hardware for video and graphics?</li><li>Will the system require OpenGL ES 2.0, or will it settle for OGL ES 1.1 (Flash 10 is rumored to use OGL ES 2.0 in its rendering engine)?<br /></li><li>Will various sensors, such as an accelerometer, be supported?</li><li>What license will be Chrome OS be released under?</li><li>Will Gears be used for offline access and it be Open Source?</li><li>What security features be required for a secure platform and will Chrome OS allow for security hardware integration?<br /></li></ul>netazhttp://www.blogger.com/profile/13820189991503080577noreply@blogger.com0