To verify the performance of Smart Beholder is at least comparable to commercial smart lens, here we consider two commercial smart lens products: Altek Cubic and Sony DSC-QX100. They both offer in-house mobile apps on Android devices. Figure 1 shows the two smart lenses we consider, where the retail prices are approximately US$500 for Sony QX100 and US$100 for Altek Cubic. On the other hand, the Rasberry Pi board we use in the prototype implementation is around US$80. One major difference between Smart Beholder and commercial products is that it is fully configurable, and supports different preview resolutions. The two commercial products only support fixed preview resolutions, which is unknown to users.

Figure 1: Photos of the considered smart lenses: Sony QX100 (left) and Altek Cubic (right).


8.1  Viewfinder Resolutions of Commercial Smart Lenses

Since the actual resolution of real-time preview on viewfinder for commercial products is unknown, we need to conduct some tests to find out their resolutions in order to have a fair comparison. We use a PIMA/ISO 12233 Resolution Test Chart to infer the preview resolutions. The experiment setup is presented in Figure 2. We place the smart lens about 30 centimeters away from the test chart, so that the test chart on viewfinder approximately spans the viewable area, and stream preview videos to the tablet on the right. We use only the lower-middle part of the test chart, where 10 numbered blocks with vertical lines in different densities (and widths) are arranged into a row. The block numbers indicate how dense these vertical lines are: from 1 (fewest, thickest lines) to 10 (most, thinnest lines), and we refer to the block number as line density. To count the number of lines in blocks with higher line densities, video previews with higher resolutions are needed. We define viewable line ratio as the fraction of distinguishable vertical lines over all vertical lines in the test chart.


Figure 2: Experiment setup for inferring the preview resolutions of commercial smart lenses.

We then execute the following experiments with Altek Cubic, Sony QX100, and Smart Beholder under 5 different resolutions. For each smart lens (and resolution), we take screenshots of the live preview of the resolution test chart and crop the line intensity testing region, as shown in Figure 3. We then convert the regional screenshots to binary (black and white) using the threshold 128 (with the gray levels ranging from 0 to 255) and programmatically count the numbers of vertical lines in individual blocks (from 1 to 10) in the video previewers in order to calculate the viewable line ratios. We plot the results in Figure 4. This figure shows that Altek Cubic and Sony QX100 achieve very similar viewable line ratios over different line densities as Smart Beholder at 320x240 and 640x480 resolutions, respectively. Hence, we conclude that the preview resolutions of these two commercial products are approximately 320x240 and 640x480 respectively. Even though the quality of camera lenses may be very different, we believe that the (relatively low) resolutions of preview video would dominate how distinguishable the thin lines are.


Figure 9: Live preview screenshots of line intensity testing region on the resolution test chart.


Figure 4: Viewable line ratios of different smart lenses under diverse line densities.

8.2  Experiment Setup

We create two video/image datasets for objective and subjective performance metrics. For objective metrics, we use a Canon EOS 600D camera to capture eight segments of 25-sec videos at 720p. Half of the segments are taken indoor (outdoor); and all segments are taken under typical smart lens usage scenarios. We concatenate eight segments into a 216-sec video, in which we insert a 2-sec white screen between any two consecutive videos to reset the video codecs for minimum interference across videos. This dataset represents typical viewfinder previews, and are suitable for objective metrics. It is, however, less suitable for evaluating subjective preview quality due to the relatively low resolution. For subjective metrics, we collect 9 high-resolution (1080p) popular Creative Commons (CC) photos from Flickr. We play each photo for 10 seconds, and record the viewfinder previews using different smart lenses.



Figure 5: Experimental setup: (top) testbed and (bottom) procedures.

Figure 5 (top) shows the testbed used in our lab. We play the videos on the video source display on the right, put a Smart Beholder (or other smart lenses) server in front of the video source display, and send the previews to the corresponding smart lens client running on a tablet (Sony Xperia). The server and mobile client have a distance of 1 meter. The tablet is connected to an external monitor on the left. Last, we use a Canon EOS 600D camera to capture the videos of the two side-by-side displays at 60 fps. The captured video is then used to derive performance results. We also run tcpdump on the tablet to capture and calculate the transmitted bitrate. Figure 5 (bottom) summarizes the measurement procedure.

For a subjective evaluation of preview quality, we conduct a crowdsourcing-based user study over the Internet via web interface. We present the original images (from Flickr) on the left half of the web page, and the degraded images (extracted from the viewfinder previews) on the right half. For each comparison, a subject gives a DMOS (Differential Mean Opinion Score) between 0 (un-degraded) and 6 (seriously degraded and unacceptable). We convert the DMOS score to an MOS score by MOS = 7−DMOS and use the resulting MOS score to be the image quality metric. We recruit 52 subjects and perform 117 sessions with a total of 14,410 comparison rounds. The total study duration is 30 hours, where each session lasts for 15 minutes on average.

We consider the following performance metrics:

  • Bitrate, which is the number of bits sent from the server to mobile client every second.
  • Latency, which is the delay between capturing and displaying a video frame on mobile client.
  • Frame rate, which is the number of frames rendered at mobile client every second.
  • Preview quality, which is the MOS score. We focus on previews, because final images/videos are either saved on smart lenses, or transferred to mobile clients in batches at very high quality.
The first three metrics are objective and the last one is subjective. We report mean results with 95% confidence intervals whenever applicable.

8.3  Evaluation Results

The results of Smart Beholder presented here is configured statically with 3 Mbps bitrate, 24 fps, and 640x480 resolution. The smart parameter adapatation algorithm is not enabled to make the comparison simpler, as the two commercial products do not support adaptation mechanisms.

Smart Beholder vs. Commercial Smart Lens

We plot the overall performance in Figure 12. Figure 6(a) shows that Smart Beholder consumes as low as half of the bitrate compared to commercial smart lenses. This conforms to our expectation as we configure Smart Beholder to use an average bitrate of 3 Mbps. Figure 6(b) reveals that Smart Beholder results in at least 50 ms shorter latency, which in turn leads to more responsive user experience. Figure 6(c) depicts that our Smart Beholder achieves comparable, actually slightly higher, frame rate than the two commercial smart lenses. In summary, the Smart Beholder outperforms the two commercial products in all considered objective performance metrics. We report the preview image quality in Figures 6(d), which shows that our Smart Beholder achieves better MOS scores than two commercial smart lenses. More importantly, such higher preview quality does not come with higher network nor system loads as Figures 6(a) and 6(b) show.


Figure 6: Overall performance comparisons among smart lenses: (a) bitrate, (b) latency, (c) frame rate, and (d) preview quality.

Configurability of Smart Beholder

Smart Beholder is highly re-configurable, where frame rate, resolution, bitrate, among many other parameters, can be set according to different needs. With each configuration, we measure the system performance using the 216-sec preview video. Two sets of sample results are given below. First, we adjust the target encoding bitrates (with 640x480 resolution and 24 fps frame rate) and present the results in Figure 7. This figure depicts that when the target bitrate is increased, the achieved bitrate (Figure 7(a)) and the latency (Figure 7(b)) increase, while the frame rate (Figure 7(c)) slightly decreases. We believe the slightly increased latency and slightly decreased frame rate is due to a higher complexity and workload in decoding the preview videos at the client.


Figure 7: Beholder performance with different target bitrates: (a) achieved bitrate, (b) latency, and (c) frame rate.


Figure 8: Beholder performance under different target frame rates: (a) bitrate, (b) latency, and (c) frame rate.
Next, we adjust the target frame rate and give the results in Figure 8. Figure 8(c) shows that Smart Beholder always achieves the target frame rates, which reveals the efficiency of its implementation. Figures 8(a) and 8(b) show that higher target frame rates lead to higher bitrate and lower latency, which are consistent with our intuition, as a 6 fps live preview incurs at least 1000 ms÷6 ≈ 166 ms latency. In summary, Figures 7 and 8 demonstrate the configurability of the Smart Beholder implementation.