Figure 1: A Smart Beholder prototype on Rasperry Pi.

Smart lenses are detachable lenses connected to mobile devices via wireless networks, which are not constrained by the small form factor of mobile devices, and have potential to deliver better photo (video) quality. However, the viewfinder previews of smart lenses on mobile devices are difficult to optimize, due to the strict resource constraints on smart lenses and fluctuating wireless network conditions.

We design, implement, and evaluate an open-source smart lens, called Smart Beholder. The term "beholder" is normally used to refer a fictional flying orb with a large eye (commonly seen in AD&D games), in analogy to very portable smart lenses with powerful (but potentially bulky) optical lenses. We carefully design and implement Smart Beholder for cost effectiveness, low interaction delay, and high viewfinder preview quality. This is done by (i) selecting an embedded system board that is just-powerful-enough, (ii) minimizing per-component latency, and (iii) dynamically adapting the video coding parameters to maximizing Quality of Experience (QoE), respectively. Several optimization techniques, such as anti-drifting mechanism for video frames and QoE-driven resolution/frame rate adaptation algorithm, are employed.

Our extensive measurement study indicates that Smart Beholder outperforms a number of commercial products in terms of lower bitrate, lower latency, slightly higher frame rate, and better preview quality. We also demonstrate that Smart Beholder adapts to network dynamics. Smart Beholder is provided to the public as anexperimental platform for researchers and developers for optimized smart lenses, and other embedded real-time video streaming systems.

Figure 2: A working Smart Beholder prototype with an Android-based client connecting to it.

Smart Beholder is built to be an open platform for researchers and developers to evaluate different design alternatives, so as to make educated, if not optimal, design decisions. A complete Smart Beholder platform as illustrated in Figure 2. In this picture, the Smart Beholder server runs on a Raspberry Pi board on the left; the Smart Beholder client runs on an Android tablet showing what the webcam on the server points to. Our evaluation results show the practicality and efficiency of the Beholder platform over the considered commercial products.


The popularity of smartphones grows dramatically in the past few years, and the growth rate shows no sign of slowing down. For example, a recent report indicates that more than 1 billion smartphones were shipped in 2013, which is equivalent to 38.4% of increase compared to 2012. The smartphones come with cameras, and are used by casual photographers to replace their digital cameras.

In fact, we observe clear drops on the number of shipped digital cameras starting from 2012, and the projected shipment volume of digital cameras is only 54 millions in 2014. Such replacement effect may be attributed to the convenience of smartphones and the shrinking performance gap between smartphone cameras and digital cameras. Nevertheless, there are still well-known reasons that differentiate digital cameras from smartphone cameras:

  1. Smartphones must be compact in order to fit into users' pockets. Therefore, most smartphone cameras are not equipped with bulky optical zoom lenses. The users have to resort to suboptimal digital zooms.
  2. Smartphones often come with smaller optical sensors due to the space concerns, which lead to inferior photo quality especially under low-light conditions.
  3. Smartphones do not support interchangeable lenses, and cannot adopt long-focus, macro, fish-eye, and wide-angle lenses for high-quality and more extreme needs.
  4. Smartphone cameras are not normally adjustable in terms of, e.g., ISO, aperture, and shutter speed, which result in inflexibility. These limitations prevent mobile photographers from producing high-quality photos using their smartphones.
Smart lenses, such as Sony DSC-QX100, Kodak SL10, and Altek Cubic, are detachable lenses connected to mobile devices via wireless networks. Mobile photographers use smartphones (or tablets) to access the smart lenses for: (i) previewing photos (or videos) in live viewfinders, (ii) adjusting various lens configurations, (iii) capturing photo (or video) shots, and (iv) applying digital effects. Since smart lenses are no longer embedded in smartphones, the form factor of smart lenses are not limited by that of smartphones. Therefore, smart lenses are capable to address the aforementioned limitations, closing up the gap of gears between mobile and professional photographers. Moreover, some special shooting angles, such as low-angle and close-up shots, are easier to take with smart lenses as they are detached from viewfinders on smartphones.

While smart lenses offer such new opportunities to mobile photographers, delivering good photo taking experience is not an easy task because the smart lenses are connected to smartphones via wireless networks, which are sensitive to fading, shadowing, and interference in wireless communications. Furthermore, mobile photographers demand for two contradicting requirements of viewfinder previews: (i) low interaction latency and (ii) high video quality. Concurrently achieving both requirements is quite challenging and leaves a huge design space for researchers and developers to explore. However, existing commercial smart lenses are proprietary and closed, and cannot facilitate customizations and parameter tuning to exercise the design space.


Design Objectives

Smart Beholder aims to provide an open platform for researchers and developers to study and build real-time mobile photography applications. The design objectives of the proposed Smart Beholder platform include:

  1. Cost effectiveness. To allow more researchers and developers to use Smart Beholder, we carefully select more cost-effective hardware components. Users are, however, free to swap in/out any of the components to meet their needs.
  2. Low latency. Users will be annoyed by long latency when using mobile photography applications, especially in viewfinder previews. To provide better photo taking experience, we strive to minimize latencies of all involved components.
  3. High quality. A fundamental requirement for photography is maximizing the photo (video) quality. The photo (video) quality largely depends on the hardware specifications of camera components, which are out of the scope of this work. On the other hand, sending live previews (for both photo and video shots) incurs a huge amount of real-time traffic, and thus leaves more rooms for optimization. Indeed, the quality of live previews significantly affects how a photographer sees and composes photos, especially for portraits and close-up shots. Hence, for higher QoE, we endeavor to maximize the preview video quality.
We emphasize that concurrently achieving all design goals is no easy task. For example, we have to optimize individual components in the video processing pipeline to minimize the system-wide latency. In addition, we need to consider multiple user-perceived quality metrics, such as graphics quality and interactivity, which further complicates the design of Smart Beholder.

System Architecture

Figure 3: The server and client architecture of Smart Beholder.

The server and client architecture of Smart Beholder is given in Figure 3. Smart Beholder is inspired by cloud gaming and screen sharing platforms, but concentrates on solving the unique challenges of smart lenses, including:

  1. resource constraints of embedded system boards,
  2. uncertainty of single-hop short-range networks, and
  3. high overhead of external camera modules.
The Beholder server runs on an embedded system board, and consists of three software components: AP (Access Point) service, DHCP (Dynamic Host Configuration Protocol) server, and video streamer. The AP service turns the server into an access point, allowing Beholder clients to connect to the server via Wi-Fi (or other wireless networks). The DHCP server assigns IP addresses to connected mobile clients. Meanwhile, the video streamer: (i) captures videos using a camera, (ii) encodes videos using software/hardware codecs, and (iii) streams encoded videos via the RTSP (Real-Time Streaming Protocol) and RTP (Real-Time Protocol) servers.

The Beholder client runs on mobile devices and consists of two components: UI (User Interface) and video streamer. The UI component is composed of the viewfinder and camera controller. The viewfinder renders the live videos received from the Beholder server, and the camera controller sends camera control commands to the server. Possible camera control commands include taking photo, recording video, setting white balance, applying image effects, configuring exposure, and tuning sensitivity. The video streamer contains hardware/software decoders, controller client, and RTSP/RTP client.