Mobile Web browsing advancements have largely been driven by the single-minded pursuit of performance. In this paper, we take a different stance and examine the energy efficiency of mobile Web browsing. Focusing on both the network and CPU, we find that generational advancements in network technology have reached a point where the mobile CPU is starting to have noticeable impact on Web browsing performance and energy consumption. Achieving energy-efficient mobile Web browsing requires considering both CPU and network capabilities (i.e., latency). We show that under different network latency conditions, achieving energy-efficient mobile Web browsing mandates different CPU performance capabilities. It is important to understand and leverage such interactions between the network and CPU in order to deliver high mobile Web performance while maintaining a low energy footprint, which current OS DVFS governors fail to achieve. Our findings suggest that designing future high-performance and energy-efficient mobile Web clients implies looking beyond individual components and taking a full system perspective.
Web browsing on mobile devices is undoubtedly the future. However, with the increasing complexity of webpages, the mobile device's computation capability and energy consumption become major pitfalls for a satisfactory user experience. In this paper, we propose a mechanism to effectively leverage processor frequency scaling in order to balance the performance and energy consumption of mobile web browsing. This mechanism explores the performance and energy tradeoff in webpage loading, and schedules webpage loading according to the webpages' characteristics, using the different frequencies. The proposed solution achieves 20.3% energy saving compared to the performance mode, and improves webpage loading performance by 37.1% compared to the battery saving mode.
In this article, we developed a massively parallel gate-level logical simulator to address the ever-increasing computing demand for VLSI verification. To the best of the authors' knowledge, this work is the first one to leverage the power of modern GPUs to successfully unleash the massive parallelism of a conservative discrete event-driven algorithm, CMB algorithm. A novel data-parallel strategy is proposed to manipulate the fine-grain message passing mechanism required by the CMB protocol. To support robust and complete simulation for real VLSI designs, we establish both a memory paging mechanism and an adaptive issuing strategy to efficiently utilize the GPU memory with a limited capacity. A set of GPU architecture-specific optimizations are performed to further enhance the overall simulation performance. On average, our simulator outperforms a CPU baseline event-driven simulator by a factor of 47.4X. This work proves that the CMB algorithm can be efficiently and effectively deployed on modern GPUs without the performance overhead that had hindered its successful applications on previous parallel architectures.
Web computing is gradually shifting toward mobile devices, in which the energy budget is severely constrained. As a result, Web developers must be conscious of energy efficiency. However, current Web languages provide developers little control over energy consumption. In this paper, we take a first step toward language-level research to enable energy-efficient Web computing. Our key motivation is that mobile systems can wisely budget energy usage if informed with user quality-of-service (QoS) constraints. To do this, programmers need new abstractions. We propose two language abstractions, QoS type and QoS target, to capture two fundamental aspects of user QoS experience. We then present GreenWeb, a set of language extensions that empower developers to easily express the QoS abstractions as program annotations. As a proof of concept, we develop a GreenWeb runtime, which intelligently determines how to deliver specified user QoS expectation while minimizing energy consumption. Overall, GreenWeb shows significant energy savings (29.2% ~ 66.0%) over Android’s default Interactive governor with few QoS violations. Our work demonstrates a promising first step toward language innovations for energy-efficient Web computing.
In this paper, we assess the past, present, and future of mobile CPU design. We study how mobile CPU designs trends have impacted the end-user, hardware design, and the holistic mobile device. We analyze the evolution of ten cutting-edge mobile CPU designs released over the past seven years. Specifically, we report measured performance, power, energy and user satisfaction trends across mobile CPU generations.
Our results quantitatively demonstrate that CPUs play a crucial role in modern mobile system-on-chips (SoCs). Over the last seven years, both single- and multicore performance improvements have contributed to end-user satisfaction by reducing user-critical application response latencies. Mobile CPUs aggressively adopted many power-hungry desktop-oriented design techniques to reach these performance levels. Unlike other smartphone components (e.g. display and radio) whose peak power consumption has decreased over time, the mobile CPU’s peak power consumption has steadily increased.
As the limits of technology scaling restrict the ability of desktop-like scaling to continue for mobile CPUs, specialized accelerators appear to be a promising alternative that can help sustain the power, performance, and energy improvements that mobile computing necessitates. Such a paradigm shift will redefine the role of the CPU within future SoCs, which merit several design considerations based on our findings.
Mobile Web applications have become an integral part of our society. They pose a high demand for application quality of service (QoS). However, the energy-constrained nature of mobile devices makes optimizing for QoS difficult. Prior art on energy efficiency optimizations has only focused on the trade-off between raw performance and energy consumption, ignoring the application QoS characteristics. In this paper, we propose the concept of energy-efficient QoS (eQoS) to capture the trade-off between QoS and energy consumption. Given the fundamental event-driven nature of mobile Web applications, we further propose event-based scheduling as an optimization framework for eQoS. The event-based scheduling automatically reasons about users’ QoS requirements, and accurately slacks the events’ execution time to save energy without violating end users’ experience. We demonstrate a working prototype using the Google Chromium and V8 framework on the Samsung Exynos 5410 SoC (used in the Galaxy S4 smartphone). Based on real hardware and software measurements, we achieve 41.2% energy saving with only 0.4% of QoS violations perceptible to end users.
In contrast to traditional computing systems, such as desktops and servers, that are programmed to perform “compute-bound” and “run-to-completion” tasks, mobile applications are designed for user interactivity. Factoring user interactivity into computer system design and evaluation is important, yet possesses many challenges. In particular, systematically studying interactive mobile applications across the diverse set of mobile devices available today is difficult due to the mobile device fragmentation problem. At the time of writing, there are 18,796 distinct Android mobile devices on the market and will only continue to increase in the future. Differences in screen sizes, resolutions and operating systems impose different interactivity requirements, making it difficult to uniformly study these systems.
We present Mosaic, a cross-platform, timing-accurate record and replay tool for Android-based mobile devices. Mosaic overcomes device fragmentation through a novel virtual screen abstraction. User interactions are translated from a physical device into a platform-agnostic intermediate representation before translation to a target system. The intermediate representation is human-readable, which allows Mosaic users to modify previously recorded traces or even synthesize their own user interactive sessions from scratch. We demonstrate that Mosaic allows user interaction traces to be recorded on emulators, smartphones, tablets, and development boards and replayed on other devices. Using Mosaic we were able to replay 45 different Google Play applications across multiple devices, and also show that we can perform cross-platform performance comparisons between two different processors under identical user interactions.
The Web browser is undoubtedly the single most important application in the mobile ecosystem. An average user spends 72 minutes each day using the mobile Web browser. Web browser internal engines (e.g., WebKit) are also growing in importance because they provide a common substrate for developing various mobile Web applications. In a user-driven, interactive, and latency-sensitive environment, the browser’s performance is crucial. However, the battery-constrained nature of mobile devices limits the performance that we can deliver for mobile Web browsing. As traditional general-purpose techniques to improve performance and energy efficiency fall short, we must employ domain-specific knowledge while still maintaining general-purpose flexibility.
In this paper, we first perform design-space exploration to identify appropriate general-purpose architectures that uniquely fit the characteristics of a popular Web browsing engine. Despite our best effort, we discover sources of energy inefficiency in these customized general-purpose architectures. To mitigate these inefficiencies, we propose, synthesize, and evaluate two new domain-specific specializations, called the Style Resolution Unit and the Browser Engine Cache. Our optimizations boost energy efficiency and at the same time improve mobile Web browsing performance. As emerging mobile work- loads increasingly rely more on Web browser technologies, the type of optimizations we propose will become important in the future and are likely to have lasting widespread impact.
This work proposes an empirical Bias Temperature Instability (BTI) stress-relaxation model based on the superposition property. The model was used to study the instantaneous frequency fluctuation in a fast Dynamic Voltage and Frequency Scaling (DVFS) environment. VDD and operating frequency information for this study were collected from an ARM Cortex A15 processor based development board running an Android operating system. Simulation results show that the frequency peaks and dips are functions of mainly two parameters: (1) the amount of stress or recovery experienced by the circuit prior to the VDD switching and (2) the frequency sensitivity to device aging after the VDD switching.
Internet web browsing has reached a critical tipping point. Increasingly, users rely more on mobile web browsers to access the Internet than desktop browsers. Meanwhile, webpages over the past decade have grown in complexity by more than tenfold. The fast penetration of mobile browsing and ever-richer webpages implies a growing need for high-performance mobile devices in the future to ensure continued end-user browsing experience. Failing to deliver webpages meeting hard cut-off constraints could directly translate to webpage abandonment or, for e-commerce websites, great revenue loss. However, mobile devices' limited battery capacity limits the degree of performance that mobile web browsing can achieve. In this paper, we demonstrate the benefits of heterogeneous systems with big/little cores each with different frequencies to achieve the ideal trade-off between high performance and energy efficiency. Through detailed characterizations of different webpage primitives based on the hottest 5,000 webpages, we build statistical inference models that estimate webpage load time and energy consumption. We show that leveraging such predictive models lets us identify and schedule webpages using the ideal core and frequency configuration that minimizes energy consumption while still meeting stringent cut-off constraints. Real hardware and software evaluations show that our scheduling scheme achieves 83.0% energy savings, while only violating the cut-off latency for 4.1% more webpages as compared with a performance-oriented hardware strategy. Against a more intelligent, OS-driven, dynamic voltage and frequency scaling scheme, it achieves 8.6% energy savings and 4.0% performance improvement simultaneously.
With the constantly increasing Internet traffic and fast changing network protocols, future routers have to simultaneously satisfy the requirements for throughput, QoS, flexibility, and scalability. In this work, we propose a novel integrated CPU/GPU microarchitecture, Hermes, for QoS-aware high speed routing. We also develop a new thread scheduling mechanism, which significantly improves all QoS metrics.
Logical simulation is the primary method to verify the correctness of IC designs. However, today's complex VLSI designs pose ever higher demand for the throughput of logic simulators. In this work, a parallel logic simulator was developed by leveraging the computing power of modern graphics processing units (GPUs). To expose more parallelism, we implemented a conservative parallel simulation approach, the CMB algorithm, on NVidia GPUs. The simulation processing is mapped to GPU hardware at the finest granularity. With carefully designed data structures and data flow organizations, our GPU based simulator could overcome many problems that hindered efficient implementations of the CMB algorithm on traditional parallel computers. In order to efficiently use the relatively limited capacity of GPU memory, a novel memory management mechanism was proposed to dynamically allocate and recycle GPU memory during simulation. We also introduced a CPU/GPU co-processing strategy for the best usage of computing resources. Experimental results showed that our GPU based simulator could outperform a CPU baseline event driven simulator by a factor of 29.2.