KEYWORDS: Field programmable gate arrays, Image filtering, Digital signal processing, Video, System on a chip, Scalable video coding, Performance modeling, Logic, Human-machine interfaces, Quantization
This paper describes key concepts in the design and implementation of a deblocking filter (DF) for a H.264/SVC video decoder. The DF supports QCIF and CIF video formats with temporal and spatial scalability. The design flow starts from a SystemC functional model and has been refined using high‐level synthesis methodology to RTL microarchitecture. The process is guided with performance measurements (latency, cycle time, power, resource utilization) with the objective of assuring the quality of results of the final system. The functional model of the DF is created in an incremental way from the AVC DF model using OpenSVC source code as reference. The design flow continues with the logic synthesis and the implementation on the FPGA using various strategies. The final implementation is chosen among the implementations that meet the timing constraints. The DF is capable to run at 100 MHz, and macroblocks are processed in 6,500 clock cycles for a throughput of 130 fps for QCIF format and 37 fps for CIF format. The proposed architecture for the complete H.264/SVC decoder is composed of an OMAP 3530 SOC (ARM Cortex‐A8 GPP + DSP) and the FPGA Virtex‐5 acting as a coprocessor for DF implementation. The DF is connected to the OMAP SOC using the GPMC interface. A validation platform has been developed using the embedded PowerPC processor in the FPGA, composing a SoC that integrates the frame generation and visualization in a TFT screen. The FPGA implements both the DF core and a GPMC slave core. Both cores are connected to the PowerPC440 embedded processor using LocalLink interfaces. The FPGA also contains a local memory capable of storing information necessary to filter a complete frame and to store a decoded picture frame. The complete system is implemented in a Virtex5 FX70T device.
The ability of additional detail extraction offered by the super-resolution image reconstruction (SRIR) algorithms greatly
improves the results of the process of spatial images augmentation, leading, where possible, to significant objective
image quality enhancement expressed in the increase of peak-signal-to-noise ratio (PSNR). Nevertheless, the ability of
providing hardware implementations of fusion SRIR algorithms capable of producing satisfactory output quality with
real-time performance is still a challenge. In order to make the hardware implementation feasible a number of trade-offs
that compromise the outcome quality are needed.
In this work we tackle the problem of high resource requirements by using a non-iterative algorithm that facilitates
hardware implementation. The algorithm execution flow is presented and described. The algorithm output quality is
measured and compared with competitive solutions including interpolation and iterative SRIR implementations. The
tested iterative algorithms use frame-level motion estimation (ME), whereas the proposed algorithm relies on,
performance-wise better, block matching ME. The comparison shows that the proposed non-iterative algorithm offers
superior output quality for all tested sequences, while promising efficient hardware implementation able to match -at
least- the software implementations in terms of outcome quality.
KEYWORDS: C++, Data modeling, Interfaces, Logic, Field programmable gate arrays, Data storage, Clocks, Signal processing, Signal detection, Optimization (mathematics)
The present paper describes an Electronic System Level (ESL) design methodology which was established and employed
in the creation of a H.264/AVC baseline decoder. The methodology involves the synthesis of the algorithmic description
of the functional blocks that comprise the decoder, using a high level synthesis tool. Optimization and design space
exploration is carried out at the algorithmic level before performing logic synthesis. Final, post-place and route
implementation results show that the decoder can operate at the target frequency of 100 MHz and meet real time
requirements for QCIF frames.
KEYWORDS: Detection and tracking algorithms, Chemical elements, Video processing, Optical tracking, Computing systems, Algorithm development, Systems modeling, Video, Modeling, Data processing
In this paper, we present the modelling of a real-time tracking system on a Multi-Processor System on Chip (MPSoC).
Our final goal is to build a more complex computer vision system (CVS) by integrating several applications in a modular
way, which performs different kind of data processing issues but sharing a common platform, and this way, a solution for
a set of applications using the same architecture is offered and not just for one application. In our current work, a visual
tracking system with real-time behaviour (25 frames/sec) is used like a reference application, and also, guidelines for our
future CVS applications development. Our algorithm written in C++ is based on correlation technique and the threshold
dynamic update approach. After an initial computational complexity analysis, a task-graph was generated from this
tracking algorithm. Concurrently with this functionality correctness analysis, a generic model of multi-processor
platform was developed. Finally, the tracking system performance mapped onto the proposed architecture and shared
resource usage were analyzed to determine the real architecture capacity, and also to find out possible bottlenecks in
order to propose new solutions which allow more applications to be mapped on the platform template in the future.
KEYWORDS: Video, Field programmable gate arrays, Telecommunications, Multimedia, Super resolution, Video compression, Computer programming, Computer simulations, Standards development, Control systems
In this paper we present a novel methodology to accelerate an MPEG-4 video decoder using software/hardware co-design
for wireless DAB/DMB networks. Software support includes the services provided by the embedded kernel
&mgr;C/OS-II, and the application tasks mapped to software. Hardware support includes several custom co-processors and a
communication architecture with bridges to the main system bus and with a dual port SRAM. Synchronization among
tasks is achieved at two levels, by a hardware protocol and by kernel level scheduling services. Our reference application
is an MPEG-4 video decoder composed of several software functions and written using a special C++ library named
CASSE. Profiling and space exploration techniques were used previously over the Advanced Simple Profile (ASP)
MPEG-4 decoder to determinate the best HW/SW partition developed here. This research is part of the ARTEMI project
and its main goal is the establishment of methodologies for the design of real-time complex digital systems using
Programmable Logic Devices with embedded microprocessors as target technology and the design of multimedia systems for broadcasting networks as reference application.
KEYWORDS: Video, Receivers, Image segmentation, Principal component analysis, Super resolution, Image processing, Lawrencium, System on a chip, Volume rendering, Image quality
This paper presents a system for real-time video reception in low-power mobile devices using Digital Audio Broadcast
(DAB) technology for transmission. A demo receiver terminal is designed into a FPGA platform using the Advanced
Simple Profile (ASP) MPEG-4 standard for video decoding. In order to keep the demanding DAB requirements, the
bandwidth of the encoded sequence must be drastically reduced. In this sense, prior to the MPEG-4 coding stage, a pre-processing
stage is performed. It is firstly composed by a segmentation phase according to motion and texture based on
the Principal Component Analysis (PCA) of the input video sequence, and secondly by a down-sampling phase, which
depends on the segmentation results. As a result of the segmentation task, a set of texture and motion maps are obtained.
These motion and texture maps are also included into the bit-stream as user data side-information and are therefore
known to the receiver. For all bit-rates, the whole encoder/decoder system proposed in this paper exhibits higher image
visual quality than the alternative encoding/decoding method, assuming equal image sizes. A complete analysis of both
techniques has also been performed to provide the optimum motion and texture maps for the global system, which has
been finally validated for a variety of video sequences. Additionally, an optimal HW/SW partition for the MPEG-4
decoder has been studied and implemented over a Programmable Logic Device with an embedded ARM9 processor.
Simulation results show that a throughput of 15 QCIF frames per second can be achieved with low area and low power
implementation.
KEYWORDS: Clocks, Interfaces, Computer architecture, System on a chip, Network on a chip, Telecommunications, Chemical elements, Video, Systems modeling, Data modeling
This paper presents a simple environment for the verification of AMBA 3 AXI systems in Verification IP (VIP)
production called VIPACES (Verification Interface Primitives for the development of AXI Compliant Elements and
Systems). These primitives are presented as a not compiled library written in SystemC where interfaces are the core of
the library. The definition of interfaces instead of generic modules let the user construct custom modules improving the
resources spent during the verification phase as well as easily adapting his modules to the AMBA 3 AXI protocol. This
topic is the main discussion in the VIPACES library. The paper focuses on comparing and contrasting the main
interconnection schemes for AMBA 3 AXI as modeled by VIPACES. For assessing these results we propose a validation
scenario with a particular architecture belonging to the domain of MPEG4 video decoding, which is compound by an
AXI bus connecting an IDCT and other processing resources.
Trends in multimedia consumer electronics, digital video and audio, aim to reach users through low-cost mobile devices connected to data broadcasting networks with limited bandwidth. An emergent broadcasting network is the digital audio broadcasting network (DAB) which provides CD quality audio transmission together with robustness and efficiency techniques to allow good quality reception in motion conditions. This paper focuses on the system-level evaluation of different architectural options to allow low bandwidth digital video reception over DAB, based on video compression techniques. Profiling and design space exploration techniques are applied over the ASP MPEG-4 decoder in order to find out the best HW/SW partition given the application and platform constraints. An innovative SystemC-based system-level design tool, called CASSE, is being used for modelling, exploration and evaluation of different ASP MPEG-4 decoder HW/SW partitions. System-level trade offs and quantitative data derived from this analysis are also presented in this work.
This paper addresses practical considerations for the implementation of algorithms developed to increase the image resolution from a video sequence by using techniques known in the specialized literature as super-resolution (SR). In order to achieve a low-cost implementation, the algorithms have been mapped onto a previously developed video encoder architecture. By re-using such architecture and performing only slight modifications on it, the need for specific, and usually high-cost, SR hardware is avoided. This modified encoder can be used either in native compression mode or in SR mode, where SR can be used to increase the image resolution over the sensor limits or as a smart way to perform electronic zoom, avoiding the use of high-power demanding mechanical parts. Two SR algorithms are presented and compared in terms of execution time, memory usage and quality. These algorithms features are analyzed from a real-time implementation perspective. The first algorithm follows an iterative scheme while the second one is a modified version where the iterative behavioural has been broken. The video encoder together with the new SR features constitutes an IP block inside Philips Research, upon which several System on Chip (SoC) platforms are being developed.
This paper addresses the enhancement of the spatial resolution of a
video sequence from a low resolution video sequence in real time.
The technique used, known as super-resolution reconstruction,
exploits the relative motion from frame to frame that produces
sub-pixel shifts. The algorithm, based on a previous version mapped
onto a video encoder architecture, is oriented towards a hardware
implementation and requires resources optimization. In order to
achieve a good resolution improvement, the motion estimation
algorithm must produce motion vectors as close to the real ones as
possible. At the same time, this motion estimation must match real
time requirements. Therefore, an exhaustive technique is applied in
combination with a simple segmentation of each frame for a motion
prediction refinement. Experimental results have been obtained for a
set of video sequences subjected to different motion
characteristics.
KEYWORDS: Image processing, Computer programming, Video, Super resolution, Image quality, Motion estimation, Algorithm development, System on a chip, Video compression, Image compression
This paper focuses on the mapping of low-cost and real-time super-resolution (SR) algorithms onto SOC (System-On-Chip) platforms in order to achieve high-quality image improvements. Low-cost constraints are accomplished by avoiding the need for specific SR hardware, by re-using a video encoder architecture. Only small modifications are needed for: the motion estimator, the motion compensator, image loop memory, etc. This encoder can be used either in compression mode or in SR mode. This video encoder together with the new SR features constitutes an IP block inside Philips Research, upon which several SOC platforms are being developed. Although this SR algorithm has been implemented on an encoder architecture developed by Philips Research it can be easily mapped upon other hybrid video encoder platforms. The results show important improvements in the image quality. Based on these results, some generalizations can be made about the impact of the sampling process on the quality of the super-resolution image.
Integrated inductors are key components in Radio Frequency Integrated Circuits (RFICs) because they are needed in several building blocks, such as voltage-controlled oscillators (VCOs), low-noise amplifiers (LNAs), mixers, or filters. The cost reduction, achieved in the circuit assemblage, makes them preferable to Surface Mounted Devices in spite of the different sources of lost that limits the use of integrated inductors; there are losses associated with the semiconductor substrate, and losses in the metals. We report, in this work, our research in modeling integrated inductors, particularly the losses in the metals. The model is derived from measurements taken from integrated spiral inductors designed and fabricated in a standard silicon process. The measurements reveal that the widely accepted lumped equivalent model does not properly predict the integrated inductor behavior at frequencies above 3 GHz for our technology. We propose a simple modification in the lumped equivalent circuit model: the introduction of an empirical resistor in the port 1-to-port 2 branch of the equivalent circuit. As a result, it will be demonstrated that the integrated inductor behavior is adequately predicted in a wider frequency range than does the conventional model. We also report a new methodology for characterizing the integrated inductors including the new resistor. In addition, the new model is used to build-up an integrated inductor library containing optimized integrated inductors.
This work analyses the DC response of an InGaAs channel PHFET when varying temperature. An analytic model for the drain current is derived from previous work, incorporating the extrinsic resistances. Experimental output characteristics at different temperatures are compared with those offered by the resulting model and numerical simulations. The DC drain current is obtained introducing the external voltages applied to the HFET terminals into an intrinsic model. The temperature range considered in this paper is between 300 and 400 K. In this range, the temperature dependence of the intrinsic electrical parameters is included in the model. For the temperature dependence of the extrinsic resistances, the HFET is numerically simulated with MINIMOS-NT. As far as we know, any influence of the electron transport through the AlGaAs/InGaAs heterojunction on the extrinsic resistances has not been already established. In our case, a termionic-field-emission (TFE) is used to simulate this effect (without TFE not only the drain current is underestimated, but also the temperature dependence predicted is opposite to the actual).
As result, the extrinsic source resistence is nearly constant (7.5 ohms), and higher values are obtained for the extrinsic drain resistence, which has a linear and positive temperature dependence, raising as the transistor operates in saturation region. When the drain voltage diminishes, the influence of the TFE model on the extrinsic resistances vanishes, and RD tends to RS. The drain current predicted by the model, in linear and saturation region, shows a relative error between measured and modeled values smaller than 10%.
KEYWORDS: Data modeling, Telecommunications, Process modeling, Very large scale integration, Modeling, Network architectures, Associative arrays, Data communications
This paper discusses and compares solutions for the issue of signalling and synchronization in the heterogeneous architecture multiprocessor paradigm. The on-chip interconnect infrastructure is split conceptually into a data transport network and a signalling network. This paper presents a SystemC based technique for modelling the communication architecture, with different topologies for the synchronization or signalling network. Each topology is parameterised for several communication requirements that define a point in the communication space. A high abstraction model leads to an experimental set-up that eases the analysis of the quantitative and qualitative behaviour of the networks for representative points in the communication space of the system design. The SystemC simulation models developed allow us to obtain information about total simulation time, processing time spent by the coprocessors, data transport time (read/write) used by the coprocessors (including arbitration time), and synchronization time spent by the coprocessors and the network. Another important metric is the coprocessor usage percentage. Results show that splitting data and signalling networks bring additional improvement to the performance of the system. The model applies well when mapping to architectural platforms the application processes expressed by abstract computational models such as Kahn process networks (KPN), synchronous data flow models (SDF), and generalized communicating sequential processes models (CSP).
KEYWORDS: Logic, Transistors, Clocks, System on a chip, Computer aided design, Interference (communication), Semiconducting wafers, Process engineering, Modeling, Instrument modeling
At 0.25, 0.18 um processes and beyond important process variations occur not only from one fab to another among batches. Moreover as we approach the realm of deep-submicron design, process variations even across a single die are predicted to become a major source of spread. Reduced signal levels, noise margins and timing windows are all contributing to make previously minor variations in geometry and technological parameters a big issue for circuit design. Worse still, new mechanisms appear that cause important variations not only in transistors but also in interconnect. And some of those mechanisms, show greater variation across a single die than across similar structures on different dice from a wafer. Thus the chip designer must expect significant and not necessarily predictable differences between transistors and between interconnect resistances on a single die. Given this scenario widely recognised by process engineers, and given the additional spread built-in in the process of mapping from a soft IP design to a hard IP block, if the designer had the opportunity to know certain performance parameters of the final hard-cores without doing successive synthesis it would lead to an easier and more predictable and accurate integration of the blocks in the system. In this sense, pre-characterised trust-worthy soft-IP blocks would be preferred candidates to select. We have explored ways for quantifying and analysing the synthesis to layout spread so that, instead of modelling the spread in devices and interconnects, we model and quantify at a higher abstraction level the technology mapping process as a whole, for a set of seed designs that will give bounds and guidelines for the behaviour of other designs when they are mapped to the same technology. For that purpose, only the best-, typical-, worst-case and other process variation corners need to be known. The analysis is based in the actual measured spread of reference seed designs as they experience spread when passing from soft to hard designs.
KEYWORDS: System on a chip, Bridges, Clocks, Virtual colonoscopy, Microcontrollers, Signal processing, Modeling, Data modeling, Control systems, Data communications
Advances in fabrication and design technologies have contributed to integrate a complete system on a chip. A system-on-chip (SoC) is generally composed of a microprocessor core, on-chip memory and one or more specific coprocessors IPs. One of the major drawbacks of this approach is the differences in the interfaces that each virtual component (VC) of the SoC presents. The idea of a common bus infrastructure allows us to smooth the system integration and has been considered as a design solution for SoC architectures. This paper presents a review of different alternatives for SoC buses and summarizes some experiences of their use. Different alternatives exist for SoC buses. ARM has proposed AMBA (Advanced Microcontroller Bus Architecture) as an open specification that serves as a framework for SoC design. AMBA is a bus architecture multiplayer for high performance SoC designs. AMBA support multi-master configurations where a bus arbiter must be included. AMBA-Lite is a simpler alternative if you are using only one master. IBM uses CoreConnect Bus architecture as a SoC solution for buses. CoreConnect share some similarities with AMBA because both use a multilayer bus to accommodate different speeds in the system: AHB and PLB can be compared. The same situation occurs for APB and OPB. Other alternatives can be found. Wishbone is an Open Bus Specification form opencores.org that tries to solve the problem of IP integration. The idea is to specify a common interface between cores to accelerate the development of virtual components. VSIA has proposed Virtual Component Interface (VCI) as a solution to solve the problem of virtual component integration. VCI specify three types of protocols depending on the level of complexity: Peripheral, Basic and Advanced VCI. The development of the IPs compatible with any of the SoC buses above presented is a complex problem. One solution is the use of wrappers that adapts the interface of the Virtual Component to the protocol supported by the SoC buses. The two main characteristics of these wrappers are that the increased in latency and area would be as low as possible. The second solution is to design the IP with the final environment in mind.
KEYWORDS: Discrete wavelet transforms, Gallium arsenide, Image compression, Image filtering, Linear filtering, Wavelets, Computer architecture, Very large scale integration, Video compression, Video
In this paper, the implementation and results obtained for a Gallium Arsenide (GaAs) multiplierless filter bank with applications on Two Dimensional Discrete Wavelet Transform (2D-DWT) are presented. Among the benefits offered by this architecture, its configurable characteristics, which allow affording input images with different sizes, as well as the ability to compute up to 10 levels of sub-band decomposition, are outlined. Different types of filters have been studied in order to select the one that best matches the requested applications. This election is based on a compromise among compactness of relevant image information in the LL sub-band, compression algorithms and VLSI simplicity. As a result, a filter running at 250 MHz with 3.2W of power dissipation is obtained, allowing CCIR applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.