Highly-Efficient Number-Crunching-Performance SoC Macros for Smart Biosensors
Upasna Vishnoi*
Electrical Engineering and Computer Systems, RWTH Aachen University, Germany
*Corresponding author: Upasna Vishnoi, Electrical Engineering and Computer Systems, RWTH Aachen University, Germany. Tel: +4916509468949; Email: vishnoiupasna@gmail.com
Received Date: 24 January, 2018; Accepted Date: 14 March,
2018; Published Date: 20 March, 2018
Citation: Vishnoi U (2018) Highly-Efficient
Number-Crunching-Performance SoC Macros for Smart Biosensors. Biosens
Bioelectron Open Acc: BBOA-115. DOI: 10.29011/BBOA-115.100015
1.
Keywords: Coordinate Rotate Digital Computer; Design-Space Exploration;
Early Cost Estimation; Pareto Optimization; QR-Decomposition; Smart Biosensors
1. Research
Depending from the specifications a huge design space, featuring up to thousands of possible implementations is available from the QRD-architecture template published in [1], these QR-Decomposition methods can be efficiently used in smart Biosensors. In order to support design-space exploration the parameterized cost model as well as routines for pruning (e.g. according to maximum latency), Pareto optimization etc. are implemented in a MATLAB-based optimization environment [2]. The execution time for a whole design space exploration (one set of specification parameters) is in the order of a few minutes only. Cost breakdown tables and figures can be generated automatically to get detailed insights to the cost contributions of the individual building blocks in order to identify bottlenecks and to guide optimization. Constraints can be set on the maximum clock frequency e.g. to avoid unreasonably high clock frequencies due to O- fold Coordinate Rotate Digital Computer (CORDIC) multiplexing or on selected clock frequencies being available on a So C. Arbitrarily high throughput rates can be achieved at almost unchanged ATE complexity and latency by multiplexing parallel QRD blocks [1]. Therefore, especially the area- and energy-optimization for less challenging throughput specifications is a valuable capability of this optimization environment [1]. Finally, it can be applied in early cost estimation to support system conception and design.
Just to give an idea of the capabilities of
the optimization environment exemplary results are in Figure 1a which shows an
example of AT- and ATE-design spaces for complex valued
integer full ([R]and [Q]) QRD of
matrix size N=12, iteration count M=16 and word length W=16.
The whole design space with 1,440 possible implementations is pruned for in Figure 1b. The
execution time for this example is 1 minute 42 seconds, only.
Figure 2 shows the AT as well as the ATE (insets) design spaces for different QRD specifications (applying extra delay stages in the PEs). The technology used is 40-nm CMOS technology. Here, for delay and output slopes SS-technology and slow-application (temperature) corner features were used while for energy and power features derived in FF-technology and fast-application corner were applied. Even though no fabricated silicon die would feature this combination of cost figures, it is still the adequate worst-case approach to ensure meeting specified figures. The back-biasing experiments were conducted assuming a reverse back-biasing voltage of VBS=-0.5V in order to reduce leakage power in the FF-corner and assuming a forward back-biasing voltage of VBS=-0.5V Volts in order to reduce critical path delay in the SS-corner. Supply voltage is 0.8 V; word length and CORDIC iteration count are specified to be w=M=16. Energy figures are given for the case that no clock or power gating is applied.
In Figures 2a,2b,2c the variation with matrix sizes from N=16 to N=18 for a real-valued, integer-data format QRD is shown. The corresponding total numbers of possible implementations are 960, 1,200, and 1,440. The AT -design spaces feature hyperbola-like Pareto-optimal fronts, offering trade-offs between throughput and silicon area. The ATE -design spaces are pruned for implementations featuring an ATE not being smaller than one tenth of the optimum ATE. Filled markers depict carry-ripple and unfilled markers depict carry-select adder-based implementations. Latency optimal implementations are depicted by red-filled circles. Squares mark latencies up to two, And diamonds up to five times the minimum latency. For this word length, ATE -optimal implementations in the lower left corner are solely carry-ripple adder based.
Figures 2d-2f shows the results for matrix size of N=16 for complex-valued / integer- w mantissa=16 and the exponent word length is Wexp=6. As can be seen from comparing the ATE -optimal implementations, the overhead for floating-point data format extensions is rather small (in the order of 26 % for AT and 14 % for E). In contrast to that, the overhead for the extensions for complex-valued matrix processing is quite high: Both, area and period costs are increased by a factor of more than two, resulting in a 4.9 time larger AT complexity. Energy also is increased by more than a factor of 2.3.
2. Acknowledgement
This
research work was done at the Chair of Electrical Engineering and Computer
Systems, RWTH Aachen University, Germany under the able guidance of Professor
Dr-Ing. Tobias G. Noll.
Figures 1(a, b):
Examples of AT - and ATE -design spaces for complex valued integer full ([R]and [Q]) QRD of matrix
size N=12.
Figures 2(a-f): AT - and ATE-design
space for a) - c) real-valued integer QRD with matrix size a) n=14,
b) n=16, c) n=18, and d) complex-valued integer n=14, e) real-valued float n=14, f) complex-valued float n=14; all figures for ([R]and [Q]), 40-nm CMOS worst case, integer
word lengths W=16 bit, floating-point
word lengths w mantissa=16 bit,
Wexp=6 bit, CORDIC iteration count M=16.