Projects

Levi Camera and Wrangler Camera

The Levi camera is a bi-directional TDI CMOS camera. This camera was designed in Research and Development center at Teledyne.

Lightweight and CCA2-Secure Hardware Implementation of Binary Ring-LWE

Due to increasing the number of connected devices to IoT networks, providing end-to-end security is essential. Lattice-based cryptography (LBC) is a promising method for IoT by providing the reasonable security against classic and quantum attacks. Binary Ring-LWE is a type of LBC that is suitable for IoT devices. However, a reliable cryptosystem should also be secure against different side-channel attacks, such as power analysis or fault injection ones. In this project, a fault resilient hardware implementation for an optimized hardware design of Ring Binary LWE for resource-constraint IoT devices is presented. The design was implemented on the FPGA platform. The maximum frequency and occupied slices on Virtex-7 are 210.5 MHz and 423, respectively. Based on the result, the proposed design occupied only 1% of the total available slices.

Area and Power Efficient Post-Quantum Cryptosystem for IoT Resource-Constrained

Devices

Internet of Things (IoT) connects a myriad of small devices over a huge network, encompassing many different and varied applications and environments. As the IoT network continues to grow, providing end-to-end security over IoT is becoming a paramount issue. To mitigate existing and future security risks within IoT, two important factors should be considered. First, some resource-constrained edge devices have an insufficient area to contain the security part. Second, the advent of quantum computers threatens the security of current public-key cryptography algorithms. In response to these challenges, lattice-based cryptography (LBC) has emerged as a promising technique for IoT security in the quantum era. The feasibility of LBC integration onto resource-constrained devices has been demonstrated in previous research. Multiplication is the main operation in Ring-BinLWE, a type of LBC. In this project, a new multiplication method is proposed (In-place Rot-Col-Mul) and new Ring-BinLWE architecture is designed. In-place Rot-Col-Mul performs a column-based multiplication in which one rotation is executed per cycle. The design was implemented on TSMC-65nm technology and FPGA platforms. ASIC implementation results show a respective improvement in power and area over the state-of-the-art design by 48.42% and 57.8%, respectively.

Area-Efficient Nano-AES Implementation for Internet-of-Things End-node Devices

Due to the fast-growing number of connected tiny devices to the Internet of Things (IoT), providing end-to-end security is vital. Therefore, it is essential to design the cryptosystem based on the requirement of resource-constrained IoT devices. In this project, a lightweight Advanced Encryption Standard (AES), a high-secure symmetric cryptography algorithm, implementation on FPGA, and 65-nm technology for resource-constrained IoT devices was done. The proposed architecture includes an 8-bit data-path and five main blocks. We design two specified register banks, Key-Register and State-Register, for storing the plain text, keys, and intermediate data. To reduce the area, Shift-Rows is embedded inside the State-Register. To adapt the Mix-Column to an 8-bit data-path, we design an optimized 8-bit block for Mix-Columns with four internal registers, which accept 8-bit and send back 8-bit. Also, a shared optimized Sub-Bytes is employed for the key expansion phase and encryption phase. To optimize Sub-Bytes, we merged and simplify some parts of the Sub-Bytes. To reduce power consumption, we apply the clock gating technique to the design. Application-specific integrated circuit (ASIC) implementation results show a respective improvement in the area over the previous similar works from 35% to 2.4%. Based on the results, the proposed design is a suitable cryptosystem for tiny IoT devices.

High throughput and area-efficient FPGA implementation of AES for high-traffic applications

In this project, a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128) was done. AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.

Efficient spiking neural network training and inference with reduced precision memory and computing

In this project, reduced precision operations are investigated in order to improve the speed and energy efficiency of SNN implementation. Instead of using the 32-bit single-precision floating-point format, small floating-point format and fixed-point format are used to represent SNN parameters and to perform SNN operations. The analyses are performed on the training and inference of a leaky integrate-and-fire model-based SNN that is trained and used to classify the handwritten digits in the MNIST database. The analysis results show that for SNN inference, the floating-point format with 4-bit exponent and 3-bit mantissa or the fixed-point format with 6-bit integer and 7-bit fraction can be used without any accuracy degradation. For training, a floating-point format with 5-bit exponent and 3-bit mantissa or a fixed-point format with 6-bit integer and 10-bit fraction can be used to obtain full accuracy. The proposed reduced precision formats can be used in SNN hardware accelerator design and the selection between floating-point and fixed-point can be determined by design requirements. A case study of SNN implementation on field-programmable gate array device is performed. With reduced precision numerical formats, memory footprint, computing speed, and resource utilisation are improved. As a result, the energy efficiency of SNN implementation is also improved.

This project was supported by the ICT R&D program of MSIT/IITP (2018-0-00197, Development of ultra-low power intelligent edge SoC technology based on lightweight RISC-V processor).

In this project, My main contribution was designing the hardware structure for SNN and implementing it on FPGA.

Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

In this project, a new 32-bit Application-Specific Instruction-set Processor (ASIP-based) crypto-processor for AES, IDEA, and MD5 is designed. The instruction set consists of both general-purpose and specific instructions for the mentioned cryptographic algorithms. The proposed architecture has nine function units and two data buses. It has also two types of 32-bit instruction formats for executing Memory Reference (M.R.), Register Reference (R.R.), and Input/Output Reference (I/O R.) instructions. The maximum achieved frequency is 166.916 MHz. The encoded output results of the encryption process of a 128-bit input block are obtained after 122, 146, and 170 clock cycles for AES-128, AES-192, and AES-256, respectively. Moreover, it takes 95 clock cycles to encrypt or decrypt a 64-bit input block by using IDEA. Finally, the MD5 hash algorithm requires 469 clock cycles to generate the coded outputs for a block of 512 bits. The performance of the proposed processor is compared to some previous and state-of-the-art implementations in terms of speed, latency, throughput, and flexibility.

Home page

My Complete CV

TO TOP ↑

Page updated

Google Sites

Report abuse