← Back to News List

Defense: Nguyen on Fast Modular Exponentiation Using Residue Domain Representation, Noon 11/5

from http://upload.wikimedia.org/wikipedia/en/2/26/COPACOBANA_FPGA_BOARD.jpg

M.S. Thesis Defense
Computer Science and Electrical Engineering
University of Maryland, Baltimore County

Fast Modular Exponentiation Using Residue Domain Representation:
A Hardware Reference Implementation and Analysis

Christopher D. Nguyen

12:00–2:00pm, Tuesday, 5 November 2013, ITE 228, UMBC

Using field-programmable gate arrays (FPGAs) we engineered and analyzed the first hardware implementation of Phatak’s reduced-precision residue number system (RP-RNS) to perform modular exponentiation.

Residue number systems (RNSs) provide an alternative representation to the binary system for performing integer arithmetic used in applications such as public-key cryptography and digital signal processing. They offer full parallel computation for addition, subtraction, and multiplication increasing performance from O(K) to O(lg K) for a K-bit number. However, base extension, division, and sign detection become harder operations.

RP-RNS is a new set of algorithms that uses approximation and a time-memory trade-off to address the hard operations. The partial reconstruction (PR) algorithm addresses base extension and the quotient-first scaling (QFS) algorithm addresses scaling. RP-RNS modular exponentiation uses the PR and QFS algorithms. RP-RNS improves performance of modular multiplication in an RNS with range [0, M-1] from the O((lg n)^2) delay of current systems (e.g. Cox-Rower) to a theoretical O(lg n) delay where n is the word-length of M.

Our implementation is based on Phatak’s description and recommended architecture diagrams. We found even low-end FPGAs can store over 30 channels of logic. Following the recommendation of parallel look-up table (LUT) access, we distributed the LUTs to be local to each channel. We found this recommendation applied to QFS exceeds the capacity of today’s high-capacity FPGAs (e.g. Xilinx Virtex-7) for modest 2,000-bit divisors. We propose several improvements to increase feasibility; one is to store the LUTs external to the FPGA, which would introduce a performance penalty per look-up.

Committee: Alan Sherman (chair), Dhananjay Phatak, Chintan Patel and
Ryan Robucci

Posted: November 2, 2013, 10:41 AM