Defense: Nguyen on Fast Modular Exponentiation Using Residue Domain Representation, Noon 11/5
M.S. Thesis Defense
Computer Science and Electrical Engineering
University of Maryland, Baltimore County
Fast Modular Exponentiation Using Residue Domain Representation:
A Hardware Reference Implementation and Analysis
Christopher D. Nguyen
12:00–2:00pm, Tuesday, 5 November 2013, ITE 228, UMBC
Using field-programmable gate arrays (FPGAs) we engineered and analyzed the first hardware implementation of Phatak’s reduced-precision residue number system (RP-RNS) to perform modular exponentiation.
Residue number systems (RNSs) provide an alternative representation to the binary system for performing integer arithmetic used in applications such as public-key cryptography and digital signal processing. They offer full parallel computation for addition, subtraction, and multiplication increasing performance from O(K) to O(lg K) for a K-bit number. However, base extension, division, and sign detection become harder operations.
RP-RNS is a new set of algorithms that uses approximation and a time-memory trade-off to address the hard operations. The partial reconstruction (PR) algorithm addresses base extension and the quotient-first scaling (QFS) algorithm addresses scaling. RP-RNS modular exponentiation uses the PR and QFS algorithms. RP-RNS improves performance of modular multiplication in an RNS with range [0, M-1] from the O((lg n)^2) delay of current systems (e.g. Cox-Rower) to a theoretical O(lg n) delay where n is the word-length of M.
Our implementation is based on Phatak’s description and recommended architecture diagrams. We found even low-end FPGAs can store over 30 channels of logic. Following the recommendation of parallel look-up table (LUT) access, we distributed the LUTs to be local to each channel. We found this recommendation applied to QFS exceeds the capacity of today’s high-capacity FPGAs (e.g. Xilinx Virtex-7) for modest 2,000-bit divisors. We propose several improvements to increase feasibility; one is to store the LUTs external to the FPGA, which would introduce a performance penalty per look-up.
Committee: Alan Sherman (chair), Dhananjay Phatak, Chintan Patel and
Ryan Robucci
Posted: November 2, 2013, 10:41 AM