CMU 15-418/618 Final Project: Checkpoint Writeup

Checkpoint Report:

CUDA RSA

Checkpoint Overview

This page explains the progress and updates made to date on the CUDA RSA project. Beside that which is on this page, updates have also been made to the main page schedule, giving a brief overview of progress with respect to the goals set forth by the project proposal and the new goals given the work done so far (as explained below). Small edits and clarifications have also been made to the project proposal.

Progress to Date

The emphasis of our project is on creating a CUDA based factorizer, so most of our effort so far has been put to researching, understanding, and evaluating factorization algorithms. However, in order to create a meaningful large number factorizer, we also require an API for large (greater than 64 bit) integers. Consequently our initial focus has been split between understanding and implementing basic factorization algorithms and creating a solid GPU multiple precision integer library on which we will rely.

Implementing Factoriztion

Because of the reported success of Pollard's p - 1 algorithm given by "Efficient Paralles RSA Decryption Algorithm for Many-Core GPUs with CUDA," we decided to first implement this algorithm completely before dabbling in any others. This way, we can increase the chance that we will be able to end up with a complete project. We first implemented a single threaded version of the algorithm to ensure that we understand it and are able to replicate it. Since we do not yet have multiple precision resources on the GPU we decided to use longs in our CPU version, so that the code would be nearly directly transferrable to the GPU. After finishing and testing the CPU version, we began work on the GPU version. At the moment, the GPU version is nearly complete, but uses a poor method for determining which value of k to use, which makes sense in a single threaded application but does not work as well on the GPU. The next step to be taken is to use a large global prime table as recommended in the above paper. Once that is complete, we must still connect the algorithm to our multiple precision library.

Multiple Precision Library

Work on the Multiple Precision Integer library for CUDA has been progressing well. We have a partially completed and well tested library with the following functionality:

Initialization of arbitrary length integers (positive and negative)
Addition, Subtraction, Multiplication
Comparison

We have created the library such that it can be compiled by both g++ and nvcc for testing on the CPU and GPU respectively. In addition we have a robust unit test suite to ensure correctness.

The remaining work in the multiple precision library is to implement division, power, and modulus.

Updated Goals and Deliverables

We have been keeping good pace with our original schedule and we think we can hit most of the goals we originally set. Here is what we think we'll have at the end of the project:

A working multiple precision library for CUDA
A CUDA implementation of Pollard's p-1 algorithm built on top of the MP library
Analysis (graphical and written) of the performance of our implementation against various CPU based implementations.

And as a stretch goal we hope to have:

Code that interfaces with RSA so we can demo the power of being able to factor large numbers quickly.

Our demo will depend on whether or not we can implement the RSA portion of the project. Obviously, it is the most flashy and would give a good demo. We would start the demo off by encrypting a message with a small key (like a 64-bit key). Then we would give a short presentation about our project and findings. Then at the end of the presentation our code will have factored the public key, decrypted the message, and output the message to the screen, thus demonstrating the power of our implementation. It may be the case that writing code to interface with RSA is too difficult or tedious, in which case this presentation won't be possible. In this case, we can still demonstrate our program's factoring on large numbers. We could use a number likely to be used in RSA (large number with two, large, prime factors).

Preliminary Results

Our only preliminary results are that we can factor small numbers on the CPU using Pollard's p-1 algorithm, and that our MP library is passing all tests for what is currently implemented.

Issues / Concerns

We are making good progress and do not have any major concerns at this time. Finishing the rest of the project should hopefully just be a matter of coding and puzzling out how to best parallelize Pollard's p-1 Algorithm. Our biggest challenges are yet to come but we think we will be able to have a working project in time. The only concern at this time is getting RSA working, but that is a stretch goal.

Updated Proposed Schedule

What We Plan To Do

Week First Half Second Half

Apr 22-28 -- Finish GPU based implementation of Pollard's algorithm, using longs. Finish implementing and testing the MP library

Apr 29-May 5 Adapt GPU code to use MP library (if not ready, investigate alternative algorithms) Profile code, create test suite and begin comparing to standards

May 6-11 Consider optimizations, reworks for specific GPU architecture Implement RSA cracking, using factorization algorithm, evaluate viability of stretch goals

	What We Plan To Do
Week	First Half	Second Half
Apr 22-28	--	Finish GPU based implementation of Pollard's algorithm, using longs. Finish implementing and testing the MP library
Apr 29-May 5	Adapt GPU code to use MP library (if not ready, investigate alternative algorithms)	Profile code, create test suite and begin comparing to standards
May 6-11	Consider optimizations, reworks for specific GPU architecture	Implement RSA cracking, using factorization algorithm, evaluate viability of stretch goals