The first two prime numbers articles I wrote (Prime Numbers Part 1 - Introduction and Prime Numbers Part 2 - Generating Primes) were written in a different country, on a different PC (with a CRT monitor!) in an old version of C#, and were uploaded to my website onto a different CMS (The publishing dates on the articles are actually wrong by several years!). It was a long time ago, and many things have changed; it’s probably about time I took another look at the subject.
Where Were We?
In the previous articles I wrote a C# class EratosthenesGenerator
, implementing the interface IPrimeGenerator
and being created by the PrimeGeneratorFactory
class. The code worked, and generated prime numbers, but there were areas that could be optimized, and I didn’t quite get around to doing that.
Where Should We Go Next?
In the original articles I made the argument that we should concentrate on making code work, and then making it fast. If you can do both at the same time that’s great, but in that case I didn’t, This time I’ll be aiming to make the code clearer and faster.
Let’s start by looking more closely at what we want to do to the code from Prime Numbers Part 2 - Generating Primes to start with:
- A: We want a method to generate prime numbers up to a maximum value. This needs to run through all of the integers, although we can do some filtering to speed things up, and return every number that is tested as prime.
- B: We also want a method to test a given number to see whether it is prime. The simplest way to do this is to run through all of the numbers less than the square root of the given number, and see whether any of them is a factor. The fastest way is to filter this list down to prime numbers. To do this we need to generate a sequence of prime numbers.
Where Do We Begin?
Given the amount of time that’s passed since those articles were written they’re starting to show their age a little and looking at the code some changes spring to mind. Starting with something mentioned back in the original articles:
If you use the code to generate one series of prime numbers, and then want to generate another series, it recalculates everything because there's no caching built in. We should probably add some.
Adding caching gives us some extra options as well:
As I mentioned, if we want to generate more than one series of prime numbers, then any that are already cached can simply be returned rather than recalculated. If subsequent series are longer than the original series, we can simply add their contents to the cache.
If we want to test a number for primeness, there’s a chance it could be in the cache. If it is, we can just return it and, as with series that are already cached, this is much quicker than calculating whether it’s prime or not. Once again, any intermediate primes we generate can be cached.
Initially we’ll start with the cache as a member variable:
private List<long> _primes = new List<long>();
This will allow us to implement all of the caching features mentioned above. While we’re doing this, let’s look at making the whole thing run faster.
Some Optimizations
There’s always someone who says that “premature optimization is the root of all evil,”, but I’ve already discussed that in System Performance – Part 1 – Outrun The Bear so we can ignore them! Anyway, I implemented the following optimizations, albeit with some basic timing code:
Accessing Cached Items
In Prime Numbers Part 2 - Generating Primes I implemented the loop that runs through the cached prime numbers using a normal for loop like this:
for (int i = 0; (i < primes.Count) && (primes[i] * primes[i] <= currentValue) && isPrime; ++i)
{
if (currentValue % primes[i] == 0)
{
isPrime = false;
}
}
This works, but it makes repeated access to the Count
property and the Item[]
method to retrieve the number at position i
. Count and Item both have O(1) access times, but that still adds up when you call them repeatedly. I updated the loop to foreach
, as follows:
foreach (var factor in _primes)
{
if (currentValue % factor == 0L)
{
isPrime = false;
break;
}
if (factor * factor >= currentValue)
{
break;
}
}
According to the (pretty basic) timing I’ve got built into the project, this change alone reduced the time to generate primes by around 25%.
Add A Cache Check To IsPrime()
Rather than check every possible factor of each number, adding a check to see whether the number passed is already in the cache should save time. If the highest number stored in the cache is greater than the number passed to IsPrime()
, then we know that the number if prime if it is in the cache, otherwise it isn’t.
// If the number is less than the maximum value in the cache,
// check to see whether it is in the cache.
if ((_primes.Count > 0) &&
(_primes[_primes.Count - 1] >= number))
{
// If the number is in the cache, it's prime.
return (_primes.BinarySearch(number) >= 0);
}
Adding this check to IsPrime()
reduces its execution time by 90%!
There’s another check that can be added to IsPrime()
. If the number to be checked is not originally in the cache, but is then added during the loop to find prime factors, it’s possible for the Primes()
enumerator to actually return the number itself if it is prime. In this case, the check that the modulo operator on the newly-cached and original numbers returns 0 will be true, but the number itself is prime.
We can change the original modulo check from:
if ((number % prime == 0) && (number != prime))
{
return false;
}
to:
if (number % prime == 0)
{
return (number == prime);
}
This change makes the code run approximately 10% faster because it returns at this point regardless of whether number is prime or not.
With all of these changes, we end up with the following code
The interface to the IPrimeGenerator
is slightly expanded to Include an IsPrime()
method:
using System.Collections.Generic;
namespace SoftwarePragmatism
{
public interface IPrimeGenerator
{
IEnumerable<long> Primes(long maximumValue);
bool IsPrime(long number);
}
}
And with all of the changes above, the EratosthenesGenerator
classes now looks like this:
using System;
using System.Collections.Generic;
using System.Linq;
namespace SoftwarePragmatism
{
public class EratosthenesGenerator : IPrimeGenerator
{
private List<long> _primes = new List<long>();
public EratosthenesGenerator()
{
}
#region IPrimeGenerator[System.Int64] implementation
public long cacheSize { get { return _primes.Count; } }
public IEnumerable<long> cache { get { return _primes; } }
public IEnumerable<long> Primes(long maximumValue)
{
if (maximumValue >= 2L)
{
yield return 2L;
bool isPrime = true;
long maximumCachedValue = _primes.Count == 0 ? 3L : _primes[_primes.Count - 1];
foreach (var cachedValue in _primes)
{
if (cachedValue <= maximumValue)
{
yield return cachedValue;
}
else
{
break;
}
}
for (long currentValue = maximumCachedValue; currentValue <= maximumValue; currentValue += 2L)
{
isPrime = true;
foreach (var factor in _primes)
{
if (currentValue % factor == 0L)
{
isPrime = false;
break;
}
if (factor * factor >= currentValue)
{
break;
}
}
if (isPrime)
{
if ((_primes.Count == 0) || (_primes[_primes.Count - 1] < currentValue))
{
_primes.Add(currentValue);
}
yield return currentValue;
}
}
}
}
public bool IsPrime(long number)
{
if (number < 2)
{
return false;
}
// If the number is less than the maximum value in the cache,
// check to see whether it is in the cache.
if ((_primes.Count > 0) &&
(_primes[_primes.Count - 1] >= number))
{
// If the number is in the cache, it's prime.
return (_primes.BinarySearch(number) >= 0);
}
foreach (var prime in Primes(number))
{
// For low numbers it's possible for the number to
// not already be in the cache, but to have been
// newly added by the Primes collection.
if (number % prime == 0)
{
return (number == prime);
}
if (prime * prime > number)
{
break;
}
}
return true;
}
#endregion
}
}
Possible Further Optimizations
As with any piece of code, it’s always possible to keep optimizing and changing it. Some of these changes are worthwhile, some aren’t, and it’s usually experience that helps you to tell the difference.
One optimization that could be applied to this code is in the IsPrime()
method. At the moment, checking whether a number n is prime results in the prime numbers between 2 and the square rot of n being in the cache, but if n is prime, it doesn’t get cached. This means repeated calls to IsPrime()
don’t use caching as efficiently as they could. By adding n to the cache if it turns out to be prime, we could check for this for each call, but we could no longer assume that the cache contains all prime numbers up to and including the last value. The cache would become a sparse collection, and we could no longer rely on the absence of a number from it meaning that a number was not prime – we’d need to check it.
This looks like it could actually reduce the efficiency of the IsPrime()
method overall, so I haven’t implemented it. There’s no reason not to try it though, and profile the results.
Where Are We Now?
After a reasonable amount of work, we now have a fairly optimized prime number generator and checker class, using the same principles as the Sieve of Eratosthenes. I probably won’t do much more optimization for the moment, although there are more ways to generate prime numbers, so that could be worth investigating.