Prime Trees — A Minor Obsession

Warning!  If you aren't into math or number theory, you may want to skip this particular article.  I'm no mathematician, but I am something of a hobbyist when it comes to math.  This article provides an overview of the approximately four months I have put into exploring a forest of prime trees.

What's a prime tree? Well, that requires a little background…

Much study has been devoted throughout history to the prime numbers, that is, those numbers evenly divisibly by only themselves and 1. These numbers begin with 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31… and extend onward to inifinity. People have examined the distribution of primes, the behaviors of primes, and have devoted massive quantities of computer time to find larger and larger primes. As of this writing the largest known prime has 6,320,430 digits.

People have researched clusters, patterns, related lists of primes. The most basic of these are arithmetic progressions of primes, i.e. for some starting prime P and some increment I: P+I is prime, P+I+I is prime, P+I+I+I is prime and so forth. Currently the longest known arithmetic progressions have a length of 22.

Another type of related prime list is called a Cunningham Chain. A Cunningham Chain is a list of primes such that the next prime is the result of some simple formula applied to the previous prime. For example 2, 5, 11, 23, 47 is such a chain of length 5, where each subsequent term is 2p+1 where p is the previous term. 5=2×2+1, 11=2×5+1, 23=2×11+1 etc. Cunningham Chains are associated specifically with the formula 2p+1 or 2p-1. For 2p-1 one of the longest starts at 69257563144280941 and has a length of 15. For 2p+1, one of the longest starts at 95405042230542329 and has a length of 14.

A prime tree is a name I came up with for a connected series of primes that are generated with a simple formula just as in a Cunningham Chain, but where the offset can be either added OR subtracted. So for example, the formula of interest might be 3p+-2 starting at 5. Note: the terminology I tend to use to describe these three values are as follows: 3 is the coefficient (c), 2 is the offset (d), and 5 is the root (r).

Letting p=5, we would apply both the 3p-2 and 3p+2 formulas, and keep track of the results. As it turns out both of the results (13 and 17) are prime, so we would add these to our tree as the “children” of 5 like so:

 [5]
   - = [13]
   + = [17]

The “- =” line shows the child when you multiply and deduct the offset, the “+ =” line shows the child when you multiply and add the offset. Then we would continue the process by letting p=13 and then 17, and adding the children of these numbers to the tree if the children are prime. 3×13-2 is 37 and 3×13+2 is 41. Both of these are prime. 3×17-2 is 49 which is not prime, but 3×17+2 is 53 which is prime:

 [5]
   - = [13]
   .   - = [37]
   .   + = [41]
   + = [17]
       + = [53]

Pushing onward we find that 37 has 2 prime children (109 and 113), 41 has none, and 53 has one (3×53-2 = 157):

 [5]
   - = [13]
   .   - = [37]
   .   .   - = [109]
   .   .   + = [113]
   .   + = [41]
   + = [17]
       + = [53]
           - = [157]

At this point the 109 and 157 branches die, no further primes are generated, but 3×113-2 yields 337 which is prime, so a new generation is added to the tree:

 [5]
   - = [13]
   .   - = [37]
   .   .   - = [109]
   .   .   + = [113]
   .   .       - = [337]
   .   + = [41]
   + = [17]
       + = [53]
           - = [157]

Continuing this process with 337 and pushing on until no more prime children are generated gives us the completed 3p+/-2 at 5 tree:

 [5]
   - = [13]
   .   - = [37]
   .   .   - = [109]
   .   .   + = [113]
   .   .       - = [337]
   .   .           - = [1009]
   .   .           + = [1013]
   .   .               - = [3037]
   .   .               .   - = [9109]
   .   .               .       + = [27329]
   .   .               + = [3041]
   .   + = [41]
   + = [17]
       + = [53]
           - = [157]

Note that at 16 primes, this tree contains more primes than the longest known Cunningham Chain. Of course this takes into account the width of the tree. The maximal depth of the tree is measured in generations, of which there are 9. The longest path through this tree is 5, 13, 37, 113, 337, 1013, 3037, 9109, 27329 which is 9 primes long. It stands to reason though that because each prime in the tree has 2 possible choices for continuing the analysis, that there must be prime trees out there that are deeper than the longest known Cunningham Chains, and perhaps even deeper than the largest arithmetic progression.

I am trying to find such trees.  My ultimate goal is to find one 23 generations deep, for such a tree would contain a connected series of primes longer than both the world-record Cunningham Chain and the world-record Arithmetic Progression.

Using a high-end computational tool called Maple I have spent perhaps the last four months on this project.  The problem space is infinite in three directions (coefficient, offset, and root prime) and therefore tracking down large trees has become increasingly difficult.

The essential problem is that as prime numbers get bigger, so do the spaces between them  There are far more prime numbers between 100 and 200, then there are between 100,000,100 and 100,000,200.  Since the growth of a tree constantly moves towards a more sparsely populated region of primes, your chances of continuing to produce primes on any particular branch drop and eventually the branch peters out.

One day while taking my typical shotgun approach to the search, I noticed something interesting that many very large trees had in common… they grew backward for awhile.   How do you make a tree grow backward?  Simple, just make the offset so large that after you multiply and subtract that offset, you get a child prime which is smaller than the root prime.  If you can get a tree to grow backward for a few generations, you are adding depth to the tree but moving toward a denser prime space!  I called trees that did this “inverted”, and it occurred to me that if I focussed specifically on these types of trees, I might find my record breakers faster.

I devised an algorithm to try, for any root prime, to build a tree with every prior prime as the first child, for a few different coefficients.  After running this algorithm for a few weeks and reaching primes up around 170,000 I had succeeded in finding a few trees 21 generations deep!  However the time required to run the computations was getting longer every day, and my little computer simply didn't have enough power to move through the problem space any faster.

So, during a weekend when I was sick with a stomach virus, I built a pair of applications designed to distribute the inverted prime tree search across multiple computers.  One application, called “the dispatcher” (left image), runs on a single machine.  The other application “the client” (right image) is installed on any machines that are going to assist in the search. The Dispatcher doles out small pieces of the problem to each client, and collates each client's results into a single datafile.  In this way, a much larger “distributed computer” can be put to work on the problem.  Collectively this pair of apps is called “ShareCal”.  Armed with ShareCal, I entreated people where I work to let me make use of their unused computer time.  I built the apps to run in the background, and share the CPU with any active applications.  These apps were slower than Maple, being written in VB, but if I could get enough clients running, them, I figured the collective computing power would process the problem space much faster than Maple.

That was about 3 weeks ago.  Right now I have 25 client computers working on the problem.  The combined processing power of all of these boxes is over 20 gHz.  That's pretty frickin' cool!

In three weeks time, we've gone from primes around 170,000 to primes around 966,000.  In the next few days, clients will begin working on primes over 1,000,000.  At that time, the distributed computer will have generated and examined approximately 13 BILLION trees.

The world record for a connected series of primes is not yet broken, but it has been tied!  At about 4 in the morning on the Sunday before last, a client running on a workstation owned by my colleague Dan Royalty discovered a prime tree with a depth of 22 generations.  Hopefully the D23 tree will be found before I get tired of working on the problem.

I have arbitrarily assigned trees into three different categories based on their depth and/or population (the number of primes in the tree).

  • Trees which have a depth less than 10 and a population less than 18 are “small” and therefore uninteresting.
  • Trees which have a depth of 10 to 15 or a population of 18 to 29 are “big”.  Big trees are fairly common.  With 25 clients, ShareCal identifies approximately 22,000 big trees in a 12-hour period.  I've found well over 300,000 big trees thus far.
  • Trees with a depth of 16 or higher, or a population of 30 or higher are called “monsters”.  To date, only 1,646 monsters have been found in over 200 trillion trees, so they are pretty rare.

Here's the current world-record prime tree for depth (22 generations):

2p+-569415 at 384187 [Depth: 22; Population: 29]:
[384187] - = [198959] + = [967333] - = [1365251] . - = [2161087] . - = [3752759] . + = [8074933] . + = [16719281] . - = [32869147] . + = [34007977] . - = [67446539] . . - = [134323663] . . . - = [268077911] . . . - = [535586407] . . . + = [1071742229] . . . - = [2142915043] . . . - = [4285260671] . . . - = [8569951927] . . . + = [17140473269] . . . + = [34281515953] . . . - = [68562462491] . . . + = [137125494397] . . . + = [274251558209] . . + = [135462493] . + = [68585369] . + = [137740153] + = [2504081] + = [5577577] - = [10585739]

And here's the enormous world-record prime tree for population (71 primes):

2p+-15 at 13 [Depth: 15; Population: 71]:
[13] - = [11] . - = [7] . . + = [29] . . - = [43] . . . - = [71] . . . . - = [127] . . . . . - = [239] . . . . . . - = [463] . . . . . . - = [911] . . . . . . + = [941] . . . . . . - = [1867] . . . . . . - = [3719] . . . . . + = [269] . . . . . - = [523] . . . . . - = [1031] . . . . . + = [1061] . . . . . + = [2137] . . . . . - = [4259] . . . . . + = [4289] . . . . . - = [8563] . . . . + = [157] . . . + = [101] . . + = [73] . . - = [131] . . + = [277] . . + = [569] . . - = [1123] . . + = [1153] . + = [37] . - = [59] . . - = [103] . . - = [191] . . - = [367] . . . - = [719] . . . - = [1423] . . . . + = [2861] . . . . + = [5737] . . . . + = [11489] . . . . - = [22963] . . . . + = [22993] . . . . - = [45971] . . . . + = [91957] . . . + = [1453] . . + = [397] . . + = [809] . + = [89] . - = [163] . . - = [311] . . - = [607] . . + = [1229] . . + = [2473] . . - = [4931] . + = [193] . + = [401] . - = [787] . - = [1559] + = [41] - = [67] . + = [149] . - = [283] . + = [313] . + = [641] . + = [1297] . - = [2579] . + = [2609] . + = [5233] + = [97] - = [179] + = [373] + = [761]

So there you have it. How a person consumes a massive amount of computer time for a useless endeavor and convinces over 20 other people to help.  But I guess as long as it is fun to do, it isn't a waste.  It wouldn't be possible to have gotten so far so fast without the IT department at Foliage Software Systems permitting me to run this experiment on their hardware.  I'd like to offer my thanks to them and everyone else who has helped me on this problem.

Wish me luck!


EDIT: Originally, the number of trees tested was erroneously reported at 242 trillion instead of 13 billion.  For reasons that will likely not be interesting, the number of trees tested to reach the N'th prime is the (N-1)'th triangular number.  I was mistakenly triangulating twice, i.e. if the (N-1)'th triangular number is T, I was computing the T'th triangular number.


11 thoughts on “Prime Trees — A Minor Obsession

  1. 2p+-7402395 at 6089581 [Depth: 24; Population: 27]:

    [6089581]
    - = [4776767]
      - = [2151139]
        + = [11704673]
          - = [16006951]
            - = [24611507]
            . + = [56625409]
            .   - = [105848423]
            .     - = [204294451]
            .       - = [401186507]
            .       . - = [794970619]
            .       .   - = [1582538843]
            .       .     - = [3157675291]
            .       .       + = [6322752977]
            .       .         - = [12638103559]
            .       .           + = [25283609513]
            .       .             + = [50574621421]
            .       .               - = [101141840447]
            .       .                 + = [202291083289]
            .       .                   + = [404589568973]
            .       .                     + = [809186540341]
            .       .                       + = [1618380483077]
            .       .                         - = [3236753563759]
            .       .                           + = [6473514529913]
            .       + = [415991297]
            + = [39416297]
              - = [71430199]
    

    Found by Jens Kruse Andersen with a C program using the GMP library.

  2. Congratulations Jens, I have confirmed the accuracy of the above tree. You have effectively ended a search I began over 2 years ago and upon which I have devoted many thousands of hours of processor time. Excellent work, hats off to you sir.
    I have to ask was your approach brute force or did you have a way to narrow down the problem space? How long have you been looking? (This is the part where you tell me it was a matter of hours or days and I feel very stupid for not having coded my app in C in the first place!)
    I long considered rewriting my search program in C, with a big-math library, but my knowledge of such libraries was limited and it appeared to me that there weren't any that were both (a) free and (b) within my range of understanding.

  3. The search only tested 2p +- k*(3*5*7*11*13*17) for odd k.
    2p was chosen to make numbers grow slowly with larger primality chance.
    +- k*(3*5*7*11*13*17) was chosen to ensure no number had a factor < =17.
    (If p>17 is prime then 2p has no small odd factor).
    The D24 tree was found after around 5 minutes on a 1.33 GHz Athlon.
    That was very lucky. After an hour there were no other above D21.
    If you want to search above D24 with the program then mail me.
    The GMP library was only used for probable prime testing. It is free.

  4. Interesting. So the offset would range from 1*255255 up to the last value such that k*255255 was less than 2p? That allows a staggeringly small number of steps. In 50 steps the offset reaches 25,270,245, which means that one can test all root primes from 127,637 up to 12,635,137 with 50 or fewer tests on each prime. That explains the extreme speed.
    My tests simply focused on testing every previous prime as a potential child of the root. So if the root is 7, I test the childrend 2,3, and 5, against whatever coefficients I want to test, computing the necessary offsets to reach the first child I am testing. You instead focused on the offset, producing a child that may or may not be prime, but testing a far smaller number of potential children. At the 100,000th prime (1299709) ShareCal performs 99,999 tests for each coefficient (2,3,4) for a total of 299,997 tests. By comparison, your software does 5 tests with offsets of 255555, 765765, 1276275, 1786785, 2297295.
    I had noticed that offsets which seemed to be the product of contiguous small primes tend to be the most productive, and did some testing in that area back in November of 2004 (http://www.unbecominglevity.com/blog/_archives/2004/11/7/176810.html). But I didn't pursue it very far because I was afraid of missing a tree that does not follow that pattern, such as 4p +/- 9,165,525 at 2,710,373 (D16, P20) (factorization of 9,165,525 is 3*5*5*122207).

    +- k*(3*5*7*11*13*17) was chosen to ensure no number had a factor < =17. (If p>17 is prime then 2p has no small odd factor).

    Well for any prime doubled, the factors are only 2 and that prime. Are you saying that by subtracting out an offset which includes all primes < = 17 you are guaranteed to produce another number which includes no prime factor less than 19? Interesting. Yes, of course that's true. if (N mod (AxBxC)) is nonzero then N - (AxBxC) will be nonzero. Further it is also true that N - (AxBxCxD) will be nonzero as long as N mod D is also nonzero, and since N is 2xp, then N mod D is nonzero as long as D is neither 2 or p. Excellent!
    Out of curiosity, why test using odd k instead of odd prime k, (i.e. k=3,5,7,11,13,17,19… instead of k=1,3,5,7,9,11,13,15,17,19…)? Wouldn't that produce the same results with fewer tests?

  5. Jens, here's something annoying: I have long noted that the biggest trees tend to be inverted (the first child is LESS than the root), and therefore ShareCal has only looked at inverted trees. Using your basic idea above I coded up a search in Maple 6, and did not restrict it to inverted trees. I used a smaller base offset (3*5*7*11) and found the following tree in about 10 minutes search time:

    2n+/-148995 at 113209 (Depth: 23, Population: 32)
    +--1-----2---3---4---5---6---7---8---9---10--11--12--13--14--15--16--17--18--19--20--21--22--23
    |  [113209]
    |    + = [375413]
    |        - = [601831]
    |            - = [1054667]
    |            .   + = [2258329]
    |            .       + = [4665653]
    |            + = [1352657]
    |                + = [2854309]
    |                    - = [5559623]
    |                        - = [10970251]
    |                            - = [21791507]
    |                            .   + = [43732009]
    |                            .       + = [87613013]
    |                            .           + = [175375021]
    |                            .               + = [350899037]
    |                            .                   - = [701649079]
    |                            .                       + = [1403447153]
    |                            .                           + = [2807043301]
    |                            .                               + = [5614235597]
    |                            .                                   - = [11228322199]
    |                            .                                   .   - = [22456495403]
    |                            .                                   .   .   - = [44912841811]
    |                            .                                   .   .       + = [89825832617]
    |                            .                                   .   .           + = [179651814229]
    |                            .                                   .   .               - = [359303479463]
    |                            .                                   .   .                   + = [718607107921]
    |                            .                                   .   + = [22456793393]
    |                            .                                   + = [11228620189]
    |                            + = [22089497]
    |                                + = [44327989]
    |                                    + = [88804973]
    |                                        - = [177460951]
    +--1-----2---3---4---5---6---7---8---9---10--11--12--13--14--15--16--17--18--19--20--21--22--23

    Note that it is not an inverted tree, and therefore ShareCal never found it, even though ShareCal is currently testing root primes in the vicinity of 4.6 million. So basically I spent 2+ years searching for a D23 tree, and it took a real mathematician only 5 minutes to find a D24 tree, and advice from a real mathematician for me to find a D23 tree in only 10 mintues.
    AAAARRRRGGGHHHH!

  6. Composite k should be included. Your D23 has k = 129 = 3*43.
    My program is experimental and messy. The version finding the D24
    was put together in 30 minutes and used some existing pieces.
    I later found a bug which meant the search was inexhaustive.
    After some optimizations the D24 now takes 15s after initialization.
    Your D23 takes 2s.
    I use a bitmap of primes below 10^9 instead of small primality tests.
    It will slow down a lot when the starting numbers become near 10^9.
    A large ram can raise this limit. More programming could also help.
    Another D24: 2p+-61005945 at 31319033.
    Some definitions include negative primes: -p is prime iff p is prime.
    That would give this D25: 2p+-195195 at 143699.

  7. What exactly is a “bitmap of primes”? You use on-off pixels in a 50,000 x 20,000 bitmap to signify primality? Interesting.

  8. I'm not talking about a bitmap image. Bitmaps can be used for different things.
    My bitmap of primes uses 1 bit per odd number.
    It is implemented as an array of 62500000 bytes kept in ram.
    Primality below 10^9 is determined by a simple lookup: 1=prime, 0=composite.
    I extended it to 2.5*10^9 when the computer was not used for other things.
    I now have 6 D24 with 2p +- k*(3*5*7*11*13*17):
    2p+-7402395 at 6089581 [Depth: 24; Population: 27]
    2p+-61005945 at 31319033 [Depth: 24; Population: 31]:
    2p+-111035925 at 110625679 [Depth: 24; Population: 35]
    2p+-179954775 at 198042617 [Depth: 24; Population: 31]
    2p+-39564525 at 207584491 [Depth: 24; Population: 34]
    2p+-463798335 at 260967659 [Depth: 24; Population: 33]
    The 3rd gives smaller primes in the first 6 steps.

  9. That's terrific Jens, excellent work. Brilliant!
    I would love to see your source code so that I could improve on my own algorithm, but what you've told me thus far has allowed me to accelerate it greatly. Thanks for sharing your findings and advice. I learned a lot.

  10. A switch to 2p +- k*(3*5*7*11*13) gave a D25.
    2p+-37102065 at 73176139 [Depth: 25; Population: 36]:

    [73176139]
    - = [109250213]
      - = [181398361]
        - = [325694657]
        . - = [614287249]
        . . - = [1191472433]
        . . . + = [2420046931]
        . . .   - = [4802991797]
        . . .   . + = [9643085659]
        . . .   + = [4877195927]
        . . .     - = [9717289789]
        . . .       + = [19471681643]
        . . .         + = [38980465351]
        . . .           - = [77923828637]
        . . .           . + = [155884759339]
        . . .           .   + = [311806620743]
        . . .           .     + = [623650343551]
        . . .           .       + = [1247337789167]
        . . .           .         - = [2494638476269]
        . . .           .           + = [4989314054603]
        . . .           .             - = [9978591007141]
        . . .           .             . - = [19957144912217]
        . . .           .             .   - = [39914252722369]
        . . .           .             .     + = [79828542546803]
        . . .           .             .       - = [159657047991541]
        . . .           .             .         - = [319314058881017]
        . . .           .             .           + = [638628154864099]
        . . .           .             + = [9978665211271]
        . . .           .               - = [19957293320477]
        . . .           + = [77998032767]
        . . .             + = [156033167599]
        . . + = [1265676563]
        . .   + = [2568455191]
        . + = [688491379]
        + = [399898787]
          - = [762695509]
    

    I will clean up my source code a little and mail it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>