WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution

10232014, 05:20 PM
Post: #21




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
8 Queens on steroids
Breaking news: a little teak dropped the runtime to 1.65s (runs 10 times in 165 ticks) which is about 4 times faster than the non bitmap program for the wp34s. Code: 148 [f] LBL D PS: there is still a few drops to squeeze by switching from LBL/GTO to BACK/SKIP. Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein 

10232014, 10:11 PM
Post: #22




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
(10222014 01:10 PM)Claudio L. Wrote: 30% seems a lot for anybody. When I watched your presentation, I understood that 30% was also including all your other optimizations. But each model will have a different impact. I'm with you, if the difference is 30%, it's not minor by any means. But on other models I don't think that it will be a big impact. If I remember correctly for the C/Regvars, the change in speed was less than 10%, and that included disabling all compiler optimizations, not just that indirection. Quoting myself for reference here. I did a few tweaks and created 50g RPL versions of the problem with the following conditions: 1) Original code 2) Stack code, using the values in the stack rather than a list 3) Original code with double indirection removed 4) Stack code, with double indirection removed 5) Bitmap code I'll download and post the code for all cases over the weekend. Right now let's go straight to the point. These results are on a 50g at normal speed: 1) Original code = 90 seconds 2) Stack code = 60.6 seconds 3) Original w/no indirection = 67.6 seconds 4) Stack code w/no indirection = 57.7 seconds 5) Bitmap code = 54.3 seconds A few observations: 3 vs. 1: When using (slow) lists, removing that indirection has a big impact. The advantage comes not much from the read indirection, but mainly from not writing the value back to a list (which forces creation of an entire list every time) until the end of the inner loop. 2 vs 1: As I mentioned before: lists are slow. Using the stack provides a boost. 2 vs 3: When you remove the list creation from the inner loop, the effect of the lists is not as bad, so the difference is smaller. 4 vs 2: With lists out of the picture, the penalty for the indirection within the stack is minor (as it should be). 5 vs 2, or 5 vs 4: The Saturn CPU isn't particularly good at bit manipulations, so this technique is not such a massive improvement over the normal solution. Still edges faster. Conclusions: a) The effect of the indirection is large when using lists (PUT much more than GET). b) The effect of the indirection removal is small when using the stack c) The bitmap code is still faster than the normal approach, but not by much in this model. Claudio 

10242014, 11:32 AM
Post: #23




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
To Claudio: I can't wait to read about your work.
To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories: 1) programs that can find all the solutions AND quit nicely. 2) programs that can find all the solutions but fail after the last solution. 3) programs that can find only solutions with the first queen at position 8. My bitmap programs fall into category 2. The HP15C program fall into category 3 which means the programmer used the knowledge that the first queen is 8 and don't need to be moved to find the first solution. Obviously, a category 3 program will be simpler and faster. Is it in the spirit of the benchmark ? Patrice “Everything should be made as simple as possible, but no simpler.” Albert Einstein 

10242014, 01:00 PM
(This post was last modified: 10242014 05:33 PM by xerxes.)
Post: #24




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
(10242014 11:32 AM)patrice Wrote: To Xerxes: After studying a few programs from your benchmark, I see that the programs fall into 3 categories: The original program was designed to find the first solution only. Later I was forced to make a structured version, due to the lack of GOTO in some languages, but it's the same way to the solution. The fact, that the unstructured version is also able to find all solutions, is a side effect and was not intended. I've tried to comprehend your observation about category 3 using the FX603P, but I'm not able to see what you mean exactly. The algorithm starts with putting the first queen at position 8, without the knowledge that it's not necessary to move it later for the first solution only. Even if I put the queen at position 8 and start calculating at the second queen, it doesn't change the execution time significantly. I've also tried to start the calculation with the first queen in other positions without failing, but of course different execution times. Calculator Benchmark 

10262014, 01:55 AM
Post: #25




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
(10232014 10:11 PM)Claudio L. Wrote: These results are on a 50g at normal speed: Add to that: 6) New instack recursive algorithm = < 30 seconds Still working on cleaning up the sources for publication. I messed up and inadvertently overwrote a subroutine of my bitmap solution (was trying to improve it and STOred in the wrong name), so I have to see if I can write it back from scratch now. But then I thought of starting an algorithm from scratch that uses the stack. The new algorithm uses the fact that queens can't share the same column, so it removes it from the possible locations of the next queens, being significantly faster than the bitmap solution. The idea is: for the first queen, you have 8 possible locations (1 thru 8), which are numbers left in the stack. The first queen takes a number, leaving the other 7 positions in the stack, and recursively calls for the next queen to do the same. Of course, each queen takes a number from the list and verifies that doesn't conflict with any diagonals of previous queens (only the diagonals need to be checked since the column is guaranteed not to be occupied by another queen). If there's no conflict with the diagonals, recurses into the next queen, passing the remainder positions in the stack. If there is a conflict, it returns that position to the list (to the top) and pulls the next one. For example, the second queen receives 1 2 3 4 5 6 7 (after the first queen takes the 8). Pulls a 7, checks that it's in the diagonal of the previous queen (in the 8), so it returns the 7 to the top of the list, leaving now 7 1 2 3 4 5 6, and now pulls the 6 and does the same. This list of available positions keeps "rotating" in the stack. If none of the N positions is good, then it returns failure to the previous queen. The previous queen will then rotate her own list and proceed to the next position. For this to work, each queen needs to make her own duplicate of the list, which is kept also in the stack. So when you reach the last queen, the stack contains: 8 positions (1st queen) + 7 positions (2nd) + 6 positions (3rd) + 5 + 4 + 3 + 2 + 1 =36 numbers in the stack, plus the 8 selected positions of the queens = 44 stack levels needed (actually, 45 if you need to measure the speed, I leave the TICKS there too). As is evident, the algorithm makes good use (and requires) the infinite stack from the 48/49/50 series (even the 28 should work). Why so fast? Because even in the bitmap solution, each queen scans through the 8 positions, checking for a conflict in the bitmap. In this case, the last queen only has one possible position (all others were taken from the list) for which only has to check the diagonals. The second last only checks 2 possible positions, etc. This saves significant time. I have to clean it up as well, but this one clocked at 29.8 seconds, a third of the original algorithm on the list, and half the time vs. the same algorithm using the stack. Of course, this is not the same algorithm, so it's not directly comparable and cannot go in the list (although it follows the same solution path and explores the same nodes, producing the exact same solution). It's just to see how far we can go with this. Claudio 

10262014, 06:18 AM
Post: #26




RE: WP34S NQueens Benchmark Based on Patrice Torchet's BITMAP Solution
In this book a recursive backtracking algorithm is presented:
N. Wirth, Algorithms and Data Structures (1985 edition, updated for Oberon in August 2004) 3.5 The Eight Queens Problem From the preface: Quote:Recursion is shown to be a generalization of repetition (iteration), and as such it is an important and powerful concept in programming. In many programming tutorials, it is unfortunately exemplified by cases in which simple iteration would suffice. Instead, Chap. 3 concentrates on several examples of problems in which recursion allows for a most natural formulation of a solution, whereas use of iteration would lead to obscure and cumbersome programs. Kind regards Thomas 

« Next Oldest  Next Newest »

User(s) browsing this thread: 1 Guest(s)