Stefan Pohl Computer Chess

private website for chessengine-tests


 

Here you find experimental testruns, which are not part of my regular testwork (Stockfish-development versions, Bullet-ratinglist and the Endless RoundRobin-tournament).

 

2016/12/09: Some weeks ago, I created my SALC opening book for engine-engine matches. In all lines (created out of 10000 human-games, all lines 20 plies deep (all lines checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6])), white and black castled to opposite sides, both queens still on the board. The idea is, to get more attacks to the king and a lower draw rate. Because the draw rate in computerchess increases more and more, the stronger the engines and the faster the hardware gets. For my Stockfish bullet-testruns, I use 500 SALC-positions since 2014, which lowered the draw rate a lot.
Now, I created another opening-book, called OLIK (Open Line In front of King). In all lines (created out of 10000 human games, all lines 24 plies deep (all lines checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6] and outside of [-0.15,+0.15])) , there is no pawn of the own color in front of one king and both queens are still on the board. Example: White played 0-0, then ((no white pawn on g-line) or (no white pawn on h-line) or (no pawn on f-line)). The idea is the same as for the SALC-book: more attacks to the king and a lower draw rate...
For more information about the two books, how to use them and their creation, check out the readme-files and the booksetting.jpg-pictures in the book-folders in the download-package. In the download, you find a third book (SALC+OLIK), which is the SALC and the OLIK book melted in one bigger book. For all three books (created for FritzGUI and ShredderGUI), there is a pgn-file, with the games, the book was made of. And an epd-file with the positions at the end of the opening-lines.

To verify, how much the draw rate is lowered by these new books / opening-positions sets, I did three testruns. 3000 games each (=9000 games). Stockfish 8 in selfplay. 70''+700ms thinkingtime, singlecore, LittleBlitzerGUI (using the 10000 positions epd-files, playing in RoundRobin-mode, in which for each game one epd-position is chosen per random).


Test 1: 34700 standard 8-move opening epd. Draw rate: 83.0%
Test 2: 10000 OLIK epd. Draw rate: 71.9%
Test 3: 10000 SALC V2 epd. Draw rate: 68.2%

 

I think, these results are really impressive...


2016/03/12: Testrun of 3 new Stockfish-clones. Stockfish played 1000 games against them (LittleBlitzerGUI, singlecore, 70''+700ms, 128 Hash, no ponder, no bases, no largepages, 500 SALC-openings). None of the clones is stronger (no surprise), so don't waste your time with this "engines". The new popcount-versions of DON are not running on my system (and the LittleBlitzerGUI), so I could not test DON.

 

     Program                   Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 160302 x64    : 3300    7    7  3000    51.9 %   3287   65.0 %
   2 Venom 3 x64             : 3293   12   12  1000    48.8 %   3300   66.6 %
   3 Anoa 1.1 x64            : 3285   12   12  1000    47.9 %   3300   63.6 %
   4 Sanjib 3 x64            : 3283   13   13  1000    47.8 %   3300   64.7 %

 

 


 

2015/06/05: Testrun of Stockfish 150510 repeated with Contempt=+50 and with Contempt=+15. Each testrun 7000 games. The result with Contempt=+15 was -5 Elo weaker and the result with Contempt=+50 was -10 Elo weaker, but all 7 opponents are very strong playing engines - against weaker opponents, the score should get a little better.

The overall draw-rate was lowered from 38.1% (default) to 37.3% (Contempt=+15) and to 36.4% (Contempt=+50), which is a little disappointing. But keep in mind, that I use the  SALC opening-positions. These opening-positions are lowering the draw-rate a lot compared to "normal" opening-positions, so a Contempt can lower the draw-rate hardly any further.

But the number of draws until move 50 (10 moves opening-positon + 40 played moves) was lowered from 457 to 212 (C=+15) and to 68 (C=+50) and the number of 3fold-draws was lowered from 1356 to 597 (C=+15) and to 266 (C=+50) (!!!)

And the average game-duration raised from 199 seconds to 206 and 212 seconds.

 

 

Games Completed = 7000 of 7000 (Avg game length = 199.562 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 467274 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 x64 4787.0/7000 3455-881-2664 (D: r=1356 i=428 f=130 s=36 a=714)
 2.  Komodo 9 x64         453.0/1000   223-317-460  (D: r=196 i=99 f=23 s=8 a=134)
 3.  Houdini 4 x64        371.0/1000   182-440-378  (D: r=123 i=96 f=22 s=9 a=128)
 4.  Gull 3 x64           307.5/1000   103-488-409  (D: r=214 i=62 f=20 s=6 a=107)
 5.  Fire 4 x64           281.0/1000   93-531-376   (D: r=210 i=48 f=9 s=6 a=103)
 6.  Equinox 3.3 x64      261.0/1000   85-563-352   (D: r=212 i=40 f=19 s=1 a=80)
 7.  Mars 3.35 x64        267.0/1000   91-557-352   (D: r=212 i=35 f=15 s=0 a=90)
 8.  Critter 1.6a x64     272.5/1000   104-559-337  (D: r=189 i=48 f=22 s=6 a=72)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 457

 

 


Games Completed = 7000 of 7000 (Avg game length = 206.670 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 483831 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 C=15 4746.5/7000 3441-948-2611 (D: r=597 i=810 f=163 s=12 a=1029)
 2.  Komodo 9 x64          465.5/1000    233-302-465  (D: r=91 i=194 f=24 s=5 a=151)
 3.  Houdini 4 x64         403.0/1000    216-410-374  (D: r=56 i=122 f=25 s=1 a=170)
 4.  Gull 3 x64            307.5/1000    110-495-395  (D: r=94 i=107 f=30 s=2 a=162)
 5.  Fire 4 x64            292.0/1000    101-517-382  (D: r=83 i=109 f=21 s=1 a=168)
 6.  Equinox 3.3 x64       252.5/1000    87-582-331   (D: r=99 i=90 f=18 s=0 a=124)
 7.  Mars 3.35 x64         270.0/1000    98-558-344   (D: r=96 i=84 f=26 s=2 a=136)
 8.  Critter 1.6a x64      263.0/1000    103-577-320  (D: r=78 i=104 f=19 s=1 a=118)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 212

 

 


Games Completed = 7000 of 7000 (Avg game length = 212.451 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 497331 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 C=50 4704.5/7000 3432-1023-2545 (D: r=266 i=885 f=152 s=12 a=1230)
 2.  Komodo 9 x64          470.0/1000    264-324-412   (D: r=27 i=176 f=31 s=3 a=175)
 3.  Houdini 4 x64         375.5/1000    203-452-345   (D: r=22 i=118 f=20 s=0 a=185)
 4.  Gull 3 x64            301.0/1000    119-517-364   (D: r=40 i=112 f=19 s=1 a=192)
 5.  Fire 4 x64            301.5/1000    104-501-395   (D: r=50 i=139 f=22 s=3 a=181)
 6.  Equinox 3.3 x64       299.0/1000    116-518-366   (D: r=43 i=124 f=20 s=0 a=179)
 7.  Mars 3.35 x64         260.5/1000    94-573-333    (D: r=42 i=108 f=19 s=3 a=161)
 8.  Critter 1.6a x64      288.0/1000    123-547-330   (D: r=42 i=108 f=21 s=2 a=157)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 68


Draws (D:)(r=3fold draw, i=insufficent material, f=fifty move rule, s=stalemate,
a=adjusted by GUI (120 moves played))

 


 

2015/03/09: Testrun of Stockfish 6 with 5 different contempt-settings (0 (=default), +15, +25, +35, +50) against Komodo 8 (1000 games each, 70''+700ms, singlecore, my 500 old LS-ratinglist opening positions, because they are more drawish than the new SALC-positions). Lets see, if the contempt-setting reduces the draw-rate and/or 3fold draws.

As you can see, the 3fold-draws and early draws (until move 40/60) are lowered a lot by a higher contempt. And the overall draw-rate is lowered, too (but not as much, as I expected).

The overall score of Stockfish against Komodo was not measureable affected by the contempt. All overall-results are in a +/-8 Elo interval, which is clearly inside errorbar.

A really interesting experiment...Conclusion is, that contempt=+50 (which seems really "radical") is a good choice for tournaments and matchplay. Contempt=+15 is not bad, too and a quite "normal" setting.

 

 

 1.  Komodo 8 x64        2129.5/5000    862-1603-2535
 2.  Stockfish 6 C=0     570.0/1000    299-159-542 (=54.2% draws)
 3.  Stockfish 6 C=+15   576.0/1000    319-167-514 (=51.4% draws)
 4.  Stockfish 6 C=+25   580.5/1000    328-167-505 (=50.5% draws)
 5.  Stockfish 6 C=+35   565.0/1000    311-181-508 (=50.8% draws)  
 6.  Stockfish 6 C=+50   579.0/1000    346-188-466 (=46.6% draws)  

 

Draw-stats:       
 2.  Sf 6 C=0     Draws: 3fold=274 +(i=176 f=77 s=15 a=0)[early draw m40=29,m60=109]
 3.  Sf 6 C=+15   Draws: 3fold=106 +(i=328 f=71 s=9 a=0) [early draw m40=12,m60=52]
 4.  Sf 6 C=+25   Draws: 3fold=77  +(i=338 f=83 s=7 a=0) [early draw m40=3, m60=33]
 5.  Sf 6 C=+35   Draws: 3fold=69  +(i=344 f=82 s=11 a=2)[early draw m40=2, m60=35]
 6.  Sf 6 C=+50   Draws: 3fold=51  +(i=333 f=75 s=7 a=0) [early draw m40=1, m60=24]

 

(i=insufficent material, f=fifty move rule, s=stalemate, a=adjusted by GUI (>300 moves))
m40 = draw games not longer than 40 moves (including 8 moves of the opening-pgn)
m60 = draw games not longer than 60 moves (including 8 moves of the opening-pgn)

 


2015/02/20: A little "Clone-Wars" testrun of Stockfish 6 against 5 of its clone-engines. (70''+700ms, singlecore, SALC-openings, 1000 games-Gauntlet). As you can see, none of the 5 clones is really measureable stronger (all results in a +/-1% score-interval and clearly inside errorbar). 

 

 

     Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   1 Pepper 150213 x64s    : 3251   13   13  1000    51.0 %   3243   60.9 %
   2 Sugar 5 x64s          : 3248   13   13  1000    50.7 %   3243   57.0 %
   3 Orka 150213 x64s      : 3248   14   14  1000    50.7 %   3243   59.2 %
   4 Salt 5 x64s           : 3247   13   13  1000    50.5 %   3243   60.1 %
   5 Stockfish 6 150128    : 3243    6    6  5000    49.4 %   3247   59.7 %
   6 Shark 150209 x64s     : 3241   13   13  1000    50.0 %   3243   61.3 %