Stefan Pohl Computer Chess

private website for chessengine-tests


 

Here you find experimental testruns, which are not part of my regular testwork.

 

2017/05/19  In the end of 2016, I released the 2.0 version of my SALC opening book. The idea was to create a book, which lowers the draw-rate in computerchess, because the draw-rate increases more and more, when the engines get stronger and the hardware gets faster. In online engine-tournaments and in the TCEC-tournament, the draw-rates are already around 85% and so, the “draw-death“ of computerchess is coming closer and closer. As you can see below (experimental testruns 2016/12/09), my SALC V2.0 book lowered the draw-rate a lot in a Stockfish 8 selfplay testrun (compared to a classic opening book/position set)(from 83% to 68.2% (!)). But in the last months, some people criticized, that the openings in the SALC book just gave a huge advantage for one color, which lowers the number of draws. It is clear, that this way of creating a book would work: if all lines of a book would give one color an advantage of +9, the draw-rate would be (of course...) 0%...But on the other hand, the scores in an engine-tournament, using such book, would be 50% for all engines, because we would have a random distribution of the advantage of the opening lines, if the number of played games is high enough.
But this was not the idea of the SALC book. The idea was, that in all book lines, white and black castle to opposite sides, with both queens still on the board, which should lead to more attacks to the king and to a more tactical and a more thrilling computerchess. All book lines were checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6]. So, no lines with a huge advantage for white or black are in the SALC book.
If the critics were right, that the SALC book lines lead to too huge advantage for one color, using the SALC book should bring the engine scores in an tournament closer to 50%, compared to a classical opening book.
To verify, that this will NOT happen, I did 3 testruns with 3 different opening sets:
1) SALC V2
2) Frank Quisinsky's FEOBOS 3.0 book (beta), a new and very well engine-analyzed and balanced opening book (get more information on his website www.amateurschach.de). Download that interesting book on my website (Download & Links-section), with kind permission of Frank Quisinsky.
3) the 8-move openings collection, which is used in the Stockfish framework.

asmFish played 1000 games versus Komodo 10.4 with all 3 books/opening sets (=3000 games). Not bullet-speed, but 5'+3'' (!), singlecore, 256 MB Hash, no pondering, both engines with Contempt=+15. LittleBlitzerGUI (in RoundRobin playmode, in which for each game, one opening position is chosen per random out of an epd-openings file). It took more around 12 days, to complete these three long testruns.

Games Completed = 1000 of 1000 (Avg game length = 944.640 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer\SALC_V2_10moves.epd(10000)

Time = 945199 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 620.5/1000 351-110-draws: 539 (L: m=0 t=0 i=0 a=110) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6659.0 d=30.93 nps=2552099)

2. Komodo 10.4 x64 379.5/1000 110-351-539 (L: m=0 t=0 i=0 a=351) (D: r=149 i=231 f=38 s=0 a=121) (tpm=6920.9 d=26.71 nps=1619591)

 

Games Completed = 1000 of 1000 (Avg game length = 1049.395 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer2\FEOBOS_v03+.epd(24085)

Time = 1039157 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 601.5/1000 293-90-draws: 617 (L: m=0 t=0 i=0 a=90) (D: r=132 i=221 f=38 s=1 a=225) (tpm=6315.9 d=30.83 nps=2477078)

2. Komodo 10.4 x64 398.5/1000 90-293-617 (L: m=0 t=0 i=0 a=293) (D: r=132 i=221 f=38 s=1 a=225) (tpm=6424.5 d=26.49 nps=1583220)

 

Games Completed = 1000 of 1000 (Avg game length = 1036.164 sec)

Settings = RR/256MB/300000ms+3000ms/M 450cp for 4 moves, D 120 moves/EPD:C:\LittleBlitzer3\34700_ok.epd(32000)

Time = 1036719 sec elapsed, 0 sec remaining

1. asmFish 170426 x64 603.0/1000 286-80-draws: 634 (L: m=0 t=0 i=0 a=80) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6334.2 d=31.54 nps=2570164)

2. Komodo 10.4 x64 397.0/1000 80-286-634 (L: m=0 t=2 i=0 a=284) (D: r=148 i=232 f=39 s=1 a=214) (tpm=6473.6 d=27.00 nps=1614400)


Conclusions:
1) The SALC book lowers the draw-rate a lot (53.9%) , compared to the FEOBOS book (61.7%) and the Stockfish Framework opening set (63.4%), although the engines played with Contempt=+15.
2) The scores of the engines are not getting closer to 50%, using the SALC-book. The Elo-differences are not getting smaller (in fact, they are getting higher!), which proofs, that the SALC book does not contain a lot of lines, which are leading to a clear advantage (and easy wins) for white or black, compared to both other books.
3) The SALC book lowers the average game duration (compared to the other books) around 10%. That means, that in the same time, +10% more games can be played, which leads to statistical more valuable results in the same time (for example: This testrun using SALC ended more than one day before the FEOBOS and the Stockfish-openings testruns)
4) Although there is no doubt, that the FEOBOS book is very well balanced and analyzed, and this beta version contains only lines with both queens on board, the draw-rate is only a little bit lower than using the Stockfish Framework opening set. The number of 3fold-draws is a little bit lower with FEOBOS (compared to both other books), but 17 less 3fold draws of 1000 games isnt pretty much (1.7%) .

All three books/opening-sets were created only for playing engine tournaments/competitions.
But only using the SALC V2.0 book brings clearly measureable benefits: a clearly lower draw-rate, 10% lower game duration (10% more games in the same time) and the biggest Elo-differences/distances in the engine results/scores. So, the SALC book avoids the “draw-death“ of computerchess in the near future and engine-tournament results, using SALC, are statistical more valuable, than using other opening books/sets, because the "resolution" of  Elos in the results is higher and more games can be played in the same time. Feel free to download the SALC V2 book/openings set and make your own tests. If the number of games (300+) is high enough, I have no doubt, that the results will confirm my testing results and you will see, how thrilling watching modern computerchess still can be.


2016/12/09: Some weeks ago, I created my SALC opening book for engine-engine matches. In all lines (created out of 10000 human-games, all lines 20 plies deep (all lines checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6])), white and black castled to opposite sides, both queens still on the board. The idea is, to get more attacks to the king and a lower draw rate. Because the draw rate in computerchess increases more and more, the stronger the engines and the faster the hardware gets. For my Stockfish bullet-testruns, I use 500 SALC-positions since 2014, which lowered the draw rate a lot.

To verify, how much the draw rate is lowered by these new book / opening-positions set, I did two testruns. 3000 games each (=6000 games). Stockfish 8 in selfplay. 70''+700ms thinkingtime, singlecore, LittleBlitzerGUI (using the 10000 positions epd-files, playing in RoundRobin-mode, in which for each game one epd-position is chosen per random).


Test 1: 34700 standard 8-move opening epd. Draw rate: 83.0%
Test 2: 10000 SALC V2 epd. Draw rate: 68.2%

 

I think, the result is really impressive...


2016/03/12: Testrun of 3 new Stockfish-clones. Stockfish played 1000 games against them (LittleBlitzerGUI, singlecore, 70''+700ms, 128 Hash, no ponder, no bases, no largepages, 500 SALC-openings). None of the clones is stronger (no surprise), so don't waste your time with this "engines". The new popcount-versions of DON are not running on my system (and the LittleBlitzerGUI), so I could not test DON.

 

     Program                   Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 160302 x64    : 3300    7    7  3000    51.9 %   3287   65.0 %
   2 Venom 3 x64             : 3293   12   12  1000    48.8 %   3300   66.6 %
   3 Anoa 1.1 x64            : 3285   12   12  1000    47.9 %   3300   63.6 %
   4 Sanjib 3 x64            : 3283   13   13  1000    47.8 %   3300   64.7 %

 

 


 

2015/06/05: Testrun of Stockfish 150510 repeated with Contempt=+50 and with Contempt=+15. Each testrun 7000 games. The result with Contempt=+15 was -5 Elo weaker and the result with Contempt=+50 was -10 Elo weaker, but all 7 opponents are very strong playing engines - against weaker opponents, the score should get a little better.

The overall draw-rate was lowered from 38.1% (default) to 37.3% (Contempt=+15) and to 36.4% (Contempt=+50), which is a little disappointing. But keep in mind, that I use the  SALC opening-positions. These opening-positions are lowering the draw-rate a lot compared to "normal" opening-positions, so a Contempt can lower the draw-rate hardly any further.

But the number of draws until move 50 (10 moves opening-positon + 40 played moves) was lowered from 457 to 212 (C=+15) and to 68 (C=+50) and the number of 3fold-draws was lowered from 1356 to 597 (C=+15) and to 266 (C=+50) (!!!)

And the average game-duration raised from 199 seconds to 206 and 212 seconds.

 

 

Games Completed = 7000 of 7000 (Avg game length = 199.562 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 467274 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 x64 4787.0/7000 3455-881-2664 (D: r=1356 i=428 f=130 s=36 a=714)
 2.  Komodo 9 x64         453.0/1000   223-317-460  (D: r=196 i=99 f=23 s=8 a=134)
 3.  Houdini 4 x64        371.0/1000   182-440-378  (D: r=123 i=96 f=22 s=9 a=128)
 4.  Gull 3 x64           307.5/1000   103-488-409  (D: r=214 i=62 f=20 s=6 a=107)
 5.  Fire 4 x64           281.0/1000   93-531-376   (D: r=210 i=48 f=9 s=6 a=103)
 6.  Equinox 3.3 x64      261.0/1000   85-563-352   (D: r=212 i=40 f=19 s=1 a=80)
 7.  Mars 3.35 x64        267.0/1000   91-557-352   (D: r=212 i=35 f=15 s=0 a=90)
 8.  Critter 1.6a x64     272.5/1000   104-559-337  (D: r=189 i=48 f=22 s=6 a=72)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 457

 

 


Games Completed = 7000 of 7000 (Avg game length = 206.670 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 483831 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 C=15 4746.5/7000 3441-948-2611 (D: r=597 i=810 f=163 s=12 a=1029)
 2.  Komodo 9 x64          465.5/1000    233-302-465  (D: r=91 i=194 f=24 s=5 a=151)
 3.  Houdini 4 x64         403.0/1000    216-410-374  (D: r=56 i=122 f=25 s=1 a=170)
 4.  Gull 3 x64            307.5/1000    110-495-395  (D: r=94 i=107 f=30 s=2 a=162)
 5.  Fire 4 x64            292.0/1000    101-517-382  (D: r=83 i=109 f=21 s=1 a=168)
 6.  Equinox 3.3 x64       252.5/1000    87-582-331   (D: r=99 i=90 f=18 s=0 a=124)
 7.  Mars 3.35 x64         270.0/1000    98-558-344   (D: r=96 i=84 f=26 s=2 a=136)
 8.  Critter 1.6a x64      263.0/1000    103-577-320  (D: r=78 i=104 f=19 s=1 a=118)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 212

 

 


Games Completed = 7000 of 7000 (Avg game length = 212.451 sec)
Settings = Gauntlet/128MB/70000ms+700ms
Time = 497331 sec elapsed, 0 sec remaining
 1.  Stockfish 150510 C=50 4704.5/7000 3432-1023-2545 (D: r=266 i=885 f=152 s=12 a=1230)
 2.  Komodo 9 x64          470.0/1000    264-324-412   (D: r=27 i=176 f=31 s=3 a=175)
 3.  Houdini 4 x64         375.5/1000    203-452-345   (D: r=22 i=118 f=20 s=0 a=185)
 4.  Gull 3 x64            301.0/1000    119-517-364   (D: r=40 i=112 f=19 s=1 a=192)
 5.  Fire 4 x64            301.5/1000    104-501-395   (D: r=50 i=139 f=22 s=3 a=181)
 6.  Equinox 3.3 x64       299.0/1000    116-518-366   (D: r=43 i=124 f=20 s=0 a=179)
 7.  Mars 3.35 x64         260.5/1000    94-573-333    (D: r=42 i=108 f=19 s=3 a=161)
 8.  Critter 1.6a x64      288.0/1000    123-547-330   (D: r=42 i=108 f=21 s=2 a=157)

 

Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 68


Draws (D:)(r=3fold draw, i=insufficent material, f=fifty move rule, s=stalemate,
a=adjusted by GUI (120 moves played))

 


 

2015/03/09: Testrun of Stockfish 6 with 5 different contempt-settings (0 (=default), +15, +25, +35, +50) against Komodo 8 (1000 games each, 70''+700ms, singlecore, my 500 old LS-ratinglist opening positions, because they are more drawish than the new SALC-positions). Lets see, if the contempt-setting reduces the draw-rate and/or 3fold draws.

As you can see, the 3fold-draws and early draws (until move 40/60) are lowered a lot by a higher contempt. And the overall draw-rate is lowered, too (but not as much, as I expected).

The overall score of Stockfish against Komodo was not measureable affected by the contempt. All overall-results are in a +/-8 Elo interval, which is clearly inside errorbar.

A really interesting experiment...Conclusion is, that contempt=+50 (which seems really "radical") is a good choice for tournaments and matchplay. Contempt=+15 is not bad, too and a quite "normal" setting.

 

 

 1.  Komodo 8 x64        2129.5/5000    862-1603-2535
 2.  Stockfish 6 C=0     570.0/1000    299-159-542 (=54.2% draws)
 3.  Stockfish 6 C=+15   576.0/1000    319-167-514 (=51.4% draws)
 4.  Stockfish 6 C=+25   580.5/1000    328-167-505 (=50.5% draws)
 5.  Stockfish 6 C=+35   565.0/1000    311-181-508 (=50.8% draws)  
 6.  Stockfish 6 C=+50   579.0/1000    346-188-466 (=46.6% draws)  

 

Draw-stats:       
 2.  Sf 6 C=0     Draws: 3fold=274 +(i=176 f=77 s=15 a=0)[early draw m40=29,m60=109]
 3.  Sf 6 C=+15   Draws: 3fold=106 +(i=328 f=71 s=9 a=0) [early draw m40=12,m60=52]
 4.  Sf 6 C=+25   Draws: 3fold=77  +(i=338 f=83 s=7 a=0) [early draw m40=3, m60=33]
 5.  Sf 6 C=+35   Draws: 3fold=69  +(i=344 f=82 s=11 a=2)[early draw m40=2, m60=35]
 6.  Sf 6 C=+50   Draws: 3fold=51  +(i=333 f=75 s=7 a=0) [early draw m40=1, m60=24]

 

(i=insufficent material, f=fifty move rule, s=stalemate, a=adjusted by GUI (>300 moves))
m40 = draw games not longer than 40 moves (including 8 moves of the opening-pgn)
m60 = draw games not longer than 60 moves (including 8 moves of the opening-pgn)

 


2015/02/20: A little "Clone-Wars" testrun of Stockfish 6 against 5 of its clone-engines. (70''+700ms, singlecore, SALC-openings, 1000 games-Gauntlet). As you can see, none of the 5 clones is really measureable stronger (all results in a +/-1% score-interval and clearly inside errorbar). 

 

 

     Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   1 Pepper 150213 x64s    : 3251   13   13  1000    51.0 %   3243   60.9 %
   2 Sugar 5 x64s          : 3248   13   13  1000    50.7 %   3243   57.0 %
   3 Orka 150213 x64s      : 3248   14   14  1000    50.7 %   3243   59.2 %
   4 Salt 5 x64s           : 3247   13   13  1000    50.5 %   3243   60.1 %
   5 Stockfish 6 150128    : 3243    6    6  5000    49.4 %   3247   59.7 %
   6 Shark 150209 x64s     : 3241   13   13  1000    50.0 %   3243   61.3 %