Stefan Pohl Computer Chess
private website for chessengine-tests
Here you find experimental testruns, which are not part of my regular testwork (Stockfish-development versions, Bullet-ratinglist and the Endless RoundRobin-tournament).
2016/12/09: Some weeks ago, I created my SALC opening book for engine-engine matches. In all lines (created out of 10000 human-games, all lines 20 plies deep (all lines checked with Komodo 10.2 (20'' per position, running on 3 cores), evaluation inside of [-0.6,+0.6])), white and black castled to opposite sides, both queens still on the board. The idea is, to get more attacks to the king and a lower draw rate. Because the draw rate in computerchess increases more and more, the stronger the engines and the faster the hardware gets. For my Stockfish bullet-testruns, I use 500 SALC-positions since 2014, which lowered the draw rate a lot.
To verify, how much the draw rate is lowered by these new books / opening-positions sets, I did three testruns. 3000 games each (=9000 games). Stockfish 8 in selfplay. 70''+700ms thinkingtime, singlecore, LittleBlitzerGUI (using the 10000 positions epd-files, playing in RoundRobin-mode, in which for each game one epd-position is chosen per random).
I think, these results are really impressive...
2016/03/12: Testrun of 3 new Stockfish-clones. Stockfish played 1000 games against them (LittleBlitzerGUI, singlecore, 70''+700ms, 128 Hash, no ponder, no bases, no largepages, 500 SALC-openings). None of the clones is stronger (no surprise), so don't waste your time with this "engines". The new popcount-versions of DON are not running on my system (and the LittleBlitzerGUI), so I could not test DON.
Program Elo + - Games Score Av.Op. Draws
1 Stockfish 160302 x64 : 3300 7 7 3000 51.9 % 3287 65.0 %
2015/06/05: Testrun of Stockfish 150510 repeated with Contempt=+50 and with Contempt=+15. Each testrun 7000 games. The result with Contempt=+15 was -5 Elo weaker and the result with Contempt=+50 was -10 Elo weaker, but all 7 opponents are very strong playing engines - against weaker opponents, the score should get a little better.
The overall draw-rate was lowered from 38.1% (default) to 37.3% (Contempt=+15) and to 36.4% (Contempt=+50), which is a little disappointing. But keep in mind, that I use the SALC opening-positions. These opening-positions are lowering the draw-rate a lot compared to "normal" opening-positions, so a Contempt can lower the draw-rate hardly any further.
But the number of draws until move 50 (10 moves opening-positon + 40 played moves) was lowered from 457 to 212 (C=+15) and to 68 (C=+50) and the number of 3fold-draws was lowered from 1356 to 597 (C=+15) and to 266 (C=+50) (!!!)
And the average game-duration raised from 199 seconds to 206 and 212 seconds.
Games Completed = 7000 of 7000 (Avg game length = 199.562 sec)
Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 457
Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 212
Draw games not longer than 40 moves (+10 moves opening-position (=50 moves)): 68
2015/03/09: Testrun of Stockfish 6 with 5 different contempt-settings (0 (=default), +15, +25, +35, +50) against Komodo 8 (1000 games each, 70''+700ms, singlecore, my 500 old LS-ratinglist opening positions, because they are more drawish than the new SALC-positions). Lets see, if the contempt-setting reduces the draw-rate and/or 3fold draws.
As you can see, the 3fold-draws and early draws (until move 40/60) are lowered a lot by a higher contempt. And the overall draw-rate is lowered, too (but not as much, as I expected).
The overall score of Stockfish against Komodo was not measureable affected by the contempt. All overall-results are in a +/-8 Elo interval, which is clearly inside errorbar.
A really interesting experiment...Conclusion is, that contempt=+50 (which seems really "radical") is a good choice for tournaments and matchplay. Contempt=+15 is not bad, too and a quite "normal" setting.
1. Komodo 8 x64 2129.5/5000 862-1603-2535
(i=insufficent material, f=fifty move rule, s=stalemate, a=adjusted by GUI (>300 moves))
2015/02/20: A little "Clone-Wars" testrun of Stockfish 6 against 5 of its clone-engines. (70''+700ms, singlecore, SALC-openings, 1000 games-Gauntlet). As you can see, none of the 5 clones is really measureable stronger (all results in a +/-1% score-interval and clearly inside errorbar).
Program Elo + - Games Score Av.Op. Draws
1 Pepper 150213 x64s : 3251 13 13 1000 51.0 % 3243 60.9 %