Architecture
Detailed information about the Chess CV model architectures, training strategies, and performance characteristics for the pieces, arrows, and snap models.
Pieces Model
Model Architecture
Chess CV uses a lightweight Convolutional Neural Network (CNN) designed for efficient inference while maintaining high accuracy on 32×32 pixel chess square images.
Network Design
Input: 32×32×3 RGB image
Conv Layer 1:
├── Conv2d(3 → 16 channels, 3×3 kernel)
├── ReLU activation
└── MaxPool2d(2×2) → 16×16×16
Conv Layer 2:
├── Conv2d(16 → 32 channels, 3×3 kernel)
├── ReLU activation
└── MaxPool2d(2×2) → 8×8×32
Conv Layer 3:
├── Conv2d(32 → 64 channels, 3×3 kernel)
├── ReLU activation
└── MaxPool2d(2×2) → 4×4×64
Flatten → 1024 features
Fully Connected 1:
├── Linear(1024 → 128)
├── ReLU activation
└── Dropout(0.5)
Fully Connected 2:
└── Linear(128 → 13) → Output logits
Softmax → 13-class probabilities
Model Statistics
- Total Parameters: 156,077
- Trainable Parameters: 156,077
- Model Size: ~600 KB (safetensors format)
- Input Size: 32×32×3 (RGB)
- Output Classes: 13
Class Labels
The model classifies chess squares into 13 categories:
Black Pieces (6):
bB– Black BishopbK– Black KingbN– Black KnightbP– Black PawnbQ– Black QueenbR– Black Rook
White Pieces (6):
wB– White BishopwK– White KingwN– White KnightwP– White PawnwQ– White QueenwR– White Rook
Empty (1):
xx– Empty square
Performance Characteristics
Expected Results
With the default configuration:
- Test Accuracy: ~99.90%
- F1 Score (Macro): ~99.90%
- Training Time: ~90 minutes (varies by hardware)
- Inference Speed: 0.05 ms per image (batch size 8192, varying by hardware)
Per-Class Performance
Actual accuracy by piece type (Test Dataset):
| Class | Accuracy | Class | Accuracy |
|---|---|---|---|
| bB | 99.90% | wB | 99.90% |
| bK | 100.00% | wK | 99.90% |
| bN | 100.00% | wN | 99.90% |
| bP | 99.81% | wP | 99.81% |
| bQ | 99.90% | wQ | 99.81% |
| bR | 100.00% | wR | 99.81% |
| xx | 100.00% |
Evaluation on External Datasets
The model has been evaluated on external datasets to assess generalization:
OpenBoard
- Dataset: S1M0N38/chess-cv-openboard
- Number of samples: 6,016
- Overall Accuracy: 99.30%
- F1 Score (Macro): 98.56%
Per-class performance on OpenBoard:
| Class | Accuracy | Class | Accuracy |
|---|---|---|---|
| bB | 99.11% | wB | 100.00% |
| bK | 100.00% | wK | 100.00% |
| bN | 100.00% | wN | 98.97% |
| bP | 99.81% | wP | 99.61% |
| bQ | 97.10% | wQ | 98.48% |
| bR | 99.32% | wR | 98.03% |
| xx | 99.24% |
ChessVision
- Dataset: S1M0N38/chess-cv-chessvision
- Number of samples: 3,186
- Overall Accuracy: 93.13%
- F1 Score (Macro): 92.28%
Per-class performance on ChessVision:
| Class | Accuracy | Class | Accuracy |
|---|---|---|---|
| bB | 100.00% | wB | 95.87% |
| bK | 92.62% | wK | 99.09% |
| bN | 100.00% | wN | 99.09% |
| bP | 90.92% | wP | 92.26% |
| bQ | 98.92% | wQ | 85.06% |
| bR | 98.92% | wR | 96.69% |
| xx | 89.17% |
Multi-Split Dataset
The ChessVision dataset contains multiple splits. All splits are concatenated during evaluation to produce a single comprehensive score.
Out of Sample Performance
The lower performance on OpenBoard (99.30% accuracy, 98.56% F1) and ChessVision (93.13% accuracy, 92.28% F1) compared to the test set (99.90% accuracy, 99.90% F1) indicates some domain gap between the synthetic training data and these external datasets. ChessVision shows significantly lower performance, particularly on specific piece types like white queens (85.06%) and empty squares (89.17%).
Dataset Characteristics
Synthetic Data Generation
The training data is synthetically generated:
Source Materials:
- 55 board styles (256×256px)
- 64 piece sets (32×32px)
- Multiple visual styles from chess.com and lichess
Generation Process:
- Render each piece onto each board style
- Extract 32×32 squares at piece locations
- Extract empty squares from light and dark squares
- Split combinations across train/val/test sets
Data Statistics:
- Total Combinations: ~3,520 (55 boards × 64 piece sets)
- Images per Combination: 26 (12 pieces × 2 colors + 2 empty)
- Total Images: ~91,500
- Train Set: ~64,000 (70%)
- Validation Set: ~13,500 (15%)
- Test Set: ~13,500 (15%)
Class Balance
The dataset is perfectly balanced:
- Each class has equal representation
- Each board-piece combination contributes equally
- Train/val/test splits maintain class balance
Arrows Model
Model Architecture
Overview
The arrows model uses the same SimpleCNN architecture as the pieces model, but is trained to classify arrow overlay components instead of chess pieces. This enables detection and reconstruction of arrow annotations commonly used in chess analysis interfaces.
Network Design
The network architecture is identical to the pieces model (see Pieces Model Architecture above), with the only difference being the output layer dimension.
[Same architecture as pieces model]
Fully Connected 2:
└── Linear(128 → 49) → Output logits
Softmax → 49-class probabilities
Model Statistics
- Total Parameters: 156,077 (same as pieces model)
- Trainable Parameters: 156,077
- Model Size: ~645 KB (safetensors format)
- Input Size: 32×32×3 (RGB)
- Output Classes: 49
Class Labels
The model classifies chess squares into 49 categories representing arrow components:
Arrow Heads (20):
Directional arrow tips in 8 cardinal/ordinal directions plus intermediate angles:
head-N,head-NNE,head-NE,head-ENE,head-E,head-ESE,head-SE,head-SSEhead-S,head-SSW,head-SW,head-WSW,head-W,head-WNW,head-NW,head-NNW
Arrow Tails (12):
Directional arrow tails in 8 cardinal/ordinal directions plus intermediate angles:
tail-N,tail-NNE,tail-NE,tail-ENE,tail-E,tail-ESE,tail-SE,tail-SSEtail-S,tail-SSW,tail-SW,tail-W
Middle Segments (8):
Arrow shaft segments for straight and diagonal lines:
middle-N-S,middle-E-W,middle-NE-SW,middle-SE-NWmiddle-N-ENE,middle-E-SSE,middle-S-WSW,middle-W-NNWmiddle-N-WNW,middle-E-NNE,middle-S-ESE,middle-W-SSW
Corners (4):
Corner pieces for knight-move arrows (L-shaped patterns):
corner-N-E,corner-E-S,corner-S-W,corner-W-N
Empty (1):
xx– Empty square (no arrow)
Naming Convention: NSEW refers to compass directions (North/South/East/West), indicating arrow orientation on the board from white's perspective.
Performance Characteristics
Expected Results
With the default configuration:
- Test Accuracy: ~99.99%
- F1 Score (Macro): ~99.99%
- Training Time: ~9 minutes for 20 epochs (varies by hardware)
- Inference Speed: ~0.019 ms per image (batch size 512, varies by hardware)
Per-Class Performance
The arrows model achieves near-perfect accuracy across all 49 classes on the synthetic test dataset:
Summary Statistics:
- Highest Accuracy: 100.00% (26 classes)
- Lowest Accuracy: 99.27% (tail-S)
- Mean Accuracy: 99.97%
- Classes > 99.9%: 40 out of 49
Performance by Component Type:
| Component Type | Classes | Avg Accuracy | Range |
|---|---|---|---|
| Arrow Heads | 20 | 99.97% | 99.56% - 100% |
| Arrow Tails | 12 | 99.89% | 99.27% - 100% |
| Middle Segments | 8 | 99.96% | 99.78% - 100% |
| Corners | 4 | 99.98% | 99.93% - 100% |
| Empty Square | 1 | 99.93% | - |
No External Dataset Evaluation
Unlike the pieces model, the arrows model has only been evaluated on synthetic test data. No external datasets with annotated arrow components are currently available for out-of-distribution testing.
Training Configuration
The arrows model uses different hyperparameters than the pieces model, optimized for the 49-class arrow classification task:
- Epochs: 20 (vs 200 for pieces - converges much faster)
- Batch Size: 128 (vs 64 for pieces - larger batches for more stable training)
- Learning Rate: 0.0005 (vs 0.0003 for pieces)
- Weight Decay: 0.00005 (vs 0.0003 for pieces - less regularization needed)
- Optimizer: AdamW
- Early Stopping: Disabled
Dataset Characteristics
Synthetic Data Generation
The arrows training data is synthetically generated using the same board styles as the pieces model:
Source Materials:
- 55 board styles (256×256px)
- Arrow overlay images organized by component type
- Multiple visual styles from chess.com and lichess
Generation Process:
- Render arrow components onto board backgrounds
- Extract 32×32 squares at arrow locations
- Extract empty squares from light and dark squares
- Split combinations across train/val/test sets
Data Statistics:
- Total Images: ~4.5 million
- Train Set: ~3,139,633 (70%)
- Validation Set: ~672,253 (15%)
- Test Set: ~672,594 (15%)
The significantly larger dataset compared to pieces (~4.5M vs ~91K) is due to the combination of 55 boards × 49 arrow component types, with multiple arrow variants per component type.
Class Balance
The dataset maintains balanced class distribution:
- Each arrow component class has equal representation
- Each board-arrow combination contributes equally
- Train/val/test splits maintain class balance
Limitations
Single Arrow Component Per Square
The model is trained on images containing at most one arrow component per square. Classification accuracy degrades significantly when multiple arrow parts overlap in a single square, which can occur with densely annotated boards or crossing arrows.
Example failure case: If a square contains both an arrow head and a perpendicular arrow shaft, the model may only detect one component or produce incorrect predictions.
Snap Model
Model Architecture
Overview
The snap model uses the same SimpleCNN architecture as the pieces and arrows models, but is trained to classify piece centering quality instead of piece identity or arrow components. This enables automated detection of whether chess pieces are properly positioned within board squares, facilitating quality control for digital chess interfaces and automated analysis systems.
Network Design
The network architecture is identical to the pieces model (see Pieces Model Architecture above), with the only difference being the output layer dimension.
[Same architecture as pieces model]
Fully Connected 2:
└── Linear(128 → 2) → Output logits
Softmax → 2-class probabilities
Model Statistics
- Total Parameters: 156,077 (same as pieces model)
- Trainable Parameters: 156,077
- Model Size: ~600 KB (safetensors format)
- Input Size: 32×32×3 (RGB)
- Output Classes: 2
Class Labels
The model classifies chess squares into 2 categories representing piece centering quality:
Centered (1):
ok– Pieces that are properly centered or slightly off-centered, plus empty squares
Off-Centered (1):
bad– Pieces that are significantly misaligned or positioned poorly within the square
Rationale: The model treats both properly centered pieces and empty squares as "ok" since both represent valid board states. Only poorly positioned pieces trigger the "bad" classification, enabling automated quality assurance.
Performance Characteristics
Expected Results
With the default configuration:
- Test Accuracy: ~99.93%
- F1 Score (Macro): ~99.93%
- Training Time: TBD (training in progress, 200 epochs)
- Inference Speed: ~0.05 ms per image (similar to pieces model, varying by hardware)
Per-Class Performance
The snap model achieves excellent accuracy across both classes on the synthetic test dataset:
Summary Statistics:
- Highest Accuracy: 99.98% (ok)
- Lowest Accuracy: 99.88% (bad)
- Mean Accuracy: 99.93%
- Classes > 99.9%: 1 out of 2
Evaluation on External Datasets
No external dataset evaluation has been conducted yet. The model has only been evaluated on synthetic test data.
Training Configuration
The snap model uses similar hyperparameters to the pieces model, optimized for the 2-class centering classification task:
- Epochs: 200 (same as pieces model)
- Batch Size: 64 (same as pieces model)
- Learning Rate: 0.001 with warmup and cosine decay (same as pieces model)
- Weight Decay: 0.001 (same as pieces model)
- Optimizer: AdamW
- Early Stopping: Disabled
Dataset Characteristics
Synthetic Data Generation
The snap training data is synthetically generated using the same board styles as the pieces model:
Source Materials:
- 55 board styles (256×256px)
- 64 piece sets (32×32px)
- Multiple visual styles from chess.com and lichess
- Centered and off-centered piece positions
Generation Process:
- Render pieces with intentional positioning variations
- Extract 32×32 squares at piece locations
- Extract empty squares from light and dark squares
- Split combinations across train/val/test sets
Data Statistics:
- Total Images: ~1.4M synthetic images
- Train Set: ~980,000 (70%)
- Validation Set: ~210,000 (15%)
- Test Set: ~210,000 (15%)
The dataset is generated with 8 positional variations per piece-board combination:
- Non-empty pieces: 4 "ok" (centered/slightly off-centered) + 4 "bad" (significantly off-centered) variations
- Empty squares: 4 "ok" variations only (empty squares are always considered valid)
- This comprehensive variation strategy ensures robust centering detection across different board styles, piece sets, and positioning variations
Class Balance
The dataset maintains balanced class distribution:
- Each centering class has equal representation
- Empty squares are included in the "ok" class
- Train/val/test splits maintain class balance
Limitations
Centering Semantics Preservation
The model is trained with conservative augmentation to preserve centering semantics. No rotation or significant geometric transformations are applied that could alter the perceived centering of pieces within squares.
Synthetic Training Data
The model is trained only on synthetically generated centering variations. Performance on real-world chess board images with natural positioning variations may vary from synthetic test results.