# Chess CV > CNN-based chess piece classifier Chess CV is a machine learning project that trains a lightweight CNN to classify chess pieces from 32×32px square images. The model uses MLX framework and achieves ~95-97% accuracy on synthetic training data. It combines 55 board styles with 64 piece sets to generate diverse training datasets. The project provides tools for data generation, model training, evaluation, and deployment to Hugging Face Hub. # Documentation # Setup Guide This guide will help you install and configure Chess CV for inference and training chess board classification models. ## Prerequisites - **Python 3.13+**: Chess CV requires Python 3.13 or later - **uv**: Fast Python package manager (required for training, [installation guide](https://docs.astral.sh/uv/)) - **MLX**: Apple's machine learning framework (installed with `chess-cv`) ## Installation ### For Inference Only Install Chess CV directly from PyPI: ```bash pip install chess-cv # or with uv uv add chess-cv ``` ### For Training and Evaluation Clone the repository and install all dependencies: ```bash # Clone the repository git clone https://github.com/S1M0N38/chess-cv.git cd chess-cv # Copy environment template cp .envrc.example .envrc # Install all dependencies uv sync --all-extras # For contributors: add development tools uv sync --all-extras --group dev ``` ## Next Steps - Use pre-trained models for inference with [Model Usage](../inference/) - Train custom models with [Train and Evaluate](../train-and-eval/) - Contribute to the project with [CONTRIBUTING.md](../CONTRIBUTING.md) # Inference Learn how to use Chess CV as a library and load pre-trained models. ## Using Pre-trained Models ### Using Bundled Models (Recommended) The chess-cv package includes pre-trained weights for all three models. This is the simplest way to get started: ```python from chess_cv import load_bundled_model # Load models with bundled weights (included in package) pieces_model = load_bundled_model('pieces') arrows_model = load_bundled_model('arrows') snap_model = load_bundled_model('snap') # Set to evaluation mode pieces_model.eval() arrows_model.eval() snap_model.eval() print("Models loaded successfully!") ``` **Get model configuration:** ```python from chess_cv.constants import get_model_config # Get class names and other config for each model pieces_config = get_model_config('pieces') print(f"Pieces classes: {pieces_config['class_names']}") # ['bB', 'bK', ..., 'xx'] print(f"Number of classes: {pieces_config['num_classes']}") # 13 arrows_config = get_model_config('arrows') print(f"Number of arrow classes: {arrows_config['num_classes']}") # 49 snap_config = get_model_config('snap') print(f"Snap classes: {snap_config['class_names']}") # ['ok', 'bad'] print(f"Number of classes: {snap_config['num_classes']}") # 2 ``` **Advanced: Get bundled weight paths:** ```python from chess_cv import get_bundled_weight_path from chess_cv.model import SimpleCNN import mlx.core as mx # Get path to bundled weights weight_path = get_bundled_weight_path('pieces') print(f"Bundled weights location: {weight_path}") # Load manually if needed model = SimpleCNN(num_classes=13) weights = mx.load(str(weight_path)) model.load_weights(list(weights.items())) model.eval() ``` ### Loading Latest Version from Hugging Face Hub To get the latest version of the models (if updated after package release): ```python import mlx.core as mx from huggingface_hub import hf_hub_download from chess_cv.model import SimpleCNN # Download latest model weights from Hugging Face model_path = hf_hub_download( repo_id="S1M0N38/chess-cv", filename="pieces.safetensors" ) # Create model and load weights model = SimpleCNN(num_classes=13) weights = mx.load(str(model_path)) model.load_weights(list(weights.items())) model.eval() print("Model loaded successfully!") ``` ### Making Predictions #### Pieces Model - Classify Chess Pieces Classify a chess square image to identify pieces: ```python import mlx.core as mx import numpy as np from PIL import Image from chess_cv import load_bundled_model from chess_cv.constants import get_model_config # Load model model = load_bundled_model('pieces') model.eval() # Get class names classes = get_model_config('pieces')['class_names'] # Load and preprocess image def preprocess_image(image_path: str) -> mx.array: """Load and preprocess a chess square image. Args: image_path: Path to 32×32 RGB image Returns: Preprocessed image tensor ready for model """ # Load image img = Image.open(image_path).convert('RGB') img = img.resize((32, 32)) # Convert to array and normalize img_array = np.array(img, dtype=np.float32) / 255.0 # Add batch dimension and convert to MLX array # MLX uses NHWC format: (batch, height, width, channels) img_tensor = mx.array(img_array[None, ...]) return img_tensor # Make prediction image_tensor = preprocess_image("square.png") logits = model(image_tensor) probabilities = mx.softmax(logits, axis=-1) predicted_class = mx.argmax(probabilities, axis=-1).item() print(f"Predicted class: {classes[predicted_class]}") print(f"Confidence: {probabilities[0, predicted_class].item():.2%}") ``` #### Arrows Model - Classify Arrow Components Classify arrow overlay components on chess board squares: ```python import mlx.core as mx import numpy as np from PIL import Image from chess_cv import load_bundled_model from chess_cv.constants import get_model_config # Load arrows model model = load_bundled_model('arrows') model.eval() # Get class names (49 arrow components + empty) classes = get_model_config('arrows')['class_names'] # Preprocess and predict (using same preprocess_image function as above) image_tensor = preprocess_image("square_with_arrow.png") logits = model(image_tensor) probabilities = mx.softmax(logits, axis=-1) predicted_class = mx.argmax(probabilities, axis=-1).item() print(f"Predicted arrow component: {classes[predicted_class]}") print(f"Confidence: {probabilities[0, predicted_class].item():.2%}") ``` **Arrow Classes**: The arrows model classifies 49 components including arrow heads (e.g., `head-N`, `head-SE`), tails (e.g., `tail-W`), middle segments (e.g., `middle-N-S`), corners (e.g., `corner-N-E`), and empty squares (`xx`). #### Snap Model - Classify Piece Centering Detect whether chess pieces are properly centered in squares: ```python import mlx.core as mx import numpy as np from PIL import Image from chess_cv import load_bundled_model from chess_cv.constants import get_model_config # Load snap model model = load_bundled_model('snap') model.eval() # Get class names ('ok' or 'bad') classes = get_model_config('snap')['class_names'] # Preprocess and predict (using same preprocess_image function as above) image_tensor = preprocess_image("square_to_check.png") logits = model(image_tensor) probabilities = mx.softmax(logits, axis=-1) predicted_class = mx.argmax(probabilities, axis=-1).item() print(f"Piece centering: {classes[predicted_class]}") print(f"Confidence: {probabilities[0, predicted_class].item():.2%}") if classes[predicted_class] == 'bad': print("⚠️ Piece is off-centered - needs adjustment") else: print("✓ Piece is properly centered or square is empty") ``` **Use Cases**: The snap model is useful for automated board state validation, piece positioning quality control, and chess interface usability testing. ### Batch Predictions Process multiple images efficiently: ```python import mlx.core as mx from pathlib import Path from chess_cv import load_bundled_model from chess_cv.constants import get_model_config from chess_cv.model import SimpleCNN # Load model and get class names model = load_bundled_model('pieces') model.eval() classes = get_model_config('pieces')['class_names'] def predict_batch(model: SimpleCNN, image_paths: list[str], classes: list[str]) -> list[dict]: """Predict classes for multiple images. Args: model: Trained SimpleCNN model image_paths: List of paths to chess square images classes: List of class names Returns: List of prediction dictionaries with class and confidence """ # Preprocess all images images = [preprocess_image(path) for path in image_paths] batch = mx.concatenate(images, axis=0) # Make predictions logits = model(batch) probabilities = mx.softmax(logits, axis=-1) predicted_classes = mx.argmax(probabilities, axis=-1) # Format results results = [] for i, path in enumerate(image_paths): pred_idx = predicted_classes[i].item() confidence = probabilities[i, pred_idx].item() results.append({ 'path': path, 'class': classes[pred_idx], 'confidence': confidence }) return results # Example usage image_paths = ["square1.png", "square2.png", "square3.png"] predictions = predict_batch(model, image_paths, classes) for pred in predictions: print(f"{pred['path']}: {pred['class']} ({pred['confidence']:.2%})") ``` ## Troubleshooting **Model Loading Issues**: If model loading fails, verify the file path exists and that the model architecture matches the weights. Use `SimpleCNN(num_classes=13)` for pieces, `SimpleCNN(num_classes=49)` for arrows, or `SimpleCNN(num_classes=2)` for snap. **Memory Issues**: Process images in smaller batches if you encounter memory problems during batch prediction. **Wrong Predictions**: Ensure input images are properly preprocessed (32×32px, RGB, normalized to [0,1]). ## Next Steps - Explore the [Architecture](../architecture/) documentation for model details - Check out [Train and Evaluate](../train-and-eval/) for training custom models # Train and Evaluate Learn how to generate data, train models, and evaluate performance with Chess CV. ## Data Generation ### Basic Usage Generate synthetic training data for a specific model: ```bash # Generate data for pieces model (default: 70% train, 15% val, 15% test) chess-cv preprocessing pieces # Generate data for arrows model chess-cv preprocessing arrows # Generate data for snap model chess-cv preprocessing snap ``` ### Custom Output Directories If you need to specify custom output directories: ```bash chess-cv preprocessing pieces \ --train-dir custom/pieces/train \ --val-dir custom/pieces/validate \ --test-dir custom/pieces/test ``` Note: The default 70/15/15 train/val/test split with seed 42 is used. These values are defined in `src/chess_cv/constants.py` and provide consistent, reproducible splits. ### Understanding Data Generation The preprocessing script generates model-specific training data: **Pieces Model:** 1. Reads board images from `data/boards/` and piece sets from `data/pieces/` 1. For each board-piece combination: - Renders pieces onto the board - Extracts 32×32 pixel squares - Saves images to train/validate/test directories 1. Generates ~93,000 images split into train (70%), val (15%), test (15%) 1. Balanced across 13 classes (12 pieces + empty square) **Arrows Model:** 1. Reads board images and arrow overlays from `data/arrows/` 1. For each board-arrow combination: - Renders arrow components onto boards - Extracts 32×32 pixel squares 1. Generates ~4.5M images for 49 arrow component classes **Snap Model:** 1. Reads board images and piece sets with positioning variations 1. Generates centered and off-centered piece examples 1. Generates ~1.4M synthetic images for 2 centering classes (ok/bad) All models use the 70/15/15 train/val/test split with seed 42 for reproducibility. ## Model Training ### Basic Training Train a specific model with default settings: ```bash # Train pieces model chess-cv train pieces # Train arrows model chess-cv train arrows # Train snap model chess-cv train snap ``` ### Custom Training Configuration ```bash chess-cv train pieces \ --train-dir data/splits/pieces/train \ --val-dir data/splits/pieces/validate \ --checkpoint-dir checkpoints/pieces \ --batch-size 64 \ --weight-decay 0.001 \ --num-epochs 1000 \ --num-workers 8 ``` **Model-Specific Defaults:** Each model type uses optimized hyperparameters defined in `src/chess_cv/constants.py`. The arrows model, for example, uses larger batch sizes (128) and fewer epochs (20) as it converges faster. Note: Image size is fixed at 32×32 pixels (model architecture requirement). ### Training Parameters **Optimizer Settings:** - `--weight-decay`: Weight decay for regularization (default: 0.001) **Learning Rate Scheduler (enabled by default):** - Base LR: 0.001 (peak after warmup) - Min LR: 1e-5 (end of cosine decay) - Warmup: 3% of total steps (~30 epochs for 1000-epoch training) **Training Control:** - `--num-epochs`: Maximum number of epochs (default: 200, recent models use 1000) - `--batch-size`: Batch size for training (default: 64) **Data Settings:** - `--num-workers`: Number of data loading workers (default: 8) **Directories:** - `--train-dir`: Training data directory (default: data/splits/pieces/train) - `--val-dir`: Validation data directory (default: data/splits/pieces/validate) - `--checkpoint-dir`: Where to save model checkpoints (default: checkpoints) ### Training Features **Learning Rate Schedule:** - Warmup phase: linear increase from 0 to 0.001 over first 3% of steps - Cosine decay: gradual decrease from 0.001 to 1e-5 over remaining steps **Data Augmentation:** - Random resized crop (scale: 0.54-0.74) - Random horizontal flip - Color jitter (brightness: ±0.15, contrast/saturation/hue: ±0.2) - Random rotation (±10°) - Gaussian noise (std: 0.05) - Arrow overlay (80% probability) - Highlight overlay (25% probability) **Early Stopping:** Early stopping is disabled by default (patience set to 999999), allowing the full training schedule to run. This default is set in `src/chess_cv/constants.py` and ensures consistent training across runs. **Automatic Checkpointing:** - Best model weights saved to `checkpoints/{model-id}/{model-id}.safetensors` - Optimizer state saved to `checkpoints/optimizer.safetensors` ### Training Output **Files Generated:** - `checkpoints/{model-id}/{model-id}.safetensors` – Best model weights - `checkpoints/optimizer.safetensors` – Optimizer state - `outputs/training_curves.png` – Loss and accuracy plots - `outputs/augmentation_example.png` – Example of data augmentation ## Experiment Tracking ### Weights & Biases Integration Track experiments with the W&B dashboard by adding the `--wandb` flag: ```bash # First time setup wandb login # Train with wandb logging chess-cv train pieces --wandb ``` **Features**: Real-time metric logging, hyperparameter tracking, model comparison, and experiment organization. ### Hyperparameter Sweeps Optimize hyperparameters with W&B sweeps using the integrated `--sweep` flag: ```bash # First time setup wandb login # Run hyperparameter sweep for a model (requires --wandb) chess-cv train pieces --sweep --wandb # The sweep will use the configuration defined in src/chess_cv/sweep.py ``` **Important**: The `--sweep` flag requires `--wandb` to be enabled. The sweep configuration is defined in `src/chess_cv/sweep.py` and includes parameters like learning rate, batch size, and weight decay optimized for each model type. ## Model Evaluation ### Basic Evaluation Evaluate a trained model on its test set: ```bash # Evaluate pieces model chess-cv test pieces # Evaluate arrows model chess-cv test arrows # Evaluate snap model chess-cv test snap ``` ### Custom Evaluation ```bash chess-cv test pieces \ --test-dir data/splits/pieces/test \ --train-dir data/splits/pieces/train \ --checkpoint checkpoints/pieces/pieces.safetensors \ --batch-size 64 \ --num-workers 8 \ --output-dir outputs/pieces ``` ### Evaluating on External Datasets Test model performance on HuggingFace datasets: ```bash # Evaluate on a specific dataset chess-cv test pieces \ --hf-test-dir S1M0N38/chess-cv-openboard # Concatenate all splits from the dataset chess-cv test pieces \ --hf-test-dir S1M0N38/chess-cv-chessvision \ --concat-splits ``` ### Evaluation Output **Files Generated:** - `outputs/test_confusion_matrix.png` – Confusion matrix heatmap - `outputs/test_per_class_accuracy.png` – Per-class accuracy bar chart - `outputs/misclassified_images/` – Misclassified examples for analysis ### Analyzing Results **Confusion Matrix:** Shows where the model makes mistakes. Look for: - High off-diagonal values (common misclassifications) - Patterns in similar piece types (e.g., knights vs bishops) **Misclassified Images:** Review examples in `outputs/misclassified_images/` to understand: - Which board/piece combinations are challenging - Whether augmentation needs adjustment - If more training data would help ## Model Deployment ### Upload to Hugging Face Hub Share your trained models on Hugging Face Hub: ```bash # First time setup hf login # Upload a specific model chess-cv upload pieces --repo-id username/chess-cv ``` **Examples:** ```bash # Upload pieces model with default settings chess-cv upload pieces --repo-id username/chess-cv # Upload arrows model with custom message chess-cv upload arrows --repo-id username/chess-cv \ --message "feat: improved arrows model v2" # Upload to private repository chess-cv upload snap --repo-id username/chess-cv --private # Specify custom paths chess-cv upload pieces --repo-id username/chess-cv \ --checkpoint-dir ./my-checkpoints/pieces \ --readme docs/custom_README.md ``` **What gets uploaded**: Model weights (`{model-id}.safetensors`), model card with metadata, and model configuration. ## Troubleshooting **Out of Memory During Training**: Reduce batch size with `--batch-size 64` or reduce number of workers with `--num-workers 2`. **Poor Model Performance**: Try adjusting hyperparameters with W&B sweeps for optimization, or review misclassified images to verify data quality. To enable early stopping for faster experimentation, modify `DEFAULT_PATIENCE` in `src/chess_cv/constants.py`. **Training Too Slow**: Increase batch size if memory allows (`--batch-size 128`). For faster experimentation, modify `DEFAULT_PATIENCE` in `src/chess_cv/constants.py` to enable early stopping. **Evaluation Issues**: Ensure the checkpoint exists, verify the test data directory is populated, and run with appropriate batch size. ## Next Steps - Use your trained model for inference with [Model Usage](../inference/) - Explore model internals with [Architecture](../architecture/) - Share your model on Hugging Face Hub using the upload command above # Architecture Detailed information about the Chess CV model architectures, training strategies, and performance characteristics for the pieces, arrows, and snap models. CNN architecture for chess piece classification ## Pieces Model ### Model Architecture Chess CV uses a lightweight Convolutional Neural Network (CNN) designed for efficient inference while maintaining high accuracy on 32×32 pixel chess square images. #### Network Design ```text Input: 32×32×3 RGB image Conv Layer 1: ├── Conv2d(3 → 16 channels, 3×3 kernel) ├── ReLU activation └── MaxPool2d(2×2) → 16×16×16 Conv Layer 2: ├── Conv2d(16 → 32 channels, 3×3 kernel) ├── ReLU activation └── MaxPool2d(2×2) → 8×8×32 Conv Layer 3: ├── Conv2d(32 → 64 channels, 3×3 kernel) ├── ReLU activation └── MaxPool2d(2×2) → 4×4×64 Flatten → 1024 features Fully Connected 1: ├── Linear(1024 → 128) ├── ReLU activation └── Dropout(0.5) Fully Connected 2: └── Linear(128 → 13) → Output logits Softmax → 13-class probabilities ``` #### Model Statistics - **Total Parameters**: 156,077 - **Trainable Parameters**: 156,077 - **Model Size**: ~600 KB (safetensors format) - **Input Size**: 32×32×3 (RGB) - **Output Classes**: 13 #### Class Labels The model classifies chess squares into 13 categories: **Black Pieces (6):** - `bB` – Black Bishop - `bK` – Black King - `bN` – Black Knight - `bP` – Black Pawn - `bQ` – Black Queen - `bR` – Black Rook **White Pieces (6):** - `wB` – White Bishop - `wK` – White King - `wN` – White Knight - `wP` – White Pawn - `wQ` – White Queen - `wR` – White Rook **Empty (1):** - `xx` – Empty square ### Performance Characteristics #### Expected Results With the default configuration: - **Test Accuracy**: ~99.90% - **F1 Score (Macro)**: ~99.90% - **Training Time**: ~90 minutes (varies by hardware) - **Inference Speed**: 0.05 ms per image (batch size 8192, varying by hardware) #### Per-Class Performance Actual accuracy by piece type (Test Dataset): | Class | Accuracy | Class | Accuracy | | ----- | -------- | ----- | -------- | | bB | 99.90% | wB | 99.90% | | bK | 100.00% | wK | 99.90% | | bN | 100.00% | wN | 99.90% | | bP | 99.81% | wP | 99.81% | | bQ | 99.90% | wQ | 99.81% | | bR | 100.00% | wR | 99.81% | | xx | 100.00% | | | #### Evaluation on External Datasets The model has been evaluated on external datasets to assess generalization: ##### OpenBoard - **Dataset**: [S1M0N38/chess-cv-openboard](https://huggingface.co/datasets/S1M0N38/chess-cv-openboard) - **Number of samples**: 6,016 - **Overall Accuracy**: 99.30% - **F1 Score (Macro)**: 98.56% Per-class performance on OpenBoard: | Class | Accuracy | Class | Accuracy | | ----- | -------- | ----- | -------- | | bB | 99.11% | wB | 100.00% | | bK | 100.00% | wK | 100.00% | | bN | 100.00% | wN | 98.97% | | bP | 99.81% | wP | 99.61% | | bQ | 97.10% | wQ | 98.48% | | bR | 99.32% | wR | 98.03% | | xx | 99.24% | | | ##### ChessVision - **Dataset**: [S1M0N38/chess-cv-chessvision](https://huggingface.co/datasets/S1M0N38/chess-cv-chessvision) - **Number of samples**: 3,186 - **Overall Accuracy**: 93.13% - **F1 Score (Macro)**: 92.28% Per-class performance on ChessVision: | Class | Accuracy | Class | Accuracy | | ----- | -------- | ----- | -------- | | bB | 100.00% | wB | 95.87% | | bK | 92.62% | wK | 99.09% | | bN | 100.00% | wN | 99.09% | | bP | 90.92% | wP | 92.26% | | bQ | 98.92% | wQ | 85.06% | | bR | 98.92% | wR | 96.69% | | xx | 89.17% | | | Multi-Split Dataset The ChessVision dataset contains multiple splits. All splits are concatenated during evaluation to produce a single comprehensive score. Out of Sample Performance The lower performance on OpenBoard (99.30% accuracy, 98.56% F1) and ChessVision (93.13% accuracy, 92.28% F1) compared to the test set (99.90% accuracy, 99.90% F1) indicates some domain gap between the synthetic training data and these external datasets. ChessVision shows significantly lower performance, particularly on specific piece types like white queens (85.06%) and empty squares (89.17%). ### Dataset Characteristics #### Synthetic Data Generation The training data is synthetically generated: **Source Materials:** - 55 board styles (256×256px) - 64 piece sets (32×32px) - Multiple visual styles from chess.com and lichess **Generation Process:** 1. Render each piece onto each board style 1. Extract 32×32 squares at piece locations 1. Extract empty squares from light and dark squares 1. Split combinations across train/val/test sets **Data Statistics:** - **Total Combinations**: ~3,520 (55 boards × 64 piece sets) - **Images per Combination**: 26 (12 pieces × 2 colors + 2 empty) - **Total Images**: ~91,500 - **Train Set**: ~64,000 (70%) - **Validation Set**: ~13,500 (15%) - **Test Set**: ~13,500 (15%) #### Class Balance The dataset is perfectly balanced: - Each class has equal representation - Each board-piece combination contributes equally - Train/val/test splits maintain class balance ______________________________________________________________________ ## Arrows Model ### Model Architecture #### Overview The arrows model uses the same SimpleCNN architecture as the pieces model, but is trained to classify arrow overlay components instead of chess pieces. This enables detection and reconstruction of arrow annotations commonly used in chess analysis interfaces. #### Network Design The network architecture is identical to the pieces model (see [Pieces Model Architecture](#network-design) above), with the only difference being the output layer dimension. ```text [Same architecture as pieces model] Fully Connected 2: └── Linear(128 → 49) → Output logits Softmax → 49-class probabilities ``` #### Model Statistics - **Total Parameters**: 156,077 (same as pieces model) - **Trainable Parameters**: 156,077 - **Model Size**: ~645 KB (safetensors format) - **Input Size**: 32×32×3 (RGB) - **Output Classes**: 49 #### Class Labels The model classifies chess squares into 49 categories representing arrow components: **Arrow Heads (20):** Directional arrow tips in 8 cardinal/ordinal directions plus intermediate angles: - `head-N`, `head-NNE`, `head-NE`, `head-ENE`, `head-E`, `head-ESE`, `head-SE`, `head-SSE` - `head-S`, `head-SSW`, `head-SW`, `head-WSW`, `head-W`, `head-WNW`, `head-NW`, `head-NNW` **Arrow Tails (12):** Directional arrow tails in 8 cardinal/ordinal directions plus intermediate angles: - `tail-N`, `tail-NNE`, `tail-NE`, `tail-ENE`, `tail-E`, `tail-ESE`, `tail-SE`, `tail-SSE` - `tail-S`, `tail-SSW`, `tail-SW`, `tail-W` **Middle Segments (8):** Arrow shaft segments for straight and diagonal lines: - `middle-N-S`, `middle-E-W`, `middle-NE-SW`, `middle-SE-NW` - `middle-N-ENE`, `middle-E-SSE`, `middle-S-WSW`, `middle-W-NNW` - `middle-N-WNW`, `middle-E-NNE`, `middle-S-ESE`, `middle-W-SSW` **Corners (4):** Corner pieces for knight-move arrows (L-shaped patterns): - `corner-N-E`, `corner-E-S`, `corner-S-W`, `corner-W-N` **Empty (1):** - `xx` – Empty square (no arrow) **Naming Convention:** NSEW refers to compass directions (North/South/East/West), indicating arrow orientation on the board from white's perspective. ### Performance Characteristics #### Expected Results With the default configuration: - **Test Accuracy**: ~99.99% - **F1 Score (Macro)**: ~99.99% - **Training Time**: ~9 minutes for 20 epochs (varies by hardware) - **Inference Speed**: ~0.019 ms per image (batch size 512, varies by hardware) #### Per-Class Performance The arrows model achieves near-perfect accuracy across all 49 classes on the synthetic test dataset: **Summary Statistics:** - **Highest Accuracy**: 100.00% (26 classes) - **Lowest Accuracy**: 99.27% (tail-S) - **Mean Accuracy**: 99.97% - **Classes > 99.9%**: 40 out of 49 **Performance by Component Type:** | Component Type | Classes | Avg Accuracy | Range | | --------------- | ------- | ------------ | ------------- | | Arrow Heads | 20 | 99.97% | 99.56% - 100% | | Arrow Tails | 12 | 99.89% | 99.27% - 100% | | Middle Segments | 8 | 99.96% | 99.78% - 100% | | Corners | 4 | 99.98% | 99.93% - 100% | | Empty Square | 1 | 99.93% | - | No External Dataset Evaluation Unlike the pieces model, the arrows model has only been evaluated on synthetic test data. No external datasets with annotated arrow components are currently available for out-of-distribution testing. #### Training Configuration The arrows model uses different hyperparameters than the pieces model, optimized for the 49-class arrow classification task: - **Epochs**: 20 (vs 200 for pieces - converges much faster) - **Batch Size**: 128 (vs 64 for pieces - larger batches for more stable training) - **Learning Rate**: 0.0005 (vs 0.0003 for pieces) - **Weight Decay**: 0.00005 (vs 0.0003 for pieces - less regularization needed) - **Optimizer**: AdamW - **Early Stopping**: Disabled ### Dataset Characteristics #### Synthetic Data Generation The arrows training data is synthetically generated using the same board styles as the pieces model: **Source Materials:** - 55 board styles (256×256px) - Arrow overlay images organized by component type - Multiple visual styles from chess.com and lichess **Generation Process:** 1. Render arrow components onto board backgrounds 1. Extract 32×32 squares at arrow locations 1. Extract empty squares from light and dark squares 1. Split combinations across train/val/test sets **Data Statistics:** - **Total Images**: ~4.5 million - **Train Set**: ~3,139,633 (70%) - **Validation Set**: ~672,253 (15%) - **Test Set**: ~672,594 (15%) The significantly larger dataset compared to pieces (~4.5M vs ~91K) is due to the combination of 55 boards × 49 arrow component types, with multiple arrow variants per component type. #### Class Balance The dataset maintains balanced class distribution: - Each arrow component class has equal representation - Each board-arrow combination contributes equally - Train/val/test splits maintain class balance #### Limitations Single Arrow Component Per Square The model is trained on images containing **at most one arrow component per square**. Classification accuracy degrades significantly when multiple arrow parts overlap in a single square, which can occur with densely annotated boards or crossing arrows. **Example failure case**: If a square contains both an arrow head and a perpendicular arrow shaft, the model may only detect one component or produce incorrect predictions. ______________________________________________________________________ ## Snap Model ### Model Architecture #### Overview The snap model uses the same SimpleCNN architecture as the pieces and arrows models, but is trained to classify piece centering quality instead of piece identity or arrow components. This enables automated detection of whether chess pieces are properly positioned within board squares, facilitating quality control for digital chess interfaces and automated analysis systems. #### Network Design The network architecture is identical to the pieces model (see [Pieces Model Architecture](#network-design) above), with the only difference being the output layer dimension. ```text [Same architecture as pieces model] Fully Connected 2: └── Linear(128 → 2) → Output logits Softmax → 2-class probabilities ``` #### Model Statistics - **Total Parameters**: 156,077 (same as pieces model) - **Trainable Parameters**: 156,077 - **Model Size**: ~600 KB (safetensors format) - **Input Size**: 32×32×3 (RGB) - **Output Classes**: 2 #### Class Labels The model classifies chess squares into 2 categories representing piece centering quality: **Centered (1):** - `ok` – Pieces that are properly centered or slightly off-centered, plus empty squares **Off-Centered (1):** - `bad` – Pieces that are significantly misaligned or positioned poorly within the square **Rationale:** The model treats both properly centered pieces and empty squares as "ok" since both represent valid board states. Only poorly positioned pieces trigger the "bad" classification, enabling automated quality assurance. ### Performance Characteristics #### Expected Results With the default configuration: - **Test Accuracy**: ~99.93% - **F1 Score (Macro)**: ~99.93% - **Training Time**: TBD (training in progress, 200 epochs) - **Inference Speed**: ~0.05 ms per image (similar to pieces model, varying by hardware) #### Per-Class Performance The snap model achieves excellent accuracy across both classes on the synthetic test dataset: **Summary Statistics:** - **Highest Accuracy**: 99.98% (ok) - **Lowest Accuracy**: 99.88% (bad) - **Mean Accuracy**: 99.93% - **Classes > 99.9%**: 1 out of 2 #### Evaluation on External Datasets *No external dataset evaluation has been conducted yet. The model has only been evaluated on synthetic test data.* #### Training Configuration The snap model uses similar hyperparameters to the pieces model, optimized for the 2-class centering classification task: - **Epochs**: 200 (same as pieces model) - **Batch Size**: 64 (same as pieces model) - **Learning Rate**: 0.001 with warmup and cosine decay (same as pieces model) - **Weight Decay**: 0.001 (same as pieces model) - **Optimizer**: AdamW - **Early Stopping**: Disabled ### Dataset Characteristics #### Synthetic Data Generation The snap training data is synthetically generated using the same board styles as the pieces model: **Source Materials:** - 55 board styles (256×256px) - 64 piece sets (32×32px) - Multiple visual styles from chess.com and lichess - Centered and off-centered piece positions **Generation Process:** 1. Render pieces with intentional positioning variations 1. Extract 32×32 squares at piece locations 1. Extract empty squares from light and dark squares 1. Split combinations across train/val/test sets **Data Statistics:** - **Total Images**: ~1.4M synthetic images - **Train Set**: ~980,000 (70%) - **Validation Set**: ~210,000 (15%) - **Test Set**: ~210,000 (15%) The dataset is generated with 8 positional variations per piece-board combination: - Non-empty pieces: 4 "ok" (centered/slightly off-centered) + 4 "bad" (significantly off-centered) variations - Empty squares: 4 "ok" variations only (empty squares are always considered valid) - This comprehensive variation strategy ensures robust centering detection across different board styles, piece sets, and positioning variations #### Class Balance The dataset maintains balanced class distribution: - Each centering class has equal representation - Empty squares are included in the "ok" class - Train/val/test splits maintain class balance #### Limitations Centering Semantics Preservation The model is trained with **conservative augmentation** to preserve centering semantics. No rotation or significant geometric transformations are applied that could alter the perceived centering of pieces within squares. Synthetic Training Data The model is trained only on synthetically generated centering variations. Performance on real-world chess board images with natural positioning variations may vary from synthetic test results. # Data Augmentation Documentation of data augmentation strategies used in training each model. ## Configuration Augmentation parameters are defined in `src/chess_cv/constants.py`: ```python AUGMENTATION_CONFIGS = { "pieces": { "padding": 16, "padding_mode": "edge", "rotation_degrees": 10, "center_crop_size": 40, "final_size": 32, "resized_crop_scale": (0.54, 0.74), "resized_crop_ratio": (0.9, 1.1), "arrow_probability": 0.80, "highlight_probability": 0.25, "move_probability": 0.50, "mouse_probability": 0.90, "horizontal_flip": True, "horizontal_flip_prob": 0.5, "brightness": 0.15, "contrast": 0.2, "saturation": 0.2, "hue": 0.2, "noise_mean": 0.0, "noise_sigma": 0.05, }, "arrows": { "arrow_probability": 0.0, "highlight_probability": 0.25, "move_probability": 0.50, "scale_min": 0.75, "scale_max": 1.0, "horizontal_flip": False, "brightness": 0.20, "contrast": 0.20, "saturation": 0.20, "hue": 0.2, "rotation_degrees": 2, "noise_mean": 0.0, "noise_sigma": 0.10, }, "snap": { "arrow_probability": 0.50, "highlight_probability": 0.20, "move_probability": 0.50, "mouse_probability": 0.80, "mouse_padding": 134, "mouse_rotation_degrees": 5, "mouse_center_crop_size": 246, "mouse_final_size": 32, "mouse_scale_range": (0.20, 0.30), "mouse_ratio_range": (0.8, 1.2), "horizontal_flip": True, "horizontal_flip_prob": 0.5, "brightness": 0.15, "contrast": 0.2, "saturation": 0.2, "hue": 0.2, "noise_mean": 0.0, "noise_sigma": 0.05, }, } ``` ______________________________________________________________________ ## Pieces Model Pipeline *The first row shows original training images, while the second row displays their augmented versions for the pieces model.* Applied in order during training: 1. **Expand Canvas** (16px padding): Pads image by 16px on all sides using edge replication mode (32×32 → 64×64). Creates space for rotation without cropping piece edges. 1. **Random Rotation** (±10°): Rotates image with black fill. More aggressive than previous ±5° to improve robustness. 1. **Center Crop** (40×40): Removes black corners introduced by rotation using conservative formula: `64 - (ceil(tan(10°) × 64) × 2) = 40`. Ensures no rotation artifacts remain. 1. **Random Resized Crop** (area scale 0.54-0.74, aspect ratio 0.9-1.1, output 32×32): Crops random region then resizes to 32×32. Base area ratio (32/40)² = 0.64 provides translation without zoom. Range 0.54-0.74 adds ±16% zoom variation. Aspect ratio 0.9-1.1 allows ±10% stretch for additional robustness. 1. **Arrow Overlay** (80% probability): Overlays random arrow component from `data/arrows/`. Applied after geometric transforms to maintain crisp arrow graphics. 1. **Highlight Overlay** (25% probability): Overlays semi-transparent highlight from `data/highlights/`. 1. **Move Overlay** (50% probability): Overlays random move indicator (dot/ring) from `data/moves/`. Simulates move annotations on pieces during gameplay. 1. **Mouse Overlay** (90% probability): Overlays random mouse cursor from `data/mouse/` with geometric transformations. Applies padding (134px), small rotation (±5°), center crop (246×246), random resized crop to final size (32×32) with scale 0.20-0.30 and ratio 0.8-1.2, making cursor smaller and positioning it randomly on the piece. 1. **Horizontal Flip** (50% probability): Flips image left-to-right. 1. **Color Jitter**: Randomly adjusts brightness (±15%), contrast (±20%), saturation (±20%), and hue (±20%). 1. **Gaussian Noise** (σ=0.05): Adds noise to normalized [0,1] pixels. ______________________________________________________________________ ## Arrows Model Pipeline *The first row shows original training images, while the second row displays their augmented versions for the arrows model.* Applied in order during training: 1. **Highlight Overlay** (25% probability): Overlays semi-transparent highlight from `data/highlights/`. Applied early before other transforms. 1. **Move Overlay** (50% probability): Overlays random move indicator (dot/ring) from `data/moves/`. Simulates move annotations on arrow components. 1. **Color Jitter**: Randomly adjusts brightness, contrast, saturation by ±20%, and hue by ±20%. 1. **Random Rotation** (±2°): Small rotation to preserve arrow directionality. 1. **Gaussian Noise** (σ=0.10): Adds noise to normalized [0,1] pixels. Higher noise than pieces model. ______________________________________________________________________ ## Snap Model Pipeline *The first row shows original training images with varying piece centering, while the second row displays their augmented versions for the snap model.* Applied during preprocessing: 1. **Piece Positioning**: Applies translation to simulate different degrees of misalignment: - "ok" class: 0-2px shifts (minimal/slight misalignment) - "bad" class: 3-14px shifts (significant misalignment) 1. **Zoom Variation**: Applies (-10%, +15%) random scaling to both classes for zoom robustness. Scaling is applied around the image center to preserve centering semantics while making the model resistant to different zoom levels. Applied in order during training: 1. **Arrow Overlay** (50% probability): Overlays random arrow component from `data/arrows/`. Applied early to simulate realistic interface conditions where arrows may be present during piece positioning. 1. **Highlight Overlay** (20% probability): Overlays semi-transparent highlight from `data/highlights/`. Simulates square highlighting that may occur during piece placement. 1. **Move Overlay** (50% probability): Overlays random move indicator (dot/ring) from `data/moves/`. Simulates move indicators during piece positioning evaluation. 1. **Mouse Cursor Overlay** (80% probability): Overlays random mouse cursor from `data/mouse/` with geometric transformations. Applies padding (134px), small rotation (±5°), center crop (246×246), random resized crop to final size (32×32) with scale 0.20-0.30 and ratio 0.8-1.2, making cursor smaller and positioning it randomly on the piece. 1. **Horizontal Flip** (50% probability): Flips image left-to-right. Centering semantics are preserved under horizontal flip. 1. **Color Jitter**: Randomly adjusts brightness (±15%), contrast (±20%), saturation (±20%), and hue (±20%). **Note:** The snap model uses **conservative augmentation** during training with no additional geometric transformations like rotation, cropping, or further scaling beyond the zoom variation in preprocessing. This preserves piece centering semantics—the model needs to distinguish between properly centered and poorly positioned pieces. Zoom variation in preprocessing (-10%, +15%) provides robustness to different zoom levels while maintaining the fundamental centering distinction. ______________________________________________________________________ ## Key Differences | Augmentation | Pieces | Arrows | Snap | Reason | | ------------------- | ----------------------------- | ----------- | -------------- | --------------------------------------------- | | Canvas Expansion | 16px edge padding (32→64) | ❌ | ❌ | Creates rotation space without edge cropping | | Rotation | ±10° | ±2° | ❌ | Snap needs to preserve centering semantics | | Center Crop | 40×40 (removes black corners) | ❌ | ❌ | Removes rotation artifacts | | Random Resized Crop | Area 0.54-0.74, ratio 0.9-1.1 | ❌ | ❌ | Translation + zoom would alter centering | | Arrow Overlay | 80% | ❌ | 50% | Simulates interface arrows during positioning | | Highlight Overlay | 25% | 25% | 20% | Simulates square highlighting | | Move Overlay | 50% | 50% | 50% | Simulates move indicators on pieces/arrows | | Mouse Overlay | 90% | ❌ | 80% | Simulates cursor interaction during placement | | Horizontal Flip | 50% | ❌ | 50% | Centering semantics preserved under flip | | Color Jitter | B±15%, CSH±20% | ±20% (BCSH) | B±15%, CSH±20% | Snap uses same variation as pieces | | Gaussian Noise | σ=0.05 | σ=0.10 | σ=0.05 | Snap uses same noise level as pieces | ______________________________________________________________________ ## Implementation The training script (`src/chess_cv/train.py`) constructs the pipeline dynamically based on model type: **Pieces Model:** ```python # 1. Expand canvas (32×32 → 64×64) v2.Pad(padding=16, padding_mode="edge") # 2. Rotate with black fill v2.RandomRotation(degrees=10, fill=0) # 3. Remove black rotation artifacts (64×64 → 40×40) # Formula: 64 - (ceil(tan(10°) × 64) × 2) = 40 v2.CenterCrop(size=40) # 4. Random crop + zoom + resize (40×40 → 32×32) # Area scale: (32/40)² ± 0.1 = 0.64 ± 0.1 → (0.54, 0.74) # Aspect ratio: ±10% stretch → (0.9, 1.1) v2.RandomResizedCrop(size=32, scale=(0.54, 0.74), ratio=(0.9, 1.1)) # 5-8. Overlays RandomArrowOverlay(probability=0.80) RandomHighlightOverlay(probability=0.25) RandomMoveOverlay(probability=0.50) RandomMouseOverlay(probability=0.90) # 9-10. Geometric + color v2.RandomHorizontalFlip(p=0.5) v2.ColorJitter(brightness=0.15, contrast=0.2, saturation=0.2, hue=0.2) # 11. Noise (requires tensor conversion) v2.ToImage() → v2.ToDtype() → v2.GaussianNoise() → v2.ToPILImage() ``` **Arrows Model:** ```python # 1. Highlight overlay RandomHighlightOverlay(probability=0.25) # 2. Move overlay RandomMoveOverlay(probability=0.50) # 3-4. Color + rotation v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2) v2.RandomRotation(degrees=2) # 5. Noise (requires tensor conversion) v2.ToImage() → v2.ToDtype() → v2.GaussianNoise() → v2.ToPILImage() ``` **Snap Model:** ```python # 1-4. Overlays RandomArrowOverlay(probability=0.50) RandomHighlightOverlay(probability=0.20) RandomMoveOverlay(probability=0.50) RandomMouseOverlay(probability=0.80) # 5-6. Geometric + color v2.RandomHorizontalFlip(p=0.5) v2.ColorJitter(brightness=0.15, contrast=0.2, saturation=0.2, hue=0.2) # 7. Noise (requires tensor conversion) v2.ToImage() → v2.ToDtype() → v2.GaussianNoise() → v2.ToPILImage() ```