Installation Guide¶

This guide will help you set up the CAPE Replication Project on your system.

Prerequisites¶

Before installing the CAPE Replication Project, ensure you have the following:

Python 3.7 or higher
Conda or Miniconda (recommended for environment management)
Git (for version control)
At least 4GB of RAM (for data processing)
10GB of free disk space (for data and models)

Step 1: Clone the Repository¶

git clone https://github.com/your-username/cape-replication-project.git
cd cape-replication-project

Step 2: Set Up the Python Environment¶

Using Conda (Recommended)¶

# Create the conda environment
conda env create -f scripts/environment.yml -n chafs_b

# Activate the environment
conda activate chafs_b

Using pip (Alternative)¶

# Create a virtual environment
python -m venv cape_env

# Activate the environment
# On Windows:
cape_env\Scripts\activate
# On macOS/Linux:
source cape_env/bin/activate

# Install dependencies
pip install -r requirements-docs.txt

Step 3: Verify Installation¶

Test that everything is working correctly:

# Check Python version
python --version

# Check if key packages are installed
python -c "import pandas, numpy, xgboost, geopandas; print('All packages installed successfully!')"

Step 4: Configure Paths¶

Edit the configuration file config/config.json with your data paths:

{
  "dir_data_in": "/path/to/your/input/data/",
  "dir_data_out": "/path/to/your/output/data/",
  "dir_viewer": "/path/to/your/viewer/output/",
  "fn_data_processed": "/path/to/processed/data.hdf",
  "fn_cropdata": "/path/to/crop/data.csv",
  "fn_shapefile": "/path/to/shapefile.gpkg",
  "cape_setting_file": "/path/to/cape_settings.csv",
  "fnids_info": "/path/to/fnids_info.hdf",
  "fn_viewer_csv": "/path/to/viewer_data.csv"
}

Step 5: Prepare Data Files¶

Ensure you have the required data files in your input directory:

Crop data: gscd_data_240213.csv or similar
Shapefile: gscd_shape.gpkg
FNID information: fnids_info.hdf
CAPE settings: cape_setting.csv

Step 6: Test the Installation¶

Run a quick test to ensure everything is working:

# Test preprocessing
cd scripts/processing
python cape_preprocessing.py --start-from 3  # Skip to aggregation step

# Test development (if you have data)
cd ../development
python cape_development.py --help

Troubleshooting¶

Common Issues¶

1. Conda Environment Creation Fails¶

# Try updating conda first
conda update conda

# Or create environment manually
conda create -n chafs_b python=3.9
conda activate chafs_b
conda install -c conda-forge pandas numpy xgboost geopandas

2. Package Installation Errors¶

# Try installing packages individually
pip install pandas
pip install numpy
pip install xgboost
pip install geopandas

3. Path Configuration Issues¶

Ensure all paths in config/config.json are absolute paths
Check that directories exist and have proper permissions
Use forward slashes (/) even on Windows

4. Memory Issues¶

If you encounter memory errors during processing:

# Reduce memory usage by processing smaller chunks
# Edit the relevant scripts to add memory management

Getting Help¶

If you encounter issues not covered here:

Check the Troubleshooting Guide
Search existing GitHub Issues
Create a new issue with detailed error information

Next Steps¶

Once installation is complete:

Read the Quick Start Guide
Review the Configuration Guide
Explore the User Guide

System Requirements¶

Minimum Requirements¶

OS: Windows 10, macOS 10.14+, or Ubuntu 18.04+
Python: 3.7+
RAM: 4GB
Storage: 10GB free space
CPU: 2 cores

Recommended Requirements¶

OS: Ubuntu 20.04+ or macOS 11+
Python: 3.9+
RAM: 16GB+
Storage: 50GB+ free space
CPU: 8+ cores
GPU: NVIDIA GPU with CUDA support (optional, for faster training)

Performance Tips¶

Use SSD storage for faster data access
Allocate sufficient RAM to avoid swapping
Use multiple CPU cores for parallel processing
Consider GPU acceleration for large model training
Optimize data storage using appropriate formats (HDF5, Parquet)