Skip to content

Installation Guide

This guide will help you set up the CAPE Replication Project on your system.

Prerequisites

Before installing the CAPE Replication Project, ensure you have the following:

  • Python 3.7 or higher
  • Conda or Miniconda (recommended for environment management)
  • Git (for version control)
  • At least 4GB of RAM (for data processing)
  • 10GB of free disk space (for data and models)

Step 1: Clone the Repository

git clone https://github.com/your-username/cape-replication-project.git
cd cape-replication-project

Step 2: Set Up the Python Environment

# Create the conda environment
conda env create -f scripts/environment.yml -n chafs_b

# Activate the environment
conda activate chafs_b

Using pip (Alternative)

# Create a virtual environment
python -m venv cape_env

# Activate the environment
# On Windows:
cape_env\Scripts\activate
# On macOS/Linux:
source cape_env/bin/activate

# Install dependencies
pip install -r requirements-docs.txt

Step 3: Verify Installation

Test that everything is working correctly:

# Check Python version
python --version

# Check if key packages are installed
python -c "import pandas, numpy, xgboost, geopandas; print('All packages installed successfully!')"

Step 4: Configure Paths

Edit the configuration file config/config.json with your data paths:

{
  "dir_data_in": "/path/to/your/input/data/",
  "dir_data_out": "/path/to/your/output/data/",
  "dir_viewer": "/path/to/your/viewer/output/",
  "fn_data_processed": "/path/to/processed/data.hdf",
  "fn_cropdata": "/path/to/crop/data.csv",
  "fn_shapefile": "/path/to/shapefile.gpkg",
  "cape_setting_file": "/path/to/cape_settings.csv",
  "fnids_info": "/path/to/fnids_info.hdf",
  "fn_viewer_csv": "/path/to/viewer_data.csv"
}

Step 5: Prepare Data Files

Ensure you have the required data files in your input directory:

  • Crop data: gscd_data_240213.csv or similar
  • Shapefile: gscd_shape.gpkg
  • FNID information: fnids_info.hdf
  • CAPE settings: cape_setting.csv

Step 6: Test the Installation

Run a quick test to ensure everything is working:

# Test preprocessing
cd scripts/processing
python cape_preprocessing.py --start-from 3  # Skip to aggregation step

# Test development (if you have data)
cd ../development
python cape_development.py --help

Troubleshooting

Common Issues

1. Conda Environment Creation Fails

# Try updating conda first
conda update conda

# Or create environment manually
conda create -n chafs_b python=3.9
conda activate chafs_b
conda install -c conda-forge pandas numpy xgboost geopandas

2. Package Installation Errors

# Try installing packages individually
pip install pandas
pip install numpy
pip install xgboost
pip install geopandas

3. Path Configuration Issues

  • Ensure all paths in config/config.json are absolute paths
  • Check that directories exist and have proper permissions
  • Use forward slashes (/) even on Windows

4. Memory Issues

If you encounter memory errors during processing:

# Reduce memory usage by processing smaller chunks
# Edit the relevant scripts to add memory management

Getting Help

If you encounter issues not covered here:

  1. Check the Troubleshooting Guide
  2. Search existing GitHub Issues
  3. Create a new issue with detailed error information

Next Steps

Once installation is complete:

  1. Read the Quick Start Guide
  2. Review the Configuration Guide
  3. Explore the User Guide

System Requirements

Minimum Requirements

  • OS: Windows 10, macOS 10.14+, or Ubuntu 18.04+
  • Python: 3.7+
  • RAM: 4GB
  • Storage: 10GB free space
  • CPU: 2 cores
  • OS: Ubuntu 20.04+ or macOS 11+
  • Python: 3.9+
  • RAM: 16GB+
  • Storage: 50GB+ free space
  • CPU: 8+ cores
  • GPU: NVIDIA GPU with CUDA support (optional, for faster training)

Performance Tips

  1. Use SSD storage for faster data access
  2. Allocate sufficient RAM to avoid swapping
  3. Use multiple CPU cores for parallel processing
  4. Consider GPU acceleration for large model training
  5. Optimize data storage using appropriate formats (HDF5, Parquet)