Installation Guide¶
This guide will help you set up the CAPE Replication Project on your system.
Prerequisites¶
Before installing the CAPE Replication Project, ensure you have the following:
- Python 3.7 or higher
- Conda or Miniconda (recommended for environment management)
- Git (for version control)
- At least 4GB of RAM (for data processing)
- 10GB of free disk space (for data and models)
Step 1: Clone the Repository¶
Step 2: Set Up the Python Environment¶
Using Conda (Recommended)¶
# Create the conda environment
conda env create -f scripts/environment.yml -n chafs_b
# Activate the environment
conda activate chafs_b
Using pip (Alternative)¶
# Create a virtual environment
python -m venv cape_env
# Activate the environment
# On Windows:
cape_env\Scripts\activate
# On macOS/Linux:
source cape_env/bin/activate
# Install dependencies
pip install -r requirements-docs.txt
Step 3: Verify Installation¶
Test that everything is working correctly:
# Check Python version
python --version
# Check if key packages are installed
python -c "import pandas, numpy, xgboost, geopandas; print('All packages installed successfully!')"
Step 4: Configure Paths¶
Edit the configuration file config/config.json with your data paths:
{
"dir_data_in": "/path/to/your/input/data/",
"dir_data_out": "/path/to/your/output/data/",
"dir_viewer": "/path/to/your/viewer/output/",
"fn_data_processed": "/path/to/processed/data.hdf",
"fn_cropdata": "/path/to/crop/data.csv",
"fn_shapefile": "/path/to/shapefile.gpkg",
"cape_setting_file": "/path/to/cape_settings.csv",
"fnids_info": "/path/to/fnids_info.hdf",
"fn_viewer_csv": "/path/to/viewer_data.csv"
}
Step 5: Prepare Data Files¶
Ensure you have the required data files in your input directory:
- Crop data:
gscd_data_240213.csvor similar - Shapefile:
gscd_shape.gpkg - FNID information:
fnids_info.hdf - CAPE settings:
cape_setting.csv
Step 6: Test the Installation¶
Run a quick test to ensure everything is working:
# Test preprocessing
cd scripts/processing
python cape_preprocessing.py --start-from 3 # Skip to aggregation step
# Test development (if you have data)
cd ../development
python cape_development.py --help
Troubleshooting¶
Common Issues¶
1. Conda Environment Creation Fails¶
# Try updating conda first
conda update conda
# Or create environment manually
conda create -n chafs_b python=3.9
conda activate chafs_b
conda install -c conda-forge pandas numpy xgboost geopandas
2. Package Installation Errors¶
# Try installing packages individually
pip install pandas
pip install numpy
pip install xgboost
pip install geopandas
3. Path Configuration Issues¶
- Ensure all paths in
config/config.jsonare absolute paths - Check that directories exist and have proper permissions
- Use forward slashes (/) even on Windows
4. Memory Issues¶
If you encounter memory errors during processing:
# Reduce memory usage by processing smaller chunks
# Edit the relevant scripts to add memory management
Getting Help¶
If you encounter issues not covered here:
- Check the Troubleshooting Guide
- Search existing GitHub Issues
- Create a new issue with detailed error information
Next Steps¶
Once installation is complete:
- Read the Quick Start Guide
- Review the Configuration Guide
- Explore the User Guide
System Requirements¶
Minimum Requirements¶
- OS: Windows 10, macOS 10.14+, or Ubuntu 18.04+
- Python: 3.7+
- RAM: 4GB
- Storage: 10GB free space
- CPU: 2 cores
Recommended Requirements¶
- OS: Ubuntu 20.04+ or macOS 11+
- Python: 3.9+
- RAM: 16GB+
- Storage: 50GB+ free space
- CPU: 8+ cores
- GPU: NVIDIA GPU with CUDA support (optional, for faster training)
Performance Tips¶
- Use SSD storage for faster data access
- Allocate sufficient RAM to avoid swapping
- Use multiple CPU cores for parallel processing
- Consider GPU acceleration for large model training
- Optimize data storage using appropriate formats (HDF5, Parquet)