Installation Guide

Prerequisites

git clone https://github.com/BartonChenTW/LLM-data-processer.git
cd LLM-data-processer

Windows PowerShellWindows CMDLinux/Mac

python -m venv .venv
.\.venv\Scripts\Activate.ps1

python -m venv .venv
.venv\Scripts\activate.bat

python3 -m venv .venv
source .venv/bin/activate

pip install --upgrade pip
pip install -r requirements.txt

For local model support, install PyTorch:

CPU OnlyGPU (CUDA)

pip install torch --index-url https://download.pytorch.org/whl/cpu

pip install torch torchvision torchaudio

PDF support is already included in requirements.txt:

# Already installed with requirements.txt
pdfplumber>=0.10.0

For document splitting and advanced RAG workflows:

pip install langchain langchain-community pypdf

To install as an editable package:

pip install -e .

You need at least one of these API keys depending on which provider you use:

Windows PowerShellWindows CMDLinux/Mac

$env:HF_TOKEN="your_token_here"

set HF_TOKEN=your_token_here

export HF_TOKEN="your_token_here"

Windows PowerShellWindows CMDLinux/Mac

$env:GEMINI_API_KEY="your_key_here"

set GEMINI_API_KEY=your_key_here

export GEMINI_API_KEY="your_key_here"

Create a .env file in the project root:

# Copy the example file
cp .env.example .env

Edit .env with your keys:

HF_TOKEN=your_huggingface_token_here
GEMINI_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_here

Run this to verify everything is installed correctly:

# Test imports
from llm_helper import AIHelper
import pandas as pd
import transformers

print("✅ All packages installed successfully!")

pip install transformers torch

Install PyTorch to suppress the warning:

pip install torch --index-url https://download.pytorch.org/whl/cpu

Check if your API keys are set:

Windows PowerShellLinux/Mac

echo $env:HF_TOKEN
echo $env:GEMINI_API_KEY

echo $HF_TOKEN
echo $GEMINI_API_KEY

Add the kernel:

python -m ipykernel install --user --name=llm-processer