Llama Fine-Tuned Spanish

Bilingual Reasoning Model for English-to-Spanish Mathematical Problem Solving

LoRA Fine-Tuning
4-bit Quantization
7B Parameters
ACTIVE Project Status

Model Architecture Diagram

LoRA

Base Model

Llama-2-7B-Chat

NousResearch

4-bit Quantized

Fine-Tuning

LoRA Adapters

200 Examples

10 Epochs

Specialized Model

English → Spanish

Reasoning Tasks

Step-by-Step

Input (English)
"A car travels 100 km at 50 km/h..."
Output (Spanish)
"Paso 1: Calculamos el tiempo..."
512
Max Sequence
2e-4
Learning Rate
8
Batch Size
T4
GPU Training

Project Overview

The Llama Fine-Tuned Spanish model is a specialized adaptation of NousResearch/llama-2-7b-chat-hf, designed specifically for solving reasoning-heavy mathematical problems in English and providing detailed, step-by-step solutions in Spanish. This bilingual approach makes it ideal for educational purposes and cross-language mathematical reasoning tasks.

Built using LoRA (Low-Rank Adaptation) fine-tuning with 4-bit quantization, the model is optimized for deployment in resource-constrained environments like Kaggle while maintaining high performance for mathematical reasoning tasks including speed/distance/time problems, geometry, and logic puzzles.

Key Features & Capabilities

  • Bilingual reasoning: Processes English mathematical problems and generates Spanish solutions
  • LoRA fine-tuning with 4-bit quantization for efficient memory usage and faster inference
  • Specialized in step-by-step mathematical problem solving with detailed explanations
  • Optimized for speed/distance/time calculations, geometry problems, and logic puzzles
  • Educational focus with structured, pedagogical approach to problem explanation
  • Resource-efficient deployment suitable for edge computing and limited GPU environments
  • Integration with Hugging Face ecosystem for easy deployment and inference
  • Systematic validation approach ensuring accuracy in multi-step calculations

Technical Implementation

The model employs LoRA (Low-Rank Adaptation) fine-tuning on top of the Llama-2-7B base model, utilizing BitsAndBytesConfig for 4-bit quantization with NF4 quantization type and float16 compute dtype. This approach significantly reduces memory requirements while maintaining model performance for the specialized reasoning tasks.

Training was conducted on Kaggle's GPU infrastructure using a carefully curated dataset of 200 training examples and 20 validation examples. The training process utilized Paged AdamW optimizer with gradient accumulation and a maximum sequence length of 512 tokens, ensuring comprehensive coverage of mathematical reasoning patterns.

Results & Performance Metrics

The model demonstrates strong performance on standard mathematical reasoning problems, correctly solving speed/distance/time problems such as calculating average speeds for multi-segment journeys. Training over 10 epochs with an effective batch size of 8 achieved convergence on the validation dataset with minimal overfitting.

Performance evaluation shows accurate step-by-step solutions for familiar problem types, though the model may struggle with complex multi-step problems or scenarios significantly different from training patterns. The 4-bit quantization maintains solution quality while enabling deployment on resource-constrained hardware with approximately 1-2 hour training time.

Training Pipeline

LoRA fine-tuning pipeline with 4-bit quantization for efficient bilingual reasoning model development

🔧 Data Preparation

Curated dataset of 200 English mathematical problems with step-by-step Spanish solutions

Speed/Distance/Time Problems
Geometry Calculations
Logic Puzzles
dataset = load_dataset("math_problems_es")
train_data = dataset["train"] # 200 examples
val_data = dataset["validation"] # 20 examples

🏗️ Model Configuration

Load base Llama-2-7B model with 4-bit quantization and LoRA adapters

BitsAndBytesConfig
NF4 Quantization
Float16 Compute
quantization_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4"
)

🎯 LoRA Setup

Configure Parameter Efficient Fine-Tuning with LoRA adapters for targeted layer updates

Rank: 16
Alpha: 32
Target: All Linear
peft_config = LoraConfig(
  r=16, lora_alpha=32,
  target_modules=["all-linear"]
)

🚀 Training Process

Fine-tune with Paged AdamW optimizer over 10 epochs with gradient accumulation

Learning Rate: 2e-4
Batch Size: 8
Max Length: 512
Warmup Steps: 100
trainer = SFTTrainer(
  model=model, tokenizer=tokenizer,
  train_dataset=train_data,
  max_seq_length=512
)

📊 Validation & Evaluation

Assess model performance on mathematical reasoning tasks with bilingual output validation

Step-by-Step Accuracy
Language Consistency
Mathematical Correctness
eval_results = trainer.evaluate()
perplexity = eval_results["eval_loss"]
accuracy = validate_math_solutions(predictions)

🎁 Model Deployment

Push fine-tuned model to Hugging Face Hub for easy access and inference

Hugging Face Hub
Inference API
Resource Efficient
model.push_to_hub("llama-spanish-math")
tokenizer.push_to_hub("llama-spanish-math")
# Ready for inference!

Key Results & Impact

🎯

Bilingual Reasoning

Successfully processes English mathematical problems and generates detailed step-by-step solutions in Spanish

Efficient Training

LoRA fine-tuning with 4-bit quantization enables training on resource-constrained hardware in 1-2 hours

🛡️

Educational Focus

Designed specifically for educational purposes with structured, pedagogical approach to mathematical explanation

🌍

Open Source

Available on Hugging Face Hub with comprehensive documentation and easy integration for researchers and educators

Explore More

Access the model, documentation, and implementation details for bilingual mathematical reasoning