Data Setup
Learn how to properly organize your data files for The Forge's automated processing pipeline. Follow these guidelines to ensure your data is processed correctly.
File Organization
The Forge requires your data to be organized in a specific way to automatically process and label your datasets. All datasets come with two standard columns: filename and label.
How It Works
- Filename Column - Contains the names of individual files as they appear in your zip file
- Label Column - Automatically populated based on folder structure
- Folder Names - Become the labels for all files stored in that folder
Zip File Structure
Image Classification Example
Imagine you're building a fire detection model to identify if there's a fire in an image or not. You would need to organize your images like this:
ForestFireImages.zip/
├── Fire/
│ ├── fire1.png
│ ├── fire2.jpg
│ ├── fire3.png
│ └── fire4.jpg
└── NoFire/
├── forest1.png
├── forest2.jpg
├── trees1.png
└── trees2.jpg
Text Classification Example
For a sentiment analysis model, organize your text files like this:
SentimentData.zip/
├── Positive/
│ ├── review1.txt
│ ├── review2.txt
│ └── review3.txt
└── Negative/
├── complaint1.txt
├── complaint2.txt
└── complaint3.txt
Supported File Types
Image Files
- JPG/JPEG - Standard image format
- PNG - High-quality images with transparency
- GIF - Animated or static images
Text Files
- TXT - Plain text files
Best Practices
File Organization
- Consistent Naming - Use clear, descriptive file names
- Proper Folders - Create separate folders for each category
- Clean Structure - Avoid nested subfolders
- File Formats - Use standard, supported file formats
Data Quality
- High Quality - Use clear, high-resolution images
- Consistent Size - Keep similar file sizes when possible
- Clean Data - Remove corrupted or duplicate files
- Balanced Classes - Ensure roughly equal numbers in each category
Security
- Data Privacy - Ensure sensitive data is properly handled
- Access Control - Limit access to authorized users only
- Backup - Keep copies of your original data
- Compliance - Follow data protection regulations
Getting Started
Ready to process your data? Follow these steps:
- Organize Your Data - Create folders for each category
- Name Your Files - Use clear, descriptive names
- Create Zip File - Compress your organized folders
- Upload to The Forge - Let automated processing begin
Pro Tip
The more organized your data is, the better The Forge can process it. Take time to properly structure your folders and files before uploading.