Zip folder layout
Organize zip archives so Lake unstructured ingest can label files automatically. Processed datasets include filename and label columns.
Before you start
- Ingest via Lake → Add data → Unstructured (guide)
- Top-level folder names become class labels
How labeling works
- Filename — name of each file inside the zip
- Label — taken from the top-level folder that contains the file
- Folder names — become class labels for all files in that folder
Image classification example
Fire detection with two classes:
ForestFireImages.zip/
├── Fire/
│ ├── fire1.png
│ ├── fire2.jpg
│ └── fire3.png
└── NoFire/
├── forest1.png
├── forest2.jpg
└── trees1.png
Text classification example
Sentiment analysis:
SentimentData.zip/
├── Positive/
│ ├── review1.txt
│ └── review2.txt
└── Negative/
├── complaint1.txt
└── complaint2.txt
Supported file types
Images
- JPG / JPEG
- PNG
- GIF
Text
- TXT (plain text)
Best practices
Organization
- Use one folder per class at the zip root
- Avoid deep nesting — keep structure flat
- Use clear, consistent file names
Data quality
- Remove corrupted or duplicate files
- Balance classes when possible (similar counts per folder)
- Use readable images and clean text
Next steps
- Create your zip following the layout above
- Run unstructured ingest
- Curate silver or publish gold for ML
Pro tip
Well-organized zips produce cleaner bronze tables and faster paths to gold for ML.