Skip to main content

Zip folder layout

Organize zip archives so Lake unstructured ingest can label files automatically. Processed datasets include filename and label columns.

Before you start
  • Ingest via LakeAdd dataUnstructured (guide)
  • Top-level folder names become class labels

How labeling works

  • Filename — name of each file inside the zip
  • Label — taken from the top-level folder that contains the file
  • Folder names — become class labels for all files in that folder

Image classification example

Fire detection with two classes:

ForestFireImages.zip/
├── Fire/
│ ├── fire1.png
│ ├── fire2.jpg
│ └── fire3.png
└── NoFire/
├── forest1.png
├── forest2.jpg
└── trees1.png

Text classification example

Sentiment analysis:

SentimentData.zip/
├── Positive/
│ ├── review1.txt
│ └── review2.txt
└── Negative/
├── complaint1.txt
└── complaint2.txt

Supported file types

Images

  • JPG / JPEG
  • PNG
  • GIF

Text

  • TXT (plain text)

Best practices

Organization

  • Use one folder per class at the zip root
  • Avoid deep nesting — keep structure flat
  • Use clear, consistent file names

Data quality

  • Remove corrupted or duplicate files
  • Balance classes when possible (similar counts per folder)
  • Use readable images and clean text

Next steps

  1. Create your zip following the layout above
  2. Run unstructured ingest
  3. Curate silver or publish gold for ML
Pro tip

Well-organized zips produce cleaner bronze tables and faster paths to gold for ML.