Skip to main content

The Forge

The Forge is an automated data processing pipeline which enables users to build powerful features vectors or datasets. These feature vectors can be stored in a traditional database and are also available in Heimdall's Data Science suite.

All the datasets will come with two standard columns: filename & label. The filename column will include the names of individual files as they are in the zipped files. In order to populate the label column you need to categorize the files into folders. The folder names will be the label for all the images stored in that folder.

The below example gives you an idea of how to structure your zip file. Imagine that you are building a classification model in which you want to identify if there is a fire in an image or not. In order to acheieve this, you would need to train the model with both images with fire and images without fire. When creating your zip file, you can put all your iamges with fire into one folder and label it as your wish, in this case, "Fire". You can also upload all your images without fire into another foler and label as you wish, in this case "NoFire".

Label Creation
ForestFireImages.zip/
├── Fire/
│ ├── fire1.png
│ ├── fire2.png
│ ├── .......
├── NoFire/
│ ├── nofire1.png
│ ├── nofire2.png
│ ├── .......

File Specifications

In order to build an object in The Forge, you will need to upload a zip file full of a particular unstructured data source. Currently, The Forge supports text and images. The zip file should either be all images or all text files.

Accepted File Formats
1 Images: .jpeg, .jpg, .png
2 Text: .txt
File Upload Speeds

The upload process speed depends on your internet connection. High speed internet should lead to quick processing, sometimes even less than a second based on the size of the file.

Mixing File Types

All your files have to be of one file category, either all images or all text files.

Data Availability

All datasets built in the Forge will be available instantly in all the Data Science products. The Forge works best of classification problems where you want to build image or text classification models.

When importing datasets from The Forge, the Data Science process is streamlined because you no longer have to worry about the file or its specifications. We take care of all the heavy lifiting, so you can sit back and relax as we build you a cutting edge classifier.

Once you import your data from The Forge, you can select Label as the target variable and hit Optimize. If you have any questions, refer to our Data Science documentations

Issues?

If you find issues with the documentation or have suggestions on how to improve the documentation or the project in general, please contact us or send us a tweet @HeimdallML on twitter!