Skip to main content

Text Analysis

Convert your text data into structured, tabular datasets ready for modeling. Heimdall Read analyzes text and extracts comprehensive metrics and insights.

Get Your API Key

You'll need an API key from the Unstructured API Key tab to use text analysis features.

What You Can Analyze

  • Customer reviews - Understand sentiment and key topics
  • Product descriptions - Extract features and categories
  • Social media posts - Analyze engagement and sentiment
  • Support tickets - Categorize and prioritize issues
  • Survey responses - Extract insights from open-ended questions

API Specifications

Endpoint

POST https://read.heimdallapp.org/read/v1/api/process

Request Headers

  • x-api-key - API key that is issued when the endpoint is configured
  • x-username - Username associated with your account

Request Body

  • text - The text input that you want to analyze
warning

The text input should not include line breaks or double quotes.

Response Metrics

Heimdall Read provides comprehensive text analysis including:

Basic Metrics

  • length - The number of characters in the text
  • word_count - The number of words in the text
  • sentence_count - The number of sentences in the text

Language Analysis

  • avg_word_length - The average number of characters in words
  • avg_sentence_length - The average number of words in sentences
  • oov_ratio - Proportion of words not found in standard vocabulary
  • oov_ratio_2 - Alternative vocabulary coverage metric

Part-of-Speech Analysis

  • noun_count - Number of nouns in the text
  • verb_count - Number of verbs in the text
  • adjective_count - Number of adjectives in the text
  • adverb_count - Number of adverbs in the text
  • pronoun_count - Number of pronouns in the text
  • stopword_count - Number of common words (the, and, etc.)

Content Analysis

  • tfidf_top1 - Most important term (TF-IDF analysis)
  • tfidf_top2 - Second most important term
  • tfidf_top3 - Third most important term

Sentiment Analysis

  • sentiment - Overall sentiment: Positive, Negative, or Neutral
  • compound_sentiment_score - Sentiment intensity (-1 to +1)

Example Response

{
"length": 1755,
"word_count": 367,
"oov_ratio": 0.18256130790190736,
"oov_ratio_2": 0.33787465940054495,
"sentence_count": 18,
"avg_word_length": 3.904632152588556,
"avg_sentence_length": 20.38888888888889,
"noun_count": 54,
"verb_count": 76,
"adjective_count": 17,
"adverb_count": 40,
"pronoun_count": 27,
"stopword_count": 169,
"tfidf_top1": "writing",
"tfidf_top2": "good",
"tfidf_top3": "just",
"sentiment": "positive",
"compound_sentiment_score": 0.7737
}

Sample Request

import requests

url = 'https://read.heimdallapp.org/read/v1/api/process'
headers = {
'X-api-key': 'YOUR-API-KEY',
'X-username': 'YOUR-USERNAME'
}

data = {
"text": "Heimdall is an amazing machine learning platform that makes data science simple and accessible for everyone."
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
result = response.json()
print(f"Sentiment: {result['sentiment']}")
print(f"Word count: {result['word_count']}")
print(f"Key terms: {result['tfidf_top1']}, {result['tfidf_top2']}")
else:
print(f"Error: {response.status_code}")

Use Cases

Customer Review Analysis

# Analyze customer reviews for sentiment and key topics
reviews = [
"Great product, fast shipping, highly recommend!",
"Poor quality, arrived damaged, very disappointed.",
"Average product, nothing special but works fine."
]

for review in reviews:
response = requests.post(url, headers=headers, json={"text": review})
data = response.json()
print(f"Review: {review[:50]}...")
print(f"Sentiment: {data['sentiment']} (Score: {data['compound_sentiment_score']})")
print(f"Key terms: {data['tfidf_top1']}, {data['tfidf_top2']}")
print("---")

Content Categorization

# Use text metrics to categorize content
def categorize_content(text_metrics):
if text_metrics['sentiment'] == 'positive' and text_metrics['compound_sentiment_score'] > 0.5:
return "Highly Positive"
elif text_metrics['sentiment'] == 'negative' and text_metrics['compound_sentiment_score'] < -0.5:
return "Highly Negative"
else:
return "Neutral"

Error Handling

422 Unprocessable Entity

You will receive a 422 error if your request body structure is incorrect.

Common issues:

  • Missing required headers
  • Invalid text format (line breaks, quotes)
  • Malformed JSON request

Next Steps

Now that you can analyze text:

  1. Try Image Analysis - Process images with Heimdall Vision
  2. Build ML Models - Use text features in machine learning