Context

This program calculates the sentiment level of words based on user reviews from the Yelp academic dataset. Unlike standard sentiment analyzers, this program does not use VADER or any other pre-existing tool.

Tools & Frameworks Used

  • Python - A friendly versatile computer language for web development and data analysis.
  • JSON - A text-based data format for easy information exchange between systems.
  • NLTK - A Python library for text analysis and processing.
  • CSV - A text file format for storing data with values separated by commas.

Yelp Reviews Sentimetal Analysis

The program loads a JSON file, initially working with a subset before processing all reviews. It extracts review texts and ratings, then breaks down reviews into words using NLTK. Words are lemmatized, filtered for stop words and those outside the corpus. The program calculates the average rating for each lemma, discarding those in fewer than 10 reviews. The top 500 negative and positive lemmas, ranked by sentiment, are saved to a CSV using a CSV writer.

Yelp Reviews Sentimetal Analysis Image