Context

This program analyzes a pickled Pandas dataframe of tweets from select Suffolk University students between 2020-01-01 and 2020-07-31. It identifies the top 20 hashtags and similar words monthly, combining counts for words matching hashtags without the '#' and ignoring case differences.

Tools & Frameworks Used

  • Python - A friendly versatile computer language for web development and data analysis.
  • Pandas - A library for data manipulation and analysis, making it simpler to work with structured data.
  • Pickle - A Python module used for serializing and deserializing Python objects, allowing you to save and load complex data structures easily.
  • CSV - A text file format for storing data with values separated by commas.

Tweet Analysis

This project processes tweet data to count hashtag usage using the pandas library. The gethashtags function retrieves hashtags from both a dedicated column and the tweet text, while getdatabym filters tweets for a given month. The main script reads data from a user-specified pickle file, processes hashtag counts for months 1-7, sorts them by frequency, and saves the top 20 monthly hashtags to a CSV named after the input file.