
Reddit Classification Model
Identifying novice investors through Reddit posts for targeted FinTech strategies.
Problem Statement:
Fintech companies often struggle to distinguish between experienced investors and novice users, leading to inefficient marketing strategies. This project aims to build a classification model that analyzes Reddit posts from r/investing and r/personalfinance to differentiate between professional and amateur investors. By identifying novice investors, companies can tailor educational content, products, and services to convert them into active users.
Summary
This project focuses on creating a machine learning classification model to categorize Reddit users based on their posts from r/investing (professionals) and r/personalfinance (amateurs). The goal is to enable fintech companies to efficiently target novice investors with personalized content and products. Various models, including Logistic Regression and Naive Bayes, were trained using text vectorization techniques like CountVectorizer and TF-IDF. Results showed that Logistic Regression with TF-IDF and Naive Bayes with CountVectorizer delivered the best performance, achieving F1-scores of 0.88 and accuracy up to 85.1%. These insights highlight the potential of text-based classification in real-world applications for audience segmentation and strategic marketing.