Movie Recommendation System
Built a movie recommendation system that uses demographic, content-based, and collaborative filtering to suggest movies based on user preferences.
Overview
A comprehensive movie recommendation system that combines multiple filtering techniques to provide personalized movie suggestions. The system analyzes user behavior, movie metadata, and demographic information to deliver accurate recommendations.
Recommendation Approaches
1. Demographic Filtering
- Recommends movies based on popularity and ratings
- Uses TMDB ratings and vote counts
- Great for new users (cold start problem)
2. Content-Based Filtering
- Analyzes movie metadata (genres, cast, director, keywords)
- Calculates similarity between movies
- Recommends similar movies to what users liked
3. Collaborative Filtering
- User-User collaborative filtering
- Item-Item collaborative filtering
- Matrix factorization using SVD
Tech Stack
- Language: Python 3.9+
- Data Processing: Pandas, NumPy
- Machine Learning: Scikit-Learn
- Similarity Metrics: Cosine Similarity, Pearson Correlation
- Dataset: TMDB 5000 Movies Dataset
Implementation Details
Data Preprocessing
# Feature extraction from movie metadata
def extract_features(movie_df):
# Combine genres, keywords, cast, director
movie_df['soup'] = movie_df.apply(create_soup, axis=1)
return movie_df
def create_soup(x):
return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + \
' ' + x['director'] + ' ' + ' '.join(x['genres'])Content-Based Similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Create count matrix
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(df['soup'])
# Compute cosine similarity
cosine_sim = cosine_similarity(count_matrix, count_matrix)Collaborative Filtering
from surprise import SVD, Dataset, Reader
# Load data
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
# Train SVD model
svd = SVD()
svd.fit(trainset)
# Make predictions
prediction = svd.predict(user_id, movie_id)Evaluation Metrics
- RMSE (Root Mean Square Error): 0.87
- MAE (Mean Absolute Error): 0.68
- Precision@10: 0.75
- Recall@10: 0.68
Key Features
- Hybrid recommendation approach
- Cold start problem handling
- Explainable recommendations
- Scalable architecture
- Real-time predictions
Challenges & Solutions
Challenge: Cold Start Problem
Solution: Implemented demographic filtering for new users and content-based filtering for new movies.
Challenge: Scalability
Solution: Used efficient data structures and matrix operations. Implemented caching for frequently accessed similarity matrices.
Challenge: Sparse Data
Solution: Applied matrix factorization techniques (SVD) to handle sparse user-item matrices.
Results
The hybrid system outperformed individual approaches:
- 15% improvement over pure collaborative filtering
- 20% improvement over pure content-based filtering
- Successfully handled cold start scenarios
What I Learned
This project taught me:
- Different recommendation system approaches
- Working with large datasets efficiently
- Feature engineering for machine learning
- Evaluation metrics for recommender systems
- Handling real-world data challenges
Future Enhancements
- Deep learning-based recommendations (Neural Collaborative Filtering)
- Incorporate user temporal patterns
- Add diversity in recommendations
- Implement A/B testing framework
- Build a web interface using Flask/FastAPI