--- language: - en tags: - tabular-classification - sklearn - real-estate - airbnb - random-forest license: mit --- Loom Video - https://www.loom.com/share/d952bab219c4444589ddaef174c0e34d # 🏠 Buenos Aires Airbnb Price Tier Classifier ![Python](https://img.shields.io/badge/Python-3.8%2B-blue) ![Library](https://img.shields.io/badge/Library-Scikit--Learn-orange) ![Task](https://img.shields.io/badge/Task-Classification-green) ## 📌 Project Overview This project focuses on predicting the price tier (**Budget, Standard, or Luxury**) of Airbnb listings in Buenos Aires. Instead of a simple price prediction, we engineered advanced features using **Unsupervised Learning (K-Means Clustering)** to capture neighborhood characteristics and listing types, which were then fed into a **Random Forest Classifier**. ## 📊 Dataset & Features The dataset contains Airbnb listings from Buenos Aires. **Key Features Used:** - `latitude`, `longitude`: Spatial coordinates. - `minimum_nights`: Rental policy. - `availability_365`: Professionalism indicator. - `number_of_reviews`: Popularity. - **Engineered Features:** - `cluster_id`: Generated via K-Means to group similar listings. - `dist_to_centroid`: Distance of listing from its cluster center. ## 🧠 Methodology ### 1. Unsupervised Learning (Clustering) Before classification, we applied **K-Means Clustering** to identify hidden market segments. - **Algorithm:** K-Means - **Features for Clustering:** Location, Price, Availability. - **Insight:** We identified distinct groups such as "Budget & High Traffic", "Luxury/Professional", and "Long-term Residential". ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/eqNPsVjmdngMHCDXE3UcY.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/la-A6XrLnAgHZ1jvSN2-N.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/Nf5cXazlrk-n4AcuOjQFc.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/TPsdl8IrWMVBh4DqyEnPR.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/yaE7_VIC0kvqhthZz64Cf.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/K7A9muyp0xXoXz4B_HDjw.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/R3lm4tMkqCYv07HOWjKJw.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/YmApWN8wRIBE4waIwPBQl.png) ### 2. Classification Task We transformed the continuous `price` target into 3 balanced classes using **Quantile Binning**: - **Class 0:** Low / Budget (Bottom 33%) - **Class 1:** Medium / Standard (Middle 33%) - **Class 2:** High / Luxury (Top 33%) ### 3. Model Selection We trained and evaluated three models: 1. Logistic Regression (Baseline) 2. Gradient Boosting 3. **Random Forest (Winner)** 🏆 ## 🏆 Model Performance The **Random Forest Classifier** was selected as the best model. It demonstrated superior ability to handle non-linear relationships (especially location data) and achieved the best balance between Precision and Recall. - **Selected Model:** Random Forest Classifier - **Metric:** Weighted F1-Score ![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/ZQHknI3Wo9oVGfjLgavqQ.png) ## 🚀 How to Use the Model You can download the model using the `huggingface_hub` library and use it in Python: ```python import pickle from huggingface_hub import hf_hub_download import pandas as pd # 1. Download the model model_path = hf_hub_download(repo_id="Orib24/Buenos_Aires_Airbnb_Data", filename="airbnb_price_classifier.pkl") # 2. Load the model with open(model_path, "rb") as f: model = pickle.load(f) # 3. Prepare Data (Example) # Ensure you have the same features: [minimum_nights, number_of_reviews, availability_365, latitude, longitude, cluster_id, dist_to_centroid] sample_data = pd.DataFrame([[2, 50, 360, -34.58, -58.42, 1, 0.5]], columns=['minimum_nights', 'number_of_reviews', 'availability_365', 'latitude', 'longitude', 'cluster_id', 'dist_to_centroid']) # 4. Predict prediction = model.predict(sample_data) print(f"Predicted Class: {prediction[0]}") # Output: 0, 1, or 2