---
language:
- en
tags:
- tabular-classification
- sklearn
- real-estate
- airbnb
- random-forest
license: mit
---
Loom Video - https://www.loom.com/share/d952bab219c4444589ddaef174c0e34d
# 🏠 Buenos Aires Airbnb Price Tier Classifier

![Python](https://img.shields.io/badge/Python-3.8%2B-blue)
![Library](https://img.shields.io/badge/Library-Scikit--Learn-orange)
![Task](https://img.shields.io/badge/Task-Classification-green)

## 📌 Project Overview
This project focuses on predicting the price tier (**Budget, Standard, or Luxury**) of Airbnb listings in Buenos Aires. 
Instead of a simple price prediction, we engineered advanced features using **Unsupervised Learning (K-Means Clustering)** to capture neighborhood characteristics and listing types, which were then fed into a **Random Forest Classifier**.

## 📊 Dataset & Features
The dataset contains Airbnb listings from Buenos Aires.
**Key Features Used:**
- `latitude`, `longitude`: Spatial coordinates.
- `minimum_nights`: Rental policy.
- `availability_365`: Professionalism indicator.
- `number_of_reviews`: Popularity.
- **Engineered Features:**
    - `cluster_id`: Generated via K-Means to group similar listings.
    - `dist_to_centroid`: Distance of listing from its cluster center.

## 🧠 Methodology

### 1. Unsupervised Learning (Clustering)
Before classification, we applied **K-Means Clustering** to identify hidden market segments.
- **Algorithm:** K-Means
- **Features for Clustering:** Location, Price, Availability.
- **Insight:** We identified distinct groups such as "Budget & High Traffic", "Luxury/Professional", and "Long-term Residential".


![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/eqNPsVjmdngMHCDXE3UcY.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/la-A6XrLnAgHZ1jvSN2-N.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/Nf5cXazlrk-n4AcuOjQFc.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/TPsdl8IrWMVBh4DqyEnPR.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/yaE7_VIC0kvqhthZz64Cf.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/K7A9muyp0xXoXz4B_HDjw.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/R3lm4tMkqCYv07HOWjKJw.png)

![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/YmApWN8wRIBE4waIwPBQl.png)

### 2. Classification Task
We transformed the continuous `price` target into 3 balanced classes using **Quantile Binning**:
- **Class 0:** Low / Budget (Bottom 33%)
- **Class 1:** Medium / Standard (Middle 33%)
- **Class 2:** High / Luxury (Top 33%)

### 3. Model Selection
We trained and evaluated three models:
1. Logistic Regression (Baseline)
2. Gradient Boosting
3. **Random Forest (Winner)** 🏆

## 🏆 Model Performance
The **Random Forest Classifier** was selected as the best model. It demonstrated superior ability to handle non-linear relationships (especially location data) and achieved the best balance between Precision and Recall.

- **Selected Model:** Random Forest Classifier
- **Metric:** Weighted F1-Score


![image](https://cdn-uploads.huggingface.co/production/uploads/6909a3eb5351e90362100740/ZQHknI3Wo9oVGfjLgavqQ.png)


## 🚀 How to Use the Model

You can download the model using the `huggingface_hub` library and use it in Python:

```python
import pickle
from huggingface_hub import hf_hub_download
import pandas as pd

# 1. Download the model
model_path = hf_hub_download(repo_id="Orib24/Buenos_Aires_Airbnb_Data", filename="airbnb_price_classifier.pkl")

# 2. Load the model
with open(model_path, "rb") as f:
    model = pickle.load(f)

# 3. Prepare Data (Example)
# Ensure you have the same features: [minimum_nights, number_of_reviews, availability_365, latitude, longitude, cluster_id, dist_to_centroid]
sample_data = pd.DataFrame([[2, 50, 360, -34.58, -58.42, 1, 0.5]], 
                           columns=['minimum_nights', 'number_of_reviews', 'availability_365', 'latitude', 'longitude', 'cluster_id', 'dist_to_centroid'])

# 4. Predict
prediction = model.predict(sample_data)
print(f"Predicted Class: {prediction[0]}") # Output: 0, 1, or 2