🏠 Egypt Real Estate Price Predictor
A machine learning model trained on 18,838 Egyptian property listings to predict real estate prices across Egypt.
Model Details
| Property | Value |
|---|---|
| Algorithm | XGBoost (gradient boosting) |
| Target | log1p(price in EGP) |
| R² Score (log scale) | 0.647 |
| Median APE | 27.5% |
| Training samples | 18,838 |
| Cities covered | 16 Egyptian governorates |
Features Used
| Feature | Description |
|---|---|
size_sqm |
Property area in square meters |
bedrooms_clean |
Number of bedrooms |
bathrooms |
Number of bathrooms |
city |
Governorate (Cairo, Giza, North Coast, etc.) |
property_type |
Apartment, Villa, Chalet, Duplex, etc. |
is_installment |
Payment method (0=Cash, 1=Installments) |
city_median_price |
Median price in that city (target encoding) |
type_median_price |
Median price for that property type |
bed_bath_ratio |
Bedrooms / Bathrooms ratio |
rooms_total |
Total rooms count |
size_per_room |
Average sqm per room |
Cities Covered
Cairo · Giza · North Coast · Red Sea · Alexandria · Suez · Qalyubia · South Sinai · Matrouh · Al Daqahlya · Aswan · Asyut · Damietta · Luxor · Sharqia · Kafr El Sheikh
Property Types
Apartment · Villa · Chalet · Duplex · Townhouse · Twin House · Penthouse · Hotel Apartment · Other
Pipeline
The model uses a sklearn.pipeline.Pipeline with:
ColumnTransformer— StandardScaler for numerics, OrdinalEncoder for categoricalsXGBRegressor— 1000 estimators, lr=0.02, max_depth=8
Usage
import joblib
import numpy as np
import pandas as pd
import json
# Load model and metadata
model = joblib.load("real_estate_model.pkl")
with open("model_metadata.json") as f:
meta = json.load(f)
def predict_price(area_sqm, bedrooms, bathrooms, city, property_type, is_installment=0):
cm = meta["city_median"]
tm = meta["type_median"]
rooms = bedrooms + bathrooms
global_med = meta["global_median"]
row = pd.DataFrame([{
"size_sqm": area_sqm,
"bedrooms_clean": bedrooms,
"bathrooms": bathrooms,
"is_installment": is_installment,
"bed_bath_ratio": bedrooms / max(bathrooms, 1),
"rooms_total": rooms,
"city_median_price": cm.get(city, global_med),
"type_median_price": tm.get(property_type, global_med),
"size_per_room": area_sqm / max(rooms, 1),
"city": city,
"property_type": property_type,
}])
log_price = model.predict(row)[0]
price = np.expm1(log_price)
return {
"estimated_price_egp": round(price),
"price_low_egp": round(price * 0.90),
"price_high_egp": round(price * 1.15),
"price_per_sqm_egp": round(price / area_sqm),
}
# Example
result = predict_price(
area_sqm=150,
bedrooms=3,
bathrooms=2,
city="Cairo",
property_type="Apartment"
)
print(result)
# {'estimated_price_egp': 9800000, 'price_low_egp': 8820000, ...}
Data Source
Scraped from PropertyFinder Egypt. Data cleaned, outliers removed, and features engineered for optimal model performance.
Limitations
- Prices reflect market listings (not sale prices)
- High variance is inherent to real estate (view, floor, finish not captured)
- North Coast / Red Sea prices are seasonal/resort properties