File size: 4,541 Bytes
62e0350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# Hugging Face Hub Upload Instructions

## Files Ready for Upload

All files are in the `hf_upload/` directory:

```
hf_upload/
├── model.safetensors          # Projection head weights (512 KB)
├── config.json                 # Model configuration
├── modeling_bgem3_projection.py # Model class definition
├── training_info.json          # Training metrics and details
└── README.md                   # Model Card
```

## Step-by-Step Upload Process

### 1. Install Hugging Face CLI (if not already installed)

```bash
pip install huggingface_hub
```

### 2. Login to Hugging Face

```bash
huggingface-cli login
```

Enter your Hugging Face token when prompted. Get your token from: https://huggingface.co/settings/tokens

### 3. Create Repository

```bash
huggingface-cli repo create bge-m3-vietnamese-rental-projection --type model
```

This creates a new model repository: `https://huggingface.co/YOUR_USERNAME/bge-m3-vietnamese-rental-projection`

### 4. Upload Files

#### Option A: Using huggingface-cli (Recommended)

```bash
cd hf_upload

# Upload all files at once
huggingface-cli upload YOUR_USERNAME/bge-m3-vietnamese-rental-projection . . --repo-type model
```

#### Option B: Using Git

```bash
cd hf_upload

# Clone the empty repo
git clone https://huggingface.co/YOUR_USERNAME/bge-m3-vietnamese-rental-projection
cd bge-m3-vietnamese-rental-projection

# Copy files
cp ../model.safetensors .
cp ../config.json .
cp ../modeling_bgem3_projection.py .
cp ../training_info.json .
cp ../README.md .

# Commit and push
git add .
git commit -m "Initial upload: BGE-M3 Vietnamese rental projection head"
git push
```

#### Option C: Using Python

```python
from huggingface_hub import HfApi

api = HfApi()

# Upload each file
api.upload_file(
    path_or_fileobj="model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="YOUR_USERNAME/bge-m3-vietnamese-rental-projection",
    repo_type="model",
)

# Repeat for other files...
```

### 5. Update README.md

Before uploading, update `README.md` with your Hugging Face username:

1. Replace `your-username` with your actual username (appears 2 times)
2. Update the citation section with your name
3. Add your contact information if desired

### 6. Verify Upload

After uploading, visit:
```
https://huggingface.co/YOUR_USERNAME/bge-m3-vietnamese-rental-projection
```

You should see:
- ✅ Model Card (README.md) displayed
- ✅ Files tab shows all 5 files
- ✅ Model can be loaded with `from_pretrained()`

### 7. Test Download (Important!)

```python
from transformers import AutoTokenizer
import sys
sys.path.insert(0, "path/to/hf_upload")  # Add for trust_remote_code

# Import model class
from modeling_bgem3_projection import BGEM3ProjectionModel, BGEM3ProjectionConfig

# Load from Hub
config = BGEM3ProjectionConfig.from_pretrained(
    "YOUR_USERNAME/bge-m3-vietnamese-rental-projection"
)
model = BGEM3ProjectionModel.from_pretrained(
    "YOUR_USERNAME/bge-m3-vietnamese-rental-projection",
    config=config,
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")

# Test encoding
texts = ["Phòng trọ Quận 10, 25m², giá 5tr"]
embeddings = model.encode(texts)
print(f"Embeddings shape: {embeddings.shape}")  # Should be [1, 128]
```

## Troubleshooting

### Issue: "trust_remote_code" error

**Solution**: Make sure to use `trust_remote_code=True` when loading the model.

### Issue: Weight loading warnings

The warnings about encoder weights not being initialized are **expected**. We only upload projection head weights; the encoder is loaded from BAAI/bge-m3 separately.

### Issue: NumPy version error

**Solution**: Use `pip install "numpy<2.0"` if you encounter TensorFlow compatibility issues.

## Additional Configuration

### Add Model Tags

You can add tags to your model page for better discoverability. In the README.md front matter:

```yaml
---
language:
- vi
tags:
- sentence-transformers
- vietnamese
- rental
- real-estate
- bge-m3
---
```

### Add to a Collection

Consider adding your model to Vietnamese NLP or real estate collections on Hugging Face.

## License

The model is released under MIT License. Make sure this is acceptable for your use case.

## Support

For issues or questions:
- Open an issue on the model repository
- Contact Hugging Face support
- Check Hugging Face documentation: https://huggingface.co/docs

---

**Ready to upload!** 🚀

Follow the steps above and your model will be publicly available for the community to use.