Instructions to use mikewang/PVD-160k-Mistral-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mikewang/PVD-160k-Mistral-7b with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="mikewang/PVD-160k-Mistral-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mikewang/PVD-160k-Mistral-7b") model = AutoModelForCausalLM.from_pretrained("mikewang/PVD-160k-Mistral-7b") - Notebooks
- Google Colab
- Kaggle
Add library name and pipeline tag (#1)
Browse files- Add library name and pipeline tag (97a9cdad3ecac3f51efa7f7e4185176af28c93c5)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- mikewang/PVD-160K
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
<h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
|
|
@@ -19,7 +21,6 @@ datasets:
|
|
| 19 |
|
| 20 |
</p>
|
| 21 |
|
| 22 |
-
|
| 23 |
We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
|
| 24 |
|
| 25 |

|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- mikewang/PVD-160K
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: image-to-text
|
| 7 |
---
|
| 8 |
|
| 9 |
<h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
|
|
|
|
| 21 |
|
| 22 |
</p>
|
| 23 |
|
|
|
|
| 24 |
We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
|
| 25 |
|
| 26 |

|