fromthesky commited on
Commit
dda3ae4
·
1 Parent(s): e0f1b57

Updated readme.

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -88,8 +88,8 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
88
 
89
  - `reference_rope`: If set to `True`, RoPE implementation implemented in the original paper is used. This is the case for model pretrained in this repo. If set to `False`, RoPE implementation from the Huggingface Transformers library is used.
90
 
91
- - `output_pldrllm_attentions=True` returns the deductive outputs and learnable parameters of power law graph attention module as tuple containing:
92
- the output of the residual metric learner (metric tensor, $\textbf{A}$), output ($\textbf{A}_{LM}$) after application of iSwiGLU on metric tensor, learned exponents of potential tensor, learned weights for energy-curvature tensor, learned bias for energy-curvature tensor, energy-curvature tensor ($\textbf{G}_{LM}$), and attention weights.
93
 
94
  See config.json for other model configuration details.
95
 
@@ -105,6 +105,7 @@ We also have a fork of transformers library with PLDR-LLM model support for futu
105
  ```python
106
  git clone https://github.com/burcgokden/transformers
107
  cd transformers
 
108
  pip install -e ".[dev]"
109
  ```
110
  - Static cache is not supported for models with `custom_G_type=None`.
 
88
 
89
  - `reference_rope`: If set to `True`, RoPE implementation implemented in the original paper is used. This is the case for model pretrained in this repo. If set to `False`, RoPE implementation from the Huggingface Transformers library is used.
90
 
91
+ - `output_pldr_attentions=True` returns the deductive outputs and learnable parameters of power law graph attention module as tuple containing:
92
+ the output of the residual metric learner (metric tensor, **A**), output (**A<sub>LM</sub>**) after application of iSwiGLU on metric tensor, learned exponents of potential tensor, learned weights for energy-curvature tensor, learned bias for energy-curvature tensor, energy-curvature tensor (**G<sub>LM</sub>**), and attention weights.
93
 
94
  See config.json for other model configuration details.
95
 
 
105
  ```python
106
  git clone https://github.com/burcgokden/transformers
107
  cd transformers
108
+ git checkout add_PLDR_LLM
109
  pip install -e ".[dev]"
110
  ```
111
  - Static cache is not supported for models with `custom_G_type=None`.