ireneisdoomed commited on
Commit
6abe9cc
·
verified ·
1 Parent(s): 96ba28e

chore: update model base model for 26.06.0-dev2 run

Browse files
Files changed (7) hide show
  1. .gitattributes +1 -0
  2. README.md +70 -0
  3. classifier.skops +3 -0
  4. config.json +42 -0
  5. requirements.txt +2 -0
  6. test.parquet +3 -0
  7. train.parquet +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ classifier.skops filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: sklearn
4
+ license: mit
5
+ tags:
6
+ - sklearn
7
+ - tabular-classification
8
+ - genomics
9
+ - gwas
10
+ - gene-prioritization
11
+ ---
12
+ # Locus-to-Gene (L2G) Model
13
+
14
+ The locus-to-gene (L2G) model prioritises likely causal genes at each GWAS locus based on genetic and functional genomics features.
15
+
16
+ ## Model Description
17
+
18
+ This is a **Gradient Boosting Classifier** (XGBoost) trained to predict causal genes at GWAS loci.
19
+
20
+ Limited to protein-coding genes with available feature data.
21
+
22
+ **Key Features:**
23
+ - **Distance**: proximity from credible set variants to gene
24
+ - **Molecular QTL Colocalization**: evidence from expression/protein QTL studies
25
+ - **Variant Pathogenicity**: VEP (Variant Effect Predictor) scores
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ from gentropy.method.l2g.model import LocusToGeneModel
31
+ from gentropy.common.session import Session
32
+
33
+ # Load model from Hugging Face Hub
34
+ session = Session()
35
+ model = LocusToGeneModel.load_from_hub(
36
+ session=session,
37
+ hf_model_id="opentargets/locus_to_gene"
38
+ )
39
+
40
+ # Make predictions on your L2G feature matrix
41
+ predictions = model.predict(your_feature_matrix, session)
42
+ ```
43
+
44
+ ## Training
45
+
46
+ - **Architecture**: XGBoost Gradient Boosting Classifier
47
+ - **Training Data**: Curated positive/negative gene-locus pairs from Open Targets
48
+ - **Evaluation Metric**: Area under precision-recall curve (AUCPR)
49
+
50
+ ## Citation
51
+
52
+ If you use this model, please cite:
53
+
54
+ ```bibtex
55
+ @article{ghoussaini2021open,
56
+ title={Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics},
57
+ author={Ghoussaini, Maya and Mountjoy, Edward and Carmona, Maria and others},
58
+ journal={Nature Genetics},
59
+ volume={53},
60
+ pages={1527--1533},
61
+ year={2021},
62
+ doi={10.1038/s41588-021-00945-5}
63
+ }
64
+ ```
65
+
66
+ ## More Information
67
+
68
+ - **Repository**: [opentargets/gentropy](https://github.com/opentargets/gentropy)
69
+ - **Documentation**: [L2G Method Docs](https://opentargets.github.io/gentropy/python_api/methods/l2g/_l2g/)
70
+ - **Developer**: Open Targets
classifier.skops ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:754b9258155d42e744486e0ae172f59fd4e188ab6e78a4189654728998dbfa5e
3
+ size 229078
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sklearn": {
3
+ "columns": [
4
+ "studyLocusId",
5
+ "geneId",
6
+ "goldStandardSet",
7
+ "eQtlColocClppMaximum",
8
+ "pQtlColocClppMaximum",
9
+ "sQtlColocClppMaximum",
10
+ "eQtlColocH4Maximum",
11
+ "pQtlColocH4Maximum",
12
+ "sQtlColocH4Maximum",
13
+ "eQtlColocClppMaximumNeighbourhood",
14
+ "pQtlColocClppMaximumNeighbourhood",
15
+ "sQtlColocClppMaximumNeighbourhood",
16
+ "eQtlColocH4MaximumNeighbourhood",
17
+ "pQtlColocH4MaximumNeighbourhood",
18
+ "sQtlColocH4MaximumNeighbourhood",
19
+ "distanceSentinelFootprint",
20
+ "distanceSentinelFootprintNeighbourhood",
21
+ "distanceFootprintMean",
22
+ "distanceFootprintMeanNeighbourhood",
23
+ "distanceTssMean",
24
+ "distanceTssMeanNeighbourhood",
25
+ "distanceSentinelTss",
26
+ "distanceSentinelTssNeighbourhood",
27
+ "vepMaximum",
28
+ "vepMaximumNeighbourhood",
29
+ "vepMean",
30
+ "vepMeanNeighbourhood",
31
+ "e2gMean",
32
+ "e2gMeanNeighbourhood",
33
+ "geneCount500kb",
34
+ "proteinGeneCount500kb",
35
+ "credibleSetConfidence",
36
+ "transPQtlColocH4Maximum",
37
+ "transPQtlColocH4MaximumNeighbourhood"
38
+ ],
39
+ "sklearn_version": "1.7.2"
40
+ },
41
+ "task": "tabular-classification"
42
+ }
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ scikit-learn==1.7.2
2
+ skops
test.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8f6bad02336095ebac5c87dd59f3e5daae6ff637fa3b502d6b404c0ab5e09d9
3
+ size 801932
train.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0dbeb6d1139a07de77ea8522656d780ee1f337c2eb2651c7cb8001e42b60664
3
+ size 4714932