BroLaurens commited on
Commit
f867e3f
·
verified ·
1 Parent(s): 9e7248c

Update readme

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #finer-distilbert
2
+
3
+ ## Model description
4
+
5
+ **finer-distilbert** is a fine-tuned distilbert model trained on the task of **Named Entity Recognition**. It is a proof-of-concept model trained to recognize the top 4 entity types in the nlpaueb/finer-139 dataset. Due to limited time the model has not undergone any hyperparameter tuning. The model's output structure matches the **IOB2** annotation scheme of the original training dataset. The label ids are as followed:
6
+ ```
7
+ 0: O
8
+ 1: B-DebtInstrumentBasisSpreadOnVariableRate1
9
+ 2: B-DebtInstrumentFaceAmount
10
+ 3: I-DebtInstrumentFaceAmount
11
+ 4: I-LineOfCreditFacilityMaximumBorrowingCapacity
12
+ 5: B-DebtInstrumentInterestRateStatedPercentage
13
+ 6: I-DebtInstrumentBasisSpreadOnVariableRate1
14
+ 7: I-DebtInstrumentInterestRateStatedPercentage
15
+ 8: B-LineOfCreditFacilityMaximumBorrowingCapacity
16
+ ```
17
+
18
+ ## Running the model
19
+ A basic example on how to run the model and obtain the predicted labels per token per text:
20
+
21
+
22
+ ```python
23
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
24
+
25
+ # Preparing labels for reference
26
+ int2str = {
27
+ 0: 'O',
28
+ 1: 'B-DebtInstrumentBasisSpreadOnVariableRate1',
29
+ 2: 'B-DebtInstrumentFaceAmount',
30
+ 3: 'I-DebtInstrumentFaceAmount',
31
+ 4: 'I-LineOfCreditFacilityMaximumBorrowingCapacity',
32
+ 5: 'B-DebtInstrumentInterestRateStatedPercentage',
33
+ 6: 'I-DebtInstrumentBasisSpreadOnVariableRate1',
34
+ 7: 'I-DebtInstrumentInterestRateStatedPercentage',
35
+ 8: 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
36
+ }
37
+
38
+ str2int = {v:k for k,v in int2str.items()}
39
+
40
+ # Load model dependencies
41
+
42
+ model = AutoModelForTokenClassification.from_pretrained(
43
+ "brolaurens/finer-distilbert", num_labels=len(int2str), id2label=int2str, label2id=str2int
44
+ )
45
+
46
+ tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased", model_max_length=512)
47
+
48
+ # Text
49
+ texts = [
50
+ "Of the amount drawn, $ 3,721,583 was used to pay the principal amount of $ 3,700,000 and accrued interest of $ 21,583 due under the Company 's Loan Agreement with Capital Preservation Solutions, LLC entered into on September 4, 2015."
51
+ ]
52
+
53
+ # Tokenize input
54
+ model_input = tokenizer(texts, return_tensors='pt')
55
+
56
+ # Obtain model output
57
+ predictions = model(**model_input).logits
58
+ predictions = predictions.argmax(axis=2)
59
+ predicted_labels = [[int2str[x] for x in t] for t in predictions.tolist()]
60
+
61
+
62
+ ```