Instructions to use MinhPhuc0804/me5-512-docling-checkthat-task1-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MinhPhuc0804/me5-512-docling-checkthat-task1-v1.2 with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("MinhPhuc0804/me5-512-docling-checkthat-task1-v1.2")

sentences = [
    "query: Thrilled to spot our study in @BJSM_BMJ on injury incidence & burden in youth football, taking into account the immature skeleton from a big cohort during 4 back-to-back seasons @aspetar @RoaldBahr",
    "passage: title: Longitudinal study of six seasons of match injuries in elite female rugby union\nabstract: ObjectiveTo establish match injury rates and patterns in elite female rugby union players in England.We conducted a six-season (2011/2012-2013/2014 and 2017/2018-2019/2020) prospective cohort study of time-loss match injuries in elite-level female players in the English Premiership competition. A 24-hour time-loss definition was used.Five-hundred and thirty-four time-loss injuries were recorded during 13 680 hours of match exposure. Injury incidence was 39 injuries per 1000 hours (95% CIs 36 to 42) with a mean severity of 48 days (95% CIs 42 to 54) and median severity of 20 days (IQR: 7-57). Concussion was the most common specific injury diagnosis (five concussions per 1000 hours, 95% CIs 4 to 6). The tackle event was associated with the greatest burden of injury (615 days absence per 1000 hours 95% CIs 340 to 1112), with 'being tackled' specifically causing the most injuries (28% of all injuries) and concussions (22% of all concussions).This is the first multiple-season study of match injuries in elite women's rugby union players. Match injury incidence was similar to that previously reported within international women's rugby union. Injury prevention strategies centred on the tackle would focus on high-burden injuries, which are associated with substantial player time-loss and financial costs to teams as well as the high-priority area of concussions.",
    "passage: title: Single, Dual, and Triple Use of Cigarettes, e-Cigarettes, and Snus among Adolescents in the Nordic Countries\nabstract: New tobacco and nicotine products have emerged on the market in recent years. Most research has concerned only one product at a time, usually e-cigarettes, while little is known about the multiple use of tobacco and nicotine products among adolescents. We examined single, dual, and triple use of cigarettes, e-cigarettes, and snus among Nordic adolescents, using data of 15–16-year-olds (n = 16,125) from the European School Survey Project on Alcohol and other Drugs (ESPAD) collected in 2015 and 2019 from Denmark, Finland, Iceland, Norway, Sweden, and the Faroe Islands. Country-specific lifetime use of any of these products ranged between 40% and 50%, and current use between 17% and 31%. Cigarettes were the most common product in all countries except for Iceland, where e-cigarettes were remarkably more common. The proportion of dual and triple users was unexpectedly high among both experimental (24%–49%) and current users (31–42%). Triple use was less common than dual use. The users’ patterns varied somewhat between the countries, and Iceland differed substantially from the other countries, with a high proportion of single e-cigarette users. More knowledge on the patterns of multiple use of tobacco and nicotine products and on the potential risk and protective factors is needed for targeted intervention and prevention efforts.",
    "passage: title: Injury incidence and burden in a youth elite football academy: a four-season prospective study of 551 players aged from under 9 to under 19 years\nabstract: Objective Investigate the incidence and burden of injuries by age group in youth football (soccer) academy players during four consecutive seasons. Methods All injuries that caused time-loss or required medical attention (as per consensus definitions) were prospectively recorded in 551 youth football players from under 9 years to under 19 years. Injury incidence (II) and burden (IB) were calculated as number of injuries per squad season (s-s), as well as for type, location and age groups. Results A total of 2204 injuries were recorded. 40% (n=882) required medical attention and 60% (n=1322) caused time-loss. The total time-loss was 25 034 days. A squad of 25 players sustained an average of 30 time-loss injuries (TLI) per s-s with an IB of 574 days lost per s-s. Compared with the other age groups, U-16 players had the highest TLI incidence per s-s (95% CI lower-upper): II= 59 (52 to 67); IB=992 days; (963 to 1022) and U-18 players had the greatest burden per s-s: II= 42.1 (36.1 to 49.1); IB= 1408 days (1373 to 1444). Across the cohort of players, contusions (II=7.7/s-s), sprains (II=4.9/s-s) and growth-related injuries (II=4.3/s-s) were the most common TLI. Meniscus/cartilage injuries had the greatest injury severity (95% CI lower-upper): II= 0.4 (0.3 to 0.7), IB= 73 days (22 to 181). The burden (95% CI lower-upper) of physeal fractures (II= 0.8; 0.6 to 1.2; IB= 58 days; 33 to 78) was double than non-physeal fractures. Summary At this youth football academy, each squad of 25 players averaged 30 injuries per season which resulted in 574 days lost."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

me5-512-docling-checkthat-task1-v1.2 / README.md

MinhPhuc0804

Automated push: fine-tuned on CT26 (max_tokens=512)

add7329 verified about 2 months ago

preview code

Raw

History Blame Contribute Delete

51 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:17319
	- loss:TripletMNRLCombinedLoss
	base_model: intfloat/multilingual-e5-large-instruct
	widget:
	- source_sentence: 'query: Thrilled to spot our study in @BJSM_BMJ on injury incidence
	& burden in youth football, taking into account the immature skeleton from a big
	cohort during 4 back-to-back seasons @aspetar @RoaldBahr'
	sentences:
	- 'passage: title: Longitudinal study of six seasons of match injuries in elite
	female rugby union

	abstract: ObjectiveTo establish match injury rates and patterns in elite female
	rugby union players in England.We conducted a six-season (2011/2012-2013/2014
	and 2017/2018-2019/2020) prospective cohort study of time-loss match injuries
	in elite-level female players in the English Premiership competition. A 24-hour
	time-loss definition was used.Five-hundred and thirty-four time-loss injuries
	were recorded during 13 680 hours of match exposure. Injury incidence was 39 injuries
	per 1000 hours (95% CIs 36 to 42) with a mean severity of 48 days (95% CIs 42
	to 54) and median severity of 20 days (IQR: 7-57). Concussion was the most common
	specific injury diagnosis (five concussions per 1000 hours, 95% CIs 4 to 6). The
	tackle event was associated with the greatest burden of injury (615 days absence
	per 1000 hours 95% CIs 340 to 1112), with ''being tackled'' specifically causing
	the most injuries (28% of all injuries) and concussions (22% of all concussions).This
	is the first multiple-season study of match injuries in elite women''s rugby union
	players. Match injury incidence was similar to that previously reported within
	international women''s rugby union. Injury prevention strategies centred on the
	tackle would focus on high-burden injuries, which are associated with substantial
	player time-loss and financial costs to teams as well as the high-priority area
	of concussions.'
	- 'passage: title: Single, Dual, and Triple Use of Cigarettes, e-Cigarettes, and
	Snus among Adolescents in the Nordic Countries

	abstract: New tobacco and nicotine products have emerged on the market in recent
	years. Most research has concerned only one product at a time, usually e-cigarettes,
	while little is known about the multiple use of tobacco and nicotine products
	among adolescents. We examined single, dual, and triple use of cigarettes, e-cigarettes,
	and snus among Nordic adolescents, using data of 15–16-year-olds (n = 16,125)
	from the European School Survey Project on Alcohol and other Drugs (ESPAD) collected
	in 2015 and 2019 from Denmark, Finland, Iceland, Norway, Sweden, and the Faroe
	Islands. Country-specific lifetime use of any of these products ranged between
	40% and 50%, and current use between 17% and 31%. Cigarettes were the most common
	product in all countries except for Iceland, where e-cigarettes were remarkably
	more common. The proportion of dual and triple users was unexpectedly high among
	both experimental (24%–49%) and current users (31–42%). Triple use was less common
	than dual use. The users’ patterns varied somewhat between the countries, and
	Iceland differed substantially from the other countries, with a high proportion
	of single e-cigarette users. More knowledge on the patterns of multiple use of
	tobacco and nicotine products and on the potential risk and protective factors
	is needed for targeted intervention and prevention efforts.'
	- 'passage: title: Injury incidence and burden in a youth elite football academy:
	a four-season prospective study of 551 players aged from under 9 to under 19 years

	abstract: Objective Investigate the incidence and burden of injuries by age group
	in youth football (soccer) academy players during four consecutive seasons. Methods
	All injuries that caused time-loss or required medical attention (as per consensus
	definitions) were prospectively recorded in 551 youth football players from under
	9 years to under 19 years. Injury incidence (II) and burden (IB) were calculated
	as number of injuries per squad season (s-s), as well as for type, location and
	age groups. Results A total of 2204 injuries were recorded. 40% (n=882) required
	medical attention and 60% (n=1322) caused time-loss. The total time-loss was 25
	034 days. A squad of 25 players sustained an average of 30 time-loss injuries
	(TLI) per s-s with an IB of 574 days lost per s-s. Compared with the other age
	groups, U-16 players had the highest TLI incidence per s-s (95% CI lower-upper):
	II= 59 (52 to 67); IB=992 days; (963 to 1022) and U-18 players had the greatest
	burden per s-s: II= 42.1 (36.1 to 49.1); IB= 1408 days (1373 to 1444). Across
	the cohort of players, contusions (II=7.7/s-s), sprains (II=4.9/s-s) and growth-related
	injuries (II=4.3/s-s) were the most common TLI. Meniscus/cartilage injuries had
	the greatest injury severity (95% CI lower-upper): II= 0.4 (0.3 to 0.7), IB= 73
	days (22 to 181). The burden (95% CI lower-upper) of physeal fractures (II= 0.8;
	0.6 to 1.2; IB= 58 days; 33 to 78) was double than non-physeal fractures. Summary
	At this youth football academy, each squad of 25 players averaged 30 injuries
	per season which resulted in 574 days lost.'
	- source_sentence: 'query: @DrCanuckMD Pathetic loser. There''s proof they work. It''s
	funny how the folks who refuse to wear them are the same ones claiming they don''t
	work.'
	sentences:
	- 'passage: title: Impact of community masking on COVID-19: A cluster-randomized
	trial in Bangladesh

	abstract: Persuading people to mask Even in places where it is obligatory, people
	tend to optimistically overstate their compliance for mask wearing. How then can
	we persuade more of the population at large to act for the greater good? Abaluck
	et al . undertook a large, cluster-randomized trial in Bangladesh involving hundreds
	of thousands of people (although mostly men) over a 2-month period. Colored masks
	of various construction were handed out free of charge, accompanied by a range
	of mask-wearing promotional activities inspired by marketing research. Using a
	grassroots network of volunteers to help conduct the study and gather data, the
	authors discovered that mask wearing averaged 13.3% in villages where no interventions
	took place but increased to 42.3% in villages where in-person interventions were
	introduced. Villages where in-person reinforcement of mask wearing occurred also
	showed a reduction in reporting COVID-like illness, particularly in high-risk
	individuals. —CA'
	- 'passage: . Analysis of survey data found that on the third day before policy
	introduction, 44% of participants reported “often” or “always” wearing a mask;
	on the fourth day after, 100% reported “always” doing so.


	title: The introduction of a mandatory mask policy was associated with significantly
	reduced COVID-19 cases in a major metropolitan city

	No potentially confounding factors were associated with the observed change in
	growth rates. Conclusions The mandatory mask use policy substantially increased
	public use of masks and was associated with a significant decline in new COVID-19
	cases after introduction of the policy. This study strongly supports the use of
	masks for controlling epidemics in the broader community.'
	- 'passage: title: Food and soft drink industry has too much influence over US dietary
	guidelines, report says

	abstract: A powerful, industry funded group is playing an “outsized role” in steering
	the development of new US dietary guidelines and must have its influence curbed
	to protect public health, a pressure group has urged.


	In a report published this week to coincide with Coca-Cola’s annual meeting of
	shareholders,1 the campaign group Corporate Accountability noted that over half
	of people appointed to the US 2020 Dietary Guidelines Advisory Committee had ties
	to the International Life Sciences Institute (ILSI), whose funders include Coke
	and other global corporations.


	ILSI was set up by a Coca-Cola executive 40 years ago in the US and operates throughout
	the world. It is a not-for-profit organisation and …'
	- source_sentence: 'query: The output of many crops in the US is curbed by a shortage
	of pollinators, and most of the pollination that''s occurring is thanks to wild
	pollinators. Compelling evidence that we need to help wild pollinators!'
	sentences:
	- 'passage: title: Elapsed time since BNT162b2 vaccine and risk of SARS-CoV-2 infection
	in a large cohort

	abstract: Israel was among the first countries to launch a large-scale COVID-19
	vaccination campaign, and quickly vaccinated its population, achieving early control
	over the spread of the virus. However, the number of COVID-19 cases is now rapidly
	increasing, which may indicate that vaccine protection decreases over time. To
	determine whether time elapsed since the second BNT162b2 messenger RNA (mRNA)
	vaccine (Pfizer-BioNTech) injection is significantly associated with the risk
	of post-vaccination COVID-19 infection. This is a retrospective cohort study performed
	in a large state-mandated health care organization in Israel. All fully vaccinated
	adults who have received a RT-PCR test between May 15, 2021 and July 26, 2021,
	at least two weeks after their second vaccine injection were included. Patients
	with a history of past COVID-19 infection were excluded. Positive result for the
	RT-PCR test. The cohort included 33,993 fully vaccinated adults, 49% women, with
	a mean age of 47 years (SD, 17 years), who received an RT-PCR test for SARS-CoV-2
	during the study period. The median time between the second dose of the vaccine
	and the RT-PCR test was 146 days, interquartile range [121-167] days. 608 (1.8%)
	patients had positive test results. There was a significantly higher rate of positive
	results among patients who received their second vaccine dose at least 146 days
	before the RT-PCR test compared to patients who have received their vaccine less
	than 146 days before: odds ratio for infection was 3.00 for patients aged over
	60 (95% CI 1.86-5.11); 2.29 for patients aged between 40 and 59 (95% CI 1.67-3.17);
	and 1.74 for patients aged between 18 and 39 (95% CI 1.27-2.37); P<0.001 in each
	age group.'
	- 'passage: title: Crop production in the USA is frequently limited by a lack of
	pollinators

	abstract: Most of the world''s crops depend on pollinators, so declines in both
	managed and wild bees raise concerns about food security. However, the degree
	to which insect pollination is actually limiting current crop production is poorly
	understood, as is the role of wild species (as opposed to managed honeybees) in
	pollinating crops, particularly in intensive production areas. We established
	a nationwide study to assess the extent of pollinator limitation in seven crops
	at 131 locations situated across major crop-producing areas of the USA. We found
	that five out of seven crops showed evidence of pollinator limitation. Wild bees
	and honeybees provided comparable amounts of pollination for most crops, even
	in agriculturally intensive regions. We estimated the nationwide annual production
	value of wild pollinators to the seven crops we studied at over $1.5 billion;
	the value of wild bee pollination of all pollinator-dependent crops would be much
	greater. Our findings show that pollinator declines could translate directly into
	decreased yields or production for most of the crops studied, and that wild species
	contribute substantially to pollination of most study crops in major crop-producing
	regions.'
	- 'passage: title: Historical decrease in agricultural landscape diversity is associated
	with shifts in bumble bee species occurrence

	abstract: Abstract Agricultural intensification is a key suspect among putative
	drivers of recent insect declines, but an explicit link between historical change
	in agricultural land cover and insect occurrence is lacking. Determining whether
	agriculture impacts beneficial insects (e.g. pollinators), is crucial to enhancing
	agricultural sustainability. Here, we combine large spatiotemporal sets of historical
	bumble bee and agricultural records to show that increasing cropland extent and
	decreasing crop richness were associated with declines in over 50% of bumble bee
	species in the agriculturally intensive Midwest, USA. Critically, we found that
	high crop diversity was associated with a higher occurrence of many species pre‐1950
	even in agriculturally dominated areas, but that current agricultural landscapes
	are devoid of high crop diversity. Our findings suggest that insect conservation
	and agricultural production may be compatible, with increasing on‐farm and landscape‐level
	crop diversity predicted to have positive effects on bumble bees.'
	- source_sentence: 'query: @user Masern sind hier passenderer Vergleich. Die Beeinträchtigung
	des Immunsystems sollte, gegenüber Aids, umkehrbar sein'
	sentences:
	- 'passage: title: Long-term measles-induced immunomodulation increases overall
	childhood infectious disease mortality

	abstract: Extra dividends from measles vaccine Vaccination against measles has
	many benefits, not only lifelong protection against this potentially serious virus.
	Mina et al. analyzed data collected since mass vaccination began in high-income
	countries when measles was common. Measles vaccination is associated with less
	mortality from other childhood infections. Measles is known to cause transient
	immunosuppression, but close inspection of the mortality data suggests that it
	disables immune memory for 2 to 3 years. Vaccination thus does more than safeguard
	children against measles; it also stops other infections taking advantage of measles-induced
	immune damage. Science , this issue p. 694'
	- 'passage: title: Association of BCG, DTP, and measles containing vaccines with
	childhood mortality: systematic review

	abstract: <b>Objectives</b> To evaluate the effects on non-specific and all
	cause mortality, in children under 5, of Bacillus Calmette-Guérin (BCG), diphtheria-tetanus-pertussis
	(DTP), and standard titre measles containing vaccines (MCV); to examine internal
	validity of the studies; and to examine any modifying effects of sex, age, vaccine
	sequence, and co-administration of vitamin A. <b>Design</b> Systematic review,
	including assessment of risk of bias, and meta-analyses of similar studies. <b>Study
	eligibility criteria</b> Clinical trials, cohort studies, and case-control
	studies of the effects on mortality of BCG, whole cell DTP, and standard titre
	MCV in children under 5. <b>Data sources</b> Searches of Medline, Embase,
	Global Index Medicus, and the WHO International Clinical Trials Registry Platform,
	supplemented by contact with experts in the field. To avoid overlap in children
	studied across the included articles, findings from non-overlapping birth cohorts
	were identified. <b>Results</b> Results from 34 birth cohorts were identified.
	Most evidence was from observational studies, with some from short term clinical
	trials. Most studies reported on all cause (rather than non-specific) mortality.
	Receipt of BCG vaccine was associated with a reduction in all cause mortality:
	the average relative risks were 0.70 (95% confidence interval 0.49 to 1.01) from
	five clinical trials and 0.47 (0.32 to 0.69) from nine observational studies at
	high risk of bias. Receipt of DTP (almost always with oral polio vaccine) was
	associated with a possible increase in all cause mortality on average (relative
	risk 1.38, 0.92 to 2.08) from 10 studies at high risk of bias; this effect seemed
	stronger in girls than in boys.'
	- 'passage: title: L’effet des dictées métacognitives-interactives sur la compétence
	à orthographier les homophones grammaticaux en rédaction

	abstract: Les homophones grammaticaux sont souvent présentés en paires dans les
	exercices, au risque de conforter leur confusion. Dans un projet mené au Québec
	dans des classes du primaire et du secondaire (482 élèves), la phrase dictée du
	jour et la dictée zéro faute ont été expérimentées pendant sept mois. Les effets
	de ces dictées métalinguistiques-interactives sur la compétence des élèves à orthographier
	les homophones grammaticaux font l''objet de cet article. Les résultats montrent
	des effets positifs pour l''ensemble du groupe ; une analyse plus fine révèle
	à qui elles profitent le plus.'
	- source_sentence: 'query: It’s been obvious for ages that mRNA vaccines constituted
	a 3+ dose series. A 3‑dose series is very effective. The fourth dose is still
	better and ought to be made available. Why does Canada still label partially (2
	dose) vaccinated as “fully vaccinated”?'
	sentences:
	- 'passage: title: Protection against omicron severe disease 0-7 months after BNT162b2
	booster

	abstract: Abstract Following a rise in cases due to the delta variant and evidence
	of waning immunity after 2 doses of the BNT162b2 vaccine, Israel began administering
	a third BNT162b2 dose (booster) in July 2021. Recent studies showed that the 3rd
	dose provides a much lower protection against infection with the omicron variant
	compared to the delta variant and that this protection wanes quickly. In this
	study, we used data from Israel to estimate the protection of the 3rd dose against
	severe disease up to 7 months from receiving the booster dose. The analysis shows
	that protection conferred by the 3rd dose against omicron did not wane over a
	7-month period and that a 4th dose further increased protection, with a severe
	disease rate approximately 3-fold lower than in the 3-dose cohorts.'
	- 'passage: title: Neurovascular injury with complement activation and inflammation
	in COVID-19

	abstract: The underlying mechanisms by which severe acute respiratory syndrome
	coronavirus 2 (SARS-CoV-2) leads to acute and long-term neurological manifestations
	remains obscure. We aimed to characterize the neuropathological changes in patients
	with coronavirus disease 2019 and determine the underlying pathophysiological
	mechanisms. In this autopsy study of the brain, we characterized the vascular
	pathology, the neuroinflammatory changes and cellular and humoral immune responses
	by immunohistochemistry. All patients died during the first wave of the pandemic
	from March to July 2020. All patients were adults who died after a short duration
	of the infection, some had died suddenly with minimal respiratory involvement.
	Infection with SARS-CoV-2 was confirmed on ante-mortem or post-mortem testing.
	Descriptive analysis of the pathological changes and quantitative analyses of
	the infiltrates and vascular changes were performed. All patients had multifocal
	vascular damage as determined by leakage of serum proteins into the brain parenchyma.
	This was accompanied by widespread endothelial cell activation. Platelet aggregates
	and microthrombi were found adherent to the endothelial cells along vascular lumina.
	Immune complexes with activation of the classical complement pathway were found
	on the endothelial cells and platelets. Perivascular infiltrates consisted of
	predominantly macrophages and some CD8+ T cells. Only rare CD4+ T cells and CD20+
	B cells were present. Astrogliosis was also prominent in the perivascular regions.
	Microglial nodules were predominant in the hindbrain, which were associated with
	focal neuronal loss and neuronophagia. Antibody-mediated cytotoxicity directed
	against the endothelial cells is the most likely initiating event that leads to
	vascular leakage, platelet aggregation, neuroinflammation and neuronal injury.
	Therapeutic modalities directed against immune complexes should be considered.'
	- 'passage: title: A fourth dose of the mRNA-1273 SARS-CoV-2 vaccine improves serum
	neutralization against the delta variant in kidney transplant recipients

	abstract: Abstract In immunocompetent subjects, the effectiveness of SARS-CoV-2
	vaccines against the delta variant appears three- to five-fold lower than that
	observed against the alpha variant. Additionally, three doses of SARS-CoV-2 mRNA-based
	vaccines might be unable to elicit a sufficient immune response against any variant
	in immunocompromised kidney transplant recipients. This study describes the kinetics
	of the neutralizing antibody (NAbs) response against the delta strain before and
	after a fourth dose of a mRNA vaccine in 67 kidney transplant recipients who had
	experienced a weak antibody response after three doses. While only 16% of patients
	harbored NAbs against the delta strain prior to the fourth injection – this percentage
	raised to 66% afterwards. We also found that, after the fourth dose, the NAbs
	titer increased significantly (p=0.0001) from <7.5 (IQR : <7.5−15.1) to
	47.1 (IQR <7.5−284.2). Collectively, our data indicate that a fourth dose of
	the mRNA-1273 vaccine in kidney transplant recipients with a weak antibody response
	after three previous doses improves serum neutralization against the delta variant.'
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: SentenceTransformer based on intfloat/multilingual-e5-large-instruct
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: 10 percent dev split
	type: 10-percent-dev-split
	metrics:
	- type: cosine_accuracy@1
	value: 0.4748051948051948
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.6581818181818182
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.7148051948051948
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.7781818181818182
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.4748051948051948
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2193939393939394
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.14296103896103896
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.07781818181818181
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.4748051948051948
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.6581818181818182
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.7148051948051948
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.7781818181818182
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.6259059611181643
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.5771395588538446
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.5824203727726155
	name: Cosine Map@100
	---

	# SentenceTransformer based on intfloat/multilingual-e5-large-instruct

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for retrieval.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) <!-- at revision 274baa43b0e13e37fafa6428dbc7938e62e5c439 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 1024 dimensions
	- Similarity Function: Cosine Similarity
	- Supported Modality: Text
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'XLMRobertaModel'})
	(1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'mean', 'include_prompt': True})
	(2): Normalize({})
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```
	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("MinhPhuc0804/me5-512-docling-checkthat-task1-v1.2")
	# Run inference
	sentences = [
	'query: It’s been obvious for ages that mRNA vaccines constituted a 3+ dose series. A 3‑dose series is very effective. The fourth dose is still better and ought to be made available. Why does Canada still label partially (2 dose) vaccinated as “fully vaccinated”?',
	'passage: title: Protection against omicron severe disease 0-7 months after BNT162b2 booster\nabstract: Abstract Following a rise in cases due to the delta variant and evidence of waning immunity after 2 doses of the BNT162b2 vaccine, Israel began administering a third BNT162b2 dose (booster) in July 2021. Recent studies showed that the 3rd dose provides a much lower protection against infection with the omicron variant compared to the delta variant and that this protection wanes quickly. In this study, we used data from Israel to estimate the protection of the 3rd dose against severe disease up to 7 months from receiving the booster dose. The analysis shows that protection conferred by the 3rd dose against omicron did not wane over a 7-month period and that a 4th dose further increased protection, with a severe disease rate approximately 3-fold lower than in the 3-dose cohorts.',
	'passage: title: A fourth dose of the mRNA-1273 SARS-CoV-2 vaccine improves serum neutralization against the delta variant in kidney transplant recipients\nabstract: Abstract In immunocompetent subjects, the effectiveness of SARS-CoV-2 vaccines against the delta variant appears three- to five-fold lower than that observed against the alpha variant. Additionally, three doses of SARS-CoV-2 mRNA-based vaccines might be unable to elicit a sufficient immune response against any variant in immunocompromised kidney transplant recipients. This study describes the kinetics of the neutralizing antibody (NAbs) response against the delta strain before and after a fourth dose of a mRNA vaccine in 67 kidney transplant recipients who had experienced a weak antibody response after three doses. While only 16% of patients harbored NAbs against the delta strain prior to the fourth injection – this percentage raised to 66% afterwards. We also found that, after the fourth dose, the NAbs titer increased significantly (p=0.0001) from <7.5 (IQR : <7.5−15.1) to 47.1 (IQR <7.5−284.2). Collectively, our data indicate that a fourth dose of the mRNA-1273 vaccine in kidney transplant recipients with a weak antibody response after three previous doses improves serum neutralization against the delta variant.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 1024]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities)
	# tensor([[1.0000, 0.8132, 0.2067],
	# [0.8132, 1.0000, 0.1794],
	# [0.2067, 0.1794, 1.0000]])
	```
	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Dataset: `10-percent-dev-split`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.4748 \|
	\| cosine_accuracy@3 \| 0.6582 \|
	\| cosine_accuracy@5 \| 0.7148 \|
	\| cosine_accuracy@10 \| 0.7782 \|
	\| cosine_precision@1 \| 0.4748 \|
	\| cosine_precision@3 \| 0.2194 \|
	\| cosine_precision@5 \| 0.143 \|
	\| cosine_precision@10 \| 0.0778 \|
	\| cosine_recall@1 \| 0.4748 \|
	\| cosine_recall@3 \| 0.6582 \|
	\| cosine_recall@5 \| 0.7148 \|
	\| cosine_recall@10 \| 0.7782 \|
	\| cosine_ndcg@10 \| 0.6259 \|
	\| cosine_mrr@10 \| 0.5771 \|
	\| cosine_map@100 \| 0.5824 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 17,319 training samples
	* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| sentence_0 \| sentence_1 \| sentence_2 \|
	\|:--------\|:------------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------\|
	\| type \| string \| string \| string \|
	\| details \| <ul><li>min: 25 tokens</li><li>mean: 58.26 tokens</li><li>max: 105 tokens</li></ul> \| <ul><li>min: 28 tokens</li><li>mean: 311.97 tokens</li><li>max: 512 tokens</li></ul> \| <ul><li>min: 20 tokens</li><li>mean: 320.42 tokens</li><li>max: 512 tokens</li></ul> \|
	* Samples:
	\| sentence_0 \| sentence_1 \| sentence_2 \|
	\|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>query: I was fact-checked when I covered this topic for @user last year. Since the story back then was, apparently, that damp strips of fabric dangling over people's faces for hours on end couldn't possibly spawn anything nasty - because Science™!!</code> \| <code>passage: title: Bacterial and fungal isolation from face masks under the COVID-19 pandemic
	abstract: Abstract The COVID-19 pandemic has led people to wear face masks daily in public. Although the effectiveness of face masks against viral transmission has been extensively studied, there have been few reports on potential hygiene issues due to bacteria and fungi attached to the face masks. We aimed to (1) quantify and identify the bacteria and fungi attaching to the masks, and (2) investigate whether the mask-attached microbes could be associated with the types and usage of the masks and individual lifestyles. We surveyed 109 volunteers on their mask usage and lifestyles, and cultured bacteria and fungi from either the face-side or outer-side of their masks. The bacterial colony numbers were greater on the face-side than the outer-side; the fungal colony numbers were fewer on the face-side than the outer-side. A longer mask usage significantly increased the fungal colony numbers but not ...</code> \| <code>passage: is very low.

	title: Do facemasks protect against <scp>COVID</scp>‐19?
	Symptomatic health-care workers should not return to work until they have been tested and found to be negative for COVID-19. The public might wear masks to avoid infection or to protect others. During the 2009 pandemic of H1N1 influenza (swine flu), encouraging the public to wash their hands reduced the incidence of infection significantly whereas wearing facemasks did not.5 There is no good evidence that facemasks protect the public against infection with respiratory viruses, including COVID-19.6 However, absence of proof of an effect is not the same as proof of absence of an effect. During the pandemics caused by swine flu and by the coronaviruses which caused SARS and MERS, many people in Asia and elsewhere walked around wearing surgical or homemade cotton masks to protect themselves. One danger of doing this is the illusion of protection. Surgical facemasks are designed to be discarded after single use....</code> \|
	\| <code>query: @user If just the US government had some National Institution of Health entity which could’ve been showcasing, studying and verifying this type of advantage from data years earlier .. to apply immediately without political spin</code> \| <code>passage: title: Chloroquine is a potent inhibitor of SARS coronavirus infection and spread
	abstract: Abstract Background Severe acute respiratory syndrome (SARS) is caused by a newly discovered coronavirus (SARS-CoV). No effective prophylactic or post-exposure therapy is currently available. Results We report, however, that chloroquine has strong antiviral effects on SARS-CoV infection of primate cells. These inhibitory effects are observed when the cells are treated with the drug either before or after exposure to the virus, suggesting both prophylactic and therapeutic advantage. In addition to the well-known functions of chloroquine such as elevations of endosomal pH, the drug appears to interfere with terminal glycosylation of the cellular receptor, angiotensin-converting enzyme 2. This may negatively influence the virus-receptor binding and abrogate the infection, with further ramifications by the elevation of vesicular pH, resulting in the inhibition of infection and spread of SAR...</code> \| <code>passage: title: A National Medical Response to Crisis — The Legacy of World War II<br>abstract: A National Medical Response to Crisis World War II’s massive casualties were mitigated by lives saved as a result of medical care. Many of the advances made would persist long after the war conclud...</code> \|
	\| <code>query: UNDENIABLE EVIDENCE OF MY SPIKE PROTEIN TRIGGERED WIDESPREAD AMYLOIDOSES THEORY. IT. IS. OCCURRING.</code> \| <code>passage: title: Amyloidogenesis of SARS-CoV-2 Spike Protein
	abstract: ABSTRACT SARS-CoV-2 infection is associated with a surprising number of morbidities. Uncanny similarities with amyloid-disease associated blood coagulation and fibrinolytic disturbances together with neurologic and cardiac problems led us to investigate the amyloidogenicity of the SARS-CoV-2 Spike protein (S-protein). Amyloid fibril assays of peptide library mixtures and theoretical predictions identified seven amyloidogenic sequences within the S-protein. All seven peptides in isolation formed aggregates during incubation at 37°C. Three 20-amino acid long synthetic Spike peptides (sequence 191-210, 599-618, 1165-1184) fulfilled three amyloid fibril criteria: nucleation dependent polymerization kinetics by ThT, Congo red positivity and ultrastructural fibrillar morphology. Full-length folded S-protein did not form amyloid fibrils, but amyloid-like fibrils with evident branching were formed during 24 hours of S-protei...</code> \| <code>passage: title: Amyloidogenesis of SARS-CoV-2 Spike Protein
	abstract: SARS-CoV-2 infection is associated with a surprising number of morbidities. Uncanny similarities with amyloid-disease associated blood coagulation and fibrinolytic disturbances together with neurologic and cardiac problems led us to investigate the amyloidogenicity of the SARS-CoV-2 spike protein (S-protein). Amyloid fibril assays of peptide library mixtures and theoretical predictions identified seven amyloidogenic sequences within the S-protein. All seven peptides in isolation formed aggregates during incubation at 37 °C. Three 20-amino acid long synthetic spike peptides (sequence 192–211, 601–620, 1166–1185) fulfilled three amyloid fibril criteria: nucleation dependent polymerization kinetics by ThT, Congo red positivity, and ultrastructural fibrillar morphology. Full-length folded S-protein did not form amyloid fibrils, but amyloid-like fibrils with evident branching were formed during 24 h of S-protein coincubat...</code> \|
	* Loss: <code>__main__.TripletMNRLCombinedLoss</code>

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `per_device_train_batch_size`: 48
	- `per_device_eval_batch_size`: 48
	- `num_train_epochs`: 20
	- `fp16`: True
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 48
	- `per_device_eval_batch_size`: 48
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 20
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: True
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `parallelism_config`: None
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `hub_revision`: None
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `liger_kernel_config`: None
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin
	- `router_mapping`: {}
	- `learning_rate_mapping`: {}

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| 10-percent-dev-split_cosine_ndcg@10 \|
	\|:-------:\|:----:\|:-------------:\|:-----------------------------------:\|
	\| 1.0 \| 361 \| - \| 0.6980 \|
	\| 1.3850 \| 500 \| 1.6273 \| - \|
	\| 2.0 \| 722 \| - \| 0.7033 \|
	\| 2.7701 \| 1000 \| 0.9528 \| - \|
	\| 3.0 \| 1083 \| - \| 0.7110 \|
	\| 4.0 \| 1444 \| - \| 0.6994 \|
	\| 4.1551 \| 1500 \| 0.6268 \| - \|
	\| 5.0 \| 1805 \| - \| 0.6933 \|
	\| 5.5402 \| 2000 \| 0.4279 \| - \|
	\| 6.0 \| 2166 \| - \| 0.6883 \|
	\| 6.9252 \| 2500 \| 0.3117 \| - \|
	\| 7.0 \| 2527 \| - \| 0.6620 \|
	\| 8.0 \| 2888 \| - \| 0.6707 \|
	\| 8.3102 \| 3000 \| 0.2262 \| - \|
	\| 9.0 \| 3249 \| - \| 0.6671 \|
	\| 9.6953 \| 3500 \| 0.1799 \| - \|
	\| 10.0 \| 3610 \| - \| 0.6579 \|
	\| 11.0 \| 3971 \| - \| 0.6470 \|
	\| 11.0803 \| 4000 \| 0.139 \| - \|
	\| 12.0 \| 4332 \| - \| 0.6469 \|
	\| 12.4654 \| 4500 \| 0.1094 \| - \|
	\| 13.0 \| 4693 \| - \| 0.6415 \|
	\| 13.8504 \| 5000 \| 0.0911 \| - \|
	\| 14.0 \| 5054 \| - \| 0.6439 \|
	\| 15.0 \| 5415 \| - \| 0.6284 \|
	\| 15.2355 \| 5500 \| 0.0755 \| - \|
	\| 16.0 \| 5776 \| - \| 0.6272 \|
	\| 16.6205 \| 6000 \| 0.0664 \| - \|
	\| 17.0 \| 6137 \| - \| 0.6290 \|
	\| 18.0 \| 6498 \| - \| 0.6253 \|
	\| 18.0055 \| 6500 \| 0.0573 \| - \|
	\| 19.0 \| 6859 \| - \| 0.6275 \|
	\| 19.3906 \| 7000 \| 0.052 \| - \|
	\| 20.0 \| 7220 \| - \| 0.6259 \|


	### Training Time
	- Training: 2.2 hours

	### Framework Versions
	- Python: 3.12.6
	- Sentence Transformers: 5.4.1
	- Transformers: 4.56.0
	- PyTorch: 2.8.0+cu129
	- Accelerate: 1.10.1
	- Datasets: 4.8.5
	- Tokenizers: 0.22.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->