phanerozoic commited on
Commit
caba6d4
·
verified ·
1 Parent(s): ff1f053

Drop unreproducible design-distance multiple; correct undatabased non-host rate to 99%

Browse files
Files changed (2) hide show
  1. README.md +3 -3
  2. TOOL.md +1 -1
README.md CHANGED
@@ -213,9 +213,9 @@ the hardest identity task at 0.26, stays above; missense pathogenicity, the stro
213
  threshold, function tasks (red) below, with DNA and protein on the same axis.*
214
 
215
  Generation makes the separation concrete. Coordinate ascent on the host head designs sequences with
216
- host_score up to 22, against a natural ceiling of about 6, and these designs sit in a region of
217
- composition space roughly 3.6 times the natural spread away from any real sequence, a region that
218
- maximizes the detector but holds no biology. The 8B language model rates those same designs as
219
  increasingly unnatural, so composition and neural naturalness are separable axes rather than two views of
220
  one thing. `ADVERSARIAL.md` covers the order-(k-1) evasion boundary and why a language model resists it;
221
  `TOOL.md` covers footprint, throughput, read length, and operating modes.
 
213
  threshold, function tasks (red) below, with DNA and protein on the same axis.*
214
 
215
  Generation makes the separation concrete. Coordinate ascent on the host head designs sequences with
216
+ host_score up to 22, against a natural ceiling of about 6, and these designs sit far outside the
217
+ region of composition space occupied by any natural class, where the detector saturates but no
218
+ biology lives. The 8B language model rates those same designs as
219
  increasingly unnatural, so composition and neural naturalness are separable axes rather than two views of
220
  one thing. `ADVERSARIAL.md` covers the order-(k-1) evasion boundary and why a language model resists it;
221
  `TOOL.md` covers footprint, throughput, read length, and operating modes.
TOOL.md CHANGED
@@ -42,7 +42,7 @@ conservative retention for enrichment).
42
 
43
  - **Strong:** human against bacterial and viral sequence, the clinically dominant contrast.
44
  - **Reference-free advantage:** on sequence absent from every database, Kraken2 classifies 0% and
45
- BLAST 6.6%, while this filter calls 100% (97% as non-host). It is the only option when the
46
  sequence has no database match, for example environmental or divergent material.
47
  - **Not for:** discriminating closely related mammals (human vs mouse/rat is weak by composition);
48
  use it for host-vs-microbe, not for separating vertebrate species.
 
42
 
43
  - **Strong:** human against bacterial and viral sequence, the clinically dominant contrast.
44
  - **Reference-free advantage:** on sequence absent from every database, Kraken2 classifies 0% and
45
+ BLAST 6.6%, while this filter calls 100% (99% as non-host). It is the only option when the
46
  sequence has no database match, for example environmental or divergent material.
47
  - **Not for:** discriminating closely related mammals (human vs mouse/rat is weak by composition);
48
  use it for host-vs-microbe, not for separating vertebrate species.