Drop unreproducible design-distance multiple; correct undatabased non-host rate to 99%
Browse files
README.md
CHANGED
|
@@ -213,9 +213,9 @@ the hardest identity task at 0.26, stays above; missense pathogenicity, the stro
|
|
| 213 |
threshold, function tasks (red) below, with DNA and protein on the same axis.*
|
| 214 |
|
| 215 |
Generation makes the separation concrete. Coordinate ascent on the host head designs sequences with
|
| 216 |
-
host_score up to 22, against a natural ceiling of about 6, and these designs sit
|
| 217 |
-
composition space
|
| 218 |
-
|
| 219 |
increasingly unnatural, so composition and neural naturalness are separable axes rather than two views of
|
| 220 |
one thing. `ADVERSARIAL.md` covers the order-(k-1) evasion boundary and why a language model resists it;
|
| 221 |
`TOOL.md` covers footprint, throughput, read length, and operating modes.
|
|
|
|
| 213 |
threshold, function tasks (red) below, with DNA and protein on the same axis.*
|
| 214 |
|
| 215 |
Generation makes the separation concrete. Coordinate ascent on the host head designs sequences with
|
| 216 |
+
host_score up to 22, against a natural ceiling of about 6, and these designs sit far outside the
|
| 217 |
+
region of composition space occupied by any natural class, where the detector saturates but no
|
| 218 |
+
biology lives. The 8B language model rates those same designs as
|
| 219 |
increasingly unnatural, so composition and neural naturalness are separable axes rather than two views of
|
| 220 |
one thing. `ADVERSARIAL.md` covers the order-(k-1) evasion boundary and why a language model resists it;
|
| 221 |
`TOOL.md` covers footprint, throughput, read length, and operating modes.
|
TOOL.md
CHANGED
|
@@ -42,7 +42,7 @@ conservative retention for enrichment).
|
|
| 42 |
|
| 43 |
- **Strong:** human against bacterial and viral sequence, the clinically dominant contrast.
|
| 44 |
- **Reference-free advantage:** on sequence absent from every database, Kraken2 classifies 0% and
|
| 45 |
-
BLAST 6.6%, while this filter calls 100% (
|
| 46 |
sequence has no database match, for example environmental or divergent material.
|
| 47 |
- **Not for:** discriminating closely related mammals (human vs mouse/rat is weak by composition);
|
| 48 |
use it for host-vs-microbe, not for separating vertebrate species.
|
|
|
|
| 42 |
|
| 43 |
- **Strong:** human against bacterial and viral sequence, the clinically dominant contrast.
|
| 44 |
- **Reference-free advantage:** on sequence absent from every database, Kraken2 classifies 0% and
|
| 45 |
+
BLAST 6.6%, while this filter calls 100% (99% as non-host). It is the only option when the
|
| 46 |
sequence has no database match, for example environmental or divergent material.
|
| 47 |
- **Not for:** discriminating closely related mammals (human vs mouse/rat is weak by composition);
|
| 48 |
use it for host-vs-microbe, not for separating vertebrate species.
|