About MarkerSeek
MarkerSeek is an open tool and database for discovering DNA-barcoding markers from plastid genomes. It aligns annotated GenBank plastomes, scans nucleotide diversity, scores candidate regions with transparent diagnostics, and designs in-silico primer pairs — turning whole plastomes into ranked, primer-ready, publication-ready markers.
What MarkerSeek does
MarkerSeek compares annotated chloroplast or plastid GenBank records, aligns homologous sequences with MAFFT, estimates nucleotide diversity (π), detects high-polymorphism hotspots, scores candidate markers from diagnostic and primer-design evidence, and writes reproducible tables, figures, and JSON payloads.
Which plastome regions combine high information content, reliable alignment, species-level resolution, and practical primer-design potential — all at once?
Browse the pre-computed catalogue of ready-made markers for thousands of plant genera, or upload your own annotated plastomes and run the full pipeline in the browser.
A command-line version is also available for batch and reproducible workflows.
Why plastid genomes
Plastid genomes are widely used in plant systematics because they are usually compact, collinearly annotated, and recoverable from genome-skimming data. Classical plant DNA barcoding depends on finding loci variable enough to separate closely related species while flanked by conserved sequence suitable for robust PCR amplification. MarkerSeek targets that problem at whole-plastome scale.
- The nucleotide diversity index π is the mean pairwise nucleotide difference per valid aligned site. Peaks in π mark mutational hotspots such as intergenic spacers or rapidly evolving coding intervals.
- A high π value alone is not enough: a good marker should also have reliable alignment, conserved flanks, species-level resolution, low estimated misclassification risk, and a working primer pair.
- MarkerSeek combines π, feature-level diagnostics, and in-silico primer evidence into a single ranked candidate-marker table.
How it works
One reproducible pipeline runs from annotated GenBank inputs to ranked, primer-ready markers.
A sliding-window π scan across the whole plastome pinpoints mutational hotspots and assigns each to a gene or intergenic spacer.
Reference-anchored identity tracks visualise conservation and divergence across every sample at a glance.
Diversity, resolution, barcoding gap, conserved flanks and more are normalised and weighted into one interpretable score.
primer3 designs pairs in conserved flanks; in-silico PCR checks unique amplification on the reference and universality across samples.
The MarkerSeek score
Each candidate marker receives a transparent 0–100 score: a normalized weighted sum of ten diagnostics, each clipped to a defined range and oriented so that higher is always better.
| Diagnostic | Direction | Weight |
|---|---|---|
| Nucleotide diversity (π) | higher better | 0.18 |
| Species resolution | higher better | 0.15 |
| Flanking conservation | higher better | 0.12 |
| Variable-site density | higher better | 0.10 |
| Alignment reliability | higher better | 0.10 |
| Barcoding gap | higher better | 0.10 |
| Nearest-neighbour discrimination | higher better | 0.10 |
| Indel density | higher better | 0.05 |
| Missing / ambiguous ratio | lower better | 0.05 |
| Length suitability | higher better | 0.05 |
Some diagnostics — barcoding gap and nearest-neighbour discrimination among them — can only be computed when a species is represented by two or more samples. For the typical NCBI layout of one plastome per species, MarkerSeek reports those metrics as NA rather than substituting a default, drops them from the weight set, and re-normalises the remaining weights so the score stays a valid 0–100 quantity scaled to the available evidence.
Primer design and in-silico PCR
When enabled, MarkerSeek designs and validates primer pairs only for the curated candidate-marker set — not the whole genome — keeping the step tractable on modest servers.
- Conserved flanking windows are identified from the genome-wide alignment, and primer3 designs candidate pairs against the reference sequence.
- Each pair is validated by full in-silico PCR on the reference to confirm a unique, length-conformant amplicon.
- Universality is checked across the non-reference samples with a fast anchor-strict, body-fuzzy binding search; the 3′ anchor must match exactly while the primer body tolerates limited mismatch.
- Pairs are ranked by a primer score that blends primer3 penalty, cross-species amplification success, and amplicon information content.
The pre-computed catalogue
MarkerSeek ships a browsable database of ready-made markers, so you can find candidate regions without uploading anything, currently spanning 2,112 genera.
- 2,112 genera across 348 families and 128 orders.
- 16,822 indexed plastomes.
- 38,303 candidate markers and 192,055 primer pairs.
How to cite
If MarkerSeek supports your analysis, please cite the MarkerSeek paper and the underlying methods and software listed below. The tool and database are available at www.bioseqhub.cn/markerseek.
Gao C. MarkerSeek: plastome DNA-barcoding marker discovery. Qingdao University. Software and database, github.com/gaochengwen/MarkerSeek.
References
MarkerSeek builds on established DNA-barcoding theory and on widely used alignment, primer-design, and population-genetics methods.
- Hebert PDN, Cywinska A, Ball SL, deWaard JR. Biological identifications through DNA barcodes. Proceedings of the Royal Society B. 2003;270:313–321.
- Meyer CP, Paulay G. DNA barcoding: error rates based on comprehensive sampling. PLoS Biology. 2005;3:e422.
- Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG. Primer3: new capabilities and interfaces. Nucleic Acids Research. 2012;40:e115.
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution. 2013;30:772–780.
- Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences USA. 1979;76:5269–5273.
Contact
Questions, bug reports, and feedback are welcome.