Help - MarkerSeek

Overview

MarkerSeek compares annotated chloroplast or plastid GenBank records, aligns homologous sequences, estimates nucleotide diversity, detects high-Pi hotspots, scores candidate markers, and prepares result tables and figures for downstream validation.

Primary question

Which plastome regions combine high information content, reliable alignment, interpretable haplotypes, and practical primer design potential?

Example MarkerSeek Pi plot — Example diversity plot from the permanent *Salvia* demo.

Input requirements

Use consistently annotated GenBank files from the same broad plastome coordinate system. The selected reference controls coordinates, feature labels, and region assignment in the output.

Upload between 2 and 20 files with .gb, .gbk, or .genbank extensions.
Total upload size is limited to 20 MB.
Choose a reference file when you want reproducible labels and coordinates. If no reference is selected, MarkerSeek uses the first sorted input file.
Species names are inferred from sample names and used for species-level diagnostics, haplotype interpretation, and PCA grouping.
The bundled example uses files in test_data/ and the reference Salvia_chinensis.gb.

Analysis parameters

Default values are suitable for an initial plastome scan. Adjust window sizes and hotspot selection only when your sampling scale or expected marker length requires it.

Parameter	Purpose	Practical guidance
Hotspot window / step	Controls the sliding-window Pi scan used to detect local diversity peaks.	Use smaller windows for short candidate markers; use larger windows for stable genome-wide scans.
Hotspot mode / value	Selects top percentage, top N, or an explicit Pi threshold.	Top-percent is useful for exploratory analyses because it adapts to each dataset.
Similarity window / step	Controls the pairwise similarity figure.	Keep the similarity step smaller than the window to avoid sparse tracks.
Primer design	Runs primer3, in-silico PCR, amplicon alignment, and primer scoring.	Enable when conserved flanks and experimental validation are part of the study.
MAFFT threads	Limits alignment worker threads.	Choose a value that fits the server CPU allocation.

Web workflow

After submission, the server stores inputs, queues the analysis, writes output files, and exposes a permanent result URL until the retention period expires.

1

Upload and configureSelect GenBank records, choose a reference, and set hotspot, similarity, and primer options.

2

Run and monitorThe submitted job page refreshes while queued or running and shows estimated completion time.

3

Review resultsInspect figures, the Candidate markers table, feature detail panels, and downloadable files.

Output files

Downloadable outputs are designed to support both exploratory inspection and reproducible reporting. The TSV files should be treated as the authoritative data tables.

File	Description
pi_windows.tsv	Sliding-window Pi values, valid-site counts, region labels, overlap labels, and hotspot flags.
candidate_marker_features.tsv	Feature table containing coordinates, Pi, variable and indel sites, conserved flanks, species diagnostics, alignment reliability, primer availability, and MarkerSeek score.
haplotype_assignments.tsv	Feature-level haplotype assignment for each sample.
sample_metadata.tsv	Sanitized sample names, inferred species labels, and source paths.
primers.tsv	Primer candidates and primer scores when primer design is enabled.
primer_amplicons.fasta	Successful in-silico PCR products by primer and sample.
primer_amplicons_alignment.fasta	MAFFT alignments of primer amplicon groups.
pi_plot.{png,pdf}	Genome-wide nucleotide diversity figure with hotspot labels.
similarity_plot.{png,pdf}	Pairwise similarity tracks against the reference.
feature_payload/*.json	Interactive web payloads for candidate-marker detail panels.

Candidate markers table

The web table is a compact view of the hotspot labels marked in the Pi plot plus the highest-scoring remaining MarkerSeek candidates. It prioritizes the fields most useful during marker screening.

Displayed field	Interpretation
Label and region	Reference-derived marker name and plastome region.
Length (bp)	Reference-coordinate marker length; displayed as an integer.
Pi	Feature-level nucleotide diversity; displayed with three significant digits.
Alignment reliability	Fraction of columns passing gap, ambiguity, and entropy filters.
MarkerSeek score	Composite 0-100 ranking score within the run.
Primer available	Whether a primer pair passed the configured design and in-silico validation steps.

Example MarkerSeek similarity plot — Pairwise similarity helps interpret whether a candidate region is localized and alignable.

Feature detail page

Open a candidate marker to inspect evidence at the feature level. The detail page is intended to answer whether the marker is diverse, interpretable, and practical for primer-based validation.

Pi Curve

Shows local Pi values across the selected marker. SNP and indel positions are markers on the curve; the legend is placed away from the plotting region.

Alignment Viewer

Displays more bases per line for easier scanning of variable sites while preserving sample names and base-specific colors.

Haplotype Network

Nodes are sized by haplotype frequency. Detailed labels such as frequency and species membership appear on hover to avoid covering the graph.

Species Separation

Projects samples using a pairwise distance matrix so species grouping and outliers can be assessed visually.

Primer design and validation

When enabled, MarkerSeek searches conserved flanks, calls primer3, tests each primer pair by in-silico PCR across samples, and scores successful amplicons.

Primer availability depends on conserved flanks, primer3 constraints, tolerated mismatches, and cross-sample amplification success.
primers.tsv reports primer sequences, melting temperatures, GC content, primer3 penalty, amplicon length summaries, success rate, and primer score.
A high candidate-marker score does not guarantee primer success; primer screening is a separate practical constraint.

Reporting recommendations

For manuscript-style reporting, describe the input dataset, reference choice, sliding-window settings, hotspot rule, primer-design status, and how missing within-species diagnostics were handled.

Report item	Recommended content
Dataset	Number of plastomes, taxon coverage, file source, and selected reference.
Analysis settings	Hotspot window and step, hotspot selection mode, similarity window and step, and primer-design settings.
Candidate evidence	Label, region, length, Pi, variable sites, haplotype count, alignment reliability, primer availability, and MarkerSeek score.
Missing diagnostics	State that within-species metrics were reported as NA when the dataset lacked intraspecific data, such as two or more samples from the same species, or lacked enough valid within-species pairwise distances.

FAQ

Why are some diagnostics NA?

nearest_neighbor_discrimination and barcoding_gap require intraspecific data. They are reported as NA when no species has at least two samples, or when the alignment does not provide enough valid within-species pairwise distances to estimate the statistic. These metrics need at least one species represented by two or more samples, because MarkerSeek must compare distances among samples from the same species. For example, barcoding gap uses the maximum intraspecific distance, and nearest-neighbor discrimination checks whether a sample's closest sequence includes a conspecific sample. If every species has only one sample, those intraspecific comparisons do not exist.

Related within-species diagnostics, including intraspecific divergence and misclassification risk, follow the same rule. In candidate_marker_features.tsv, these cells remain NA; the web table and this FAQ provide the reason without adding extra columns to the data file.

Why are some rows present in the downloadable table but not in Candidate markers?

The downloadable feature table includes the broader feature inventory. The web Candidate markers table focuses on Pi-plot hotspot labels and top-scoring candidates with interactive detail payloads.

How long are jobs retained?

Ordinary jobs are retained for 7 days. The bundled example job is permanent.

Can I use the score as the only selection criterion?

No. Use the score as a ranking aid, then inspect Pi shape, alignment reliability, haplotypes, species separation, and primer results before selecting markers for validation.

MarkerSeek web help