Ortiz, Edgardo M. [1], Schaefer, Hanno [1].

Improved and automated marker recovery from targeted capture data.

Following our probe design pipeline, we developed an efficient assembly pipeline for the data obtained from the captured libraries. Instead of dividing the reads according to their probes of origin, our approach co-assembles the entire set of reads using an optimized de novo assembler and then extracts the coding sequence from the contigs to reconstruct the proteins used for creating the probes. The extraction of coding sequence is tolerant to sequencing errors and/or frameshifts. After extraction is completed, the markers are collected across samples and frameshifts are masked and annotated in order to build translated alignments using MACSE and MAFFT. The amount of recovered data compares favorably to current methods while detecting an increased number of paralogous sequences which are also included in the alignments for further visual examination. We present the results of the pipeline applied to a Cucurbitales dataset on which the Angiosperms353 probe set was used.

1 - Technical University Of Munich, Ecology & Ecosystem Management, Plant Biodiversity Research, Emil-Ramann Strasse 2, Freising, BY, D-85354, Germany

target capture

