Tutorial
Workflow
The NaPDoS bioinformatic pipeline is shown in the following diagram. The web interface to this pipeline is divided into five consecutive steps. Click on the link for each step to get detailed instructions. | |
Web Interface Steps |
Preliminary Candidate Screening
Basic Procedures
|
|
Advanced SettingsDefault parameter settings are recommended for routine use. However, in some cases, users may wish to boost sensitivity by choosing less stringent HMM or BLAST criteria, or shorter minimum sequence lengths. Users should be aware that these adjustments may increase false positive predictions. Conversely, selectivity can be improved by using lower e-values and longer minimum match lengths, at the cost of decresed sensitivity. |
|
|
HMM search
For genomic sequences only, preliminary domain candidate information based on Hidden Markov Model (HMM) search is displayed on a separate page. Users may find this information helpful in estimating the total number and positions of PKS/NRPS operons present in a genomic or metagenomic sequence set. However, these intial results should be interpreted with some caution, for the following reasons: | |
|
BLAST search
A BLAST search is performed against curated reference database examples to identify matches to known PKS/NRPS pathways. Some suggested guidelines for interpreting blast scores are presented below. To proceed with further analysis, one or more candidate sequences must be selected using check boxes. Three different output options are available: | |
|
|
In some cases, the number of candidate matches on this page may be fewer than the number reported on the earlier genomic summary page, reflecting differences between HMM and BLAST stringencies used for the analysis. |
Tree Construction
Selected candidate sequences plus their blast matches are trimmed and inserted into a manually curated reference alignment, keeping the original reference alignment intact. This alignment is used to build a tree, which is often more useful than blast results alone in predicting whether pathway products for candidate domains are likely to be similar or different from previously known examples [4]. | |
Tree output options
|
|
|
|
Newick format output
(hctox1_C2_dual:1.21247,(hctox5_C3_dual:1.58329, (hctox1_C3_dual:0.94480,hctox4_C3_dual:1 .08446) 0.842:0.19209)0.855:0.17790, (cyclo1_C12_dual:1.37115,((NC_013790. 1_3_5_1279_1556:0.76822,surfa4_C3_LCL:0. 60447)1.000:1.11061, (syrin1_C2_dual:0.42670,(syrin1_C8_dual: 0.45307,(syrin1_C4_dual:0.04019, syrin1_C3_dual:0.04876) 1.000:0.37152)0.909:0.19372)0.995:0. 59611)0.884:0.23364)0.761:0.11321); |
SVG format output
|
Interpreting Results
BLAST hits for KS or C domains with more than 85%-90% identity at the amino acid level indicate that the query domains may be associated with the production of the same or a similar compound as those produced by the reference pathway. If you detect domains with less than 80% identity to any characterized domain in the NaPDoS database, a BLAST search against the NCBI nr database is recommended. Although we will update the database regularly, the NaPDoS database does not contain all characterized biosynthetic pathways. If this search does not find any known domain with more than 85% identity, the biosynthetic gene cluster has most likely not yet been characterized. In these cases it is possible that the encoded compound is new.
Constructing a phylogenetic tree can classify the domains, which may not necessarily be shown by the best BLAST hits. This classification can be informative in terms of predicting the type of compound produced. The domain classes have been defined based on the clades observed in the reference trees [6] .
References
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al: The Pfam protein families database. Nucleic Acids Res 2010, 38(Database issue):D211-222.
Yadav G, Gokhale RS, Mohanty D: Towards prediction of metabolic products of polyketide synthases: an in silico analysis. PLoS Comput Biol 2009, 5(4):e1000351.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792-1797.
Jenke-Kodama H, Sandmann A, Muller R, Dittmann E. Evolutionary implications of bacterial polyketide synthases. Mol Biol Evol. 2005 Oct;22(10):2027-39.
Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696-704.
Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity. PLoS One. 2012;7(3):e34064 Epub 2012 Mar 29. PubMed PMID: 22479523