




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
1、Transcript analysis and reconstructionBrazil 2001GenesWhy are there only a few tens of thousands of genes in the human genome?How do genes express themselves to manufacture the proteome?How can available sequence information be processed in order to deliver understanding of gene expression?Genomic e
2、xpressionWithin eukaryotes, genes have shared basic characteristics. They have single or multiple exons and introns distributed along the gene in coding and non-coding regions with5 Flanking region with transcription regulation signalsTranscription initiation start site (5)Initiation codon for prote
3、in coding sequenceExon-intron boundaries with splice site signals at the boundariesTermination codon for protein coding sequence3 signals for regulation and polyadenylationGCboxCAATTATAGCboxTranscription Initiation SiteExon 1Initiation CodonGTTranscription Initiation SiteAGGTIntron 1Intron 2Exon 2GC
4、boxGCboxCAATTATATranscription InitiationExon 1Initiation CodonGT AGGT AGIntron 1Intron 2Exon 2Exon 3Stop CodonAATAAPoly (A) addition site5 Flanking regionPre-mRNAMature mRNAGene ExpressionTranscription products can vary.Transcription initiation at the start site (TSS)Exon lengthExon prescence/absenc
5、e in the mature transcriptAlternate transcription termination and polyadenylationAlternative donor and acceptor splice sitesAlternative polyadenylationExon skippingExamples of alternative splicingGCboxGCboxCAATTATATranscription InitiationExon 1Initiation CodonGTAGIntron 1Exon 3Stop CodonAATAAPoly (A
6、) addition site (s)Exon 2 SKIP3 Flanking regionPre-mRNAMature mRNACapturing expressed transcriptsDatabases - SequencesdbESTSeveral collapsed datasetsTIGR-THCAllgenesUnigeneBodyMapSTACKSeveral more specialisedGenome Sequence as it appears: Expression Capture Serial Analysis of Gene Expression DNA fra
7、gments that act as unique markers of gene transcripts. Assay of numbers of each marker in a set of sequence yields a measure of gene expression Array Laydown of sequence clones to provide an organised series for hybridisationResolution of Captured ExpressionESTSLow resolution, broad capture, provide
8、s template for SAGE and ArraySAGEMedium resolution, need template, noise can be an issue, stoichiometry is revealed but standardisation a problemARRAYHigh resolution, need template, noise, stoichiometric resolution highest, standardisation a problem.35AAAAAPartial cDNATranscripts3EST5ESTClone/Seq ve
9、ctor with CLONEIDForwards andreverse sequencingprimers3 overlapping5 staggered lengthdue to polymerase processitivityWhat is an EST?What potential do ESTs hold? Expression counts Consensus sequences Alternate expression-form characterisation Identification of genes expressed in a pilot gene discover
10、y project Identification of genes specifically expressed in a chosen library or tissueUse of Transcripts in Completed genomes Identification of genes Exon boundaries Alternate transcripts Genomic annotation Expression sites of encoded genes Comparitive genomicsEST data qualityT27784 g609882 | T27784
11、 CLONE_LIB: Human Endothelial cells. LEN: 337 b.p. FILE gbest3.seq 5-PRIME DEFN: EST16067 Homo sapiens cDNA 5 end AAGACCCCCGTCTCTTTAAAAATATATATATTTTAAATATACTTAAATATATATTTCTAATATCTTTAAATATATATATATATTTNAAAGACCAATTTATGGGAGANTTGCACACAGATGTGAAATGAATGTAATCTAATAGANGCCTAATCAGCCCACCATGTTCTCCACTGAAAAATCCTCTTT
12、CTTTGGGGTTTTTCTTTCTTTCTTTTTTGATTTTGCACTGGACGGTGACGTCAGCCATGTACAGGATCCACAGGGGTGGTGTCAAATGCTATTGAAATTNTGTTGAATTGTATACTTTTTCACTTTTTGATAATTAACCATGTAAAAAATGEST is Poor Quality data with contaminantsVectorRepeat MASKIndividual items are prone to error but an entire collection contains valuable genetic inf
13、ormationOverview of clustering and consensus generationPre-pocessingInitialClusteringAssemblyAlignmentProcessingClusterJoiningOutputRepeatsVectorMaskAlignmentsConsensiExpressed FormsTranscript reconstructionWhat is an EST cluster?A cluster is fragmented, EST data and (if known) composite exon transc
14、ript sequence data, consolidated, placed in correct context and indexed by gene such that all expressed data concerning a single gene is in a single index class, and each index class contains the information for only one gene. (Burke, Davison, Hide, Genome Research 1999). Loose and stringent cluster
15、ing Stringent - greater fidelity, lower coverage One pass Shorter consensi Lower inclusion rate of expression-forms Loose - lower fidelity, higher coverage Multi-pass Longer consensus sequences but paralogs need attention Comprehensive inclusion of expression-formsSupervised clustering Template for
16、hybridisation is a transcript composite derived from: A captured full length mRNA A composite exon construct from a genomic sequence An assembled EST cluster consensusClean Short and Tight TIGR-THCUniGeneSTACKLong and Loose Data apprehension and input format. Sources: In-House, Public, Proprietary A
17、ccession / Sequence-run ID Location/orientation Source Clone Source library and conditionsPre-processing Minimum informative length Low complexity regions Removal of common contaminants Vector, Repeats, Mitochondrial, Xenocontaminants XBLAST, Repeatmasker, VecBase and others BLIND masking Pre-cluste
18、ring vs known transcripts (data reduction)Initial clustering Stepwise clustering Multistate. sequence identity annotation verificationAssembly Including chromatograms - SNPs and Paralogs PHRAP and CAP series Multiple assemblies can fragment from one input cluster fidelity alt. forms errorAlignment p
19、rocessing Consensus generation Alternate forms Errors Choosing the correct consensusCluster joining Clone joining Choosing to accept a clone annotation 1 clone ID 2 clone IDs Available parents mRNA (incomplete/alternate) Composite(constructed from Genomic) intronic sequence 2%Output Alignment altern
20、ate expression-forms polymorphisms error assessment Cluster raw cluster membership contextual links Formats: FASTA, GenBank, EMBLAlignment scoring methods: Correct position of sequence elements against each other maximizes some score BLAST and FASTA Heuristic cutoff and identity pairwise alignment f
21、ast EST clustering methods Est sequence is littered with errors, stutters, in-dels and re-arrangements alignment approach is sensitive to these 3 only comparisonNon-alignment based scoring methods: D2-cluster No alignment so a speedup Sensitivity improved by multiplicity measure low weight to low co
22、mplexity very error tolerant transitive closure 96% ID over 100 or 150 bases.Word table()acggtc cggtcaMultiplicity comparison332(d)2= 4TIGR_ASSEMBLER THC_BUILD: BLAST-FASTA id all overlaps and are stored. Tigr-assembler then uses rapid oligo nucleotide comparison and assembles non-repeat overlaps. (
23、95% ID over 40bp) matching constraints on sequence ends minimum sequence id within a sequence group - more fragmented as a result Other TIGR approaches are similar UniGeneUnigene approach Originally 3 only + mRNA common words of length 13 separated by no more than 2 bases. IDAnnotationShared clone I
24、D Genbank, genomic ad dbEST DUST 100bp min MEGABLASTWagner et al. CSH 1999Fragmentation ComparisonMethodologyInputSequencesSingletonGroups%Singleton GroupsTIGR GeneIndex626 163135 14021.83STACK_PACK415 83358 07013.96STACK_PACK analysis of UniGene clusters resulted in a fragmentation ratejust over ha
25、lf of the TIGR index.Alignment Analysis Three subassembliesPotential alternateexpression formOrthologs and Paralogs Orthologs Genes that share the same ancestral gene that perform the same biological function in different species but have diverged in sequence makeup due to selective evolution Paralo
26、gs Genes within the same genome that share an ancestral gene that perform diverse biological functions.Needs Functional assignments Expression states of alternate forms and their sites of expression Exon level resolution of expression Representative forms for application to arrays Physical gene loca
27、tions Relationship to diseaseExploration Availability of genomic sequence and partial transcription products means characterisation of alternate transcription can begin in earnest. Contribution to variation of expressed products and effects on biology are likely to be significantHow to trap useful g
28、enome sequence to manufacture a genome virtually?Gene level approachTrap Expressed Sequence Tags1.8 M tags, 35-100K genesCombine to form virtual genesAnnotate and analyse these genesCorrelate with phenotype(s) = diseaseUnderstand the expression basis of diseaseReconstruction of transcriptsDerive und
29、erstanding of expressed gene productsUse of expressed sequence data requires complex processingProcessed datasets are badly neededCapture a first glimpse of a genomes activitesGenomic level sequence is the final state, but its products can provide powerful information very early.Characterize underly
30、ing gene structureExon boundaries are difficult to define accurately and consistentlyAssess effect of an intervention on gene expression productsA rough EST profile is a quick identifier of key expression productsAssociate isoforms with expression states Expression forms vary, how and when? What doe
31、s a full length cDNA really mean?Why is transcript data a problem?Transcript DataFull length cDNAGenBank has many entries that confuse full length with complete Coding SequencePartial cDNARedundant partial cDNA sequencesExon CompositeAll confirmed exons combined to form a complete transcriptExpresse
32、d Sequence TagSingle pass sequenceGenome Survey SequenceSingle pass sequence Small genomes contain more coding sequences in GSS than larger genomesGenome Sequence:Characterizing underlying gene structureFanfare fragmentFirst Pass AnnotatedExon boundariesPredictedCross species conservationTranscript
33、confirmationComposite exon transcriptHow do you define a transcript?STACKing approachDistill quality from quantityAccurate consensus sequence representationIdentify expression variation, both spatial and developmentalFacilitate better understanding of gene expressionExon-level gene expression profil
34、eIntegration of expression with genome sequenceConfirm and discover expressed exonsProvide gene candidacy deliveryIntegrate with phenotypeSTACKPACKSQL DatabaseData abstractionApplication managementProcess SchedulerUser Interface- C+, MySQL, HTML, JavastackPACK SchemaALL alternate expression forms ar
35、e saved and accessible.WebProbe - View by clonelink accessionEntering a project name and cluster accession number displays the clonelink Consensus View.Clonelink cluster IDCluster IDContig IDInput EST accession numbersIn all views, the full cluster family tree is shown in the panel on the left. Link
36、 to corresponding UniGene entryAlignment and Analysis PHRAP Alignment first alignment created all ESTs in one alignment Alignment Analysis CRAW used to look for subassemblies Identifies potential alternate expression forms CRAW Alignment Final alignment for each subassembly Consensus Analysis Statis
37、tics used to select best consensus Notes degree of matching between EST & consensusThe Value of Cluster DataMicroarray StudiesClusters represent unique forms associated with a specific stateGene DiscoveryUnique transcripts revealed in association with expression libraries especially in little studie
38、d organismsFunctional AnnotationVirtual genes can be searched against the database to provide functional annotation of the products of a genomeExpressed Gene StructureExons boundaries are revealed by transcript confirmationHow to trap useful genome sequence to manufacture a genome virtually? Gene le
39、vel approach Trap Expressed Sequence Tags Combine to reconstruct virtual genes Maufacture a substrate for microarray studies Annotate and analyse these genes Compare between species Species-specific characteristics Reveal genes under selectionRaw ESTsClustersAlignmentsConsensiJoined Consensipredict CDSProtein FragmentsNNNNNDetection of virulence genes in malarial pathogensRahlston MullerReconstruction
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 江油離婚協(xié)議書
- 工程轉(zhuǎn)包委托協(xié)議書
- 漢富控股協(xié)議書
- 實踐團隊任務(wù)協(xié)議書
- 小區(qū)倉儲出售協(xié)議書
- 神明合同協(xié)議書
- 就業(yè)勞務(wù)基地協(xié)議書
- 泰國基金協(xié)議書
- 離婚流程協(xié)議書
- 照明合同協(xié)議書
- 電機控制與調(diào)速技術(shù)課件 項目四 步進電動機控制與調(diào)速技術(shù)
- 2024版保險合同法律適用與條款解釋3篇
- 【MOOC】人格與精神障礙-學(xué)做自己的心理醫(yī)生-暨南大學(xué) 中國大學(xué)慕課MOOC答案
- 外科經(jīng)典換藥術(shù)
- 2024年支氣管哮喘臨床診療指南:課件精講
- 《滑翔傘模擬器控制系統(tǒng)的設(shè)計與研究》
- 公務(wù)員考試題庫及答案4000題
- 專題04 物質(zhì)結(jié)構(gòu)與性質(zhì)-2024年高考真題和模擬題化學(xué)分類匯編(解析版)
- 林權(quán)投資合作協(xié)議范本
- 中醫(yī)康復(fù)治療技術(shù)習(xí)題+參考答案
- 新疆大學(xué)答辯模板課件模板
評論
0/150
提交評論