




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、Grammatical ComplexityOf Symbolic Sequences:A Brief IntroductonBailin HaoT-Life Research Center, Fudan UniversityInstitute of Theoretical Physics, Academia SinicaThe Santa Fe Institute, New Mexico, USAhttp:/ Paradigms in TheoreticalDescription of Nature Deterministic: based on periodicities and recu
2、rrences, from Kepler to Yang-Mills Stochastic: based on randomness, from Brownian motion to MSR field theory of hydrodynamics and molecular motors Fractal, self-similar, scale invariant: from phase transitions and critical phenomena to chaotic dynamics Finiteness is the unifying Physics: languages語言
3、學語言學(language 而非 philology)方法方法 統(tǒng)計語言學 “字”的頻度和關(guān)聯(lián) Zipf 定律 代數(shù)語言學:生成語法和語法復雜性 串行生成:Chomsky體系 平行生成:Lindenmayer 體系(來自發(fā)育生物學) 可因式化語言(Factorizable language)自然語言與遺傳語言自然語言與遺傳語言相似處:相似處:多義性 冗余度 容錯和糾錯 長程關(guān)聯(lián) 均基于離散的排列組合系統(tǒng)有某些語法,但不能完全生成方言、個體差異性演化、突變、滅絕歷史“垃圾”、古語、“化石”外來語、橫向交換相異處:相異處: 標點符號和間隔不同 兩種語言的相互作用 二維、三維的相互作用 重復序列的數(shù)
4、目和作用An Observation u d c s b tcharge, mass, flavor, charm, p n e charge, mass, spin, magnetic momentum, H C N O P atomic number, ion radius, valence, affinity, H2O NO CO2 molecular weight, polarity, a c g t A D E F G H W Y VBRCA1 PDGFA PROGRAMME:Coarse-Grained Description of NatureUse of Symbols and
5、 Symbolic StringsLanguageGrammar and Complexity (Chomsky, Lindenmayer, etc.)So far this programme has been best realized in the study of dynamics by using Symbolic Dynamics.There have been preliminary attempts in analyzing biological sequences.It may not be a coincidence that the two systems in the
6、universe that most impress us with their open-ended complex design life and mind are based on discrete combinatorial systems. Many biologists believe that if inheritance were not discrete, evolution as we know it could not have taken place.S. Pinker, The Language Instinct (1995)Simple ExamplesAt the
7、 level of words: DOG GOD At sentence level: Dog bites Man Man bites DogN C EGF (Epidermal GF)N C Chymotrypsin (胰凝乳蛋白酶) N C Urokinase (UK) (尿激酶)N C Factor IX (凝血因子IX, X-mas抗血友病因子)N C Plasminogen (纖維蛋白融酶原) 幾種絲氨酸蛋白酶的幾種絲氨酸蛋白酶的domain組合組合 B.Alberts 等,Mol.Biology of the Cell 第三版 1994. P.123Ca 結(jié)合蛋白含3個-s-s-G
8、C 語法復雜性 字母表 例1. = a, c, g, t 例2. = A, C, D W, Y 例3. = a, z, A, Z, +, , 字母表中各種字母組成的一切字母串 (包括空串) * *的任何子集是基于的一種語言語法 = 字母表,初始字母,產(chǎn)生規(guī)則 基于該語法的語言Classification of Formal LanguagesChomsky Hierarchy Sequential production rulesLindenmayer SystemsParallel production rulesGenerative Grammar S Sentence NP Noun P
9、hraseVP Verb PhraseAdj AdjectiveArt ArticleS if S then SS either S or SNon-Terminal and Terminal SymbolsN boy | girl | scientist | V sees | believes | loves | eats | Adj young | good | beautiful | Art a | one | theS NP VPVP V NPNP (Art) Adj* NChomsky 語法層次 N 非終結(jié)字母集(工作用符號) T 終結(jié)字母集 S N 起始字母 P = 生成規(guī)則(x
10、y)的集合 x, y 為字母串 關(guān)于 x, y 的不同規(guī)定導致不同語法 語法 G = (N, T, P, S)0 類語法 x (N T)* N(N T)* y (N T)*至少含有一個非終結(jié)字母1 類語法 上下文有關(guān)語法 x = t1 a t2 t1, t2 T* a N 2 類語法 上下文無關(guān)語法 x = a N 3 類語法 正規(guī)語法 x = a y = b 或 bc a, c N b = 空 或 b T形式語言的Chomsky層次層語言計算機存儲要求0遞歸可數(shù)REL圖靈機(萬能計算機)無根1上下文有關(guān)CSL線性有界自動機比例於輸入字長2上下文無關(guān)CFL下推自動機下推區(qū)(堆棧)3正規(guī)RGL有
11、限自動機不要求 RLRRRLRRab(i)(ii)RLabcbcdA transfer function (a, R) = bA Finite State Automaton(FSA)FSA: Finite State Automata Deterministic FSA Non-Deterministic SFA Equivalence of DFSA and NDFSA: subset construction Minimal DFSA Myhill-Nerode theorem (1958): number of nodes in minDFSAA Pushdown AutomatonP
12、ushdown list StackFirst In Last Out (FILO)A Turing MachineAlan M. Turing (1912-1954)FSA + R/W tapeChurch-Turing Thesis (1936):Any effective (mechanical) computation canbe carried out by a Turing machineTerminals = a, b, cNon-terminal = A, BSequential rules: B aBAc | abc bA bb cA Ac B abc B aBAc aabc
13、Ac aabAcc B abAc aaBAcAc aaBAAc aaabcAAc aaabAcAc aaabbAcc Example: ai b ici | i0 CSL Rules to Generate Gene-Like Sequences( according to David Searls )gene upstream transcript downstreamtranscript 5-untranslated-region start-codon coding-region 3-untranslated-regioncoding-region codon coding-region
14、 | stop-codon | splice | coding regioncodon lys | asn | thr | met | glu | his | pro | asp | ala | gly | tyr | trp | phe | leu | ile | ser | arg | gln | val | cysstart-codon metstop-codon taa | tag | tga leu tt purine | ct base (6)ser ag pyrimidine | tc base (6)arg ag purine | cg base (6)val gt base
15、pro cc base (4)ala gc base gly gg base (4)thr ac base (4) ile at pyrimidine | ata (3)lys aa purine asn aa pyrimidine (2)gln ca purine his ca pyrimidine (2)glu ga purine cys tg pyrimidine (2)phe tt pyrimidine tyr ta pyrimidine (2)asp ga pyrimidine (2)met atg trp tggbase m a | c | g | t purine a | gpr
16、imidine c | tsplice intron intron gt | intron-body | agsplice a a intron splice c c intronsplice t t intron splice g g intron a splice intron a c splice intron c t splice intron t g splice intron gupstream enhancer promotor enhancer enhancer promotor silencer isolator These rules are capable to gene
17、rate an unlimitedset of gene-like sequences, mostly biological nonsense. They may be used to recognize gene-like segments in long DNA sequences.Syntax versus Semantics: texts vs. grammar. Physics behind this coarse-grained description:stereochemistry, interaction between proteins andDNA chains, meta
18、llic ions etc.Symbolic Dynamics Languages19911999謝惠民定理和猜測謝惠民定理和猜測 單峰映射揉序列對應(yīng)的語言中的單峰映射揉序列對應(yīng)的語言中的正規(guī)語言只有周期序列和最終周正規(guī)語言只有周期序列和最終周期序列兩種期序列兩種 如何走向比正規(guī)語言高一級的上如何走向比正規(guī)語言高一級的上下文無關(guān)語言?下文無關(guān)語言? 猜測:單峰映射揉序列對應(yīng)的語猜測:單峰映射揉序列對應(yīng)的語言中沒有上下文無關(guān)語言言中沒有上下文無關(guān)語言Subintervals determined by the periodic kneadingSequence (RLRRC) Order of
19、visits in the periodic kneadingSequence (RLLRC)Transformations of subintervals a c + d (on reading L) b d (on reading R) c b + c (on reading R) d a (on reading R)InputLRRRqabcdd1100c1010b0010a0001Transfer FunctionsRLac, dbdcb, cdaRLa,b,c,d a,b,c,d c,dc,da,b,ca,b,cb,c,dc,db,c,da,b,c,dStefan matrix fo
20、r 256P in Feigenbaum cascadeStefan matrix for F13=233; Case (a)Stefan matrix for F13=233. Case (b)Stefan matrix for F13=233. Case (c)Stefan matrix for F13=233. Case (d)Symbolic Dynamics Languages19911999 br bl ar al albr blar Alphabet: S = ar, al, br, bl Production rules: Initial symbol (axiom) = ar
21、 Grammar: G = (S, P, ) Language: L (G) S*Development of Anabaena catenula (串珠藻項圈藻屬) br ar ar albr bl al al blarP = Lindenmayer SystemsParallel production rules. Finer classification D0L Deterministic, no interaction, i.e., context-free0L non-deterministic, no interaction IL non-deterministic, with I
22、nteraction, i.e., context sensitiveT0L with Table of production rulesTIL E0L Extended to non-terminal symbolsET0L EIL REL of ChomskyRGL Regular CFL Context-FreeCSL Context-Sensitive REL Recursively Enumerable CSL CFL RGL FINDOLRELChomsky LindenmayerIndexed0:REL1:CSLINDET0LE0L2:CFL3:RGLILT0L0LD0LEILL
23、 = aibici | i 0 CSLG = (S, T, ) = abc S = a, b, c T = t1, t2 T1 = a aa, b bb, c cc T2 = a , b , c T0LExample a la Lindenmayer Dyck language: A language of nested parentheses Many types of parentheses Finite depth of nesting Context-free languageOur case: Only 3 types of parentheses Shallow nesting C
24、onjecture (Xie): may be regular language模糊語言學 形式推廣不難:Z .G .Yu (喻祖國2001) 如何定量地引用生物知識 Consensus 序列和權(quán)重矩陣隨機語法 隱馬可夫鏈 = 隨機正規(guī)語法 更高階的隨機語法? Factorizable Languages Symbolic dynamics leads to factorizable languages A complete genome defines a factorizable langauge An amino acid sequence with unique reconstruct
25、ion (at certain K) defines a factorizable language Modeling in Biology Cells Tissues Organs “Systems”: circulation, respiration, reproduction, neural, sensory, musclular, etc. Organisms, population, ecosystems Animals versus plants Plant development, morphology, physiology and pathologyModeling of Plant MorphologyBy using L-System P. Prusinkiewicz, J. Hanan, Lindenmayer Systems, Fractals, and Plants, LN in Biomath., vol. 79, Springer, 1989 P. Prusinkiewicz, A. Lindenmayer, The Algorithmic Beau
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 汽車轉(zhuǎn)讓與售后服務(wù)及品牌推廣協(xié)議
- 餐飲品牌加盟商扶持與發(fā)展服務(wù)合同
- 環(huán)衛(wèi)安全措施方案
- 企業(yè)能耗目標規(guī)劃方案
- 醫(yī)療產(chǎn)品工藝方案(3篇)
- 倉儲租賃及倉儲保險合作協(xié)議范本
- 醫(yī)學考試搞笑考試題及答案
- 臨水方案監(jiān)理意見
- 生態(tài)農(nóng)業(yè)園區(qū)土地租賃經(jīng)營合同
- 服裝銷售計劃方案
- 2017版高中生物課程標準考試試題及答案
- 中醫(yī)夏令營課程
- 國家開放大學《管理學基礎(chǔ)》網(wǎng)上課程形考任務(wù)1-4附參考答案
- 三級醫(yī)院評審標準實施細則(2023 年版)
- (高清版)TSG 09-2025 缺陷特種設(shè)備召回管理規(guī)則
- 集團企業(yè)IT項目規(guī)劃調(diào)研方案
- 對公貸款業(yè)務(wù)培訓
- 2025春季學期國開電大本科《商務(wù)英語3》一平臺在線形考(綜合測試)試題及答案
- 【初中信息】開啟物聯(lián)網(wǎng)之門課件 2024-2025學年人教版(2024)初中信息科技八年級全一冊
- 2025年國家公務(wù)員考試行測常識題庫及答案(共120題)
- 本科畢業(yè)論文完整范文(滿足查重要求)鄉(xiāng)村振興背景下大學生農(nóng)村創(chuàng)業(yè)的困境及對策
評論
0/150
提交評論