




版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、2022/7/151高級(jí)人工智能 知識(shí)發(fā)現(xiàn) 2022/7/152 概述 在數(shù)據(jù)庫(kù)基礎(chǔ)上實(shí)現(xiàn)的知識(shí)發(fā)現(xiàn)系統(tǒng),通過(guò)綜合運(yùn)用統(tǒng)計(jì)學(xué)、粗糙集、模糊數(shù)學(xué)、機(jī)器學(xué)習(xí),和專(zhuān)家系統(tǒng)等多種學(xué)習(xí)的手段和方法, 從大量的數(shù)據(jù)中提煉出抽象的知識(shí),從而揭示出蘊(yùn)涵在這些數(shù)據(jù)背后的客觀世界的內(nèi)在聯(lián)系和本質(zhì)規(guī)律,實(shí)現(xiàn)知識(shí)的自動(dòng)獲取,這是一個(gè)富有挑戰(zhàn)性、應(yīng)用前景廣闊的研究課題。2022/7/153提綱KDD的由來(lái)和應(yīng)用領(lǐng)域KDD的定義KDD的各個(gè)步驟KDD軟件KDD領(lǐng)域的會(huì)議和雜志2022/7/154Evolution of Database Technology:from data management to data an
2、alysis1960s:Data collection, database creation, IMS and network DBMS.1970s: Relational data model, relational DBMS implementation.1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.).1990s: Data mining an
3、d data warehousing, multimedia databases, and Web technology.2022/7/155Motivations “Necessity is the Mother of Invention”Data explosion problem: Automated data collection tools, mature database technology and internet lead to tremendous amounts of data stored in databases, data warehouses and other
4、information repositories. We are drowning in information, but starving for knowledge! (John Naisbett)Data warehousing and data mining :On-line analytical processingExtraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases.2022/7/1561989 IJCAI Works
5、hop on KDDKnowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, eds., 1991)1991-1994 Workshops on KDDAdvances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., 1996)1995-1998 AAAI Int. Conf. on KDD and DM (KDD95-98)Journal of
6、 Data Mining and Knowledge Discovery (1997)1998 ACM SIGKDD 1999 SIGKDD99 Conf.Important dates of data mining2022/7/157數(shù)據(jù)庫(kù)知識(shí)發(fā)現(xiàn)該術(shù)語(yǔ)于1989年出現(xiàn),F(xiàn)ayyad定義(1996)為“KDD是從數(shù)據(jù)集中識(shí)別出有效的、新穎的、潛在有用的,以及最終可理解的模式的非平凡過(guò)程” the nontrivial process of identifying valid, novel, potentially useful,and ultimately understandable pa
7、tterns in data2022/7/158IdentifyProblem or OpportunityMeasure effectof ActionAct onKnowledgeKnowledgeResultsStrategyProblemThe virtuous cycle2022/7/159Application Areas and OpportunitiesMarketing: segmentation, customer targeting, .Finance: investment support, portfolio managementBanking & Insurance
8、: credit and policy approvalSecurity: fraud detectionScience and medicine: hypothesis discovery, prediction, classification, diagnosis Manufacturing: process modeling, quality control,resource allocationEngineering: simulation and analysis, pattern recognition, signal processingInternet: smart searc
9、h engines, web marketing 2022/7/1510Selection and PreprocessingData MiningInterpretation and EvaluationData ConsolidationKnowledgep(x)=0.02WarehouseData SourcesPatterns & ModelsPrepared Data ConsolidatedDataThe KDD process2022/7/1511Data mining is a major component of the KDD process - automated dis
10、covery of patterns and the development of predictive and explanatory models.What is KDD? A process!2022/7/1512Learning the application domain:relevant prior knowledge and goals of applicationData consolidation: Creating a target data setSelection and Preprocessing Data cleaning : (may take 60% of ef
11、fort!)Data reduction and projection:find useful features, dimensionality/variable reduction, invariant representation.Choosing functions of data mining summarization, classification, regression, association, clustering.Choosing the mining algorithm(s)Data mining: search for patterns of interestInter
12、pretation and evaluation: analysis of results.visualization, transformation, removing redundant patterns, Use of discovered knowledgeThe steps of the KDD process2022/7/1513Garbage in Garbage out The quality of results relates directly to quality of the data50%-70% of KDD process effort is spent on d
13、ata consolidation and preparationMajor justification for a corporate data warehouseData consolidation and preparation2022/7/1514From data sources to consolidated data repositoryRDBMSLegacy DBMSFlat FilesDataConsolidationand CleansingWarehouseObject/Relation DBMS Multidimensional DBMS Deductive Datab
14、ase Flat files ExternalData consolidation2022/7/1515Determine preliminary list of attributes Consolidate data into working databaseInternal and External sourcesEliminate or estimate missing valuesRemove outliers (obvious exceptions)Determine prior probabilities of categories and deal with volume bia
15、sData consolidation2022/7/1516Generate a set of exampleschoose sampling methodconsider sample complexitydeal with volume bias issuesReduce attribute dimensionalityremove redundant and/or correlating attributescombine attributes (sum, multiply, difference)Reduce attribute value rangesgroup symbolic d
16、iscrete valuesquantize continuous numeric valuesTransform datade-correlate and normalize values map time-series data to static representationOLAP and visualization tools play key roleData selection and preprocessing2022/7/1517Data mining tasks and methods Automated Exploration/Discoverye.g. discover
17、ing new market segmentsclustering analysisPrediction/Classificatione.g. forecasting gross sales given current factorsregression, neural networks, genetic algorithms, decision treesExplanation/Descriptione.g. characterizing customers by demographics and purchase historydecision trees, association rul
18、esx1x2f(x)xif age 35 and e $35k then .2022/7/1518Clustering: partitioning a set of data into a set of classes, called clusters, whose members share some interesting common properties.Distance-based numerical clusteringmetric grouping of examples (K-NN)graphical visualization can be usedBayesian clus
19、teringsearch for the number of classes which result in best fit of a probability distribution to the data AutoClass (NASA) one of best examplesAutomated exploration and discovery2022/7/1519Learning a predictive modelClassification of a new case/sample Many methods:Artificial neural networksInductive
20、 decision tree and rule systemsGenetic algorithmsNearest neighbor clustering algorithmsStatistical (parametric, and non-parametric)Prediction and classification2022/7/1520The objective of learning is to achieve good generalization to new unseen cases.Generalization can be defined as a mathematical i
21、nterpolation or regression over a set of training pointsModels can be validated with a previously unseen test set or using cross-validation methodsf(x)xGeneralization and regression2022/7/1521Objective: Develop a general model or hypothesis from specific examplesFunction approximation (curve fitting
22、)Classification (concept learning, pattern recognition)f(x)xx1x2ABSummarizing: inductive modeling = learning2022/7/1522Learn a generalized hypothesis (model) from selected dataDescription/Interpretation of model provides new knowledge Methods:Inductive decision tree and rule systemsAssociation rule
23、systemsLink Analysis Explanation and description2022/7/1523Generate a model of normal activityDeviation from model causes alertMethods:Artificial neural networksInductive decision tree and rule systemsStatistical methodsVisualization toolsException/deviation detection2022/7/1524Outlier and exception
24、 data analysisTime-series analysis (trend and deviation): Trend and deviation analysis: regression, sequential pattern, similar sequences, trend and deviation, e.g., stock analysis.Similarity-based pattern-directed analysisFull vs. partial periodicity analysisOther pattern-directed or statistical an
25、alysis2022/7/1525A data mining system/query may generate thousands of patterns, not all of them are interesting.Interestingness measures:easily understood by humansvalid on new or test data with some degree of certainty.potentially usefulnovel, or validates some hypothesis that a user seeks to confi
26、rm Objective vs. subjective interestingness measuresObjective: based on statistics and structures of patterns, e.g., support, confidence, etc.Subjective: based on users beliefs in the data, e.g., unexpectedness, novelty, etc.Are all the discovered pattern interesting?2022/7/1526Find all the interest
27、ing patterns: Completeness.Can a data mining system find all the interesting patterns?Search for only interesting patterns: Optimization.Can a data mining system find only the interesting patterns?ApproachesFirst generate all the patterns and then filter out the uninteresting ones.Generate only the interesting patterns - mining query optimization.Completeness vs. optimization2022/7/1527EvaluationStatistical validation and significance testingQualitative review by experts in the fieldPilot surveys to evaluate model accuracyInterpretationIn
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 評(píng)估公司收費(fèi)管理制度
- 財(cái)務(wù)賬目基本管理制度
- 財(cái)政公用經(jīng)費(fèi)管理制度
- 貨場(chǎng)物料調(diào)撥管理制度
- 貨車(chē)企業(yè)各項(xiàng)管理制度
- 批發(fā)面條轉(zhuǎn)讓協(xié)議書(shū)范本
- 離婚協(xié)議書(shū)有效文本范本
- 金融合伙經(jīng)營(yíng)協(xié)議書(shū)范本
- 土建職稱(chēng)聘用協(xié)議書(shū)范本
- 海運(yùn)損毀賠償協(xié)議書(shū)范本
- 2025年山東省煙臺(tái)市中考真題數(shù)學(xué)試題【含答案解析】
- 2025年山東將軍煙草新材料科技有限公司招聘筆試沖刺題(帶答案解析)
- 2025年高考真題-語(yǔ)文(全國(guó)一卷) 無(wú)答案
- 兵團(tuán)開(kāi)放大學(xué)2025年春季《公共關(guān)系學(xué)》終結(jié)考試答案
- 拆遷名額轉(zhuǎn)讓協(xié)議書(shū)
- 2025年初中學(xué)業(yè)水平考試地理試卷(地理學(xué)科核心素養(yǎng))含答案解析
- 《重大電力安全隱患判定標(biāo)準(zhǔn)(試行)》解讀與培訓(xùn)
- 《人工智能基礎(chǔ)與應(yīng)用》課件-實(shí)訓(xùn)任務(wù)18 構(gòu)建智能體
- 人工智能筆試題及答案
- 打造重點(diǎn)專(zhuān)科協(xié)議書(shū)
- 細(xì)菌性結(jié)膜炎
評(píng)論
0/150
提交評(píng)論