Comments
Description
Transcript
Data Mining
Introduzione al Data Mining Franco Perduca Factory Software [email protected] Agenda Cos’è il datamining Data mining in sql 2005 Algoritmi Demo 2 Cos’è il datamining Mining Act of excavation in the earth from which ore or minerals can be extracted Data Mining Act of excavation in the data from which patterns can be extracted Applicazioni diverse: 3 database, statistica, intelligenza artificiale Cos’è il datamining Questo studente andrà alla università? Ci si basa su sesso, reddito, incoraggiamento genitori, QI, ecc. Es, if ParentEncouragement=Yes and IQ>100, College=Yes Classification (prediction) Simili 4 Spam email Analisi credito Cos’è il datamining Quanti anni ha una persona? Si basa su Hobby, MaritalStatus, NumberOfChildren, Income, HouseOwnership, NumberOfCars, … E.g., If MaritalStatus=Yes, Age = 20+4*NumberOfChildren+0.0001*Income+… Regression (prediction) Simili 5 Cosa valgono le MSFT la prossima settimana? (stock prediction) Quanto puo’ rendere un cliente? (marketing) Cos’è il datamining Chi visita il mio sito? Raggruppare in base modelli di “visita”, dati demografici ecc. Es. Chi legge le news,tipo prodotti ecc. Segmentation (clustering) Simili : 6 Tipi di clienti in supermercati , catene commerciali (target marketing) Gruppi di documenti (text categorization) Cos’è il datamining Quali prodotti si acquistano assieme ad una macchina fotografica digitale? Ci si basa su “acquisti precedenti” (shopping cart) Es. flash memory, batterie, stampante. Association Analysis (recommendation, market basket analysis, collaborative filtering) Simili: 7 on-line stores come Amazon.com, Barnes & Nobles ecc Composizione vetrine Cos’è il datamining Questo network packet arriva da un virus attack? Si basa su network packet pattern Anormaly detection (outlier detection) Similar questions: 8 Movimenti anomali c.c. Richiesta danni assicurativi ? (fraud detection) Data mining in sql 2005 7 DM algoritmi Decision trees (classification and regression) Naïve Bayesian Clustering Neural Network Association Rules Sequence Clustering Time Series 12 Viewers 9 Dependency Network Attribute discrimination Cluster profiles Lift Chart … Data mining in sql 2005 √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis Time series √ - first choice 10 √ - second choice Data mining in sql 2005 Business Knowledge Relative Business Value SQL 2005 Data Mining OLAP Reports (Adhoc) Reports (Static) Easy Difficult Usability 11 Data mining in sql 2005 OLE DB for DM, DMX Native XMLA support XMLA Standard Web Services Object model for ADOMD.Net 12 Industry standard COM native applications .NET applications Data mining in sql 2005 Case: Entità su cui si crea modello in base al suo comportamento (behavior). Non quello che vogliamo predirre (= output) Example: Lo studente che va alla università ? Qual’è il prezzo delle MSFT? “(MSFT) stock” Cosa compra il cliente? 13 “studente” “cliente” Data mining in sql 2005 Define a model Train the model Test the model Training Data Data Mining Management System (DMMS) Test Data Mining Model Prediction using the model Prediction Input Data 14 Data mining in sql 2005 Business Understanding Data Understanding Data SSIS SSAS(OLAP) SSRS Flexible APIs SSAS (OLAP) DSV Data Preparation Deployment Modeling Evaluation 15 SSIS SSAS (OLAP ) SSAS (Data Mining) DEMO Data mining in sql 2005 Creazione via codice di modelli Analysis Management Objects Modello ad oggetti per amministrare/creare Data Mining Extensions (DMX) CREATE MINING MODEL TargetMailDT (CustID LONG KEY, Gender TEXT DISCRETE, CommuteDist TEXT DISCRETE, Education LONG CONTINUOUS, … BikeBuyer LONG DISCRETE PREDICT ) USING Microsoft_Decision_Trees 17 INSERT INTO TargetMailDT (CustID, Gender, CommuteDist, Education, …, BikeBuyer) OPENQUERY ([My Data Source], ‘SELECT CustID, Gender, ComDist, Education, … BikeBuyer’ ) Data mining in sql 2005 Creazione di Modelli in sessione Analysis Management Objects Modello ad oggetti per amministrare/creare Data Mining Extensions (DMX) Anche su dati live !!!! CREATE SESSION MINING MODEL TargetMailDT (CustID LONG KEY, Gender TEXT DISCRETE, CommuteDist TEXT DISCRETE, Education LONG CONTINUOUS, … BikeBuyer LONG DISCRETE PREDICT ) USING Microsoft_Decision_Trees 18 INSERT INTO TargetMailDT (CustID, Gender, CommuteDist, Education, …, BikeBuyer) @InputRowset Data mining in sql 2005 App Data ADOMD.Net/OLE DB Local Analysis Services (msmdlocal) Mining Model Model File Decision Tree/Clustering algorithms Your Application 19 Data Source Retrieve Data Domande? 20 © 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.