...

Data Mining

by user

on
Category: Documents
27

views

Report

Comments

Transcript

Data Mining
Introduzione al Data Mining
Franco Perduca
Factory Software
[email protected]
Agenda
Cos’è il datamining
Data mining in sql 2005
Algoritmi
Demo




2
Cos’è il datamining
Mining


Act of excavation in the earth from which ore or
minerals can be extracted
Data Mining


Act of excavation in the data from which patterns can
be extracted
Applicazioni diverse:


3
database, statistica, intelligenza artificiale
Cos’è il datamining
Questo studente andrà alla università?




Ci si basa su sesso, reddito, incoraggiamento genitori,
QI, ecc.
Es, if ParentEncouragement=Yes and IQ>100,
College=Yes
Classification (prediction)
Simili



4
Spam email
Analisi credito
Cos’è il datamining
Quanti anni ha una persona?

Si basa su Hobby, MaritalStatus, NumberOfChildren,
Income, HouseOwnership, NumberOfCars, …
 E.g., If MaritalStatus=Yes, Age =
20+4*NumberOfChildren+0.0001*Income+…
 Regression (prediction)

Simili



5
Cosa valgono le MSFT la prossima settimana? (stock
prediction)
Quanto puo’ rendere un cliente? (marketing)
Cos’è il datamining
Chi visita il mio sito?




Raggruppare in base modelli di “visita”, dati
demografici ecc.
Es. Chi legge le news,tipo prodotti ecc.
Segmentation (clustering)
Simili :



6
Tipi di clienti in supermercati , catene commerciali
(target marketing)
Gruppi di documenti (text categorization)
Cos’è il datamining
Quali prodotti si acquistano assieme ad una
macchina fotografica digitale?




Ci si basa su “acquisti precedenti” (shopping cart)
Es. flash memory, batterie, stampante.
Association Analysis (recommendation, market
basket analysis, collaborative filtering)
Simili:



7
on-line stores come Amazon.com, Barnes & Nobles
ecc
Composizione vetrine
Cos’è il datamining
Questo network packet arriva da un virus attack?



Si basa su network packet pattern
Anormaly detection (outlier detection)
Similar questions:



8
Movimenti anomali c.c.
Richiesta danni assicurativi ? (fraud detection)
Data mining in sql 2005
7 DM algoritmi








Decision trees (classification and regression)
Naïve Bayesian
Clustering
Neural Network
Association Rules
Sequence Clustering
Time Series
12 Viewers






9
Dependency Network
Attribute discrimination
Cluster profiles
Lift Chart
…
Data mining in sql 2005
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
Classification
Regression
Segmentaion
Assoc. Analysis
Anomaly Detect.
Seq. Analysis
Time series
√ - first choice
10
√ - second choice
Data mining in sql 2005
Business Knowledge
Relative Business Value
SQL 2005
Data Mining
OLAP
Reports (Adhoc)
Reports (Static)
Easy
Difficult
Usability
11
Data mining in sql 2005

OLE DB for DM, DMX



Native XMLA support



XMLA Standard
Web Services
Object model for ADOMD.Net

12
Industry standard
COM native applications
.NET applications
Data mining in sql 2005

Case:



Entità su cui si crea modello in base al suo
comportamento (behavior).
Non quello che vogliamo predirre (= output)
Example:

Lo studente che va alla università ?


Qual’è il prezzo delle MSFT?


“(MSFT) stock”
Cosa compra il cliente?

13
“studente”
“cliente”
Data mining in sql 2005
Define a model
Train the model
Test the model
Training Data
Data Mining
Management System
(DMMS)
Test Data
Mining Model
Prediction using the model
Prediction Input Data
14
Data mining in sql 2005
Business
Understanding
Data
Understanding
Data
SSIS
SSAS(OLAP)
SSRS
Flexible APIs
SSAS
(OLAP)
DSV
Data
Preparation
Deployment
Modeling
Evaluation
15
SSIS
SSAS
(OLAP
)
SSAS
(Data
Mining)
DEMO
Data mining in sql 2005

Creazione via codice di modelli

Analysis Management Objects


Modello ad oggetti per amministrare/creare
Data Mining Extensions (DMX)
CREATE MINING MODEL
TargetMailDT
(CustID
LONG KEY,
Gender
TEXT DISCRETE,
CommuteDist
TEXT DISCRETE,
Education
LONG CONTINUOUS,
…
BikeBuyer
LONG DISCRETE PREDICT
)
USING Microsoft_Decision_Trees
17
INSERT INTO
TargetMailDT
(CustID, Gender, CommuteDist,
Education, …, BikeBuyer)
OPENQUERY
([My Data Source],
‘SELECT CustID, Gender,
ComDist, Education, …
BikeBuyer’
)
Data mining in sql 2005

Creazione di Modelli in sessione

Analysis Management Objects



Modello ad oggetti per amministrare/creare
Data Mining Extensions (DMX)
Anche su dati live !!!!
CREATE SESSION MINING MODEL
TargetMailDT
(CustID
LONG KEY,
Gender
TEXT DISCRETE,
CommuteDist
TEXT DISCRETE,
Education
LONG CONTINUOUS,
…
BikeBuyer
LONG DISCRETE PREDICT
)
USING Microsoft_Decision_Trees
18
INSERT INTO
TargetMailDT
(CustID, Gender, CommuteDist,
Education, …, BikeBuyer)
@InputRowset
Data mining in sql 2005
App
Data
ADOMD.Net/OLE DB
Local Analysis Services
(msmdlocal)
Mining Model
Model File
Decision Tree/Clustering algorithms
Your Application
19
Data
Source
Retrieve
Data
Domande?
20
© 2004 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
Fly UP