...

The Business Value of XML 1 Data Management – DB2 9.5 pureXML

by user

on
Category: Documents
72

views

Report

Comments

Transcript

The Business Value of XML 1 Data Management – DB2 9.5 pureXML
IBM Software Group
The Business Value of XML
Data Management Solutions
Information Management
IBM Software Group
1
© 2008 IBM Corporation
IBM Software Group
Data Management – DB2 9.5 pureXML ®
Summer/Fall 2008
Data Management Solutions
Information Management
IBM Software Group
2
© 2008 IBM Corporation
1
Agenda
ƒ Where & Why XML data is used
ƒ What’s driving the need for XML data management?
ƒ Industry solutions and application scenarios
ƒ DB2 9 early feedback and competitive comparison
3
© 2008 IBM Corporation
You will store XML!
Resistance is Futile.
4
© 2008 IBM Corporation
2
XML is Everywhere!
ƒ Integration of diverse data sources
ƒ Information exchange between applications & organizations
ƒ eForms and workflow processing
ƒ Content and document management
ƒ Message-based transactions, web services, SOA
ƒ XML documents as business objects / transaction records (digital
signatures, auditing, regulatory compliance)
ƒ XML as the better data model (for multi-values, hierarchical and
complex data)
5
© 2008 IBM Corporation
Who uses XML? Everybody!
ƒ
Financial
ACORD
XML for Insurances
FIXML
Financial Information eXchange protocol
FPML
Financial Product ML
FUNDSML Funds Markup Language
XBRL
eXtensible Business Markup Language
Life Sciences
AGAVE
Architecture for Genomic Annotation,
Visualization and Exchange
BSML
Bioinformatic Sequence Markup Language
CML
Chemical Markup Language
Publication etc.
SportML
Sport Markup Language
NewsML
News Markup Language
XBITS
XML Book Industry Transaction Standards
XPRL
eXtensible Public Relations Language
Other
LandML
Land Development Markup Language
MODA-ML Middleware tOols and Documents to
Enhance the textile/clothing supply chain
through xML
MatML
Materials Property Data Markup Language
JXDM
Global Justice XML Data Model
ebXML
Electronic Business using eXtensible Markup
Language
...
...
6
http://www.acord.org/standards/lifexml.aspx
http://www.fixprotocol.org/cgi-bin/Spec.cgi?menu=4
http://www.fpml.org/spec/index.asp
http://www.funds-xml.org/html/download.htm
http://www.xbrl.org/r
http://www.lifecde.com/products/agave/
http://www.bsml.org/resources/default.asp
http://www.xml-cml.org/
http://www.sportsml.com/specifications.php
http://www.newsml.org/pages/spec_main.php
http://www.xmlbits.org/docs.asp
http://www.xprl.org/
http://www.landxml.org/spec.htm
http://www.moda-ml.net/modaml/repository/schema/V20031/default.asp?lingua=en
http://www.matml.org/schema.htm
http://it.ojp.gov/jxdm/3.0/index.html
http://www.ebxml.org/specs/
...
© 2008 IBM Corporation
3
XML Example: Financial Data (FIXML)
ƒBuying 1000 Shares of IBM Stock..
8=FIX.4.2^9=251^35=D^49=AFUNDMGR^56=ABROKER^34=2
^52=20030615-01:14:49^11=12345^1=111111^63=0^64=2003
0621^21=3^110=1000^111=50000^55=IBM^48=459200101^22=
1^54=1^60=2003061501:14:4938=5000^40=1^44=15.75^15=USD
^59=0^10=127
Old FIX
Protocol
New FIXML
Protocol
ƒextensible
ƒlower appl development &
maintenance cost
7
© 2008 IBM Corporation
Analyst reports
ƒ Gartner Group
– 4Q07 insurance industry survey:
• 75% of firms have implemented XML standards
• 74% of firms believe XML implementations have yielded business and
technology benefits
• > 50% of firms basing XML initiatives wholly or partially on ACORD
– 1Q08 DITA (Darwin Info Type System) assessment
• “DITA can reduce the time and cost required to create content, while
increasing the value of content it identifies. These will drive enterprises
to adopt DITA-aware applications.”
ƒ The Burton Group (3Q07)
– “XQuery is likely to become for XML content what SQL is for
relational data.”
8
© 2008 IBM Corporation
4
Industry standards and mandates
ƒ Single Euro Payments Area (SEPA)
– Major European payments initiative with global implications
– Mandates XML-based message formats for exchange of
payments data between banks in Euro zone. Based on ISO
20022.
– Expected to reach “critical mass” by end of 2010.
ƒ OTC (over the counter) derivatives processing
– International Swaps and Derivatives Association (ISDA)
survey (2005): 75% of large firms responding to survey use
FpML (Financial Products Markup Language)
– Operations Management Group (OMG) consisting of buyside firms and service providers endorse FpML use as a
means to support “operational scalaibilty” for trading in credit
derivatives (1Q08)
9
© 2008 IBM Corporation
Where is your XML?
In files…
ƒ Storage not managed and not secure
In LOBS…
ƒ Content and business value locked up
XML
ONL
Y
Shred to tables
ƒ Complex and fragile mapping
10
XML DB
ƒ Scalability & integration concerns
© 2008 IBM Corporation
5
Why XML data management?
ƒ XML is pervasive
– Integration and messaging
applications
• OTC derivatives (FpML)
• Single Euro Payments Area (SEPA, UNIFI / ISO
20022)
• Securities trading (FIXML)
• Insurance (ACORD)
– E-Forms, Web applications
– SOA engagements
ƒ XML data is a corporate asset
– Messages = business
artifacts/transactions (e.g., order, trade)
– Customer profiles, behavior patterns
– Audit trails for regulatory compliance
– Archiving
11
<? xml version=“1.0” ?>
<purchaseOrder id='12345” secretKey='4x%$^'>
<customer id=“A6789”>
<name>John Smith Co</name>
<address>
<street>1234 W. Main St</street>
<city>Toledo</city>
<state>OH</state>
<zip>95141</zip>
</address>
</customer>
<itemList>
<item>
<partNo>A54</partNo>
<quantity>12</quantity>
</item>
<item>
<partNo>985</partno>
<quantity>1</quantity>
</item>
</itemList>
</purchaseOrder>
ƒ XML must be managed, shared,
analyzed & protected like any other
important data
© 2008 IBM Corporation
XML in the Database
ƒ Data that’s inherently hierarchical or nested in nature
– Example: Medical data, Bill-of-materials, etc., OO & Multivalue
ƒ Data sets with sparsely populated attributes
– Example: FIXML, FpML, Customer profiles
ƒ Schema evolution
– Example: Frequently changing services/products/processes
ƒ Variable schemas, many schemas
– Example: Data integration, consolidation of diverse data
sources
ƒ Combining structured & unstructured data
– Example: CM, Life Sciences, News & Media
12
© 2008 IBM Corporation
6
XML-Enabled Databases: Two Main Options
Shredding
CLOB/Varchar
XML
DOC
Extract
selected
elements/attr.
XML
DOC
"Decomposition"
Fixed
Mapping
Shredder
Side Tables
XML DOC
XML DOC
XML DOC
Regular tables for
faster lookup
Varchar or CLOB
column
Regular
relational
tables
13
© 2008 IBM Corporation
Shredding: A simple case
<DEPARTMENT deptid="15" deptname="Sales">
<EMPLOYEE>
<EMPNO>10</EMPNO>
<FIRSTNAME>CHRISTINE</FIRSTNAME>
<LASTNAME>SMITH</LASTNAME>
<PHONE>408-463-4963</PHONE>
<SALARY>52750.00</SALARY>
</EMPLOYEE>
<EMPLOYEE>
<EMPNO>27</EMPNO>
<FIRSTNAME>MICHAEL</FIRSTNAME>
<LASTNAME>THOMPSON</LASTNAME>
<PHONE>406-463-1234</PHONE>
<SALARY>41250.00</SALARY>
</EMPLOYEE>
</DEPARTMENT>
Department
DEPTID
DEPTNAME
15 Sales
Employee
DEPTID EMPNO FIRSTNAME
15
27 MICHAEL
15
10 CHRISTINE
14
LASTNAME PHONE
SALARY
THOMPSON 406-463-1234
41250
SMITH
408-463-4963
52750
© 2008 IBM Corporation
7
Shredding: A schema change…
"Employees are now allowed to have multiple phone numbers…"
<DEPARTMENT deptid="15" deptname="Sales">
<EMPLOYEE>
<EMPNO>10</EMPNO>
<FIRSTNAME>CHRISTINE</FIRSTNAME>
<LASTNAME>SMITH</LASTNAME>
<PHONE>408-463-4963</PHONE>
<PHONE>415-010-1234</PHONE>
<SALARY>52750.00</SALARY>
</EMPLOYEE>
<EMPLOYEE>
<EMPNO>27</EMPNO>
<FIRSTNAME>MICHAEL</FIRSTNAME>
<LASTNAME>THOMPSON</LASTNAME>
<PHONE>406-463-1234</PHONE>
<SALARY>41250.00</SALARY>
</EMPLOYEE>
</DEPARTMENT>
Requires:
• Normalization of existing data !
• Modification of the mapping
• Change of applications
Phone
EMPNO
27
10
10
PHONE
406-463-1234
415-010-1234
408-463-4963
Department
DEPTID
DEPTNAME
15 Sales
Costly!
Employee
DEPTID EMPNO FIRSTNAME
15
27 MICHAEL
15
10 CHRISTINE
LASTNAME PHONE
SALARY
THOMPSON 406-463-1234
41250
SMITH
408-463-4963
52750
15
© 2008 IBM Corporation
DB2 9 Technology Leadership
ƒ Native XML hierarchical storage
– No shredding, no CLOBs, no BLOBs required
– Optimized for XPATH and XQuery processing
ƒ High performance
– Superior indexing technology
– No parsing of XML data at query runtime
ƒ Fully integrated XML and relational processing
– Seamlessly query various types of data at once
– No internal translation of XQuery into SQL
ƒ Schema flexibility
– Changes don’t force unload / reload of data
– Multiple schemas allowed per XML column
16
© 2008 IBM Corporation
8
DB2 pureXML Detailed Features List
ƒ XML data type for columns
– Stored as native XML, with options for inlining and compression
ƒ Language bindings for XML type in programming languages
– cobol, c, java, etc..
ƒ XML indexes
ƒ An XML schema/DTD repository
– Support for multiple schemas, schema validation triggers, check
constraints, compatible schema evolution
ƒ Support for XQuery as a primary language as well as:
–
–
–
–
Sub-document update (transform function)
Support for SQL within XQuery
Support for XQuery with SQL
Support for new SQL/XML functions
ƒ XSLT Support
ƒ Performance, scale, and everything else you expect from a
DBMS
17
© 2008 IBM Corporation
DB2 pureXML Detailed Features List (continued)
ƒ XML Import, Export and Load
ƒ XML Runstats (w/ initial optimizer support)
ƒ XML type support in stored procedures
ƒ XML type supported by HADR
ƒ .NET add-in to support DB2 XML type
ƒ JDBC/ODBC support for Xquery, JDBC 4.0 (JSR 221) in 9.5
ƒ XML type for CLI, Embedded SQL in C++ and Cobol, PHP. Ruby
on Rails and Perl
ƒ Queue Replication support
ƒ Federation support for XML
ƒ and more…
18
© 2008 IBM Corporation
9
Benefits of managing XML data with DB2 9
ƒ Lower Development Costs
– Reduced code and development complexity
– Improved developer productivity
Æ Quicken
solution development and gain cost savings
ƒ Greater Business Agility
– Easily accommodate changes to data and schemas
– Update applications rapidly and reduce maintenance
costs
Æ Respond
quickly to dynamic conditions and get faster time to value
ƒ Improved Business Insight
– Access to “hidden gems” (data) in unexploited
documents
– Unprecedented application performance
Æ Gain
19
competitive advantage through better and quicker
information
© 2008 IBM Corporation
XML Data Needs Relational Maturity
ƒ XML Data Needs Protection
– Backup and recovery features to ensure
continuity
– Data is protected using database security
5
ƒ Simplified XML Data Access
– Centrally store and access difficult to retrieve
data
– SQL or XQuery can be used to retrieve data
– Join XML data with it’s related relational data
ƒ Search Speed
– Search documents quickly and efficiently using
proven search optimization engine of mature
database
ƒ Optimize Existing Investments
– Use existing technology infrastructure and skills
to store and manage both relational and XML
20
© 2008 IBM Corporation
10
XML Usage Scenarios …
21
© 2008 IBM Corporation
XML Usage Scenarios
1.
Industry standards and data exchange applications
2.
Web services, SOA data transport and message persistence
3.
Business object / transaction record
4.
Integration of diverse data sources
5.
Forms and workflow processing
6.
Document storage and querying
7.
XML Feeds and Web 2.0 Syndication
8.
Mapping XML in relational applications
9.
Better data model for certain types of data
10. Rapid application prototyping and development
… and many more!
22
© 2008 IBM Corporation
11
XML - the foundation for SOA and Web Services
ƒ XML is the transport for messages and data in SOA
ƒ XML DBs can provide SOA data services
ƒ SOA messages/data often need to be persisted
– Temporary Cache
– Audit Logs
– Compliance Records
– Insight
XML
Service
Requestor
Service
Provider
<xml>
23
© 2008 IBM Corporation
XML Transaction Records / Business Objects
ƒ Transactions being conducted as XML
– Within SOA environments
– Between value chain members
Æ Need to store the transaction record and query later
ƒ Many business objects being represented as XML
– Purchase orders
– Invoices
– Insurance policies
Æ Need to store XML business objects intact
24
© 2008 IBM Corporation
12
Integration of Diverse Data Sources
ƒ XML database as integration hub
– XML schema flexibility Æ integrate data with differing formats
– XQuery language Æ excellent for joining different data sources
ƒ Integration using SOA environments
– Services Oriented Integration (SOI)
Applications,
Services,
Employee/
Customer
Portals,
Suppliers,
Distributors,
Partners,
Agencies
DB2 9
<xml>
<xml>
<xml>
Z
E
O
</xml>
25
© 2008 IBM Corporation
Forms and their processing
ƒ
Forms exist for virtually all types of goods and services
– Insurance applications, bank loans, tax filings, …
ƒ
Paper forms being replaced by electronic forms in XML format:
– IBM Lotus Forms
ƒ
Store entire form (XML document) as a whole in XML database
rather than shred into relational column
v
pro
Ap
Application
Form
DB2 9
e
Audit
<xml>
Insi
ght
Broker
Status
</xml>
26
© 2008 IBM Corporation
13
XML Feeds and Syndication
ƒ Syndication is heartbeat of Web 2.0
ƒ RSS/ATOM Feeds – encapsulated as XML
ƒ Use XML database for serving and storing feeds
ƒ E.g. Stock ticker feeds, inventory feeds, etc.
Web Server
XML
XML
Web Server
XML
XML
ATOM/RSS
Reader
DB2 9
XML
ATOM/RSS
Provider
27
© 2008 IBM Corporation
Mapping XML for relational applications
“Shredding” may be ok if:
ƒ
“Simple” data / Schema not
complicated
ƒ
Stable schema, no evolution
ƒ
XML is merely a transport i.e. XML
structure not relevant
ƒ
Existing SQL Apps have only
relational APIs
<product id=“129”>
<name>Acme</name>
<price>12.99</price>
</product>
DB2 9
ID
Name
Price
129
Acme
12.99
…
…
…
– E.g. BI apps, reporting tools
Æ DB2 Annotated Schema Shredding
28
Insight
© 2008 IBM Corporation
14
XML as a better data model
ƒ XML provides a better data model for many new apps
– Flexibility, schema versatility, hierarchical nature
ƒ Semi-structured or unstructured data
– E.g. healthcare records, biological data, contracts, insurance claims,
etc.
ƒ Inherently hierarchical, nested or complex data
– E.g. manuals, books, catalogs, bills of materials, land records, etc.
ƒ Data with changing or evolving schemas
– E.g. Forms, changing industry standard documents, new product
versions, etc.
ƒ Data with Null, Multiple or Unknown values
– E.g., Phone numbers (home, office, mobile), in patient records, etc.
Î pureXML
database a natural choice for XML data
29
© 2008 IBM Corporation
pureXML for Rapid Application Prototyping and
Development
ƒ
Represent multiple elements as a
single object
ƒ
e.g.: Purchase Order
ƒ
Relational:
– Many tables: Customer, Product,
Shipping, …
– Normalization
– Foreign key relationships
– Insert involves many columns
– Complex queries with joins
– Conform to column definition
ƒ
XML:
– Single Purchase Order column
– Easily access individual elements
Î Write
30
less code with pureXML
© 2008 IBM Corporation
15
A DB2 customer experience…
31
© 2008 IBM Corporation
Profile
ƒ One of Norway’s largest providers of
insurance and financial services. Early adopter
of SOA, Web Services and XML
Challenge
ƒ Improve cost effectiveness, speed time to market,
increase product customization.
“Development time using
the XML native store is
overall radically improved
over shredding.
Benefits
Task
Before
With DB2
Development of
search & retrieval business
processes
CLOB: 8 hrs
30 min.
Shred: 2 hrs
Add field to schema
1 week
5 min.
Relative lines of I/O code
(65% reduction)
100
35
Queries
24 - 36 hrs
20 sec - 10 min
Search preparation
Shred: 1 week
½ day
32
Also, shredding often
results in complex
mappings, which mean that
the developer needs deep
competence in constructing
SQL.”
Senior Enterprise Architect
Thore Thomassen
See “Managing XML for Maximum
Return,” IBM White Paper,
www.ibm.com/db2/xml
© 2008 IBM Corporation
16
Proof-of-concept at North American securities firm
ƒ Goal: investigate XML databases to simplify and enable SOA
ƒ POC Objectives
– Evaluate the loading and querying speed of DB2 v9
– Use realistic data and queries to ensure valid results
– Gain experience with XQuery and SQL/XML
ƒ Initial Expectations:
– Having evaluated some of the other products in the market we were
not very optimistic as to what kind of performance figures to expect
Source: charts presented by securities firm at IOD Conference, Oct. 2006
33
© 2008 IBM Corporation
Results
ƒ Majority of POC objectives met with little or no additional tuning due to
DB2 9 new autonomic self tuning and managing capabilities
ƒ All tests required less than 16GB of memory
ƒ Requirement to load approximately 500,000 XML documents per day
achieved in less than 1 hour on DB2 9
ƒ All transactional queries completed sub-second (per record)
– Retrieve XML doc for any specific trade (by trade number)
– Retrieve all trades for a counterparty (by counterparty_ptynbr)
– Retrieve all trades by trade create time
– Retrieve all trades by maturity date range
– Retrieve trades for a given acquire day range, and trade number
range
Source: charts presented by securities firm at IOD Conference, Oct. 2006
34
© 2008 IBM Corporation
17
DB2 9 Architecture
XML Integrated in All Facets of DB2!
XML Developer
“I see a sophisticated
XML repository that
also supports SQL."
Familiar
Programming Models
SQL Developer
"I see a sophisticated
RDBMS that also
supports XML."
New storage model
Familiar
tooling
New indexing,
optimization
Reliability,
scalability,
high performance
New query
support
New XML applications benefit from
• Ability to seamlessly leverage relational investment
• Proven Infrastructure that provides enterprise-class capabilities
35
© 2008 IBM Corporation
Information Fidelity
Integration
Schema Flexibility
Performance/Scale
Programming Models
Manageability
36
Pure XML
Hybrid
CLOB
XML db
Industry
bundles
Shred
DB2 9 pureXML® Storage vs. the Competition
8999
9889
8 = 99
= 88 9
88 = 9
9889
© 2008 IBM Corporation
18
IBM Software Group
Data Management – DB2 9.5 pureXML ®
Summer/Fall 2008
Data Management Solutions
Information Management
IBM Software Group
37
© 2008 IBM Corporation
IBM Software Group
The Business Value of XML
Data Management Solutions
Information Management
IBM Software Group
38
© 2008 IBM Corporation
19
Fly UP