...

A quarterly journal 2012 Issue 1 06

by user

on
Category: Documents
22

views

Report

Comments

Transcript

A quarterly journal 2012 Issue 1 06
A quarterly journal
2012
Issue 1
06
The third wave of
customer analytics
30
The art and science
of new analytics
technology
Reshaping the
workforce with
the new analytics
Mike Driscoll
CEO, Metamarkets
44
Natural language
processing and social
media intelligence
58
Building the foundation
for a data science culture
Acknowledgments
Advisory
Principal & Technology Leader
Tom DeGarmo
US Thought Leadership
Partner-in-Charge
Tom Craren
Strategic Marketing
Natalie Kontra
Jordana Marx
Center for Technology
& Innovation
Managing Editor
Bo Parker
Editors
Vinod Baya
Alan Morrison
Contributors
Galen Gruman
Steve Hamby and Orbis Technologies
Bud Mathaisel
Uche Ogbuji
Bill Roberts
Brian Suda
Editorial Advisors
Larry Marion
Copy Editor
Lea Anne Bantsari
Transcriber
Dawn Regan
02
PwC Technology Forecast 2012 Issue 1
US studio
Design Lead
Tatiana Pechenik
Designer
Peggy Fresenburg
Illustrators
Don Bernhardt
James Millefolie
Production
Jeff Ginsburg
Online
Managing Director Online Marketing
Jack Teuber
Designer and Producer
Scott Schmidt
Animator
Roger Sano
Reviewers
Jeff Auker
Ken Campbell
Murali Chilakapati
Oliver Halter
Matt Moore
Rick Whitney
Special thanks
Cate Corcoran
WIT Strategy
Nisha Pathak
Metamarkets
Lisa Sheeran
Sheeran/Jager Communication
Industry perspectives
During the preparation of this
publication, we benefited greatly
from interviews and conversations
with the following executives:
Kurt J. Bilafer
Regional Vice President, Analytics,
Asia Pacific Japan
SAP
Jonathan Chihorek
Vice President, Global Supply Chain
Systems
Ingram Micro
Zach Devereaux
Chief Analyst
Nexalogy Environics
Mike Driscoll
Chief Executive Officer
Metamarkets
Elissa Fink
Chief Marketing Officer
Tableau Software
Kaiser Fung
Adjunct Professor
New York University
Jonathan Newman
Senior Director, Enterprise Web & EMEA
eSolutions
Ingram Micro
Ashwin Rangan
Chief Information Officer
Edwards Lifesciences
Seth Redmore
Vice President, Marketing and Product
Management
Lexalytics
Vince Schiavone
Co-founder and Executive Chairman
ListenLogic
Jon Slade
Global Online and Strategic Advertising
Sales Director
Financial Times
Claude Théoret
President
Nexalogy Environics
Saul Zambrano
Senior Director,
Customer Energy Solutions
Pacific Gas & Electric
Kent Kushar
Chief Information Officer
E. & J. Gallo Winery
Josée Latendresse
Owner
Latendresse Groupe Conseil
Mario Leone
Chief Information Officer
Ingram Micro
Jock Mackinlay
Director, Visual Analysis
Tableau Software
Reshaping the workforce with the new analytics
03
The right data +
the right resolution =
a new culture
of inquiry
Tom DeGarmo
US Technology Consulting Leader
[email protected]
Message from the editor
James Balog1 may have more influence
on the global warming debate than
any scientist or politician. By using
time-lapse photographic essays of
shrinking glaciers, he brings art and
science together to produce striking
visualizations of real changes to
the planet. In 60 seconds, Balog
shows changes to glaciers that take
place over a period of many years—
introducing forehead-slapping
insight to a topic that can be as
difficult to see as carbon dioxide.
Part of his success can be credited to
creating the right perspective. If the
photographs had been taken too close
to or too far away from the glaciers,
the insight would have been lost. Data
at the right resolution is the key.
Glaciers are immense, at times more
than a mile deep. Amyloid particles
that are the likely cause of Alzheimer’s
1 http://www.jamesbalog.com/.
04
PwC Technology Forecast 2012 Issue 1
disease sit at the other end of the size
spectrum. Scientists’ understanding
of the role of amyloid particles in
Alzheimer’s has relied heavily on
technologies such as scanning tunneling
microscopes.2 These devices generate
visual data at sufficient resolution
so that scientists can fully explore
the physical geometry of amyloid
particles in relation to the brain’s
neurons. Once again, data at the right
resolution together with the ability to
visually understand a phenomenon
are moving science forward.
Science has long focused on data-driven
understanding of phenomenon. It’s
called the scientific method. Enterprises
also use data for the purposes of
understanding their business outcomes
and, more recently, the effectiveness and
efficiency of their business processes.
But because running a business is not the
same as running a science experiment,
2Davide Brambilla, et al., “Nanotechnologies for
Alzheimer’s disease: diagnosis, therapy, and safety
issues,” Nanomedicine: Nanotechnology, Biology and
Medicine 7, no. 5 (2011): 521–540.
there has long been a divergence
between analytics as applied to science
and the methods and processes that
define analytics in the enterprise.
This difference partly has been a
question of scale and instrumentation.
Even a large science experiment (setting
aside the Large Hadron Collider) will
introduce sufficient control around the
inquiry of interest to limit the amount of
data collected and analyzed. Any large
enterprise comprises tens of thousands
of moving parts, from individual
employees to customers to suppliers to
products and services. Measuring and
retaining the data on all aspects of an
enterprise over all relevant periods of
time are still extremely challenging,
even with today’s IT capacities.
But targeting the most important
determinants of success in an enterprise
context for greater instrumentation—
often customer information—can be and
is being done today. And with Moore’s
Law continuing to pay dividends, this
instrumentation will expand in the
future. In the process, and with careful
attention to the appropriate resolution
of the data being collected, enterprises
that have relied entirely on the art of
management will increasingly blend in
the science of advanced analytics. Not
surprisingly, the new role emerging in
the enterprise to support these efforts
is often called a “data scientist.”
This issue of the Technology Forecast
examines advanced analytics through
this lens of increasing instrumentation.
PwC’s view is that the flow of data
at this new, more complete level of
resolution travels in an arc beginning
with big data techniques (including
NoSQL and in-memory databases),
through advanced statistical packages
(from the traditional SPSS and SAS
to open source offerings such as R),
to analytic visualization tools that put
interactive graphics in the control of
business unit specialists. This arc is
positioning the enterprise to establish
a new culture of inquiry, where
decisions are driven by analytical
precision that rivals scientific insight.
The first article, “The third wave of
customer analytics,” on page 06 reviews
the impact of basic computing trends
on emerging analytics technologies.
Enterprises have an unprecedented
opportunity to reshape how business
gets done, especially when it comes
to customers. The second article,
“The art and science of new analytics
technology,” on page 30 explores the
mix of different techniques involved
in making the insights gained from
analytics more useful, relevant, and
visible. Some of these techniques are
clearly in the data science realm, while
others are more art than science. The
article, “Natural language processing
and social media intelligence,” on
page 44 reviews many different
language analytics techniques in use
for social media and considers how
combinations of these can be most
effective.“How CIOs can build the
foundation for a data science culture”
on page 58 considers new analytics as
an unusually promising opportunity
for CIOs. In the best case scenario,
the IT organization can become the
go-to group, and the CIO can become
the true information leader again.
This issue also includes interviews
with executives who are using new
analytics technologies and with subject
matter experts who have been at the
forefront of development in this area:
• Mike Driscoll of Metamarkets
considers how NoSQL and other
analytics methods are improving
query speed and providing
greater freedom to explore.
• Jon Slade of the Financial Times
(FT.com) discusses the benefits
of cloud analytics for online
ad placement and pricing.
• Jock Mackinlay of Tableau Software
describes the techniques behind
interactive visualization and
how more of the workforce can
become engaged in analytics.
• Ashwin Rangan of Edwards
Lifesciences highlights new
ways that medical devices can
be instrumented and how new
business models can evolve.
Please visit pwc.com/techforecast
to find these articles and other issues
of the Technology Forecast online.
If you would like to receive future
issues of this quarterly publication as
a PDF attachment, you can sign up at
pwc.com/techforecast/subscribe.
As always, we welcome your feedback
and your ideas for future research
and analysis topics to cover.
Reshaping the workforce with the new analytics
05
Bahrain World Trade Center
gets approximately 15% of its
power from these wind turbines
06
PwC Technology Forecast 2012 Issue 1
The third wave of
customer analytics
These days, there’s only one way to scale the
analysis of customer-related information to
increase sales and profits—by tapping the data
and human resources of the extended enterprise.
By Alan Morrison and Bo Parker
As director of global online and
strategic advertising sales for FT.com,
the online face of the Financial Times,
Jon Slade says he “looks at the 6 billion
ad impressions [that FT.com offers]
each year and works out which one
is worth the most for any particular
client who might buy.” This activity
previously required labor-intensive
extraction methods from a multitude
of databases and spreadsheets. Slade
made the process much faster and
vastly more effective after working
with Metamarkets, a company that
offers a cloud-based, in-memory
analytics service called Druid.
“Before, the sales team would send
an e-mail to ad operations for an
inventory forecast, and it could take
a minimum of eight working hours
and as long as two business days to
get an answer,” Slade says. Now, with
a direct interface to the data, it takes
a mere eight seconds, freeing up the
ad operations team to focus on more
strategic issues. The parallel processing,
in-memory technology, the interface,
and many other enhancements led to
better business results, including doubledigit growth in ad yields and 15 to 20
percent accuracy improvement in the
metrics for its ad impression supply.
The technology trends behind
FT.com’s improvements in advertising
operations—more accessible data;
faster, less-expensive computing; new
software tools; and improved user
interfaces—are driving a new era in
analytics use at large companies around
the world, in which enterprises make
decisions with a precision comparable
to scientific insight. The new analytics
uses a rigorous scientific method,
including hypothesis formation and
testing, with science-oriented statistical
packages and visualization tools. It is
spawning business unit “data scientists”
who are replacing the centralized
analytics units of the past. These trends
will accelerate, and business leaders
Reshaping the workforce with the new analytics
07
Figure 1: How better customer analytics capabilities are affecting enterprises
More computing speed,
storage, and ability to scale
Leads to
More time and better tools
More data sources
More focus on key metrics
Better access to results
Leads to
A broader culture of inquiry
Leads to
Less guesswork
Less bias
More awareness
Better decisions
Processing power and memory keep increasing, the
ability to leverage massive parallelization continues to
expand in the cloud, and the cost per processed bit
keeps falling.
Data scientists are seeking larger data sets and iterating
more to refine their questions and find better answers.
Visualization capabilities and more intuitive user
interfaces are making it possible for most people in
the workforce to do at least basic exploration.
Social media data is the most prominent example of the
many large data clouds emerging that can help
enterprises understand their customers better. These
clouds augment data that business units have direct
access to internally now, which is also growing.
A core single metric can be a way to rally the entire
organization’s workforce, especially when that core
metric is informed by other metrics generated with the
help of effective modeling.
Whether an enterprise is a gaming or an e-commerce
company that can instrument its own digital environment, or a smart grid utility that generates, slices, dices,
and shares energy consumption analytics for its
customers and partners, better analytics are going
direct to the customer as well as other stakeholders.
And they’re being embedded where users can more
easily find them.
Visualization and user interface improvements have
made it possible to spread ad hoc analytics capabilities
across the workplace to every user role. At the same
time, data scientists—people who combine a creative
ability to generate useful hypotheses with the savvy to
simulate and model a business as it’s changing—have
never been in more demand than now.
The benefits of a broader culture of inquiry include new
opportunities, a workforce that shares a better understanding of customer needs to be able to capitalize on
the opportunities, and reduced risk. Enterprises that
understand the trends described here and capitalize
on them will be able to change company culture and
improve how they attract and retain customers.
who embrace the new analytics will be
able to create cultures of inquiry that
lead to better decisions throughout
their enterprises. (See Figure 1.)
This issue of the Technology Forecast
explores the impact of the new
analytics and this culture of inquiry.
This first article examines the essential
ingredients of the new analytics, using
several examples. The other articles
08
PwC Technology Forecast 2012 Issue 1
in this issue focus on the technologies
behind these capabilities (see the
article, “The art and science of new
analytics technology,” on page 30)
and identify the main elements of a
CIO strategic framework for effectively
taking advantage of the full range of
analytics capabilities (see the article,
“How CIOs can build the foundation for
a data science culture,” on page 58).
More computing speed,
storage, and ability to scale
Basic computing trends are providing
the momentum for a third wave
in analytics that PwC calls the new
analytics. Processing power and
memory keep increasing, the ability
to leverage massive parallelization
continues to expand in the cloud, and
the cost per processed bit keeps falling.
FT.com benefited from all of these
trends. Slade needs multiple computer
screens on his desk just to keep up. His
job requires a deep understanding of
the readership and which advertising
suits them best. Ad impressions—
appearances of ads on web pages—
are the currency of high-volume media
industry websites. The impressions
need to be priced based on the reader
segments most likely to see them and
click through. Chief executives in
France, for example, would be a reader
segment FT.com would value highly.
“The trail of data that users create
when they look at content on a website
like ours is huge,” Slade says. “The
real challenge has been trying to
understand what information is useful
to us and what we do about it.”
FT.com’s analytics capabilities were
a challenge, too. “The way that data
was held—the demographics data, the
behavior data, the pricing, the available
inventory—was across lots of different
databases and spreadsheets,” Slade
says. “We needed an almost witchcraftlike algorithm to provide answers to
‘How many impressions do I have?’ and
‘How much should I charge?’ It was an
extremely labor-intensive process.”
FT.com saw a possible solution when
it first talked to Metamarkets about
an initial concept, which evolved as
they collaborated. Using Metamarkets’
analytics platform, FT.com could
quickly iterate and investigate
numerous questions to improve its
decision-making capabilities. “Because
our technology is optimized for the
cloud, we can harness the processing
power of tens, hundreds, or thousands
of servers depending on our customers’
data and their specific needs,” states
Mike Driscoll, CEO of Metamarkets.
“We can ask questions over billions
of rows of data in milliseconds. That
kind of speed combined with data
science and visualization helps business
users understand and consume
information on top of big data sets.”
Decades ago, in the first wave of
analytics, small groups of specialists
managed computer systems, and even
smaller groups of specialists looked for
answers in the data. Businesspeople
typically needed to ask the specialists
to query and analyze the data. As
enterprise data grew, collected from
enterprise resource planning (ERP)
systems and other sources, IT stored the
more structured data in warehouses so
analysts could assess it in an integrated
form. When business units began to
ask for reports from collections of data
relevant to them, data marts were born,
but IT still controlled all the sources.
The second wave of analytics saw
variations of centralized top-down data
collection, reporting, and analysis. In
the 1980s, grassroots decentralization
began to counter that trend as the PC
era ushered in spreadsheets and other
methods that quickly gained widespread
use—and often a reputation for misuse.
Data warehouses and marts continue
to store a wealth of helpful data.
In both waves, the challenge for
centralized analytics was to respond to
business needs when the business units
themselves weren’t sure what findings
they wanted or clues they were seeking.
The third wave does that by giving
access and tools to those who act
on the findings. New analytics taps
the expertise of the broad business
Reshaping the workforce with the new analytics
09
Figure 2: The three waves of analytics and the impact of decentralization
Cloud computing accelerates decentralization of the analytics function.
Cloud co-creation
Data
in the
cloud
Trend toward decentralization
Self-service
Central IT generated
A
B
C
1
2
3
4
5
6
7
Analytics functions in enterprises
were all centralized in the beginning,
but not always responsive to
business needs.
PCs and then the web and an
increasingly interconnected
business ecosystem have provided
more responsive alternatives.
ecosystem to address the lack of
responsiveness from central analytics
units. (See Figure 2.) Speed, storage,
and scale improvements, with the
help of cloud co-creation, have
made this decentralized analytics
possible. The decentralized analytics
innovation has evolved faster than
the centralized variety, and PwC
expects this trend to continue.
“In the middle of looking at some data,
you can change your mind about what
question you’re asking. You need to be
able to head toward that new question
on the fly,” says Jock Mackinlay,
director of visual analysis at Tableau
Software, one of the vendors of the new
visualization front ends for analytics.
“No automated system is going to keep
up with the stream of human thought.”
The trend toward
decentralization continues as
business units, customers, and
other stakeholders collaborate
to diagnose and work on
problems of mutual interest in
the cloud.
More time and better tools
Big data techniques—including NoSQL1
and in-memory databases, advanced
statistical packages (from SPSS and
SAS to open source offerings such as R),
visualization tools that put interactive
graphics in the control of business
unit specialists, and more intuitive
user interfaces—are crucial to the new
analytics. They make it possible for
many people in the workforce to do
some basic exploration. They allow
business unit data scientists to use larger
data sets and to iterate more as they test
hypotheses, refine questions, and find
better answers to business problems.
Data scientists are nonspecialists
who follow a scientific method of
iterative and recursive analysis with a
practical result in mind. Even without
formal training, some business users
in finance, marketing, operations,
human capital, or other departments
1 See “Making sense of Big Data,” Technology Forecast
2010, Issue 3, http://www.pwc.com/us/en/technologyforecast/2010/issue3/index.jhtml, for more information
on Hadoop and other NoSQL databases.
10
PwC Technology Forecast 2012 Issue 1
Case study
How the E. & J. Gallo Winery
matches outbound shipments
to retail customers
E. & J. Gallo Winery, one of the world’s
largest producers and distributors
of wines, recognizes the need to
precisely identify its customers for
two reasons: some local and state
regulations mandate restrictions on
alcohol distribution, and marketing
brands to individuals requires
knowing customer preferences.
“The majority of all wine is consumed
within four hours and five miles
of being purchased, so this makes
it critical that we know which
products need to be marketed and
distributed by specific destination,”
says Kent Kushar, Gallo’s CIO.
Gallo knows exactly how its products
move through distributors, but
tracking beyond them is less clear.
Some distributors are state liquor
control boards, which supply the
wine products to retail outlets and
other end customers. Some sales are
through military post exchanges, and
in some cases there are restrictions and
regulations because they are offshore.
Gallo has a large compliance
department to help it manage the
regulatory environment in which Gallo
products are sold, but Gallo wants
to learn more about the customers
who eventually buy and consume
those products, and to learn from
them information to help create
new products that localize tastes.
Gallo sometimes cannot obtain point
of sales data from retailers to complete
the match of what goes out to what is
sold. Syndicated data, from sources
such as Information Resources, Inc.
(IRI), serves as the matching link
between distribution and actual
consumption. This results in the
accumulation of more than 1GB of
data each day as source information
for compliance and marketing.
Years ago, Gallo’s senior management
understood that customer analytics
would be increasingly important. The
company’s most recent investments are
extensions of what it wanted to do 25
years ago but was limited by availability
of data and tools. Since 1998, Gallo
IT has been working on advanced
data warehouses, analytics tools, and
visualization. Gallo was an early adopter
of visualization tools and created IT
subgroups within brand marketing to
leverage the information gathered.
The success of these early efforts has
spurred Gallo to invest even more
in analytics. “We went from step
function growth to logarithmic growth
of analytics; we recently reinvested
heavily in new appliances, a new
system architecture, new ETL [extract,
transform, and load] tools, and new
ways our SQL calls were written; and
we began to coalesce unstructured
data with our traditional structured
consumer data,” says Kushar.
“Recognizing the power of these
capabilities has resulted in our taking a
10-year horizon approach to analytics,”
he adds. “Our successes with analytics
to date have changed the way we
think about and use analytics.”
The result is that Gallo no longer relies
on a single instance database, but has
created several large purpose-specific
databases. “We have also created
new service level agreements for our
internal customers that give them
faster access and more timely analytics
and reporting,” Kushar says. Internal
customers for Gallo IT include supply
chain, sales, finance, distribution,
and the web presence design team.
Reshaping the workforce with the new analytics
11
already have the skills, experience,
and mind-set to be data scientists.
Others can be trained. The teaching of
the discipline is an obvious new focus
for the CIO. (See the article,”How
CIOs can build the foundation for a
data science culture” on page 58.)
Visualization tools have been especially
useful for Ingram Micro, a technology
products distributor, which uses them
to choose optimal warehouse locations
around the globe. Warehouse location is
a strategic decision, and Ingram Micro
can run many what-if scenarios before it
decides. One business result is shorterterm warehouse leases that give Ingram
Micro more flexibility as supply chain
requirements shift due to cost and time.
Over time, academia
and the business
software community
have collaborated
to make analytics
tools more userfriendly and more
accessible to people
who aren’t steeped
in the mathematical
expressions needed to
query and get good
answers from data.
12
PwC Technology Forecast 2012 Issue 1
“Ensuring we are at the efficient frontier
for our distribution is essential in this
fast-paced and tight-margin business,”
says Jonathan Chihorek, vice president
of global supply chain systems at Ingram
Micro. “Because of the complexity,
size, and cost consequences of these
warehouse location decisions, we run
extensive models of where best to
locate our distribution centers at least
once a year, and often twice a year.”
Modeling has become easier thanks
to mixed integer, linear programming
optimization tools that crunch large
and diverse data sets encompassing
many factors. “A major improvement
came from the use of fast 64-bit
processors and solid-state drives that
reduced scenario run times from
six to eight hours down to a fraction
of that,” Chihorek says. “Another
breakthrough for us has been improved
visualization tools, such as spider and
bathtub diagrams that help our analysts
choose the efficient frontier curve
from a complex array of data sets that
otherwise look like lists of numbers.”
Analytics tools were once the province
of experts. They weren’t intuitive,
and they took a long time to learn.
Those who were able to use them
tended to have deep backgrounds
in mathematics, statistical analysis,
or some scientific discipline. Only
companies with dedicated teams of
specialists could make use of these
tools. Over time, academia and the
business software community have
collaborated to make analytics tools
more user-friendly and more accessible
to people who aren’t steeped in the
mathematical expressions needed to
query and get good answers from data.
Products from QlikTech, Tableau
Software, and others immerse users in
fully graphical environments because
most people gain understanding more
quickly from visual displays of numbers
rather than from tables. “We allow
users to get quickly to a graphical view
of the data,” says Tableau Software’s
Mackinlay. “To begin with, they’re
using drag and drop for the fields
in the various blended data sources
they’re working with. The software
interprets the drag and drop as algebraic
expressions, and that gets compiled
into a query database. But users don’t
need to know all that. They just need
to know that they suddenly get to
see their data in a visual form.”
Tableau Software itself is a prime
example of how these tools are
changing the enterprise. “Inside
Tableau we use Tableau everywhere,
from the receptionist who’s keeping
track of conference room utilization
to the salespeople who are monitoring
their pipelines,” Mackinlay says.
These tools are also enabling
more finance, marketing, and
operational executives to become
data scientists, because they help
them navigate the data thickets.
Figure 3: Improving the signal-to-noise ratio in social media monitoring
Social media is a high-noise environment
But there are ways to reduce the noise
And focus on significant conversations
Illuminating and helpful dialogue
work boots
leather
heel
color fashion
boots
construction safety
rugged
style
leather
shoes
price
boots
store
wear
style
cool
leather
toe
safety
value
rugged
construction
An initial set of relevant terms is used to cut
back on the noise dramatically, a first step
toward uncovering useful conversations.
heel
color fashion
With proper guidance, machines can do
millions of correlations, clustering words by
context and meaning.
shoes
price
store
boots
wear
cool
toe
safety
value
rugged
construction
Visualization tools present “lexical maps” to
help the enterprise unearth instances of
useful customer dialog.
Source: Nexalogy Environics and PwC, 2012
More data sources
The huge quantities of data in the
cloud and the availability of enormous
low-cost processing power can help
enterprises analyze various business
problems—including efforts to
understand customers better, especially
through social media. These external
clouds augment data that business units
already have direct access to internally.
Ingram Micro uses large, diverse data
sets for warehouse location modeling,
Chihorek says. Among them: size,
weight, and other physical attributes
of products; geographic patterns of
consumers and anticipated demand
for product categories; inbound and
outbound transportation hubs, lead
times, and costs; warehouse lease and
operating costs, including utilities;
and labor costs—to name a few.
Social media can also augment
internal data for enterprises willing to
learn how to use it. Some companies
ignore social media because so much
of the conversation seems trivial,
but they miss opportunities.
Consider a North American apparel
maker that was repositioning a brand
of shoes and boots. The manufacturer
was mining conventional business data
for insights about brand status, but
it had not conducted any significant
analysis of social media conversations
about its products, according to Josée
Latendresse, who runs Latendresse
Groupe Conseil, which was advising
the company on its repositioning
effort. “We were neglecting the
wealth of information that we could
find via social media,” she says.
To expand the analysis, Latendresse
brought in technology and expertise
from Nexalogy Environics, a company
that analyzes the interest graph implied
in online conversations—that is, the
connections between people, places, and
things. (See “Transforming collaboration
with social tools,” Technology Forecast
2011, Issue 3, for more on interest
graphs.) Nexalogy Environics studied
millions of correlations in the interest
graph and selected fewer than 1,000
relevant conversations from 90,000 that
mentioned the products. In the process,
Nexalogy Environics substantially
increased the “signal” and reduced
the “noise” in the social media about
the manufacturer. (See Figure 3.)
Reshaping the workforce with the new analytics
13
Figure 4: Adding social media analysis techniques
suggests other changes to the BI process
Here’s one example of how the larger business intelligence (BI) process might
Addingwith
SMA
change
thetechniques
addition of social media analysis.
One apparel maker started with its conventional BI analysis cycle.
Conventional BI techniques
used by an apparel
company client ignored
social media and required
lots of data cleansing. The
results often lacked insight.
1.
2.
3.
4.
5.
1
2
5
4
Develop questions
Collect data
Clean data
Analyze data
Present results
3
Then it added social media and targeted focus groups to the mix.
The company’s revised approach
added several elements such as
social media analysis and
expanded others, but kept the
focus group phase near the
beginning of the cycle. The
company was able to mine new
insights from social media
conversations about market
segments that hadn’t occurred to
the company to target before.
1. Develop questions
2. Refine conventional BI
- Collect data
- Clean data
- Analyze data
3. Conduct focus groups
(retailers and end users)
4. Select conversations
5. Analyze social media
6. Present results
1
6
2
5
3
4
Then it tuned the process for maximum impact.
The company’s current
approach places focus
groups near the end, where
they can inform new
questions more directly. This
approach also stresses how
the results get presented to
executive leadership.
1
7
2
6
3
5
New step added
What Nexalogy Environics discovered
suggested the next step for the brand
repositioning. “The company wasn’t
marketing to people who were blogging
about its stuff,” says Claude Théoret,
president of Nexalogy Environics.
The shoes and boots were designed
for specific industrial purposes, but
the blogging influencers noted their
fashion appeal and their utility when
riding off-road on all-terrain vehicles
and in other recreational settings.
“That’s a whole market segment
the company hadn’t discovered.”
Latendresse used the analysis to
help the company expand and
refine its intelligence process more
14
PwC Technology Forecast 2012 Issue 1
4
1. Develop questions
2. Refine conventional BI
- Collect data
- Clean data
- Analyze data
3. Select conversations
4. Analyze social media
5. Present results
6. Tailor results to audience
7. Conduct focus groups
(retailers and end users)
generally. “The key step,” she says,
“is to define the questions that you
want to have answered. You will
definitely be surprised, because
the system will reveal customer
attitudes you didn’t anticipate.”
Following the social media analysis
(SMA), Latendresse saw the retailer
and its user focus groups in a new
light. The analysis “had more complete
results than the focus groups did,” she
says. “You could use the focus groups
afterward to validate the information
evident in the SMA.” The revised
intelligence development process
now places focus groups closer to the
end of the cycle. (See Figure 4.)
Figure 5: The benefits of big data analytics: A carrier example
By analyzing billions of call records, carriers are able to obtain early warning of groups of subscribers likely to switch services.
Here is how it works:
1 Carrier notes big peaks
in churn.*
2 Dataspora brought in to
analyze all call records.
14 billion
call data records
analyzed
$
$
DON’T GO!
We’ll miss you!
3 The initial analysis debunks some
myths and raises new questions
discussed with the carrier.
Dropped calls/poor service?
Merged to family plan?
Preferred phone unavailable?
Offer by competitor?
Financial trouble?
Dropped dead?
Incarcerated?
Friend dropped recently!
Pattern spotted: Those with a
relationship to a dropped customer
(calls lasting longer than two minutes,
more than twice in the previous
month) are 500% more likely to drop.
$
$
6 Marketers begin
campaigns that target
at-risk subscriber groups
with special offers.
Carrier’s
prime hypothesis
disproved
5 Data group deploys a call
record monitoring system that
issues an alert that identifies
at-risk subscribers.
4 Further analysis confirms that friends influence
other friends’ propensity to switch services.
* Churn: the proportion of contractual subscribers who leave during
a given time period
Source: Metamarkets and PwC, 2012
Third parties such as Nexalogy
Environics are among the first to
take advantage of cloud analytics.
Enterprises like the apparel maker may
have good data collection methods
but have overlooked opportunities to
mine data in the cloud, especially social
media. As cloud capabilities evolve,
enterprises are learning to conduct more
iteration, to question more assumptions,
and to discover what else they can
learn from data they already have.
More focus on key metrics
One way to start with new analytics is
to rally the workforce around a single
core metric, especially when that core
metric is informed by other metrics
generated with the help of effective
modeling. The core metric and the
model that helps everyone understand
it can steep the culture in the language,
methods, and tools around the
process of obtaining that goal.
A telecom provider illustrates the
point. The carrier was concerned
about big peaks in churn—customers
moving to another carrier—but hadn’t
methodically mined the whole range of
its call detail records to understand the
issue. Big data analysis methods made
a large-scale, iterative analysis possible.
The carrier partnered with Dataspora, a
consulting firm run by Driscoll before he
founded Metamarkets. (See Figure 5.)2
“We analyzed 14 billion call data
records,” Driscoll recalls, “and built a
high-frequency call graph of customers
who were calling each other. We found
that if two subscribers who were friends
spoke more than once for more than
two minutes in a given month and the
first subscriber cancelled their contract
in October, then the second subscriber
became 500 percent more likely to
cancel their contract in November.”
2 For more best practices on methods to address churn,
see Curing customer churn, PwC white paper, http://
www.pwc.com/us/en/increasing-it-effectiveness/
publications/curing-customer-churn.jhtml, accessed
April 5, 2012.
Reshaping the workforce with the new analytics
15
Data mining on that scale required
distributed computing across hundreds
of servers and repeated hypothesis
testing. The carrier assumed that
dropped calls might be one reason
why clusters of subscribers were
cancelling contracts, but the Dataspora
analysis disproved that notion,
finding no correlation between
dropped calls and cancellation.
“Through the power
of information and
presentation, you can
start to show customers
different ways that they
can become stewards
of energy.”
—Saul Zambrano, PG&E
“There were a few steps we took. One
was to get access to all the data and next
do some engineering to build a social
graph and other features that might
be meaningful, but we also disproved
some other hypotheses,” Driscoll says.
Watching what people actually did
confirmed that circles of friends were
cancelling in waves, which led to the
peaks in churn. Intense focus on the key
metric illustrated to the carrier and its
workforce the power of new analytics.
Better access to results
The more pervasive the online
environment, the more common the
sharing of information becomes.
Whether an enterprise is a gaming
or an e-commerce company that
can instrument its own digital
environment, or a smart grid utility
that generates, slices, dices, and
shares energy consumption analytics
for its customers and partners, better
analytics are going direct to the
customer as well as other stakeholders.
And they’re being embedded where
users can more easily find them.
For example, energy utilities preparing
for the smart grid are starting to
invite the help of customers by
putting better data and more broadly
shared operational and customer
analytics at the center of a co-created
energy efficiency collaboration.
Saul Zambrano, senior director of
customer energy solutions at Pacific
Gas & Electric (PG&E), an early
installer of smart meters, points out
16
PwC Technology Forecast 2012 Issue 1
that policymakers are encouraging
more third-party access to the usage
data from the meters. “One of the big
policy pushes at the regulatory level
is to create platforms where third
parties can—assuming all privacy
guidelines are met—access this data
to build business models they can
drive into the marketplace,” says
Zambrano. “Grid management and
energy management will be supplied
by both the utilities and third parties.”
Zambrano emphasizes the importance
of customer participation to the energy
efficiency push. The issue he raises is
the extent to which blended operational
and customer data can benefit the
larger ecosystem, by involving millions
of residential and business customers.
“Through the power of information
and presentation, you can start to show
customers different ways that they can
become stewards of energy,” he says.
As a highly regulated business, the
utility industry has many obstacles to
overcome to get to the point where
smart grids begin to reach their
potential, but the vision is clear:
• Show customers a few key
metrics and seasonal trends in
an easy-to-understand form.
• Provide a means of improving those
metrics with a deeper dive into where
they’re spending the most on energy.
• Allow them an opportunity to
benchmark their spending by
providing comparison data.
This new kind of data sharing could be a
chance to stimulate an energy efficiency
competition that’s never existed between
homeowners and between business
property owners. It is also an example of
how broadening access to new analytics
can help create a culture of inquiry
throughout the extended enterprise.
Case study
Smart shelving: How the
E. & J. Gallo Winery analytics
team helps its retail partners
Some of the data in the E. & J. Gallo
Winery information architecture is for
production and quality control, not just
customer analytics. More recently, Gallo
has adopted complex event processing
methods on the source information,
so it can look at successes and failures
early in its manufacturing execution
system, sales order management,
and the accounting system that
front ends the general ledger.
Information and information flow are
the lifeblood of Gallo, but it is clearly
a team effort to make the best use
of the information. In this team:
what the data reveal (for underlying
trends of specific brands by location),
or to conduct R&D in a test market,
or to listen to the web platforms.
These insights inform a specific design
for “smart shelving,” which is the
placement of products by geography
and location within the store. Gallo
offers a virtual wine shelf design
schematic to retailers, which helps
the retailer design the exact details
of how wine will be displayed—by
brand, by type, and by price. Gallo’s
wine shelf design schematic will help
the retailer optimize sales, not just for
Gallo brands but for all wine offerings.
• Supply chain looks at the flows.
• Sales determines what information is
needed to match supply and demand.
• R&D undertakes the heavy-duty
customer data integration, and it
designs pilots for brand consumption.
• IT provides the data and consulting
on how to use the information.
Before Gallo’s wine shelf design
schematic, wine sales were not a major
source of retail profits for grocery stores,
but now they are the first or second
highest profit generators in those stores.
“Because of information models such as
the wine shelf design schematic, Gallo
has been the wine category captain for
some grocery stores for 11 years in a row
so far,” says Kent Kushar, CIO of Gallo.
Mining the information for patterns and
insights in specific situations requires
the team. A key goal is what Gallo refers
to as demand sensing—to determine
the stimulus that creates demand by
brand and by product. This is not just
a computer task, but is heavily based
on human intervention to determine
Reshaping the workforce with the new analytics
17
Conclusion: A broader
culture of inquiry
This article has explored how
enterprises are embracing the big data,
tools, and science of new analytics
along a path that can lead them to a
broader culture of inquiry, in which
improved visualization and user
interfaces make it possible to spread ad
hoc analytics capabilities to every user
role. This culture of inquiry appears
likely to become the age of the data
scientists—workers who combine
a creative ability to generate useful
hypotheses with the savvy to simulate
and model a business as it’s changing.
It’s logical that utilities are
instrumenting their environments as
a step toward smart grids. The data
they’re generating can be overwhelming,
but that data will also enable the
analytics needed to reduce energy
consumption to meet efficiency and
environmental goals. It’s also logical
that enterprises are starting to hunt
for more effective ways to filter social
media conversations, as apparel makers
have found. The return on investment
for finding a new market segment can
be the difference between long-term
viability and stagnation or worse.
Tackling the new kinds of data being
generated is not the only analytics task
ahead. Like the technology distributor,
enterprises in all industries have
concerns about scaling the analytics
for data they’re accustomed to having
and now have more. Publishers can
serve readers better and optimize ad
sales revenue by tuning their engines
for timing, pricing, and pinpointing
ad campaigns. Telecom carriers can
mine all customer data more effectively
to be able to reduce the expense
of churn and improve margins.
What all of these examples suggest is a
greater need to immerse the extended
workforce—employees, partners, and
customers—in the data and analytical
methods they need. Without a view
into everyday customer behavior,
there’s no leverage for employees to
influence company direction when
One way to raise awareness about the
power of new analytics comes from
articulating the results in a visual form
that everyone can understand. Another
is to enable the broader workforce to
work with the data themselves and to ask
them to develop and share the results of
their own analyses.
18
PwC Technology Forecast 2012 Issue 1
Table 1: Key elements of a culture of inquiry
Element
How it is manifested within an organization
Value to the organization
Executive support
Senior executives asking for data to support any
opinion or proposed action and using interactive
visualization tools themselves
Set the tone for the rest of the organization with
examples
Data availability
Cloud architecture (whether private or public) and
semantically rich data integration methods
Find good ideas from any source
Analytics tools
Higher-profile data scientists embedded in the
business units
Identify hidden opportunities
Interactive visualization
Visual user interfaces and the right tool for the right
person
Encourage a culture of inquiry
Training
Power users in individual departments
Spread the word and highlight the most effective and
user-friendly techniques
Sharing
Internal portals or other collaborative environments
to publish and discuss inquiries and results
Prove that the culture of inquiry is real
markets shift and there are no insights
into improving customer satisfaction.
Computing speed, storage, and scale
make those insights possible, and it is
up to management to take advantage
of what is becoming a co-creative
work environment in all industries—
to create a culture of inquiry.
Of course, managing culture change is
a much bigger challenge than simply
rolling out more powerful analytics
software. It is best to have several
starting points and to continue to find
ways to emphasize the value of analytics
in new scenarios. One way to raise
awareness about the power of new
analytics comes from articulating the
results in a visual form that everyone
can understand. Another is to enable
the broader workforce to work with
the data themselves and to ask them to
develop and share the results of their
own analyses. Still another approach
would be to designate, train, and
compensate the more enthusiastic users
in all units—finance, product groups,
supply chain, human resources, and
so forth—as data scientists. Table 1
presents examples of approaches to
fostering a culture of inquiry.
The arc of all the trends explored
in this article is leading enterprises
toward establishing these cultures
of inquiry, in which decisions can be
informed by an analytical precision
comparable to scientific insight. New
market opportunities, an energized
workforce with a stake in helping to
achieve a better understanding of
customer needs, and reduced risk are
just some of the benefits of a culture of
inquiry. Enterprises that understand
the trends described here and capitalize
on them will be able to improve how
they attract and retain customers.
Reshaping the workforce with the new analytics
19
The nature of cloudbased data science
Mike Driscoll of Metamarkets talks about
the analytics challenges and opportunities
that businesses moving to the cloud face.
Interview conducted by Alan Morrison and Bo Parker
Mike Driscoll
Mike Driscoll is CEO of Metamarkets,
a cloud-based analytics company he
co-founded in San Francisco in 2010.
PwC: What’s your background,
and how did you end up running
a data science startup?
MD: I came to Silicon Valley after
studying computer science and biology
for five years, and trying to reverse
engineer the genome network for
uranium-breathing bacteria. That
was my thesis work in grad school.
There was lots of modeling and causal
inference. If you were to knock this gene
out, could you increase the uptake of the
reduction of uranium from a soluble to
an insoluble state? I was trying all these
simulations and testing with the bugs
to see whether you could achieve that.
PwC: You wanted to clean up
radiation leaks at nuclear plants?
MD: Yes. The Department of
Energy funded the research work
I did. Then I came out here and I
gave up on the idea of building a
biotech company, because I didn’t
think there was enough commercial
viability there from what I’d seen.
I did think I could take this toolkit I’d
developed and apply it to all these other
businesses that have data. That was the
genesis of the consultancy Dataspora.
As we started working with companies
at Dataspora, we found this huge gap
between what was possible and what
companies were actually doing.
Right now the real shift is that
companies are moving from this very
high-latency-course era of reporting
into one where they start to have lower
latency, finer granularity, and better
20
PwC Technology Forecast 2012 Issue 1
Critical
business
questions
Some companies don’t have all the capabilities
they need to create data science value.
Companies need these three capabilities
to excel in creating data science value.
Value and
change
Good
data
visibility into their operations. They
realize the problem with being walking
amnesiacs, knowing what happened
to their customers in the last 30 days
and then forgetting every 30 days.
Most businesses are just now
figuring out that they have this
wealth of information about their
customers and how their customers
interact with their products.
PwC: On its own, the new
availability of data creates
demand for analytics.
MD: Yes. The absolute number-one
thing driving the current focus in
analytics is the increase in data. What’s
different now from what happened 30
years ago is that analytics is the province
of people who have data to crunch.
What’s causing the data growth? I’ve
called it the attack of the exponentials—
the exponential decline in the cost of
compute, storage, and bandwidth,
and the exponential increase in the
number of nodes on the Internet.
Suddenly the economics of computing
over data has shifted so that almost all
the data that businesses generate is
worth keeping around for its analysis.
PwC: And yet, companies are
still throwing data away.
MD: So many businesses keep only
60 days’ worth of data. The storage
cost is so minimal! Why would you
throw it away? This is the shift at the
big data layer; when these companies
store data, they store it in a very
expensive relational database. There
needs to be different temperatures
of data, and companies need to
put different values on the data—
whether it’s hot or cold, whether it’s
active. Most companies have only one
temperature: they either keep it hot in
a database, or they don’t keep it at all.
PwC: So they could just
keep it in the cloud?
MD: Absolutely. We’re starting to
see the emergence of cloud-based
databases where you say, “I don’t
need to maintain my own database
on the premises. I can just rent some
boxes in the cloud and they can
persist our customer data that way.”
Metamarkets is trying to deliver
DaaS—data science as a service. If a
company doesn’t have analytics as a
core competency, it can use a service
like ours instead. There’s no reason for
companies to be doing a lot of tasks
that they are doing in-house. You need
to pick and choose your battles.
We will see a lot of IT functions
being delivered as cloud-based
services. And now inside of those
cloud-based services, you often
will find an open source stack.
Here at Metamarkets, we’ve drawn
heavily on open source. We have
Hadoop on the bottom of our stack,
and then at the next layer we have our
own in-memory distributed database.
We’re running on Amazon Web Services
and have hundreds of nodes there.
Data
science
PwC: How are companies that do
have data science groups meeting
the challenge? Take the example
of an orphan drug that is proven
to be safe but isn’t particularly
effective for the application it
was designed for. Data scientists
won’t know enough about a broad
range of potential biological
systems for which that drug might
be applicable, but the people
who do have that knowledge
don’t know the first thing about
data science. How do you bring
those two groups together?
MD: My data science Venn diagram
helps illustrate how you bring those
groups together. The diagram has three
circles. [See above.] The first circle is
data science. Data scientists are good
at this. They can take data strings,
perform processing, and transform
them into data structures. They have
great modeling skills, so they can use
something like R or SAS and start to
build a hypothesis that, for example,
if a metric is three standard deviations
above or below the specific threshold
then someone may be more likely to
cancel their membership. And data
scientists are great at visualization.
But companies that have the tools and
expertise may not be focused on a
critical business question. A company
is trying to build what it calls the
technology genome. If you give them
a list of parts in the iPhone, they can
look and see how all those different
parts are related to other parts in
camcorders and laptops. They built
this amazingly intricate graph of the
Reshaping the workforce with the new analytics
21
“[Companies] realize the problem with being
walking amnesiacs, knowing what happened
to their customers in the last 30 days and then
forgetting every 30 days.”
actual makeup. They’ve collected large
amounts of data. They have PhDs from
Caltech; they have Rhodes scholars;
they have really brilliant people.
But they don’t have any real critical
business questions, like “How is this
going to make me more money?”
The second circle in the diagram is
critical business questions. Some
companies have only the critical business
questions, and many enterprises fall
in this category. For instance, the CEO
says, “We just released a new product
and no one is buying it. Why?”
The third circle is good data. A beverage
company or a retailer has lots of POS
[point of sale] data, but it may not have
the tools or expertise to dig in and figure
out fast enough where a drink was
selling and what demographics it was
selling to, so that the company can react.
On the other hand, sometimes some
web companies or small companies
have critical business questions and
they have the tools and expertise.
But because they have no customers,
they don’t have any data.
PwC: Without the data, they
need to do a simulation.
MD: Right. The intersection in the Venn
diagram is where value is created. When
you think of an e-commerce company
that says, “How do we upsell people
and reduce the number of abandoned
22
PwC Technology Forecast 2012 Issue 1
shopping carts?” Well, the company
has 600 million shopping cart flows
that it has collected in the last six
years. So the company says, “All right,
data science group, build a sequential
model that shows what we need to
do to intervene with people who have
abandoned their shopping carts and
get them to complete the purchase.”
PwC: The questioning nature of
business—the culture of inquiry—
seems important here. Some
who lack the critical business
questions don’t ask enough
questions to begin with.
MD: It’s interesting—a lot of businesses
have this focus on real-time data,
and yet it’s not helping them get
answers to critical business questions.
Some companies have invested a
lot in getting real-time monitoring
of their systems, and it’s expensive.
It’s harder to do and more fragile.
A friend of mine worked on the data
team at a web company. That company
developed, with a real effort, a real-time
log monitoring framework where they
can see how many people are logging
in every second with 15-second latency
across the ecosystem. It was hard to keep
up and it was fragile. It broke down and
they kept bringing it up, and then they
realized that they take very few business
actions in real time. So why devote
all this effort to a real-time system?
PwC: In many cases, the data
is going to be fresh enough,
because the nature of the business
doesn’t change that fast.
MD: Real time actually means two
things. The first thing has to do with
the freshness of data. The second
has to do with the query speed.
By query speed, I mean that if you have
a question, how long it takes to answer
a question such as, “What were your top
products in Malaysia around Ramadan?”
PwC: There’s a third one also,
which is the speed to knowledge.
The data could be staring you
in the face, and you could have
incredibly insightful things in
the data, but you’re sitting there
with your eyes saying, “I don’t
know what the message is here.”
MD: That’s right. This is about how fast
can you pull the data and how fast can
you actually develop an insight from it.
For learning about things quickly
enough after they happen, query speed
is really important. This becomes
a challenge at scale. One of the
problems in the big data space is that
databases used to be fast. You used
to be able to ask a question of your
inventory and you’d get an answer
in seconds. SQL was quick when the
scale wasn’t large; you could have an
interactive dialogue with your data.
But now, because we’re collecting
millions and millions of events a
day, data platforms have seen real
performance degradation. Lagging
performance has led to degradation
of insights. Companies literally
are drowning in their data.
In the 1970s, when the intelligence
agencies first got reconnaissance
satellites, there was this proliferation
in the amount of photographic data
they had, and they realized that it
paralyzed their decision making. So to
this point of speed, I think there are a
number of dimensions here. Typically
when things get big, they get slow.
PwC: Isn’t that the problem
the new in-memory database
appliances are intended to solve?
MD: Yes. Our Druid engine on the back
end is directly competitive with those
proprietary appliances. The biggest
difference between those appliances
and what we provide is that we’re cloud
based and are available on Amazon.
appliance. We solve the performance
problem in the cloud. Our mantra is
visibility and performance at scale.
Data in the cloud liberates companies
from some of these physical box
confines and constraints. That means
that your data can be used as inputs to
other types of services. Being a cloud
service really reduces friction. The
coefficient of friction around data has
for a long time been high, and I think
we’re seeing that start to drop. Not
just the scale or amount of data being
collected, but the ease with which data
can interoperate with different services,
both inside your company and out.
I believe that’s where tremendous
value lies.
“Being a cloud service really
reduces friction. The coefficient
of friction around data has for a
long time been high, and I think
we’re seeing that start to drop.”
If your data and operations are in
the cloud, it does not make sense
to have your analytics on some
Reshaping the workforce with the new analytics
23
Online advertising
analytics in the cloud
Jon Slade of the Financial Times describes
the 123-year-old business publication’s
advanced approach to its online ad sales.
Interview conducted by Alan Morrison, Bo Parker, and Bud Mathaisel
Jon Slade
Jon Slade is global online and strategic
advertising sales director at FT.com, the
digital arm of the Financial Times.
PwC: What is your role at the
FT [Financial Times], and
how did you get into it?
JS: I’m the global advertising sales
director for all our digital products.
I’ve been in advertising sales and in
publishing for about 15 years and at
the FT for about 7 years. And about
three and a half years ago I took this
role—after a quick diversion into
landscape gardening, which really gave
me the idea that digging holes for a
living was not what I wanted to do.
PwC: The media business has
changed during that period of
time. How has the business model
at FT.com evolved over the years?
JS: From the user’s perspective, FT.com
is like a funnel, really, where you have
free access at the outer edge of the
funnel, free access for registration in
the middle, and then the subscriber
at the innermost part. The funnel is
based on the volume of consumption.
From an ad sales perspective, targeting
the most relevant person is essential.
So the types of clients that we’re talking
about—companies like PwC, Rolex, or
Audi—are not interested in a scatter
graph approach to advertising. The
advertising business thrives on targeting
advertising very, very specifically.
On the one hand, we have an ad model
that requires very precise, targeted
information. And on the other hand, we
have a metered model of access, which
means we have lots of opportunity to
collect information about our users.
24
PwC Technology Forecast 2012 Issue 1
“We have what we call the web app with FT.com.
We’re not available through the iTunes Store
anymore. We use the technology called HTML5,
which essentially allows us to have the same kind
of touch screen interaction as an app would, but
we serve it through a web page.”
PwC: How does a company like the
FT sell digital advertising space?
JS: Every time you view a web page,
you’ll see an advert appear at the top
or the side, and that one appearance
of the ad is what we call an ad
impression. We usually sell those in
groups of 1,000 ad impressions.
Over a 12-month period, our total
user base, including our 250,000
paying subscribers, generates about
6 billion advertising impressions
across FT.com. That’s the currency
that is bought and sold around
advertising in the online world.
In essence, my job is to look at those ad
impressions and work out which one of
those ad impressions is worth the most
for any one particular client. And we
have about 2,000 advertising campaigns
a year that run across FT.com.
from maybe 1 percent or 2 percent
just three years ago. So that’s a
radically changing picture that we
now need to understand as well.
What are the consumption patterns
around mobile? How many pages are
people consuming? What type of content
are they consuming? What content is
more relevant to a chief executive versus
a finance director versus somebody in
Japan versus somebody in Dubai?
Mobile is a very substantial platform
that we now must look at in much
more detail and with much greater
care than we ever did before.
Impressions generated have different
values to different advertisers. So
we need to separate all the strands
out of those 6 billion ad impressions
and get as close a picture as we
possibly can to generate the most
revenue from those ad impressions.
PwC: Yes, and regarding the
mobile picture, have you seen
any successes in terms of trying
to address that channel in a
new and different way?
JS: Well, just with the FT, we have
what we call the web app with FT.com.
We’re not available through the iTunes
Store anymore. We use the technology
called HTML5, which essentially allows
us to have the same kind of touch
screen interaction as an app would,
but we serve it through a web page.
PwC: It sounds like you have a
lot of complexity on both the
supply and the demand side. Is
the supply side changing a lot?
JS: Sure. Mobile is changing things
pretty dramatically, actually. About
20 percent of our page views on digital
channels are now generated by a
mobile device or by someone who’s
using a mobile device, which is up
So a user points the browser on their
iPad or other device to FT.com, and it
takes you straight through to the app.
There’s no downloading of the app;
there’s no content update required. We
can update the infrastructure of the
app very, very easily. We don’t need
to push it out through any third party
such as Apple. We can retain a direct
relationship with our customer.
One or two other publishers are starting
to understand that this is a pretty good
way to push content to mobile devices,
and it’s an approach that we’ve been
very successful with. We’ve had more
than 1.4 million users of our new web
app since we launched it in June 2011.
It’s a very fast-growing opportunity
for us. We see both subscription and
advertising revenue opportunities.
And with FT.com we try to balance
both of those, both subscription
revenue and advertising revenue.
PwC: You chose the web
app after having offered
a native app, correct?
JS: That’s right, yes.
PwC: Could you compare and
contrast the two and what
the pros and cons are?
JS: If we want to change how we
display content in the web app, it’s a
lot easier for us not to need to go to a
new version of the app and push that
through into the native app via an
approval process with a third party.
We can just make any changes at our
end straight away. And as users go
to the web app, those implemented
changes are there for them.
On the back end, it gives us a lot
more agility to develop advertising
opportunities. We can move faster to
take advantage of a growing market,
plus provide far better web-standard
analytics around campaigns—something
that native app providers struggle with.
Reshaping the workforce with the new analytics
25
Big data in online advertising
“Every year, our total user base, including
our 250,000 paying subscribers,
generates about 6 billion advertising
impressions across FT.com.”
One other benefit we’ve seen is that a
far greater number of people use the
web app than ever used the native app.
So an advertiser is starting to get a bit
more scale from the process, I guess. But
it’s just a quicker way to make changes
to the application with the web app.
PwC: How about the demand
side? How are things changing?
You mentioned 6 billion annual
impressions—or opportunities,
we might phrase it.
JS: Advertising online falls into two
distinct areas. There is the scatter graph
type of advertising where size matters.
There are networks that can give you
billions and billions of ad impressions,
and as an advertiser, you throw as many
messages into that mix as you possibly
can. And then you try and work out
over time which ones stuck the best,
and then you try and optimize to that.
That is how a lot of mainstream or
major networks run their businesses.
On the other side, there are very,
very targeted websites that provide
advertisers with real efficiency to reach
only the type of demographic that
they’re interested in reaching, and that’s
very much the side that we fit into.
Over the last two years, there’s been
a shift to the extreme on both sides.
We’ve seen advertisers go much more
toward a very scattered environment,
and equally other advertisers head much
more toward investing more of their
money into a very niche environment.
And then some advertisers seem to
try and play a little bit in the middle.
26
PwC Technology Forecast 2012 Issue 1
With the readers and users of FT.com,
particularly in the last three years as
the economic crisis has driven like a
whirlwind around the globe, we’ve
seen what we call a flight to quality.
Users are aware—as are advertisers—
that they could go to a thousand
different places to get their news, but
they don’t really have the time to do
that. They’re going to fewer places and
spending more time within them, and
that’s certainly the experience that
we’ve had with the Financial Times.
PwC: To make a more targeted
environment for advertising,
you need to really learn more
about the users themselves, yes?
JS: Yes. Most of the opt-in really
occurs at the point of registration and
subscription. This is when the user
declares demographic information:
this is who I am, this is the industry
that I work for, and here’s the ZIP
code that I work from. Users who
subscribe provide a little bit more.
Most of the work that we do around
understanding our users better occurs
at the back end. We examine user
actions, and we note that people who
demonstrate this type of behavior
tend to go on and do this type of thing
later in the month or the week or the
session or whatever it might be.
Our back-end analytics allows us to
extract certain groups who exhibit
those behaviors. That’s probably
most of the work that we’re focused
on at the moment. And that applies
not just to the advertising picture
6B
but to our content development
and our site development, too.
If we know, for example, that people
type A-1-7 tend to read companies’
pages between 8 a.m. and 10 a.m.
and they go on to personal finance
at lunchtime, then we can start to
examine those groups and drive the
right type of content toward them more
specifically. It’s an ongoing piece of the
content and advertising optimization.
PwC: Is this a test to tune and
adjust the kind of environment
that you’ve been able to create?
JS: Absolutely, both in terms of how
our advertising campaigns display and
also the type of content that we display.
If you and I both looked at FT.com right
now, we’d probably see the home page,
and 90 percent of what you would see
would be the same as what I would see.
But about 10 percent of it would not be.
PwC: How does Metamarkets fit
into this big picture? Could you
shine some light on what you’re
doing with them and what the
initial successes have been?
JS: Sure. We’ve been working with
Metamarkets in earnest for more
than a year. The real challenge
that Metamarkets relieves for us
is to understand those 6 billion ad
impressions—who’s generating
them, how many I’m likely to have
tomorrow of any given sort, and how
much I should charge for them.
It gives me that single view in a single
place in near real time what my exact
supply and my exact demand are.
And that is really critical information.
I increasingly feel a little bit like I’m
on a flight deck with the number of
screens around me to understand.
When I got into advertising straight
after my landscape gardening days,
I didn’t even have a screen. I didn’t
have a computer when I started.
Previously, the way that data was
held—the demographics data, the
behavior data, the pricing, the available
inventory—was across lots of different
databases and spreadsheets. We needed
an almost witchcraft-like algorithm
to provide answers to “How many
impressions do I have?” and “How
much should I charge?” It was an
extremely labor-intensive process.
And that approach just didn’t really fit
the need for the industry in which we
work. Media advertising is purchased in
real time now. The impression appears,
and this process goes between three
or four interested parties—one bid
wins out, and the advert is served in
the time it takes to open a web page.
PwC: In general, it seems like
Metamarkets is doing a whole
piece of your workflow rather
than you doing it. Is that a
fair characterization?
JS: Yes. I’ll give you an example. I was
talking to our sales manager in Paris the
other day. I said to him, “If you wanted
to know how many adverts of a certain
size that you have available to you in
Paris next Tuesday that will be created
by chief executives in France, how would
you go about getting that answer?”
Before, the sales team would send an
e-mail to ad operations in London for an
inventory forecast, and it could take the
ad operations team up to eight working
hours to get back to them. It could even
take as long as two business days to get
an answer in times of high volume. Now,
we’ve reduced that turnaround to about
eight seconds of self-service, allowing
our ad operations team time to focus on
more strategic output. That’s the sort of
magnitude of workflow change that this
creates for us—a two-day turnaround
down to about eight seconds.
Now if advertising has been
purchased in real time, we really
need to understand what we have
on our supermarket shelves in real
time, too. That’s what Metamarkets
does for us—help us visualize in one
place our supply and demand.
“Before, the sales team would send an
e-mail to ad operations in London for an
inventory forecast, and it could take the
ad operations team up to eight working
hours to get back to them. Now, we’ve
reduced that turnaround to about eight
seconds of self-service.”
Reshaping the workforce with the new analytics
27
PwC: When you were looking
to resolve this problem, were
there a lot of different services
that did this sort of thing?
JS: Not that we came across. I have
to say our conversations with the
Metamarkets team actually started
about something not entirely
different, but certainly not the
product that we’ve come up with
now. Originally we had a slightly
different concept under discussion
that didn’t look at this part at all.
As a company, Metamarkets was
really prepared to say, “We don’t have
something on the shelves. We have
some great minds and some really good
technology, so why don’t we try to figure
out with you what your problem is, and
then we’ll come up with an answer.”
To be honest, we looked around a little
bit at what else is out there, but I don’t
want to buy anything off the shelf.
I want to work with a company that can
understand what I’m after, go away,
and come back with the answer to that
plus, plus, plus. And that seems to be
the way Metamarkets has developed.
Other vendors clearly do something
similar or close, but most of what I’ve
seen comes off the shelf. And we are—
we’re quite annoying to work with, I
would say. We’re not really a cookiecutter business. You can slice and
dice those 6 billion ad impressions in
28
PwC Technology Forecast 2012 Issue 1
thousands and thousands of ways,
and you can’t always predict how a
client or a customer or a colleague is
going to want to split up that data.
So rather than just say, “The only way
you can do it is this way, and here’s the
off-the-shelf solution,” we really wanted
something that put the power in the
hands of the user. And that seems to be
what we’ve created here. The credit is
entirely with Metamarkets, I have to say.
We just said, “Help, we have a problem,”
and they said, “OK, here’s a good
answer.” So the credit for all the clever
stuff behind this should go with them.
PwC: So there continues to be a
lot of back and forth between FT
and Metamarkets as your needs
change and the demand changes?
JS: Yes. We have at least a weekly call.
The Metamarkets team visits us in
London about once a month, or we meet
in New York if I’m there. And there’s
a lot of back and forth. What seems
to happen is that every time we give
it to one of the ultimate end users—
one of the sales managers around the
world—you can see the lights on in
their head about the potential for it.
And without fail they’ll say, “That’s
brilliant, but how about this and this?”
Or, “Could we use it for this?” Or, “How
about this for an intervention?” It’s
great. It’s really encouraging to see
a product being taken up by internal
customers with the enthusiasm that it is.
We very much see this as an iterative
project. We don’t see it as necessarily
having a specific end in sight. We
think there’s always more that we
can add into this. It’s pretty close
to a partnership really, a straight
vendor and supplier relationship. It
is a genuine partnership, I think.
“So rather than just say, ‘The only way
you can do it is this way, and here’s the
off-the-shelf solution,’ we really wanted
something that put the power in the
hands of the user.”
Supply accuracy in online advertising
“Accuracy of supply is upward of 15 percent better
than what we’ve seen before.”
PwC: How is this actually
translated into the bottom line—
yield and advertising dollars?
JS: It would be probably a little hard for
me to share with you any percentages
or specifics, but I can say that it is
driving up the yields we achieve. It
is double-digit growth on yield as a
result of being able to understand
our supply and demand better.
The degree of accuracy of supply
that it provides for us is upward of
15 percent better than what we’ve
seen before. I can’t quantify the
difference that it’s made to workflows,
but it’s significant. To go to a twoday turnaround on a simple request
to eight seconds is significant.
PwC: Given our research focus,
we have lots of friends in the
publishing business, and many
of them talked to us about the
decline in return from impression
advertising. It’s interesting.
Your story seems to be pushing
in the different direction.
JS: Yes. I’ve noticed that entirely.
Whenever I talk to a buying customer,
they always say, “Everybody else
is getting cheaper, so how come
you’re getting more expensive?”
I completely hear that. What I would say
is we are getting better at understanding
the attribution model. Ultimately,
what these impressions create for a
client or a customer is not just how
many visits will readers make to your
website, but how much money will
they spend when they get there.
15%
Now that piece is still pretty much
embryonic, but we’re certainly making
the right moves in that direction. We’ve
found that putting a price up is accepted.
Essentially what an increase in yield
implies is that you put your price up.
It’s been accepted because we’ve been
able to offer a much tighter specific
segmentation of that audience. Whereas,
when people are buying on a spray basis
across large networks, deservedly there
is significant price pressure on that.
Equally, if we understand our supply
and demand picture in a much more
granular sense, we know when it’s a
good time to walk away from a deal
or whether we’re being too bullish in
the deal. That pricing piece is critical,
and we’re looking to get to a realtime dynamic pricing model in 2012.
And Metamarkets is certainly along
the right lines to help us with that.
PwC: A lot of our clients are very
conservative organizations,
and they might be reluctant
to subscribe to a cloud service
like Metamarkets, offered by
a company that has not been
around for a long time. I’m
assuming that the FT had
to make the decision to go
on this different route and
that there was quite a bit of
consideration of these factors.
JS: Endless legal diligence would be
one way to put it—back and forth a lot.
We have 2,000 employees worldwide,
so we still have a fairly entrepreneurial
attitude toward suppliers. Of course we
do the legal diligence, and of course
we do the contractual diligence, and of
course we look around to see what else is
available. But if you have a good instinct
about working with somebody, then
we’re the size of organization where that
instinct can still count for something.
And I think that was the case with
Metamarkets. We felt that we were
talking on the same page here.
We almost could put words in one
another’s mouths and the sentence
would still kind of form. So it felt
very good from the beginning.
If we look at what’s happening in the
digital publishing world, some of the
most exciting things are happening
with very small startup businesses
and all of the big web powers now
were startups 8 or 10 years ago,
such as Facebook and Amazon.
We believe in that mentality. We
believe in a personality in business.
Metamarkets represented that to us
very well. And yes, there’s a little bit of a
risk, but it has paid off. So we’re happy.
Reshaping the workforce with the new analytics
29
30
PwC Technology Forecast 2012 Issue 1
The art and science
of new analytics
technology
Left-brain analysis connects with
right-brain creativity.
By Alan Morrison
The new analytics is the art and science
of turning the invisible into the visible.
It’s about finding “unknown unknowns,”
as former US Secretary of Defense
Donald Rumsfeld famously called them,
and learning at least something about
them. It’s about detecting opportunities
and threats you hadn’t anticipated, or
finding people you didn’t know existed
who could be your next customers. It’s
about learning what’s really important,
rather than what you thought was
important. It’s about identifying,
committing, and following through on
what your enterprise must change most.
Achieving that kind of visibility
requires a mix of techniques. Some
of these are new, while others aren’t.
Some are clearly in the realm of data
science because they make possible
more iterative and precise analysis
of large, mixed data sets. Others, like
visualization and more contextual
search, are as much art as science.
This article explores some of the newer
technologies that make feasible the case
studies and the evolving cultures of
inquiry described in “The third wave of
customer analytics” on page 06. These
technologies include the following:
• In-memory technology—Reducing
response time and expanding
the reach of business intelligence
(BI) by extending the use of main
(random access) memory
• Interactive visualization—Merging
the user interface and the presentation
of results into one responsive
visual analytics environment
• Statistical rigor—Bringing more of
the scientific method and evidence
into corporate decision making
• Associative search—Navigating to
specific names and terms by browsing
the nearby context (see the sidebar,
“Associative search,” on page 41)
A companion piece to this article,
“Natural language processing and
social media intelligence,” on page 44,
reviews the methods that vendors use
for the needle-in-a-haystack challenge
of finding the most relevant social
media conversations about particular
products and services. Because social
media is such a major data source for
exploratory analytics and because
natural language processing (NLP)
techniques are so varied, this topic
demands its own separate treatment.
Reshaping the workforce with the new analytics
31
Figure 1: Addressable analytics footprint for in-memory technology
In-memory technology augmented traditional business intelligence (BI) and predictive analytics to begin with, but its footprint
will expand over the forecast period to become the base for corporate apps, where it will blur the boundary between
transactional systems and data warehousing. Longer term, more of a 360-degree view of the customer can emerge.
2011
2012
2013
2014
BI
+
ERP and
mobile
+
Other corporate
apps
In-memory technology
Enterprises exploring the latest
in-memory technology soon come
to realize that the technology’s
fundamental advantage—expanding the
capacity of main memory (solid-state
memory that’s directly accessible) and
reducing reliance on disk drive storage
to reduce latency—can be applied in
many different ways. Some of those
applications offer the advantage of being
more feasible over the short term. For
example, accelerating conventional BI
is a short-term goal, one that’s been
feasible for several years through earlier
products that use in-memory capability
from some BI providers, including
MicroStrategy, QlikTech QlikView,
TIBCO Spotfire, and Tableau Software.
Longer term, the ability of platforms
such as Oracle Exalytics, SAP HANA,
and the forthcoming SAS in-memory
Hadoop-based platform1 to query
across a wide range of disparate data
sources will improve. “Previously,
1 See Doug Henschen, “SAS Prepares HadoopPowered In-Memory BI Platform,” InformationWeek,
February 14, 2012, http://www.informationweek.com/
news/hardware/grid_cluster/232600767, accessed
February 15, 2012. SAS, which also claims interactive
visualization capabilities in this appliance, expects to
make this appliance available by the end of June 2012.
32
PwC Technology Forecast 2012 Issue 1
Crossfunctional,
cross-source
analytics
users were limited to BI suites such
as BusinessObjects to push the
information to mobile devices,” says
Murali Chilakapati, a manager in PwC’s
Information Management practice and
a HANA implementer. “Now they’re
going beyond BI. I think in-memory
is one of the best technologies that
will help us to work toward a better
overall mobile analytics experience.”
The full vision includes more crossfunctional, cross-source analytics, but
this will require extensive organizational
and technological change. The
fundamental technological change is
already happening, and in time richer
applications based on these changes will
emerge and gain adoption. (See Figure
1.) “Users can already create a mashup
of various data sets and technology
to determine if there is a correlation,
a trend,” says Kurt J. Bilafer, regional
vice president of analytics at SAP.
To understand how in-memory advances
will improve analytics, it will help to
consider the technological advantages
of hardware and software, and how
they can be leveraged in new ways.
What in-memory technology does
For decades, business analytics has
been plagued by slow response
times (also known as latency), a
problem that in-memory technology
helps to overcome. Latency is due
to input/output bottlenecks in
a computer system’s data path.
These bottlenecks can be alleviated
by using six approaches:
Memory
swapping
Figure
2: Memory
swapping
Swapping data from RAM to disk
introduces latency that in-memory
systems designs can now avoid.
RAM
Block out
• Move the traffic through more
paths (parallelization)
Block in
Steps that
introduce latency
• Increase the speed of any
single path (transmission)
• Reduce the time it takes to
switch paths (switching)
• Reduce the time it takes to
store bits (writing)
• Reduce the time it takes to
retrieve bits (reading)
• Reduce computation
time (processing)
To process and store data properly
and cost-effectively, computer
systems swap data from one kind of
memory to another a lot. Each time
they do, they encounter latency in
transmitting, switching, writing,
or reading bits. (See Figure 2.)
Contrast this swapping requirement with
processing alone. Processing is much
faster because so much of it is on-chip or
directly interconnected. The processing
function always outpaces multitiered
memory handling. If these systems
can keep more data “in memory”
or directly accessible to the central
processing units (CPUs), they can avoid
swapping and increase efficiency by
accelerating inputs and outputs.
Less swapping reduces the need for
duplicative reading, writing, and
moving data. The ability to load and
work on whole data sets in main
memory—that is, all in random
access memory (RAM) rather than
frequently reading it from and writing
it to disk—makes it possible to bypass
many input/output bottlenecks.
Systems have needed to do a lot of
swapping, in part, because faster
storage media were expensive. That’s
why organizations have relied heavily
on high-capacity, cheaper disks for
storage. As transistor density per square
millimeter of chip area has risen, the
cost per bit to use semiconductor (or
solid-state) memory has dropped and
the ability to pack more bits in a given
chip’s footprint has increased. It is now
more feasible to use semiconductor
memory in more places where it
can help most, and thereby reduce
reliance on high-latency disks.
Of course, the solid-state memory used
in direct access applications, dynamic
random access memory (DRAM), is
volatile. To avoid the higher risk of
Reshaping the workforce with the new analytics
33
Write-behind caching
Figure 3: Write-behind caching
Write-behind caching makes writes to
disk independent of other write functions.
CPU
Reader
Writer
RAM
Write
behind
T-Mobile, one of SAP’s
customers for HANA,
claims that reports
that previously took
hours to generate now
take seconds.
Source: Gigaspaces and PwC, 2010 and 2012
data loss from expanding the use of
DRAM, in-memory database systems
incorporate a persistence layer with
backup, restore, and transaction
logging capability. Distributed caching
systems or in-memory data grids
such as Gigaspaces XAP data grid,
memcached, and Oracle Coherence—
which cache (or keep in a handy place)
lots of data in DRAM to accelerate
website performance—refer to this
same technique as write-behind caching.
These systems update databases on
disk asynchronously from the writes
to DRAM, so the rest of the system
doesn’t need to wait for the disk write
process to complete before performing
another write. (See Figure 3.)
How the technology benefits
the analytics function
The additional speed of improved
in-memory technology makes possible
more analytics iterations within a
given time. When an entire BI suite
is contained in main memory, there
are many more opportunities to query
the data. Ken Campbell, a director in
PwC’s information and enterprise data
34
PwC Technology Forecast 2012 Issue 1
management practice, notes: “Having
a big data set in one location gives you
more flexibility.” T-Mobile, one of SAP’s
customers for HANA, claims that reports
that previously took hours to generate
now take seconds. HANA did require
extensive tuning for this purpose.2
Appliances with this level of main
memory capacity started to appear
in late 2010, when SAP first offered
HANA to select customers. Oracle soon
followed by announcing its Exalytics
In-Memory Machine at OpenWorld
in October 2011. Other vendors well
known in BI, data warehousing, and
database technology are not far behind.
Taking full advantage of in-memory
technology depends on hardware and
software, which requires extensive
supplier/provider partnerships even
before any thoughts of implementation.
Rapid expansion of in-memory
hardware. Increases in memory bit
density (number of bits stored in a
square millimeter) aren’t qualitatively
new; the difference now is quantitative.
What seems to be a step-change in
in-memory technology has actually
been a gradual change in solidstate memory over many years.
Beginning in 2011, vendors could install
at least a terabyte of main memory,
usually DRAM, in a single appliance.
Besides adding DRAM, vendors are
also incorporating large numbers of
multicore processors in each appliance.
The Exalytics appliance, for example,
includes four 10-core processors.3
The networking capabilities of the
new appliances are also improved.
2 Chris Kanaracus, “SAP’s HANA in-memory database
will run ERP this year,” IDG News Service, via InfoWorld,
January 25, 2012, http://www.infoworld.com/d/
applications/saps-hana-in-memory-database-will-runerp-year-185040, accessed February 5, 2012.
3 Oracle Exalytics In-Memory Machine: A Brief
Introduction, Oracle white paper, October 2011, http://
www.oracle.com/us/solutions/ent-performance-bi/
business-intelligence/exalytics-bi-machine/overview/
exalytics-introduction-1372418.pdf, accessed
February 1, 2012.
Use case examples
Business process advantages
of in-memory technology
Exalytics has two 40Gbps InfiniBand
connections for low-latency database
server connections and two 10 Gigabit
Ethernet connections, in addition to
lower-speed Ethernet connections.
Effective data transfer rates are
somewhat lower than the stated raw
speeds. InfiniBand connections became
more popular for high-speed data
center applications in the late 2000s.
With each succeeding generation,
InfiniBand’s effective data transfer
rate has come closer to the raw rate.
Fourteen data rate or FDR InfiniBand,
which has a raw data lane rate of more
than 14Gbps, became available in 2011.4
Improvements in in-memory
databases. In-memory databases are
quite fast because they are designed to
run entirely in main memory. In 2005,
Oracle bought TimesTen, a high-speed,
in-memory database provider serving
the telecom and trading industries.
With the help of memory technology
improvements, by 2011, Oracle claimed
that entire BI system implementations,
such as Oracle BI server, could be held
in main memory. Federated databases—
multiple autonomous databases that
can be run as one—are also possible.
“I can federate data from five physical
databases in one machine,” says PwC
Applied Analytics Principal Oliver Halter.
In 2005, SAP bought P*Time, a
highly parallelized online transaction
processing (OLTP) database, and
has blended its in-memory database
capabilities with those of TREX and
MaxDB to create the HANA in-memory
database appliance. HANA includes
stores for both row (optimal for
transactional data with many fields)
and column (optimal for analytical data
with fewer fields), with capabilities
for both structured and less structured
data. HANA will become the base for
the full range of SAP’s applications,
4 See “What is FDR InfiniBand?” at the InfiniBand Trade
Association site (http://members.infinibandta.org/
kwspub/home/7423_FDR_FactSheet.pdf, accessed
February 10, 2012) for more information on InfiniBand
availability.
In-memory technology makes
it possible to run queries that
previously ran for hours in minutes,
which has numerous implications.
Running queries faster implies the
ability to accelerate data-intensive
business processes substantially.
Take the case of supply chain
optimization in the electronics
industry. Sometimes it can take 30
hours or more to run a query from a
business process to identify and fill
gaps in TV replenishment at a retailer,
for example. A TV maker using an
in-memory appliance component in
this process could reduce the query
time to under an hour, allowing the
maker to reduce considerably the time
it takes to respond to supply shortfalls.
Or consider the new ability to
incorporate into a process more
predictive analytics with the help of
in-memory technology. Analysts could
identify new patterns of fraud in tax
return data in ways they hadn’t been able
to before, making it feasible to provide
investigators more helpful leads, which
in turn could make them more effective
in finding and tracking down the most
potentially harmful perpetrators before
their methods become widespread.
Competitive advantage in these cases
hinges on blending effective strategy,
means, and execution together, not
just buying the new technology and
installing it. In these examples, the
challenge becomes not one of simply
using a new technology, but using it
effectively. How might the TV maker
anticipate shortfalls in supply more
readily? What algorithms might be most
effective in detecting new patterns of
tax return fraud? At its best, in-memory
technology could trigger many creative
ideas for process improvement.
with SAP porting its enterprise resource
planning (ERP) module to HANA
beginning in the fourth quarter of
2012, followed by other modules.5
Better compression. In-memory
appliances use columnar compression,
which stores similar data together to
improve compression efficiency. Oracle
claims a columnar compression capability
of 5x, so physical capacity of 1TB is
equivalent to having 5TB available.
Other columnar database management
system (DBMS) providers such as
EMC/Greenplum, IBM/Netezza, and
HP/Vertica have refined their own
columnar compression capabilities
over the years and will be able to apply
these to their in-memory appliances.
5 Chris Kanaracus, op. cit.
Reshaping the workforce with the new analytics
35
More adaptive and efficient caching
algorithms. Because main memory
is still limited physically, appliances
continue to make extensive use of
advanced caching techniques that
increase the effective amount of
main memory. The newest caching
algorithms—lists of computational
procedures that specify which data
to retain in memory—solve an old
problem: tables that get dumped
from memory when they should be
maintained in the cache. “The caching
strategy for the last 20 years relies
on least frequently used algorithms,”
Halter says. “These algorithms aren’t
always the best approaches.” The term
least frequently used refers to how these
algorithms discard the data that hasn’t
been used a lot, at least not lately.
The method is good in theory, but in
practice these algorithms can discard
“The caching strategy for the
last 20 years relies on least
frequently used algorithms.
These algorithms aren’t
always the best approaches.”
—Oliver Halter, PwC
36
PwC Technology Forecast 2012 Issue 1
data such as fact tables (for example, a
list of countries) that the system needs
at hand. The algorithms haven’t been
smart enough to recognize less used
but clearly essential fact tables that
could be easily cached in main memory
because they are often small anyway.
Generally speaking, progress has
been made on many fronts to improve
in-memory technology. Perhaps most
importantly, system designers have
been able to overcome some of the
hardware obstacles preventing the
direct connections the data requires
so it can be processed. That’s a
fundamental first step of a multistep process. Although the hardware,
caching techniques, and some software
exist, the software refinement and
expansion that’s closer to the bigger
vision will take years to accomplish.
Figure 4: Data blending
In self-service BI software, the end user can act as an analyst.
Sales database
Customer name
Last n days
Order date
Order ID
Order priority
Territory spreadsheet
Container
Product category
Profit
State
ZIP code
A
B
C
State
State abbreviated
Population, 2009
Territory
Tableau recognizes
identical fields in
different data sets.
Simple drag and drop replaces days of programming.
Database
You can combine,
filter, and even
perform calculations
among different
data sources right in
the Tableau window.
Spreadsheet
Source: Tableau Software, 2011
Derived from a video at http://www.tableausoftware.com/videos/data-integration
Self-service BI and
interactive visualization
One of BI’s big challenges is to make it
easier for a variety of end users to ask
questions of the data and to do so in an
iterative way. Self-service BI tools put
a larger number of functions within
reach of everyday users. These tools
can also simplify a larger number of
tasks in an analytics workflow. Many
tools—QlikView, Tableau, and TIBCO
Spotfire, to name a few—take some
advantage of the new in-memory
technology to reduce latency. But
equally important to BI innovation
are interfaces that meld visual ways
of blending and manipulating the
data with how it’s displayed and
how the results are shared.
In the most visually capable BI tools,
the presentation of data becomes just
another feature of the user interface.
Figure 4 illustrates how Tableau,
for instance, unifies data blending,
analysis, and dashboard sharing within
one person’s interactive workflow.
How interactive visualization works
One important element that’s been
missing from BI and analytics platforms
is a way to bridge human language in the
user interface to machine language more
effectively. User interfaces have included
features such as drag and drop for
decades, but drag and drop historically
has been linked to only a single
application function—moving a file
from one folder to another, for example.
Reshaping the workforce with the new analytics
37
Figure 5: Bridging human, visual, and machine language
1. To the user, results come from a simple
drag and drop, which encourages
experimentation and further inquiry.
Database
2. Behind the scenes, complex algebra actually makes the motor run. Hiding all the
complexities of the VizQL computations saves time and frees the user to focus on the
results of the query, rather than the construction of the query.
552
General
queries
Spreadsheet
1104
612
550
Specification
x: C*(A+B)
y: D+E
Z: F
Iod: G
…
Compute
normalized set
form of each
table expression
x: { (c1, a1)…(ck, bj ) }
y: { (c1), …(d l ), (e1), …(em ) }
z: { (f1), …(fn ) }
556
614
Construct
table and
sorting
network
z x, y
720
Query results
Partition into relations
corresponding to panes
Data interpreter
616
Per pane aggregation
and sorting of tuples
Visual interpreter
Each tuple is rendered as a mark;
data is encoded in color, size, etc.
Source: Chris Stolte, Diane Tang, and Pat Hanrahan, “Computer systems and methods for the query and visualization of multidimensional databases,”
United States Patent 7089266, Stanford University, 2006, http://www.freepatentsonline.com/7089266.html, accessed February 12, 2012.
To query the data, users have resorted
to typing statements in languages
such as SQL that take time to learn.
What a tool such as Tableau does
differently is to make manipulating
the data through familiar techniques
(like drag and drop) part of an ongoing
dialogue with the database extracts
that are in active memory. By doing so,
the visual user interface offers a more
seamless way to query the data layer.
Tableau uses what it calls Visual Query
Language (VizQL) to create that
dialogue. What the user sees on the
screen, VizQL encodes into algebraic
expressions that machines interpret and
execute in the data. VizQL uses table
algebra developed for this approach
that maps rows and columns to the
x- and y-axes and layers to the z-axis.6
6 See Chris Stolte, Diane Tang, and Pat Hanrahan,
“Polaris: A System for Query, Analysis, and Visualization
of Multidimensional Databases,” Communications of
the ACM, November 2008, 75–76, http://
mkt.tableausoftware.com/files/Tableau-CACM-Nov2008-Polaris-Article-by-Stolte-Tang-Hanrahan.pdf,
accessed February 10, 2012, for more information on
the table algebra Tableau uses.
38
PwC Technology Forecast 2012 Issue 1
Jock Mackinlay, director of visual
analysis at Tableau Software, puts it
this way: “The algebra is a crisp way to
give the hardware a way to interpret
the data views. That leads to a really
simple user interface.” (See Figure 5.)
The benefits of interactive visualization
Psychologists who study how humans
learn have identified two types: leftbrain thinkers, who are more analytical,
logical, and linear in their thinking, and
right-brain thinkers, who take a more
synthetic parts-to-wholes approach
that can be more visual and focused on
relationships among elements. Visually
oriented learners make up a substantial
portion of the population, and adopting
tools more friendly to them can be the
difference between creating a culture
of inquiry, in which different thinking
styles are applied to problems, and
making do with an isolated group of
Good visualizations without
normalized data
statisticians. (See the article, “How
CIOs can build the foundation for a data
science culture,” on page 58.)
The new class of visually interactive,
self-service BI tools can engage parts of
the workforce—including right-brain
thinkers—who may not have been
previously engaged with analytics.
Business analytics software generally
assumes that the underlying data is
reasonably well designed, providing
powerful tools for visualization and the
exploration of scenarios. Unfortunately,
well-designed, structured information
is a rarity in some domains.
Interactive tools can help refine a
user’s questions and combine data,
but often demand a reasonably
normalized schematic framework.
At Seattle Children’s Hospital, the
director of knowledge management,
Ted Corbett, initially brought Tableau
into the organization. Since then,
according to Elissa Fink, chief marketing
officer of Tableau Software, its use has
spread to include these functions:
Zepheira’s Freemix product, the
foundation of the Viewshare.org project
from the US Library of Congress,
works with less-structured data, even
comma-separated values (CSV) files
with no headers. Rather than assuming
• Facilities optimization—
Making the best use of scarce
operating room resources
the data is already set up for the
analytical processing that machines
can undertake, the Freemix designers
concluded that the machine needs help
from the user to establish context, and
made generating that context feasible
for even an unsophisticated user.
Freemix walks the user through
the process of adding context to
the data by using annotations and
augmentation. It then provides plugins to normalize fields, and it enhances
data with new, derived fields (from
geolocation or entity extraction, for
example). These capabilities help the
user display and analyze data quickly,
even when given only ragged inputs.
• Inventory optimization—
Reducing the tendency for nurses
to hoard or stockpile supplies by
providing visibility into what’s
available hospital-wide
• Test order reporting—Ensuring tests
ordered in one part of the hospital
aren’t duplicated in another part
• Financial aid identification
and matching—Expediting a
match between needy parents
whose children are sick and
a financial aid source
The proliferation of iPad devices, other
tablets, and social networking inside the
enterprise could further encourage the
adoption of this class of tools. TIBCO
Spotfire for iPad 4.0, for example,
integrates with Microsoft SharePoint
and tibbr, TIBCO’s social tool.7
7 Chris Kanaracus, “Tibco ties Spotfire business
intelligence to SharePoint, Tibbr social network,”
InfoWorld, November 14, 2011, http://
www.infoworld.com/d/business-intelligence/tibco-tiesspotfire-business-intelligence-sharepoint-tibbr-socialnetwork-178907, accessed February 10, 2012.
The QlikTech QlikView 11 also integrates
with Microsoft SharePoint and is based on
an HTML5 web application architecture
suitable for tablets and other handhelds.8
Bringing more statistical
rigor to business decisions
Sports continue to provide examples
of the broadening use of statistics. In
the United States several years ago,
Billy Beane and the Oakland Athletics
baseball team, as documented in
Moneyball by Michael Lewis, hired
statisticians to help with recruiting
and line-up decisions, using previously
little-noticed player metrics. Beane
had enough success with his method
that it is now copied by most teams.
8 Erica Driver, “QlikView Supports Multiple
Approaches to Social BI,” QlikCommunity, June
24, 2011, http://community.qlikview.com/blogs/
theqlikviewblog/2011/06/24/with-qlikview-you-cantake-various-approaches-to-social-bi, and Chris
Mabardy, “QlikView 11—What’s New On Mobile,”
QlikCommunity, October 19, 2011, http://
community.qlikview.com/blogs/theqlikviewblog
/2011/10/19, accessed February 10, 2012.
Reshaping the workforce with the new analytics
39
“There are certain
statistical principles
and concepts that lie
underneath all the
sophisticated methods.
You can get a lot out
of or you can go far
without having to do
complicated math.”
—Kaiser Fung,
New York University
In 2012, there’s a debate over whether
US football teams should more seriously
consider the analyses of academics such
as Tobias Moskowitz, an economics
professor at the University of Chicago,
who co-authored a book called
Scorecasting. He analyzed 7,000 fourthdown decisions and outcomes, including
field positions after punts and various
other factors. His conclusion? Teams
should punt far less than they do.
This conclusion contradicts the common
wisdom among football coaches: even
with a 75 percent chance of making a
first down when there’s just two yards to
go, coaches typically choose to punt on
fourth down. Contrarians, such as Kevin
Kelley of Pulaski Academy in Little Rock,
Arkansas, have proven Moskowitz right.
Since 2003, Kelley went for it on fourth
down (in various yardage situations)
500 times and has a 49 percent success
rate. Pulaski Academy has won the
state championship three times
since Kelley became head coach.9
Addressing the human factor
As in the sports examples, statistical
analysis applied to business can
surface findings that contradict longheld assumptions. But the basic
principles aren’t complicated. “There
are certain statistical principles
and concepts that lie underneath
all the sophisticated methods. You
can get a lot out of or you can go far
without having to do complicated
math,” says Kaiser Fung, an adjunct
professor at New York University.
Simply looking at variability is an
example. Fung considers variability
a neglected factor in comparison to
averages, for example. If you run a
9 Seth Borenstein, “Unlike Patriots, NFL slow to
embrace ‘Moneyball’,” Seattle Times, February
3, 2012, http://seattletimes.nwsource.com/html/
sports/2017409917_apfbnsuperbowlanalytics.html,
accessed February 10, 2012.
40
PwC Technology Forecast 2012 Issue 1
theme park and can reduce the longest
wait times for rides, that is a clear way
to improve customer satisfaction, and it
may pay off more and be less expensive
than reducing the average wait time.
Much of the utility of statistics is to
confront old thinking habits with valid
findings that may seem counterintuitive
to those who aren’t accustomed to
working with statistics or acting on the
basis of their findings. Clearly there
is utility in counterintuitive but valid
findings that have ties to practical
business metrics. They get people’s
attention. To counter old thinking habits,
businesses need to raise the profiles of
statisticians, scientists, and engineers
who are versed in statistical methods,
and make their work more visible. That
in turn may help to raise the visibility
of statistical analysis by embedding
statistical software in the day-to-day
business software environment.
R: Statistical software’s
open source evolution
Until recently, statistical software
packages were in a group by themselves.
College students who took statistics
classes used a particular package, and
the language it used was quite different
from programming languages such as
Java. Those students had to learn not
only a statistical language, but also other
programming languages. Those who
didn’t have this breadth of knowledge
of languages faced limitations in what
they could do. Others who were versed
in Python or Java but not a statistical
package were similarly limited.
What’s happened since then is the
proliferation of R, an open source
statistical programming language
that lends itself to more uses in
Associative search
business environments. R has become
popular in universities and now has
thousands of ancillary open source
applications in its ecosystem. In its latest
incarnations, it has become part of the
fabric of big data and more visually
oriented analytics environments.
Particularly for the kinds of enterprise
databases used in business intelligence,
simple keyword search goes only so
far. Keyword searches often come
up empty for semantic reasons—the
users doing the searching can’t guess
the term in a database that comes
closest to what they’re looking for.
R in open source big data
environments. Statisticians have
typically worked with small data
sets on their laptops, but now they
can work with R directly on top of
Hadoop, an open source cluster
computing environment.10 Revolution
Analytics, which offers a commercial
R distribution, created a Hadoop
interface for R in 2011, so R users will
not be required to use MapReduce or
Java.11 The result is a big data analytics
capability for R statisticians and
programmers that didn’t exist before,
one that requires no additional skills.
R convertible to SQL and part of
the Oracle big data environment.
In January 2012, Oracle announced
Oracle R Enterprise, its own
distribution of R, which is bundled
with a Hadoop distribution in its big
data appliance. With that distribution,
R users can run their analyses in the
Oracle 11G database. Oracle claims
performance advantages when
running in its own database.12
Integrating interactive visualization
with R. One of the newest capabilities
involving R is its integration with
10 See “Making sense of Big Data,” Technology Forecast
2010, Issue 3, http://www.pwc.com/us/en/technologyforecast/2010/issue3/index.jhtml, and Architecting the
data layer for analytic applications, PwC white paper,
Spring 2011, http://www.pwc.com/us/en/increasingit-effectiveness/assets/pwc-data-architecture.pdf,
accessed April 5, 2012, to learn more about Hadoop
and other NoSQL databases.
11 Timothy Prickett Morgan, “Revolution speeds stats on
Hadoop clusters,” The Register, September 27, 2011,
http://www.theregister.co.uk/2011/09/27/revolution_r_
hadoop_integration/, accessed February 10, 2012.
12 Doug Henschen, “Oracle Analytics Package Expands
In-Database Processing Options,” InformationWeek,
February 8, 2012, http://informationweek.com/news/
software/bi/232600448, accessed February 10, 2012.
To address this problem, self-service BI
tools such as QlikView offer associative
search. Associative search allows
users to select two or more fields and
search occurrences in both to find
references to a third concept or name.
With the help of this technique, users
can gain unexpected insights and
make discoveries by clearly seeing
how data is associated—sometimes
for the very first time. They ask a
stream of questions by making a series
of selections, and they instantly see
all the fields in the application filter
themselves based on their selections.
At any time, users can see not only what
data is associated—but what data is
not related. The data related to their
selections is highlighted in white while
unrelated data is highlighted in gray.
In the case of QlikView’s associative
search, users type relevant words or
phrases in any order and get quick,
associative results. They can search
across the entire data set, and with
search boxes on individual lists, users
can confine the search to just that
field. Users can conduct both direct
and indirect searches. For example,
if a user wanted to identify a sales
rep but couldn’t remember the sales
rep’s name—just details about the
person, such as that he sells fish to
customers in the Nordic region—
the user could search on the sales
rep list box for “Nordic” and “fish”
to narrow the search results to just
sellers who meet those criteria.
interactive visualization.13 R is best
known for its statistical analysis
capabilities, not its interface. However,
interactive visualization tools such
as Omniscope are beginning to
offer integration with R, improving
the interface significantly.
The resulting integration makes it
possible to preview data from various
sources, drag and drop from those
sources and individual R statistical
operations, and drag and connect
to combine and display results.
Users can view results in either a
data manager view or a graph view
and refine the visualization within
either or both of those views.
13 See Steve Miller, “Omniscope and R,” Information
Management, February 7, 2012, http://
www.information-management.com/blogs/datascience-agile-BI-visualization-Visokio-10021894-1.html
and the R Statistics/Omniscope 2.7 video,
http://www.visokio.com/featured-videos, accessed
February 8, 2012.
Reshaping the workforce with the new analytics
41
R has benefitted greatly from its status
in the open source community, and
this has brought it into a mainstream
data analysis environment. There
is potential now for more direct
collaboration between the analysts and
the statisticians. Better visualization
and tablet interfaces imply an
ability to convey statistically based
information more powerfully and
directly to an executive audience.
Conclusion: No lack of vision,
resources, or technology
The new analytics certainly doesn’t lack
for ambition, vision, or technological
innovation. SAP intends to base
its new applications architecture
on the HANA in-memory database
appliance. Oracle envisions running
whole application suites in memory,
starting with BI. Others that offer BI
or columnar database products have
similar visions. Tableau Software and
others in interactive visualization
continue to refine and expand a visual
language that allows even casual users
to extract, analyze, and display in a few
drag-and-drop steps. More enterprises
are keeping their customer data
longer, so they can mine the historical
record more effectively. Sensors
are embedded in new places daily,
generating ever more data to analyze.
42
PwC Technology Forecast 2012 Issue 1
There is clear promise in harnessing
the power of a larger proportion of the
whole workforce with one aspect or
another of the new analytics. But that’s
not the only promise. There’s also the
promise of more data and more insight
about the data for staff already fully
engaged in BI, because of processes
that are instrumented closer to the
action; the parsing and interpretation
of prose, not just numbers; the speed
that questions about the data can be
asked and answered; the ability to
establish whether a difference is random
error or real and repeatable; and the
active engagement with analytics that
interactive visualization makes possible.
These changes can enable a company to
be highly responsive to its environment,
guided by a far more accurate
understanding of that environment.
There are so many different ways now to
optimize pieces of business processes, to
reach out to new customers, to debunk
old myths, and to establish realities
that haven’t been previously visible. Of
course, the first steps are essential—
putting the right technologies in place
can set organizations in motion toward
a culture of inquiry and engage those
who haven’t been fully engaged.
There are so many different ways
now to optimize pieces of business
processes, to reach out to new
customers, to debunk old myths, and
to establish realities that haven’t
been previously visible. Of course,
the first steps are essential—putting
the right technologies in place
can set organizations in motion
toward a culture of inquiry.
The new analytics
certainly doesn’t lack
for ambition, vision, or
technological innovation.
Reshaping the workforce with the new analytics
43
44
PwC Technology Forecast 2012 Issue 1
Natural language
processing and social
media intelligence
Mining insights from social media data requires
more than sorting and counting words.
By Alan Morrison and Steve Hamby
Most enterprises are more than eager
to further develop their capabilities in
social media intelligence (SMI)—the
ability to mine the public social media
cloud to glean business insights and act
on them. They understand the essential
value of finding customers who discuss
products and services candidly in public
forums. The impact SMI can have goes
beyond basic market research and test
marketing. In the best cases, companies
can uncover clues to help them revisit
product and marketing strategies.
“Ideally, social media can function
as a really big focus group,” says Jeff
Auker, a director in PwC’s Customer
Impact practice. Enterprises, which
spend billions on focus groups, spent
nearly $1.6 billion in 2011 on social
media marketing, according to Forrester
Research. That number is expected to
grow to nearly $5 billion by 2016.1
1 Shar VanBoskirk, US Interactive Marketing Forecast,
2011 To 2016, Forrester Research report, August 24,
2011, http://www.forrester.com/rb/Research/us_
interactive_marketing_forecast%2C_2011_to_2016/q/
id/59379/t/2, accessed February 12, 2012.
Auker cites the example of a media
company’s use of SocialRep,2 a tool
that uses a mix of natural language
processing (NLP) techniques to scan
social media. Preliminary scanning for
the company, which was looking for a
gentler approach to countering piracy,
led to insights about how motivations
for movie piracy differ by geography. “In
India, it’s the grinding poverty. In Eastern
Europe, it’s the underlying socialist
culture there, which is, ‘my stuff is your
stuff.’ There, somebody would buy a
film and freely copy it for their friends.
In either place, though, intellectual
property rights didn’t hold the same
moral sway that they did in some
other parts of the world,” Auker says.
This article explores the primary
characteristics of NLP, which is the
key to SMI, and how NLP is applied
to social media analytics. The article
considers what’s in the realm of the
possible when mining social media
text, and how informed human
analysis becomes essential when
interpreting the conversations that
machines are attempting to evaluate.
2 PwC has joint business relationships with SocialRep,
ListenLogic, and some of the other vendors mentioned
in this publication.
Reshaping the workforce with the new analytics
45
“It takes very rare
skill sets in the NLP
community to figure
this stuff out. It’s
incredibly processing
and storage intensive,
and it takes awhile.
If you used pure NLP
to tell me everything
that’s going on, by the
time you indexed all
the conversations, it
might be days or weeks
later. By then, the whole
universe isn’t what it
used to be.”
—Jeff Auker, PwC
Natural language processing:
Its components and social
media applications
NLP technologies for SMI are just
emerging. When used well, they serve
as a more targeted, semantically based
complement to pure statistical analysis,
which is more scalable and able to
tackle much larger data sets. While
statistical analysis looks at the relative
frequencies of word occurrences and
the relationships between words,
NLP tries to achieve deeper insights
into the meanings of conversations.
The best NLP tools can provide a level
of competitive advantage, but it’s a
challenging area for both users and
vendors. “It takes very rare skill sets
in the NLP community to figure this
stuff out,” Auker says. “It’s incredibly
processing and storage intensive,
and it takes awhile. If you used pure
NLP to tell me everything that’s
going on, by the time you indexed all
the conversations, it might be days
or weeks later. By then, the whole
universe isn’t what it used to be.”
First-generation social media monitoring
tools provided some direct business
value, but they also left users with more
questions than answers. And context was
a key missing ingredient. Rick Whitney,
a director in PwC’s Customer Impact
practice, makes the following distinction
between the first- and secondgeneration SMI tools: “Without good
NLP, the first-generation tools don’t
give you that same context,” he says.
What constitutes good NLP is open
to debate, but it’s clear that some
of the more useful methods blend
different detailed levels of analysis
and sophisticated filtering, while
others stay attuned to the full context
of the conversations to ensure that
novel and interesting findings that
inadvertently could be screened
out make it through the filters.
46
PwC Technology Forecast 2012 Issue 1
Types of NLP
NLP consists of several subareas of
computer-assisted language analysis,
ways to help scale the extraction of
meaning from text or speech. NLP
software has been used for several
years to mine data from unstructured
data sources, and the software had its
origins in the intelligence community.
During the past few years, the locus
has shifted to social media intelligence
and marketing, with literally
hundreds of vendors springing up.
NLP techniques span a wide range,
from analysis of individual words and
entities, to relationships and events, to
phrases and sentences, to documentlevel analysis. (See Figure 1.) The
primary techniques include these:
Word or entity (individual
element) analysis
• Word sense disambiguation—
Identifies the most likely meaning of
ambiguous words based on context
and related words in the text. For
example, it will determine if the word
“bank” refers to a financial institution,
the edge of a body of water, the act of
relying on something, or one of the
word’s many other possible meanings.
• Named entity recognition
(NER)—Identifies proper nouns.
Capitalization analysis can help with
NER in English, for instance, but
capitalization varies by language
and is entirely absent in some.
• Entity classification—Assigns
categories to recognized entities.
For example, “John Smith” might
be classified as a person, whereas
“John Smith Agency” might be
classified as an organization, or more
specifically “insurance company.”
Figure 1: The varied paths to meaning in text analytics
Machines need to review many different kinds of clues to be able to deliver
meaningful results to users.
Documents
Metadata
Lexical
graphs
Words
Sentences
Social
graphs
Meaning
• Part of speech (POS) tagging—
Assigns a part of speech (such as
noun, verb, or adjective) to every
word to form a foundation for
phrase- or sentence-level analysis.
about its competitors—even though
a single verb “blogged” initiated the
two events. Event analysis can also
define relationships between entities
in a sentence or phrase; the phrase
“Sally shot John” might establish
a relationship between John and
Sally of murder, where John is also
categorized as the murder victim.
Relationship and event analysis
• Relationship analysis—Determines
relationships within and across
sentences. For example, “John’s
wife Sally …” implies a symmetric
relationship of spouse.
• Event analysis—Determines the
type of activity based on the verb
and entities that have been assigned
to a classification. For example,
an event “BlogPost” may have two
types associated with it—a blog post
about a company versus a blog post
• Co-reference resolution—Identifies
words that refer to the same entity.
For example, in these two sentences—
“John bought a gun. He fired the
gun when he went to the shooting
range.”—the “He” in the second
sentence refers to “John” in the first
sentence; therefore, the events in the
second sentence are about John.
Reshaping the workforce with the new analytics
47
Syntactic (phrase and sentence
construction) analysis
• Syntactic parsing—Generates a parse
tree, or the structure of sentences and
phrases within a document, which
can lead to helpful distinctions at the
document level. Syntactic parsing
often involves the concept of sentence
segmentation, which builds on
tokenization, or word segmentation,
in which words are discovered within
a string of characters. In English and
other languages, words are separated
by spaces, but this is not true in some
languages (for instance, Chinese).
• Language services—Range
from translation to parsing and
extracting in native languages.
For global organizations, these
services are a major differentiator
because of the different techniques
required for different languages.
Document analysis
• Summarization and topic
identification—Summarizes (in the
case of topic identification) in a few
words the topic of an entire document
or subsection. Summarization, by
contrast, provides a longer summary
of a document or subsection.
• Sentiment analysis—Recognizes
subjective information in a
document that can be used to
identify “polarity” or distinguish
between entirely opposite entities
and topics. This analysis is often
used to determine trends in public
opinion, but it also has other uses,
such as determining confidence
in facts extracted using NLP.
• Metadata analysis—Identifies and
analyzes the document source, users,
dates, and times created or modified.
48
PwC Technology Forecast 2012 Issue 1
NLP applications require the use of
several of these techniques together.
Some of the most compelling NLP
applications for social media analytics
include enhanced extraction, filtered
keyword search, social graph analysis,
and predictive and sentiment analysis.
Enhanced extraction
NLP tools are being used to mine both
the text and the metadata in social
media. For example, the inTTENSITY
Social Media Command Center (SMCC)
integrates Attensity Analyze with
Inxight ThingFinder—both established
tools—to provide a parser for social
media sources that include metadata
and text. The inTTENSITY solution
uses Attensity Analyze for predicate
analysis to provide relationship
and event analysis, and it uses
ThingFinder for noun identification.
Filtered keyword search
Many keyword search methods exist.
Most require lists of keywords to be
defined and generated. Documents
containing those words are matched.
WordStream is one of the prominent
tools in keyword search for SMI. It
provides several ways for enterprises
to filter keyword searches.
Social graph analysis
Social graphs assist in the study
of a subject of interest, such as a
customer, employee, or brand.
These graphs can be used to:
• Determine key influencers in
each major node section
• Discover if one aspect of the brand
needs more attention than others
• Identify threats and opportunities
based on competitors and industry
• Provide a model for
collaborative brainstorming
Many NLP-based social graph tools
extract and classify entities and
relationships in accordance with a
defined ontology or graph. But some
social media graph analytics vendors,
such as Nexalogy Environics, rely on
more flexible approaches outside
standard NLP. “NLP rests upon what
we call static ontologies—for example,
the English language represented in
a network of tags on about 30,000
concepts could be considered a static
ontology,” Claude Théoret, president
of Nexalogy Environics, explains.
“The problem is that the moment
you hit something that’s not in the
ontology, then there’s no way of
figuring out what the tags are.”
In contrast, Nexalogy Environics
generates an ontology for each data
set, which makes it possible to capture
meaning missed by techniques that
are looking just for previously defined
terms. “That’s why our stuff is not
quite real time,” he says, “because
the amount of number crunching
you have to do is huge and there’s no
human intervention whatsoever.” (For
an example of Nexalogy’s approach,
see the article, “The third wave of
customer analytics,” on page 06.)
Predictive analysis and early warning
Predictive analysis can take many forms,
and NLP can be involved, or it might not
be. Predictive modeling and statistical
analysis can be used effectively without
the help of NLP to analyze a social
network and find and target influencers
in specific areas. Before he came to
PwC, Mark Paich, a director in the
firm’s advisory service, did some agentbased modeling3 for a Los Angeles–
based manufacturer that hoped to
3 Agent-based modeling is a means of understanding
the behavior of a system by simulating the behavior
of individual actors, or agents, within that system.
For more on agent-based modeling, see the article
“Embracing unpredictability” and the interview with
Mark Paich, “Using simulation tools for strategic
decision making,” in Technology Forecast 2010, Issue
1, http://www.pwc.com/us/en/technology-forecast/
winter2010/index.jhtml, accessed February 14, 2012.
What constitutes good NLP is
open to debate, but it’s clear
that some of the more useful
methods blend different
detailed levels of analysis and
sophisticated filtering, while
others stay attuned to the full
context of the conversations.
change public attitudes about its
products. “We had data on what products
people had from the competitors and
what people had products from this
particular firm. And we also had some
survey data about attitudes that people
had toward the product. We were able
to say something about what type of
people, according to demographic
characteristics, had different attitudes.”
Paich’s agent-based modeling
effort matched attitudes with the
manufacturer’s product types. “We
calibrated the model on the basis of
some fairly detailed geographic data
to get a sense as to whose purchases
influenced whose purchases,” Paich
says. “We didn’t have direct data that
said, ‘I influence you.’ We made some
assumptions about what the network
would look like, based on studies of
who talks to whom. Birds of a feather
flock together, so people in the same age
groups who have other things in common
Reshaping the workforce with the new analytics
49
tend to talk to each other. We got a
decent approximation of what a network
might look like, and then we were
able to do some statistical analysis.”
“Our models are built
on seeds from analysts
with years of experience
in each industry. We
can put in the word
‘Escort’ or ‘Suburban,’
and then behind that
put a car brand such
as ‘Ford’ or ‘Chevy.’ The
models combined could
be strings of 250 filters
of various types.”
—Vince Schiavone,
ListenLogic
That statistical analysis helped with
the influencer targeting. According
to Paich, “It said that if you want to
sell more of this product, here are the
key neighborhoods. We identified
the key neighborhood census tracts
you want to target to best exploit
the social network effect.”
PwC Technology Forecast 2012 Issue 1
Nearly all SMI products provide
some form of timeline analysis of
social media traffic with historical
analysis and trending predictions.
Predictive modeling is helpful when
the level of specificity needed is high
(as in the Los Angeles manufacturer’s
example), and it’s essential when
the cost of a wrong decision is high.4
But in other cases, less formal social
media intelligence collection and
analysis are often sufficient. When it
comes to brand awareness, NLP can
help provide context surrounding
a spike in social media traffic about
a brand or a competitor’s brand.
Sentiment analysis
Even when overall social media traffic
is within expected norms or predicted
trends, the difference between positive,
neutral, and negative sentiment can
stand out. Sentiment analysis can
suggest whether a brand, customer
support, or a service is better or
worse than normal. Correlating
sentiment to recent changes in
product assembly, for example,
could provide essential feedback.
That spike could be a key data point
to initiate further action or research
to remediate a problem before it gets
worse or to take advantage of a market
opportunity before a competitor does.
(See the article, “The third wave of
customer analytics,” on page 06.)
Because social media is typically faster
than other data sources in delivering
early indications, it’s becoming a
preferred means of identifying trends.
Most customer sentiment analysis
today is conducted only with statistical
analysis. Government intelligence
agencies have led with more advanced
methods that include semantic analysis.
In the US intelligence community,
media intelligence generally provides
early indications of events important
to US interests, such as assessing
the impact of terrorist activities on
voting in countries the Unites States
is aiding, or mining social media for
early indications of a disease outbreak.
In these examples, social media
prove to be one of the fastest, most
accurate sources for this analysis.
4 For more information on best practices for the use of
predictive analytics, see Putting predictive analytics to
work, PwC white paper, January 2012, http://
www.pwc.com/us/en/increasing-it-effectiveness/
publications/predictive-analytics-to-work.jhtml,
accessed February 14, 2012.
50
Many companies mine social media to
determine who the key influencers are
for a particular product. But mining
the context of the conversations via
interest graph analysis is important.
“As Clay Shirky pointed out in 2003,
influence is only influential within
a context,” Théoret says.
Table 1: A few NLP best practices
Strategy
Description
Benefits
Mine the
aggregated data.
Many tools monitor individual accounts.
Clearly enterprises need more than
individual account monitoring.
Scalability and efficiency of the mining effort are essential.
Segment the
interest graph in
a meaningful way.
Regional segmentation, for instance,
is important because of differences in
social media adoption by country.
Orkut is larger than Facebook in Brazil, for instance, and
Qzone is larger in China. Global companies need global
social graph data.
Conduct deep
parsing.
Deep parsing takes advantage of a
range of NLP extraction techniques
rather than just one.
Multiple extractors that use the best approaches in individual
areas—such as verb analysis, sentiment analysis, named
entity recognition, language services, and so forth—provide
better results than the all-in-one approach.
Align internal models
to the social model.
After mining the data for social graph
clues, the implicit model that results
should be aligned to the models used
for other data sources.
With aligned customer models, enterprises can correlate
social media insights with logistics problems and shipment
delays, for example. Social media serves in this way as an
early warning or feedback mechanism.
Take advantage of
alternatives to
mainstream NLP.
Approaches outside the mainstream
can augment mainstream tools.
Tools that take a bottom-up approach and surface more
flexible ontologies, for example, can reveal insights
other tools miss.
NLP-related best practices
After considering the breadth of NLP,
one key takeaway is to make effective
use of a blend of methods. Too simple
an approach can’t eliminate noise
sufficiently or help users get to answers
that are available. Too complicated an
approach can filter out information
that companies really need to have.
Some tools classify many different
relevant contexts. ListenLogic, for
example, combines lexical, semantic,
and statistical analysis, as well as models
the company has developed to establish
specific industry context. “Our models
are built on seeds from analysts with
years of experience in each industry.
We can put in the word ‘Escort’ or
‘Suburban,’ and then behind that put
a car brand such as ‘Ford’ or ‘Chevy,’”
says Vince Schiavone, co-founder and
executive chairman of ListenLogic.
“The models combined could be strings
of 250 filters of various types.” The
models fall into five categories:
• Direct concept filtering—Filtering
based on the language of social media
• Ontological—Models describing
specific clients and their product lines
• Action—Activity associated
with buyers of those products
• Persona—Classes of social
media users who are posting
• Topic—Discovery algorithms for
new topics and topic focusing
Other tools, including those from
Nexalogy Environics, take a bottom-up
approach, using a data set as it comes
and, with the help of several proprietary universally applicable algorithms,
processing it with an eye toward categorization on the fly. Equally important,
Nexalogy’s analysts provide interpretations of the data that might not be evident
to customers using the same tool. Both
kinds of tools have strengths and weaknesses. Table 1 summarizes some of the
key best practices when collecting SMI.
Reshaping the workforce with the new analytics
51
Conclusion: A machine-assisted
and iterative process, rather
than just processing alone
Good analysis requires consideration
of a number of different clues and
quite a bit of back-and-forth. It’s not
a linear process. Some of that process
can be automated, and certainly it’s in
a company’s interest to push the level of
automation. But it’s also essential not to
put too much faith in a tool or assume
that some kind of automated service
will lead to insights that are truly game
changing. It’s much more likely that the
tool provides a way into some far more
extensive investigation, which could
lead to some helpful insights, which
then must be acted upon effectively.
One of the most promising aspects of
NLP adoption is the acknowledgment
that structuring the data is necessary to
help machines interpret it. Developers
have gone to great lengths to see how
much knowledge they can extract with
the help of statistical analysis methods,
and it still has legs. Search engine
companies, for example, have taken pure
The tool provides a way into
some far more extensive
investigation, which could
lead to some helpful insights,
which then must be acted
upon effectively.
statistical analysis to new levels, making
it possible to pair a commonly used
phrase in one language with a phrase
in another based on some observation
of how frequently those phrases are
used. So statistically based processing
is clearly useful. But it’s equally clear
from seeing so many opaque social
media analyses that it’s insufficient.
Structuring textual data, as with
numerical data, is important. Enterprises
cannot get to the web of data if the data
is not in an analysis-friendly form—a
database of sorts. But even when
something materializes resembling a
better described and structured web,
not everything in the text of a social
media conversation will be clear.
The hope is to glean useful clues and
starting points from which individuals
can begin their own explorations.
Perhaps one of the more telling
trends in social media is the rise of
online word-of-mouth marketing
and other similar approaches that
borrow from anthropology. So-called
social ethnographers are monitoring
how online business users behave,
and these ethnographers are using
NLP-based tools to land them in a
neighborhood of interest and help them
zoom in once there. The challenge is
how to create a new social science of
online media, one in which the tools
are integrated with the science.
“As Clay Shirky pointed out
in 2003, influence is only
influential within a context.”
—Claude Théoret, Nexalogy
Environics
52
PwC Technology Forecast 2012 Issue 1
An in-memory appliance
to explore graph data
YarcData’s uRiKA analytics appliance,1
announced at O’Reilly’s Strata data
science conference in March 2012, is
designed to analyze the relationships
between nodes in large graph data sets.
To accomplish this feat, the system
can take advantage of as much as
512TB of DRAM and 8,192 processors
with over a million active threads.
In-memory appliances like these allow
very large data sets to be stored and
analyzed in active or main memory,
avoiding memory swapping to disk
that introduces lots of latency. It’s
possible to load full business intelligence
(BI) suites, for example, into RAM to
speed up the response time as much
as 100 times. (See “What in-memory
technology does” on page 33 for more
information on in-memory appliances.)
With compression, it’s apparent that
analysts can query true big data (data
sets of greater than 1PB) directly in main
memory with appliances of this size.
Besides the sheer size of the system,
uRiKA differs from other appliances
because it’s designed to analyze graph
data (edges and nodes) that take the
form of subject-verb-object triples.
This kind of graph data can describe
relationships between people, places,
and things scalably. Flexible and richly
described data relationships constitute
an additional data dimension users can
mine, so it’s now possible, for example,
to query for patterns evident in the
graphs that aren’t evident otherwise,
whether unknown or purposely hidden.2
1 The YarcData uRiKA Graph Appliance: Big Data
Relationship Analytics, Cray white paper, http://www.
yarcdata.com/productbrief.html, March 2012, accessed
April 3, 2012.
But mining graph data, as YarcData
(a unit of Cray) explains, demands
a system that can process graphs
without relying on caching, because
mining graphs requires exploring
many alternative paths individually
with the help of millions of threads—
a very memory- and processorintensive task. Putting the full graph
in a single random access memory
space makes it possible to query it and
retrieve results in a timely fashion.
The first customers for uRiKA are
government agencies and medical
research institutes like the Mayo Clinic,
but it’s evident that social media
analytics developers and users would
also benefit from this kind of appliance.
Mining the social graph and the larger
interest graph (the relationships
between people, places, and things)
is just beginning.3 Claude Théoret
of Nexalogy Environics has pointed
out that crunching the relationships
between nodes at web scale hasn’t
previously been possible. Analyzing
the nodes themselves only goes so far.
3 See “The collaboration paradox,” Technology Forecast
2011, Issue 3, http://www.pwc.com/us/en/technologyforecast/2011/issue3/features/feature-socialinformation-paradox.jhtml#, for more information on the
interest graph.
2 Michael Feldman, “Cray Parlays Supercomputing
Technology Into Big Data Appliance,” Datanami,
March 2, 2012, http://www.datanami.com/
datanami/2012-03-02/cray_parlays_supercomputing_
technology_into_big_data_appliance.html, accessed
April 3, 2012.
Reshaping the workforce with the new analytics
53
The payoff from
interactive
visualization
Jock Mackinlay of Tableau Software
discusses how more of the workforce
has begun to use analytics tools.
Interview conducted by Alan Morrison
Jock Mackinlay
Jock Mackinlay is the director of visual
analysis at Tableau Software.
PwC: When did you come
to Tableau Software?
JM: I came to Tableau in 2004 out of
the research world. I spent a long time
at Xerox Palo Alto Research Center
working with some excellent people—
Stuart Card and George Robertson, who
are both recently retired. We worked
in the area of data visualization for a
long time. Before that, I was at Stanford
University and did a PhD in the same
area—data visualization. I received
a Technical Achievement Award for
that entire body of work from the
IEEE organization in 2009. I’m one
of the lucky few people who had the
opportunity to take his research out into
the world into a successful company.
PwC: Our readers might
appreciate some context on
the whole area of interactive
visualization. Is the innovation
in this case task automation?
JM: There’s a significant limit to
how we can automate. It’s extremely
difficult to understand what a person’s
task is and what’s going on in their
head. When I finished my dissertation,
I chose a mixture of automated
techniques plus giving humans a lot
of power over thinking with data.
And that’s the Tableau philosophy
too. We want to provide people with
good defaulting as best we can but
also make it easy for people to make
adjustments as their tasks change.
When users are in the middle of looking
at some data, they might change their
minds about what questions they’re
asking. They need to head toward
that new question on the fly. No
automated system is going to keep up
with the stream of human thought.
54
PwC Technology Forecast 2012 Issue 1
“No amount of pre-computation or work
by an IT department is going to be able to
anticipate all the possible ways people might
want to work with data. So you need to have
a flexible, human-centered approach.”
PwC: Humans often don’t know
themselves what question they’re
ultimately interested in.
JM: Yes, it’s an iterative exploration
process. You cannot know up front
what question a person may want to ask
today. No amount of pre-computation
or work by an IT department is
going to be able to anticipate all the
possible ways people might want to
work with data. So you need to have
a flexible, human-centered approach
to give people a maximal ability to
take advantage of data in their jobs.
PwC: What did your research
uncover that helps?
JM: Part of the innovation of the
dissertation at Stanford was that the
algebra enables a simple drag-anddrop interface that anyone can use.
They drag fields and place them in
rows and columns or whatnot. Their
actions actually specify an algebraic
expression that gets compiled into
a query database. But they don’t
need to know all that. They just need
to know that they suddenly get to
see their data in a visual form.
PwC: One of the issues we run
into is that user interfaces are
often rather cryptic. Users must
be well versed in the tool from the
designer’s perspective. What have
you done to make it less cryptic,
to make what’s happening more
explicit, so that users don’t
present results that they think
are answering their questions
in some way but they’re not?
JM: The user experience in Tableau
is that you connect to your data and
you see the fields on the side. You can
drag out the fields and drop them
on row, column, color, size, and so
forth. And then the tool generates
the graphical views, so users can
see the data visualization. They’re
probably familiar with their data.
Most people are if they’re working
with data that they care about.
The graphical view by default codifies
the best practices for putting data in the
view. For example, if the user dragged
out a profit and date measure, because
it’s a date field, we would automatically
generate a line mark and give that user
a trend line view because that’s best
practice for profit varying over time.
If instead they dragged out product and
profit, we would give them a bar graph
view because that’s an appropriate
way to show that information. If they
selected a geographic field, they’ll
get a map view because that’s an
appropriate way to show geography.
We work hard to make it a rapid
exploration process, because not only
are tables and numbers difficult for
humans to process, but also because
a slow user experience will interrupt
cognition and users can’t answer the
questions. Instead, they’re spending
the time trying to make the tool work.
The whole idea is to make the tool
an extension of your hand. You don’t
think about the hammer. You just think
about the job of building a house.
PwC: Are there categories of
more structured data that would
lend themselves to this sort of
approach? Most of this data
presumably has been processed
to the point where it could be
fed into Tableau relatively
easily and then worked with
once it’s in the visual form.
JM: At a high level, that’s accurate.
One of the other key innovations of the
dissertation out of Stanford by Chris
Stolte and Pat Hanrahan was that they
built a system that could compile those
algebraic expressions into queries on
databases. So Tableau is good with any
information that you would find in a
database, both SQL databases and MDX
databases. Or, in other words, both
relational databases and cube databases.
But there is other data that doesn’t
necessarily fall into that form. It is just
data that’s sitting around in text files or
in spreadsheets and hasn’t quite got into
a database. Tableau can access that data
pretty well if it has a basic table structure
to it. A couple of releases ago, we
introduced what we call data blending.
A lot of people have lots of data in
lots of databases or tables. They
might be text files. They might be
Microsoft Access files. They might
be in SQL or Hyperion Essbase. But
whatever it is, their questions often
span across those tables of data.
Reshaping the workforce with the new analytics
55
Normally, the way to address that is to
create a federated database that joins
the tables together, which is a six-month
or greater IT effort. It’s difficult to query
across multiple data tables from multiple
databases. Data blending is a way—in
a lightweight drag-and-drop way—to
bring in data from multiple sources.
other, which we call grouping. At a
fundamental level, it’s a way you can
build up a hierarchical structure out of a
flat dimension easily by grouping fields
together. We also have some lightweight
support for supporting those hierarchies.
data about the lenders, their location,
the amount, and the borrower right
down to their photographs. And we
built a graphical view in Tableau. We
sliced and diced it first and then built
some graphical views for the demo.
We’ve also connected Tableau to
Hadoop. Do you know about it?
The key problem about it from a human
performance point of view is that there’s
high latency. It takes a long time for
the programs to run and process the
data. We’re interested in helping people
answer their questions at the speed of
their thought. And so latency is a killer.
Imagine you have a spreadsheet that
you’re using to keep track of some
information about your products,
and you have your company-wide
data mart that has a lot of additional
information about those products.
And you want to combine them. You
can direct connect Tableau to the data
mart and build a graphical view.
PwC: We wrote about Hadoop
in 2010. We did a full issue
on it as a matter of fact.1
JM: We’re using a connector to
Hadoop that Cloudera built that
allows us to write SQL and then access
data via the Hadoop architecture.
Then you can connect to your
spreadsheet, and maybe you build
a view about products. Or maybe
you have your budget in your
spreadsheet and you would like
compare the actuals to the budget
you’re keeping in your spreadsheet.
It’s a simple drag-and-drop operation
or a simple calculation to do that.
In particular, whenever we do demos
on stage, we like to look for real data
sets. We found one from Kiva, the
online micro-lending organization.
Kiva published the huge XML file
describing all of the organization’s
lenders and all of the recipients
of the loans. This is an XML file,
so it’s not your normal structured
data set. It’s also big, with multiple
years and lots of details for each.
So, you asked me this big question
about structured to unstructured data.
PwC: That’s right.
JM: We have functionality that allows
you to generate additional structure
for data that you might have brought
in. One of the features gives you the
ability—in a lightweight way—to
combine fields that are related to each
56
PwC Technology Forecast 2012 Issue 1
We processed that XML file in Hadoop
and used our connector, which has
string functions. We used those string
functions to reach inside the XML and
pull out what would be all the structured
1 See “Making sense of Big Data,” Technology Forecast
2010, Issue 3, http://www.pwc.com/us/en/technologyforecast/2010/issue3/index.jhtml, for more information.
We used the connection to process the
XML file and build a Tableau extract
file. That file runs on top of our data
engine, which is a high-performance
columnar database system. Once we had
the data in the Tableau extract format,
it was drag and drop at human speed.
We’re heading down this vector, but
this is where we are right now in terms
of being able to process less-structured
information into a form that you could
then use Tableau on effectively.
PwC: Interesting stuff. What
about in-memory databases
and how large they’re getting?
JM: Anytime there’s a technology that
can process data at fast rates, whether
it’s in-memory technology, columnar
databases, or what have you, we’re
excited. From its inception, Tableau
involved direct connecting to databases
and making it easy for anybody to be
able to work with it. We’re not just
about self-analytics; we’re also about
data storytelling. That can have as
much impact on the executive team
as directly being able themselves
to answer their own questions.
PwC: Is more of the workforce
doing the analysis now?
JM: I just spent a week at the
Tableau Customer Conference,
and the people that I meet are
extremely diverse. They’re not just
the hardcore analysts who know
about SPSS and R. They come from
all different sizes of companies
and nonprofits and on and on.
easier to use. I love that I have authentic
users all over the company and I can
ask them, “Would this feature help?”
So yes, I think the focus on the workforce
is essential. The trend here is that data
is being collected by our computers
almost unmanned, no supervision
necessary. It’s the process of utilizing
that data that is the game changer. And
the only way you’re going to do that
is to put the data in the hands of the
individuals inside your organization.
“A lot of people have lots of data in
lots of databases or tables. They
might be text files. They might be
Microsoft Access files. They might
be in SQL or Hyperion Essbase. But
whatever it is, their questions often
span across those tables of data.”
And the people at the customer
conferences are pretty passionate.
I think part of the passion is the
realization that you can actually work
with data. It doesn’t have to be this
horribly arduous process. You can
rapidly have a conversation with your
data and answer your questions.
Inside Tableau, we use Tableau
everywhere—from the receptionist
who’s tracking utilization of all the
conference rooms to the sales team
that’s monitoring their pipeline. My
major job at Tableau is on the team that
does forward product direction. Part
of that work is to make the product
Reshaping the workforce with the new analytics
57
Palm tree nursery. Palm oil is being tested to be
used in aviation fuel
58
PwC Technology Forecast 2012 Issue 1
How CIOs can build
the foundation for a
data science culture
Helping to establish a new culture of
inquiry can be a way for these executives to
reclaim a leadership role in information.
By Bud Mathaisel and Galen Gruman
The new analytics requires that CIOs
and IT organizations find new ways to
engage with their business partners.
For all the strategic opportunities new
analytics offers the enterprise, it also
threatens the relevance of the CIO. The
threat comes from the fact that the CIO’s
business partners are being sold data
analytics services and software outside
normal IT procurement channels,
which cuts out of the process the very
experts who can add real value.
Perhaps the vendors’ user-centric view
is based on the premise that only users
in functional areas can understand
which data and conclusions from its
analysis are meaningful. Perhaps the
CIO and IT have not demonstrated
the value they can offer, or they have
dwelled too much on controlling
security or costs to the detriment of
showing the value IT can add. Or
perhaps only the user groups have the
funding to explore new analytics.
Whatever the reasons, CIOs must rise
above them and find ways to provide
important capabilities for new analytics
while enjoying the thrill of analytics
discovery, if only vicariously. The IT
organization can become the go-to
group, and the CIO can become the
true information leader. Although it is
a challenge, the new analytics is also
an opportunity because it is something
within the CIO’s scope of responsibility
more than nearly any other development
in information technology.
The new analytics needs to be treated
as a long-term collaboration between
IT and business partners—similar to
the relationship PwC has advocated1
for the general consumerization-of-IT
phenomenon invoked by mobility,
social media, and cloud services. This
tight collaboration can be a win for
the business and for the CIO. The
new analytics is a chance for the CIO
to shine, reclaim the “I” leadership
in CIO, and provide a solid footing
for a new culture of inquiry.
1 The consumerization of IT: The next-generation CIO,
PwC white paper, November 2011, http://
www.pwc.com/us/en/technology-innovation-center/
consumerization-information-technology-transformingcio-role.jhtml, accessed February 1, 2012.
Reshaping the workforce with the new analytics
59
The many ways for CIOs to
be new analytics leaders
In businesses that provide information
products or services—such as
healthcare, finance, and some utilities—
there is a clear added value from having
the CIO directly contribute to the use
of new analytics. Consider Edwards
Lifesciences, where hemodynamic
(blood circulation) modeling has
benefited from the convergence of
new data with new tools to which
the CIO contributes. New digitally
enabled medical devices, which are
capable of generating a continuous
flow of data, provide the opportunity
to measure, analyze, establish pattern
boundaries, and suggest diagnoses.
“IT has partnered
successfully with Gallo’s
marketing, sales, R&D,
and distribution to
leverage the capabilities
of information from
multiple sources. IT
is not the focus of the
analytics; the business is.”
—Kent Kushar,
E. & J. Gallo Winery
60
PwC Technology Forecast 2012 Issue 1
“In addition, a personal opportunity
arises because I get to present our
newest product, the EV1000, directly to
our customers alongside our business
team,” says Ashwin Rangan, CIO of
Edwards Lifesciences. Rangan leverages
his understanding of the underlying
technologies, and, as CIO, he helps
provision the necessary information
infrastructure. As CIO, he also has
credibility with customers when he
talks to them about the information
capabilities of Edwards’ products.
For CIOs whose businesses are not in
information products or services, there’s
still a reason to engage in the new
analytics beyond the traditional areas
of enablement and of governance, risk,
and compliance (GRC). That reason
is to establish long-term relationships
with the business partners. In this
partnership, the business users decide
which analytics are meaningful,
and the IT professionals consult
with them on the methods involved,
including provisioning the data and
tools. These CIOs may be less visible
outside the enterprise, but they
have a crucial role to play internally
to jointly explore opportunities for
analytics that yield useful results.
E. & J. Gallo Winery takes this
approach. Its senior management
understood the need for detailed
customer analytics. “IT has partnered
successfully with Gallo’s marketing,
sales, R&D, and distribution to
leverage the capabilities of information
from multiple sources. IT is not the
focus of the analytics; the business
is,” says Kent Kushar, Gallo’s CIO.
“After working together with the
business partners for years, Gallo’s
IT recently reinvested heavily in
updated infrastructure and began
to coalesce unstructured data with
the traditional structured consumer
data.” (See “How the E. & J. Gallo
Winery matches outbound shipments
to retail customers” on page 11.)
Regardless of the CIO’s relationship
with the business, many technical
investments IT makes are the
foundation for new analytics. A CIO
can often leverage this traditional
role to lead new analytics from
behind the scenes. But doing even
that—rather than leading from the
front as an advocate for businessvaluable analytics—demands new
skills, new data architectures,
and new tools from IT.
At Ingram Micro, a technology
distributor, CIO Mario Leone views
a well-integrated IT architecture as
a critical service to business partners
to support the company’s diverse and
dynamic sales model and what Ingram
Micro calls the “frontier” analysis
of distribution logistics. “IT designs
the modular and scalable backplane
architecture to deliver real-time and
relevant analytics,” he says. On one
side of the backplane are multiple
data sources, primarily delivered
through partner interactions; on the
flip side of the backplane are analytics
tools and capabilities, including such
Figure 1: A CIO’s situationally specific roles
CIO #1
Focuses on inputs when
production innovation, for
example, is at a premium.
Backplane
OUTPUTS
INPUTS
Multiple
data
sources
Marketing
Sales
Distribution
Research and
development
CIO #2
Focuses on outputs when sales
or marketing, for example, is
the major concern.
new features as pattern recognition,
optimization, and visualization. Taken
together, the flow of multiple data
streams from different points and
advanced tools for business users
can permit more sophisticated and
iterative analyses that give greater
insight to product mix offerings,
changing customer buying patterns,
and electronic channel delivery
preferences. The backplane is a
convergence point of those data into a
coherent repository. (See Figure 1.)
Enable the data scientist
One course of action is to strategically
plan and provision the data and
infrastructure for the new sources
of data and new tools (discussed
in the next section). However, the
bigger challenge is to invoke the
productive capability of the users. This
challenge poses several questions:
Given these multiple ways for CIOs
to engage in the new analytics—
and the self-interest for doing so—
the next issue is how to do it. After
interviewing leading CIOs and
other industry experts, PwC offers
the following recommendations.
• Analytics capabilities have been
pursued for a long time, but several
hurdles have hindered the attainment
of the goal (such as difficult-to-use
tools, limited data, and too much
dependence on IT professionals).
CIOs must ask: which of these
impediments are eased by the new
capabilities and which remain?
• How can CIOs do this without
knowing in advance which users
will harvest the capabilities?
Reshaping the workforce with the new analytics
61
• As analytics moves more broadly
through the organization, there may
be too few people trained to analyze
and present data-driven conclusions.
Who will be fastest up the learning
curve of what to analyze, of how
to obtain and process data, and of
how to discover useful insights?
Josée Latendresse of
Latendresse Groupe
Conseil says one of her
clients, an apparel
manufacturer based in
Quebec, has been hiring
PhDs to serve in the data
science function.
What the enterprise needs is the data
scientist—actually, several of them.
A data scientist follows a scientific
method of iterative and recursive
analysis, with a practical result in
mind. Examples are easy to identify:
an outcome that improves revenue,
profitability, operations or supply chain
efficiency, R&D, financing, business
strategy, the use of human capital,
and so forth. There is no sure way of
knowing in advance where or when
this insight will arrive, so it cannot
be tackled in assembly line fashion
with predetermined outcomes.
The analytic approach involves trial and
error and accepts that there will be dead
ends, although a data scientist can even
draw a useful conclusion—“this doesn’t
work”—from a dead end. Even without
formal training, some business users
have the suitable skills, experience,
and mind-set. Others need to be
trained and encouraged to think like a
scientist but behave like a—choose the
function—financial analyst, marketer,
sales analyst, operations quality
analyst, or whatever. When it comes
to repurposing parts of the workforce,
it’s important to anticipate obstacles or
frequent objections and consider ways
to overcome them. (See Table 1.)
Josée Latendresse of Latendresse
Groupe Conseil says one of her clients,
an apparel manufacturer based in
Quebec, has been hiring PhDs to serve
in this function. “They were able to
know the factors and get very, very fine
analysis of the information,” she says.
62
PwC Technology Forecast 2012 Issue 1
Gallo has tasked statisticians in IT, R&D,
sales, and supply chain to determine
what information to analyze, the
questions to ask, the hypotheses to test,
and where to go after that, Kushar says.
The CIO has the opportunity to help
identify the skills needed and then
help train and support data scientists,
who may not reside in IT. CIOs should
work with the leaders of each business
function to answer the questions:
Where would information insights pay
the highest dividends? Who are the
likely candidates in their functions to
be given access to these capabilities,
as well as the training and support?
Many can gain or sharpen analytic skills.
The CIO is in the best position to ensure
that the skills are developed and honed.
The CIO must first provision the
tools and data, but the data analytics
requires the CIO and IT team to
assume more responsibility for the
effectiveness of the resources than in
the past. Kushar says Gallo has a team
within IT dedicated to managing and
proliferating business intelligence
tools, training, and processes.
When major systems were deployed
in the past, CIOs did their best to train
users and support them, but CIOs
only indirectly took responsibility
for the users’ effectiveness. In data
analytics, the responsibility is more
directly correlated: the investments
are not worth making unless IT steps
up to enhance the users’ performance.
Training should be comprehensive
and go beyond teaching the tools to
helping users establish an hypothesis,
iteratively discover and look for insights
from results that don’t match the
hypothesis, understand the limitations
of the data, and share the results with
others (crowdsourcing, for example)
who may see things the user does not.
Table 1: Barriers to adoption of analytics and ways to address them
Barrier
Solution
Too difficult to use
Ensure the tool and data are user friendly; use published application programming
interfaces (APIs) against data warehouses; seed user groups with analytics-trained
staff; offer frequent training broadly; establish an analytics help desk.
Refusal to accept facts and resulting
analysis, thereby discounting analytics
Require a 360-degree perspective and pay attention to dissenters; establish a culture
of fact finding, inquiry, and learning.
Lack of analytics incentives and
performance review criteria
Make contributions to insights from analytics an explicit part of performance reviews;
recognize and reward those who creatively use analytics.
Training should encompass multiple
tools, since part of what enables
discovery is the proper pairing of tool,
person, and problem; these pairings
vary from problem to problem and
person to person. You want a toolset to
handle a range of analytics, not a single
tool that works only in limited domains
and for specific modes of thinking.
The CIO could also establish and
reinforce a culture of information
inquiry by getting involved in data
analysis trials. This involvement lends
direct and moral support to some
of the most important people in the
organization. For CIOs, the bottom line
is to care for the infrastructure but focus
more on the actual use of information
services. Advanced analytics is adding
insight and power to those services.
Renew the IT infrastructure
for the new analytics
As with all IT investments, CIOs
are accountable for the payback
from analytics. For decades, much
time and money has been spent on
data architectures; identification of
“interesting” data; collecting, filtering,
storing, archiving, securing, processing,
and reporting data; training users;
and the associated software and
hardware in pursuit of the unique
insights that would translate to
improved marketing, increased sales,
improved customer relationships, and
more effective business operations.
Because most enterprises have been
frustrated by the lack of clear payoffs
from large investments in data analysis,
they may be tempted to treat the
new analytics as not really new. This
would be a mistake. As with most
developments in IT, there is something
old, something new, something
borrowed, and possibly something blue
in the new analytics. Not everything is
new, but that doesn’t justify treating
the new analytics as more of the
same. In fact, doing so indicates that
your adoption of the new analytics is
merely applying new tools and perhaps
personnel to your existing activities.
It’s not the tool per se that solves
problems or finds insights—it’s the
people who are able to explore openly
and freely and to think outside the box,
aided by various tools. So don’t just
re-create or refurbish the existing box.
Even if the CIO is skeptical and believes
analytics is in a major hype cycle,
there is still reason to engage. At the
very least, the new analytics extends
IT’s prior initiatives; for example,
the new analytics makes possible
Reshaping the workforce with the new analytics
63
the kind of analytics your company
has needed for decades to enhance
business decisions, such as complex,
real-time events management, or it
makes possible new, disruptive business
opportunities, such as the on-location
promotion of sales to mobile shoppers.
Given limited resources, a portfolio
approach is warranted. The portfolio
should encompass many groups in the
enterprise and the many functions they
perform. It also should encompass the
convergence of multiple data sources
and multiple tools. If you follow
Ingram Micro’s backplane approach,
you get the data convergence side of
the backplane from the combination
of traditional information sources
with new data sources. Traditional
information sources include structured
transaction data from enterprise
resource planning (ERP) and customer
relationship management (CRM)
systems; new data sources include
textual information from social media,
clickstream transactions, web logs,
radio frequency identification (RFID)
sensors, and other forms of unstructured
and/or disparate information.
The analytics tools side of the backplane
arises from the broad availability of
new tools and infrastructure, such as
mobile devices; improved in-memory
systems; better user interfaces
for search; significantly improved
visualization technologies; improved
pattern recognition, optimization, and
analytics software; and the use of the
cloud for storing and processing. (See
the article, “The art and science of new
analytics technology,” on page 30.)
Understanding what remains the
same and what is new is a key to
profiting from the new analytics.
Even for what remains the same,
additional investments are required.
64
PwC Technology Forecast 2012 Issue 1
Develop the new analytics
strategic plan
As always, the CIO should start with
a strategic plan. Gallo’s Kushar refers
to the data analytics specific plan as
a strategic plan for the “enterprise
information fabric,” a reference to all
the crossover threads that form an
identifiable pattern. An important
component of this fabric is the
identification of the uses and users that
have the highest potential for payback.
Places to look for such payback include
areas where the company has struggled,
where traditional or nontraditional
competition is making inroads, and
where the data has not been available
or granular enough until now.
The strategic plan must include the
data scientist talent required and the
technologies in which investments
need to be made, such as hardware and
software, user tools, structured and
unstructured data sources, reporting
and visualization capabilities, and
higher-capacity networks for moving
larger volumes of data. The strategic
planning process brings several benefits:
it updates IT’s knowledge of emerging
capabilities as well as traditional
and new vendors, and it indirectly
informs prospective vendors that the
CIO and IT are not to be bypassed.
Once the vendor channels are known
to be open, the vendors will come.
Criteria for selecting tools may vary
by organization, but the fundamentals
are the same. Tools must efficiently
handle larger volumes within acceptable
response times, be friendly to users and IT
support teams, be sound technically, meet
security standards, and be affordable.
The new appliances and tools could
each cost several millions of dollars, and
millions more to support. The good news
is some of the tools and infrastructure
can be rented through the cloud, and
then tested until the concepts and
You want a toolset to handle a
range of analytics, not a single
tool that works only in limited
domains and for specific
modes of thinking.
super-users have demonstrated their
potential. (See the interview with Mike
Driscoll on page 20.) “All of this doesn’t
have to be done in-house with expensive
computing platforms,” says Edwards’
Rangan. “You can throw it in the cloud
… without investing in tremendous
capital-intensive equipment.”
With an approved strategy, CIOs
can begin to update the IT internal
capabilities. At a minimum, IT must
first provision the new data, tools, and
infrastructure, and then ensure the IT
team is up to speed on the new tools
and capabilities. Gallo’s IT organization,
for example, recently reinvested
heavily in new appliances; system
architecture; extract, transform, and
load (ETL) tools; and ways in which
SQL calls were written, and then began
to coalesce unstructured data with the
traditional structured consumer data.
Provision data, tools,
and infrastructure
The talent, toolset, and infrastructure
are prerequisites for data analytics.
In the new analytics, CIOs and their
business partners are changing
or extending the following:
• Data sources to include the traditional
enterprise structured information
in core systems such as ERP, CRM,
manufacturing execution systems,
and supply chain, plus newer sources
such as syndicated data (point of sale,
Nielsen, and so on) and unstructured
data from social media and other
sources—all without compromising
the integrity of the production
systems or their data and while
managing data archives efficiently.
• Appliances to include faster
processing and better in-memory
caching. In-memory caching
improves cycle time significantly,
enabling information insights to
follow human thought patterns
closer to their native speeds.
• Software to include newer
data management, analysis,
reporting, and visualization
tools—likely multiple tools, each
tuned to a specific capability.
Reshaping the workforce with the new analytics
65
• Data architectures and flexible
metadata to accommodate multiple
streams of multiple types of data
stored in multiple databases. In
this environment, a single database
architecture is unworkable.
• A cloud computing strategy that
factors in the requirements of newly
expanded analytics capability and how
best to tap external as well as internal
resources. Service-level expectations
should be established for customers
to ensure that these expanded
sources of relevant data are always
online and available in real time.
The adoption of new analytics is
an opportunity for IT to augment
or update the business’s current
capabilities. According to Kushar,
Gallo IT’s latest investments are
extensions of what Gallo wanted to
do 25 years ago but could not due to
limited availability of data and tools.
Of course, each change requires a new
response from IT, and each raises the
perpetual dilemma of how to be selective
with investments (to conserve funds)
while being as broad and heterogeneous
as possible so a larger population
can create analytic insights, which
could come from almost anywhere.
Update IT capabilities:
Leverage the cloud’s capacity
With a strategic plan in place and the
tools provisioned, the next prerequisite
is to ensure that the IT organization is
ready to perform its new or extended
job. One part of this preparation
is the research on tools the team
needs to undertake with vendors,
consultancies, and researchers.
The CIO should consider some
organizational investments to add
to the core human resources in IT,
because once the business users
get traction, IT must be prepared
66
PwC Technology Forecast 2012 Issue 1
to meet the increased demands for
technical support. IT will need new
skills and capabilities that include:
• Broader access to all relevant
types of data, including data from
transaction systems and new sources
• Broader use of nontraditional
resources, such as big data
analytics services
• Possible creation of specialized
databases and data warehouses
• Competence in new tools and
techniques, such as database
appliances, column and row
databases, compression techniques,
and NoSQL frameworks
• Support in the use of tools for
reporting and visualization
• Updated approaches for mobile
access to data and analytic results
• New rules and approaches
to data security
• Expanded help desk services
Without a parallel investment in
IT skills, investments in tools and
infrastructure could lie fallow, causing
frustrated users to seek outside help.
For example, without advanced
compression and processing techniques,
performance becomes a significant
problem as databases grow larger and
more varied. That’s an IT challenge
that users would not anticipate, but
it could result in a poor experience
that leads them to third parties that
have solved the issue (even if the users
never knew what the issue was).
Most of the IT staff will welcome
the opportunities to learn new tools
and help support new capabilities,
even if the first reaction might be
to fret over any extra work. CIOs
must lead this evolution by being
a source for innovation and trends
in analytics, encouraging adoption,
having the courage to make the
investments, demonstrating trust in
IT teams and users, and ensuring that
execution matches the strategy.
Conclusion
Data analytics is no longer an obscure
science for specialists in the ivory tower.
Increasingly more analytics power is
available for more people. Thanks to
these new analytics, business users have
been unchained from prior restrictions,
and finding answers is easier, faster,
and less costly. Developing insightful,
actionable analytics is a necessary skill
for every knowledge worker, researcher,
consumer, teacher, and student. It is
driven by a world in which faster insight
is treasured, and it often needs to be real
time to be most effective. Real-time data
that changes quickly invokes a quest
for real-time analytic insights and is not
tolerant of insights from last quarter, last
month, last week, or even yesterday.
Enabling the productive use of
information tools is not a new obligation
for the CIO, but the new analytics
extends that obligation—in some
cases, hyperextends it. Fulfilling that
obligation requires the CIO to partner
with human resources, sales, and
other functional groups to establish
the analytics credentials for knowledge
workers and to take responsibility
for their success. The CIO becomes
a teacher and role model for the
increasing number of data engineers,
both the formal and informal ones.
Certainly, IT must do its part to plan
and provision the raw enabling
capabilities and handle GRC, but
more than ever, data analytics is the
opportunity for the CIO to move out
of the data center and into the front
office. It is the chance for the CIO to
demonstrate information leadership.
Developing insightful, actionable
analytics is a necessary skill for
every knowledge worker, researcher,
consumer, teacher, and student.
The adoption of new analytics is an
opportunity for IT to augment or update
the business’s current capabilities.
According to CIO Kent Kushar, Gallo IT’s
latest investments are extensions of what
Gallo wanted to do 25 years ago but
could not due to limited availability of
data and tools.
Reshaping the workforce with the new analytics
67
How visualization
and clinical decision
support can improve
patient care
Ashwin Rangan details what’s different about
hemodynamic monitoring methods these days.
Interview conducted by Bud Mathaisel and Alan Morrison
Ashwin Rangan
Ashwin Rangan is the CIO of Edwards
Lifesciences, a medical device company.
68
PwC Technology Forecast 2012 Issue 1
PwC: What are Edwards
Lifesciences’ main business
intelligence concerns given its role
as a medical device company?
AR: There’s the traditional application
of BI [business intelligence], and
then there’s the instrumentation
part of our business that serves many
different clinicians in the OR and
ICU. We make a hemodynamic [blood
circulation and cardiac function]
monitoring platform that is able to
communicate valuable information
and hemodynamic parameters
to the clinician using a variety of
visualization tools and a rich graphical
user interface. The clinician can use
this information to make treatment
decisions for his or her patients.
PwC: You’ve said that the form
in which the device provides
information adds value for the
clinician or guides the clinician.
What does the monitoring
equipment do in this case?
AR: The EV1000 Clinical Platform
provides information in a more
meaningful way, intended to better
inform the treating clinician and lead to
earlier and better diagnosis and care. In
the critical care setting, the earlier the
clinician can identify an issue, the more
choices the clinician has when treating
the patient. The instrument’s intuitive
screens and physiologic displays are also
ideal for teaching, presenting the various
hemodynamic parameters in the context
of each other. Ultimately, the screens are
intended to offer a more comprehensive
view of the patient’s status in a very
intuitive, user-friendly format.
PwC: How does this approach
compare with the way the
monitoring was done before?
AR: Traditional monitoring historically
presented physiologic information, in
this case hemodynamic parameters, in
the form of a number and in some cases
a trend line. When a parameter would
fall out of the defined target zones,
the clinician would be alerted with an
alarm and would be left to determine
the best course of action based upon
the displayed number or a line.
Comparatively, the EV1000 clinical
platform has the ability to show
physiologic animations and physiologic
decision trees to better inform
and guide the treating clinician,
whether it is a physician or nurse.
PwC: How did the physician
view the information before?
AR: It has been traditional in movies,
for example, to see a patient surrounded
by devices that displayed parameters,
all of which looked like numbers
and jagged lines on a timescale. In
our view and where we’re currently
at with the development of our
technology, this is considered more
basic hemodynamic monitoring.
In our experience, the “new-school”
hemodynamic monitoring is a device
that presents the dynamics of the
circulatory system, the dampness of
the lungs and the cardiac output realtime in an intuitive display. The only
lag time between what’s happening in
the patient and what’s being reflected
on the monitor is the time between the
analog body and the digital rendering.
PwC: Why is visualization
important to this process?
AR: Before, we tended to want to tell
doctors and nurses to think like engineers
when we constructed these monitors.
Now, we’ve taken inspiration from the
glass display in Minority Report [a 2002
science-fiction movie] and influenced
the design of the EV1000 clinical
platform screens. The EV1000 clinical
platform is unlike any other monitoring
tool because you have the ability to
customize display screens to present
parameters, color codes, time frames
and more according to specific patient
needs and/or clinician preferences, truly
offering the clinician what they need,
when they need it and how they need it.
We are no longer asking clinicians to
translate the next step in their heads. The
goal now is to have the engineer reflect
the data and articulate it in a contextual
and intuitive language for the clinician.
The clinician is already under pressure,
caring for critically ill patients; our goal
is to alleviate unnecessary pressure
and provide not just information but
also guidance, enabling the clinician
to more immediately navigate to
the best therapy decisions.
PwC: Looking toward the
next couple of years and some
of the emerging technical
capability, what do you
think is most promising?
AR: Visualization technologies. The
human ability to discern patterns is not
changing. That gap can only be bridged
by rendering technologies that are visual
in nature. And the visualization varies
Figure 1: Edwards Lifesciences
EV1000 wireless monitor
Patton Design helped develop
this monitor, which displays a range of
blood-circulation parameters
very simply.
Source: Patton Design, 2012
depending on the kind of statistics that
people are looking to understand.
I think we need to look at this
more broadly and not just print
bar graphs or pie graphs. What is
the visualization that can really be
contextually applicable with different
applications? How do you make it
easier? And more quickly understood?
Reshaping the workforce with the new analytics
69
To have a deeper conversation about
this subject, please contact:
Tom DeGarmo
US Technology Consulting Leader
+1 (267) 330 2658
[email protected]
Bill Abbott
Principal, Applied Analytics
+1 (312) 298 6889
[email protected]
Bo Parker
Managing Director
Center for Technology & Innovation
+1 (408) 817 5733
[email protected]
Oliver Halter
Principal, Applied Analytics
+1 (312) 298 6886
[email protected]
Robert Scott
Global Consulting Technology Leader
+1 (416) 815 5221
[email protected]
Comments or requests?
Please visit www.pwc.com/techforecast or send
e-mail to [email protected]
This publication is printed on McCoy Silk. It is a Forest Stewardship Council™ (FSC®) certified stock
containing 10% postconsumer waste (PCW) fiber and manufactured with 100% certified renewable energy.
By using postconsumer recycled fiber in lieu of virgin fiber:
6 trees were preserved for the future
16 lbs of waterborne waste were not created
2,426 gallons of wastewater flow were saved
268 lbs of solid waste were not generated
529 lbs net of greenhouse gases were prevented
4,046,000 BTUs of energy were not consumed
Photography
Catherine Hall: Cover, pages 06, 20
Gettyimages: pages 30, 44, 58
PwC (www.pwc.com) provides industry-focused assurance, tax and advisory services to build public trust and
enhance value for its clients and their stakeholders. More than 155,000 people in 153 countries across our
network share their thinking, experience and solutions to develop fresh perspectives and practical advice.
© 2012 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers
to the US member firm, and may sometimes refer to the PwC network. Each member firm is a separate legal
entity. Please see www.pwc.com/structure for further details. This content is for general information purposes
only, and should not be used as a substitute for consultation with professional advisors. NY-12-0340
www.pwc.com/techforecast
Subtext
Culture of inquiry
A business environment focused on asking better
questions, getting better answers to those questions,
and using the results to inform continual improvement.
A culture of inquiry infuses the skills and capabilities
of data scientists into business units and compels a
collaborative effort to find answers to critical business
questions. It also engages the workforce at large—
whether or not the workforce is formally versed in data
analysis methods—in enterprise discovery efforts.
In-memory
A method of running entire databases in random
access memory (RAM) without direct reliance on disk
storage. In this scheme, large amounts of dynamic
random access memory (DRAM) constitute the
operational memory, and an indirect backup method
called write-behind caching is the only disk function.
Running databases or entire suites in memory speeds
up queries by eliminating the need to perform disk
writes and reads for immediate database operations.
Interactive
visualization
The blending of a graphical user interface for
data analysis with the presentation of the results,
which makes possible more iterative analysis
and broader use of the analytics tool.
Natural language
processing (NLP)
ethods of modeling and enabling machines to extract
M
meaning and context from human speech or writing,
with the goal of improving overall text analytics results.
The linguistics focus of NLP complements purely
statistical methods of text analytics that can range from
the very simple (such as pattern matching in word
counting functions) to the more sophisticated (pattern
recognition or “fuzzy” matching of various kinds).
Fly UP