Our world is awash in data. Every action that takes place in a company, with customers and in the marketplace creates it. When we buy products, users interact with services, and colleagues collaborate, information is captured. Each time we visit the doctor, drive our cars, get on an airplane, or take a photo we produce more.
In the right hands, data can drive new insights and powerfully informed decisions. When combined with advances in artificial intelligence and machine learning, data can be transformational; and this makes it valuable.
Because of the value posed by data, an entirely new set of disciplines has emerged in order to capture, store, and work with the modern deluge. Collectively grouped under the label of "Data Science," this set of disciplines and its practitioners are positioned to help use information in new, exciting, and impactful ways such as:
- aiding doctors in the diagnosis of disease
- decreasing the time required to safely transition new medications to market
- streamlining the transportation system so we can move people more quickly and efficiently
- modeling disease outbreaks and helping to contain dangerous pathogens
Yet, for all of its importance and promise, Data Science is a poorly defined and misunderstood thing. In this article, we'll attempt to clear some of the fog. We will look at:
- how an increasing volume and diversity of data drives the need for Data Science capabilities
- what Data Science is and the steps required to build a data-driven organization so you can use its tools effectively
- the roles within the Data Science landscape and how they work together
- why organizations need Data Science in order to stay relevant and competitive
The Modern Information Flood
The modern world generates a lot of information; an enormous, ever increasing amount of information from a huge variety of sources.
And There's More ...
Data comes not only from people and mobile devices, there is an entire universe of additional information produced by sensors and Internet of Things (IoT) devices. These include satellites, weather and scientific monitoring systems, automated transport systems such as drones and self-driving cars, and many other sources.
All of this data encodes detail about our world and the ways in which we interact with it. It represents an inexhaustible stream of potential insight and action, an analytic pot of gold.
What is Data Science?
Which is where Data Science enters the picture. Data Science is the art of turning data into action. It's the set of tools and techniques we use to mine and refine the crude information into insight and intelligence.
Data Scientists, Engineers, and Analysts produce data products which are in turn used to support decision making. They are intended to answer questions such as: "Where should I invest my ad dollars to increase profit? How can I improve compliance while reducing costs? How can I improve communication and collaboration to better build products?"
As these questions become better understood, the data products built to supply answers take many forms, such as:
- recommendation engines capable of suggesting movies, books, articles, or other types of content based on a person's interests and needs
- forecasting systems which attempt to predict the weather, fluctuations in the stock market or other financial markets
- machine learning models that are able to help locate and take action on variations in industrial processes
- diagnostic engines capable of helping to automate the diagnosis of disease, ensure that the highest-risk patients receive prompt treatment, and safeguard public health by monitoring disease outbreaks
- customer management systems which can improve a company's impact by helping to target advertising, improve interactions, and personalize products
Almost anything can potentially become a data product, meaning there is a near-infinite number of uses-cases. As a result, Data Science team have an enormous scope of responsibility. Additionally, because data products are built from many moving parts involving the entirety of the organization, they require investments in software, data infrastructure, operations, and teams.
No company becomes a "Data Science", "Machine Learning", or "Artificial Intelligence" company overnight. Strictly speaking, "Data Science" isn't really practiced by individuals.
Building Data Science Capabilities
The effective practice of Data Science is something built over time. It is a rich discipline at the border of software development, business and domain knowledge, operations and engineering, and inquiry. It requires you to collect data, structure and update it, seek understanding through exploration, and enrich the raw information before you are able to utilize it in machine learning and artificial intelligence applications. Because of this, building capability is about nurturing teams and cultures.
Most of the time, it's impossible to find "unicorns," those rare individuals who have skills across all of the required domains to guide a project through all the phases required for success. Instead, it's better to create blended teams that are able to work together. Broadly speaking, there are four roles within Data Science -- software development, business analysts, data engineers, and data scientists -- all of which play an important part in a successful data-driven organization.
While each of the roles unlocks new capabilities for the other, you can achieve value at every level of the pyramid.
- Using software developers to create applications that capture information about customer engagement can be very valuable, even if you don't have data engineers to combine it with CRM data.
- Creating a unified customer portfolio through ETL pipelines built by data engineers clarifies relationships and interactions, even if the data isn't being mined in aggregate by business analysts or data scientists.
- Analytics and dashboards built by data analysts using business intelligence tools provides strategic insight, even if you aren't creating machine learning systems to take immediate action.
Data Science Benefits
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
--- Zen of Python
While we've seen many implied benefits of Data Science, is is often best to explicit: Data Science tools, processes, and capabilities are transformative and will produce the future-leaders of every industry. It empowers better decisions, allows for more effective customer and faster response to the market, and provides ways to test decisions and dynamically adapt to responses.
Our world is swimming in information, through the interconnection of society by the Internet and mobile we have created the ability to empirically describe our world in entirely new ways. Because that pipeline of information elucidates our shared reality, it is an extremely valuable resource. Data Science is about turning crude data into something useful so that we can gain insight and make better decisions as individuals, organizations, and a human family.
We put Data Science to work by building data products, a highly varied set of systems intended to help us answer specific questions. Building such products requires a diverse team -- software developers, business analysts, data engineers, and data scientists -- working together collaboratively. It also requires investment in the capabilities of an organization to collect; manage data flow; explore and transform; aggregate, enrich, and label; and experiment, learn, and optimize.
The potential for what Data Science can achieve is near limitless, it all comes down to having and using the right information. Because of this, the benefits of such investment is enormous and will produce the future leaders in nearly every industry; allowing us to do things that were previously unimaginable.