The potential of artificial intelligence (AI) and machine learning (ML) seems almost limitless in terms of its ability to extract and leverage new sources of value for customers, products, services, operations, environmental and social value. If your organization is to compete in the economy of the future, then artificial intelligence should be at the center of your business operations.
Kearney’s research titled “Impact of analytics in 2020”Highlights untapped profitability and business impact for organizations looking for excuses to accelerate their investments in data science (AI / ML) and data management:
- Researchers could improve profitability by 20% if they were as effective as leaders.
- Followers could improve profitability by 55% if they were as effective as leaders.
- Laggards could improve profitability by 81% if they were as effective as leaders.
The impact on business, operations and society can be overwhelming, save for one significant organizational issue: data. No one but the godfather of AI, Andrew Ng, noted that data and data governance are hindering the empowerment of organizations and society to realize the potential of AI and machine learning:
“The model and code for many applications is basically a solved problem. Now that the models have reached a certain level, we need to make the data work as well. ” – Andrew Ng
Data is the foundation for training AI and machine learning models. And high quality, reliable data organized through highly efficient and scalable pipelines means AI can deliver these compelling business and operational results. Just as a healthy heart needs oxygen and reliable blood flow, AI / ML engines need a constant stream of clean, accurate, rich and reliable data.
For example, one CIO has a team of 500 data engineers managing over 15,000 extract, transform, and load (ETL) jobs who are responsible for collecting, moving, aggregating, standardizing, and reconciling data across hundreds of specialized data repositories (data mart. data warehouses, data lakes and data warehouses). They perform these tasks in an organization’s operating systems and customer-centric systems through ridiculously tight service-level agreements (SLAs) to support a growing number of diverse data consumers. It seems Ruby Goldberg was definitely meant to be a data architect (Figure 1).
Reducing the grueling structures of the spaghetti architecture of one-off, specialized static ETLs to move, clean, flatten, and transform data dramatically reduces the “time to understand” it takes organizations to fully exploit the unique economic characteristics of data. “the most valuable resource in the world” in accordance with Economist…
The emergence of intelligent data pipelines
The goal of the data pipeline is to automate and scale common and repetitive tasks of collecting, transforming, moving, and integrating data. A well-designed data pipeline strategy can speed up and automate the processing associated with collecting, cleaning, transforming, enriching, and moving data to downstream systems and applications. As data volume, variety and speed continue to grow, the need for data pipelines that can scale linearly across cloud and hybrid cloud environments becomes increasingly critical to business operations.
A data pipeline refers to a set of data processing activities that combine both operational and business logic to perform advanced searches, transformations, and data loading. The data pipeline can run either on a schedule, in real time (streaming), or triggered by a predefined rule or set of conditions.
In addition, logic and algorithms can be built into the data pipeline to create an intelligent data pipeline. Intelligent pipelines are reusable and extensible economic assets that can be specialized for source systems and perform the data transformations necessary to support the unique data and analytical requirements for the target system or application.
As machine learning and AutoML become more prevalent, data pipelines are becoming more intelligent. Data pipelines can move data between advanced data enrichment and transformation modules, where neural networks and machine learning algorithms can create more complex data transformations and enrichments. This includes segmentation, regression analysis, clustering, and the creation of extended indices and propensity scores.
Finally, AI can be integrated into data pipelines so that they can continually learn and adapt based on source systems, required data transformations and enrichments, and evolving business and operational requirements of target systems and applications.
For example: An intelligent healthcare data pipeline can analyze a grouping of healthcare diagnostics group (DRG) codes to ensure consistency and completeness of DRG representations and detect fraud as DRG data is conveyed from a system source to analytical systems.
Realizing business value
The challenge for CIOs and CIOs is to unlock the business value of their data — to apply the data to the business to achieve quantifiable financial impact.
The ability to deliver high-quality, reliable data to the right data consumer at the right time to drive more timely and accurate decisions will be a key differentiator for today’s big data businesses. Rube Goldberg’s ELT scripting system and disparate dedicated analytic repositories are preventing organizations from achieving this goal.
Learn more about smart data pipelines in Modern enterprise data pipelines (e-book) from Dell Technologies here…
This content was produced by Dell Technologies. This was not written by the editors of the MIT Technology Review.