Microsoft today launched Microsoft Fabric, a new end-to-end data and analytics platform (not to be confused with Azure Service Fabric). The new platform centers around Microsoft’s OneLake data lake but can also pull in data from Amazon S3 and (soon) Google Cloud Platform, and includes everything from integration tools, a Spark-based data engineering platform, a real-time analytics platform and, thanks to the newly improved Power BI, an easy-to-use visualization and AI-based analytics tool. There is also a new no-code developer experience that allows users to monitor their data in real time and trigger actions and notifications based on it. All of these tools are tightly integrated and, of course, Microsoft will also integrate its AI Copilot into Fabric.
“Over the last five to 10 years, there has been a pretty massive level of innovation — which is great and that’s awesome because there’s lots of new technologies out there — but it’s also caused a lot of fragmentation of the modern data stack,” Arun Ulag, Microsoft’s corporate VP for Azure Data, told me. “There’s literally hundreds — if not thousands — of products and open source technologies and solutions that customers have to make sense of.” He also noted that a lot of the data and analytics products tend to keep their data in silos. “When I talk to customers, one of the messages I hear consistently is that they’re tired of paying this integration tax,” he said.
Image Credits: Microsoft
So Microsoft looked at the core data analytics workloads (data integration, engineering, warehousing, data science, real-time analytics and business intelligence) and looked at how it could build a unified experience around this. To do this, the team decided to focus on a unified compute infrastructure and a single data lake.
“There’s a unified compute infrastructure; there’s a unified data lake. There’s a unified product experience for all your data professionals, so that they can really collaborate deeply. [There’s] unified governance so that IT can manage this and create sources of truth that everybody can trust, and really a unified platform that both IT and business share — and the unified business model. There’s literally just one thing to buy, and it allows customers to save a lot of costs, which, especially in today’s environment, is really important,” said Ulag.
Ulag noted that he personally demoed Fabric to 100 of the Fortune 500 over the course of the last year and that many enterprises are excited about it because it greatly simplifies their data infrastructure for them without locking them into a single cloud vendor. In part, that’s because the team decided to build the central data lake around the open-source Apache Parquet format, a column-oriented file format for data storage and retrieval.
Microsoft also decided to build Fabric around a multi-cloud approach, with built-in support for data in Amazon S3 and Google Storage (coming soon).
The company also decided to simplify the pricing model, which focuses on the compute infrastructure and centers around a common Fabric compute unit. Cost is, of course, a major driver for enterprise tech buying decisions today and that often means consolidating vendors. Most businesses today cobble their data and analytics systems together with the help of multiple vendors. That’s an integration challenge but also introduces added cost.
“This wastage is something that Fabric addresses because it creates a unified compute model,” explained Ulag. “Overnight, it might do a lot of data engineering and data science, maybe data integration. In the morning, the same compute flows to maybe BI and SQL as people walked into the office. Because all compute is virtualized, all compute is serverless in Fabric, it really allows you to reuse the capacity that you purchased. That is attractive for [enterprises].”
Another advantage, Microsoft argues, is that this single unified system means that it’s easier to manage data access and governance (using Microsoft Purview). If an employee, who has the right access rights, wants to analyze highly confidential employee salary data, for example, and export that to Excel or into Power BI, then the service will ensure that the documents created with this data will inherit the same confidentiality label and rules associated with it (and, based on these rules, automatically encrypt these files so that even if they leak outside of the company, nobody would be able to access them).
While it’s a deeply integrated system, there are a number of moving parts here. Data Factory is an integration service, which comes with 150+ pre-built connectors. Microsoft then also relies on a number of its Synapse-branded data tools to provide the data engineering and data science tooling for data scientists to, for example, build AI models. Meanwhile, Power BI will sit at the other end of this spectrum and allow business analysts and other users to gain insights from all of this data, while the new no-code Data Activator service will allow users to automatically trigger specific actions based on the incoming real-time data.
This is Microsoft in 2023, so there is, of course, a Copilot in Microsoft Fabric that will make it easier for users to build data pipelines, generate code, build machine learning models and more. This Copilot isn’t available yet, though, so it remains to be seen how useful it will be.
Fabric is currently in public preview. In a somewhat unusual move by Microsoft’s standards, anyone can try the service without even having to provide their credit card information. Starting July 1, Fabric will be enabled for all Power BI tenants.