5 Things to Know About Microsoft Fabric
Microsoft has recently introduced a new offering in data storage and management, called Microsoft Fabric. Fabric presents and integrates several features, including a data lake, multiple compute setups, advanced data governance, and a wide application surface area. This blog post will give an overview of these core components.
OneLake: The Central Data Lake of Microsoft Fabric
OneLake is a key component of Microsoft Fabric, operating as a unified logical data lake. It's constructed on the robust foundation of Azure Data Lake Storage Gen2. Each Fabric tenant is provisioned with a OneLake instance, making it an integral part of the system much like a "OneDrive for data."
OneLake's approach to data storage involves storing all data as a single copy as Delta tables in Parquet format. This method, similar to Databricks' implementation, is an extension of the Parquet format, providing ACID (Atomicity, Consistency, Isolation, Durability) guarantees. Further, with the new Shortcuts feature, users can virtualize data from other cloud sources such as AWS S3, extending OneLake's data capabilities.
Versatility in Compute Options
OneLake is designed within Microsoft Fabric to support a range of compute engines, encompassing T-SQL, Spark, KQL, and Analysis Services. This versatility allows users to select the compute engine most suited to their particular needs, promoting adaptability in data operations.
Emphasizing Data Governance with One Security Model
Fabric has taken a define-one-enforce-everywhere approach to data management and governance. User-crafted security definitions coexist with the data, ensuring a uniform application of security measures across all compute engines. This method aligns with the concept of a "data mesh," fostering logical organization and management of data, thereby enabling different business groups within an organization to have control over their own data.
Broad Application Scope Across Workloads
The application scope within Fabric supports a broad set of functionality across data engineering, data analysis, and data science. It includes support for Data Factory for visual ELT/ETL, Synapse Data Engineering for complex transformations using SQL and Spark, Synapse Data Science for machine learning development, Real-Time Analytics for streaming data processing using KQL, and Synapse Data Warehousing for SQL operations over columnar databases.
In addition to these applications supporting established workloads, Fabric integrates AI-assist features through Copilot, which utilizes Large Language Models for aiding in SQL writing and report generation. Fabric also introduces Data Activator, is a no-code tool designed to trigger actions, such as sending messages to Teams, Outlook, etc., based on specific data parameters, similar to Reverse ETL tools like Hightouch and Census. Note that each of the features mentioned may be in different phases of release.
Flexible Pricing Model
Fabric's pricing model includes an organizational license, which can be either premium per user or capacity, in addition to individual licenses, which are either free or pro. The capacity is billed on a dual mode - either billed per second or monthly/yearly. This pricing model appears to cater to a wide set of needs and budgets of organizations and is subject to change over time.
With a centralized data lake unpinning a wide, integrated, and adaptable feature set, Microsoft Fabric presents a new "bundled" approach to end-to-end enterprise data analytics. With OneLake's unique data storage strategy, a wide set of supported compute engines, unified data governance mechanisms, and a comprehensive application scope, Microsoft Fabric is positioned to offer a flexible solution that is adaptable to the complex reality of data within modern organizations.