The evolving modern data platform
-
John Gamble
Director, Professional ServicesIntroduction
At C5, we believe that the role of the Data Platform within an organisation is evolving, particularly with the introduction of Fabric as a service.
A data platform is no longer just there to provide advanced reporting and analytic capabilities; it is now the central hub of the organisation, where all systems feed data in and extract as needed to remain up to date, and where reporting and analytics are just one of multiple workloads available.
Other workloads include:
- Data integrations, all systems feed in and share data to feed downstream to other systems as needed.
- Data Science, Ability to supply and curate data to AI / ML applications.
- Support Data Governance / Management activities – e.g. seen by integration of Microsoft Fabric with Purview
- Data Discovery activities.
- Supply of data to 3rd Parties.
Exceeding data management limitations with Microsoft Fabric
Latest iteration of the Microsoft Data Platform: Historically Microsoft’s data platform was based around SQL Server, data warehousing principles (e.g. Kimball) and the use of OLAP based analytic technologies. This was all on-premise and changed with the introduction of Cloud technologies.
Overall whilst the platforms delivered great value, the on-premise approach:
- Was expensive in terms of infrastructure which you had to provision, upgrade, manage and decommission.
- Suffered from a flexibility paradox. Whilst the outputs of Data Warehouses and especially OLAP based technologies (such as Analysis Services) where incredibly flexible, the applications were extremely inflexible in terms of data ingest. It was costly to store and process data, so anything not needed wasn’t included. This forced a design where, despite the outputs of data platforms being very flexible, inputs into data platforms were inherently inflexible and unlikely to change.
- Originally hard to integrate data sources that weren’t also on-premise, in recent times tooling has got better here.
Then vs Now
Cloud provided exceptionally cheap storage and on-demand compute power and combined this changed the way data could be ingested, stored and processed. Data Modelling was (and is) still critically important, but with Cloud we can ingest as much data as we liked and then process it for different workloads, previously this was extremely expensive from an infrastructure perspective.
Extract, Transform and Load (ETL) routines became Extract, Load, Transform (ELT). Rather than only extract and transform what we need, we now extract and load everything, transforming when we need to conscious that we’ve already got all the data.
With this change in mindset a new generation of tooling was created which culminated in the Platform As A Service offering known as Azure Synapse Analytics. This combined a new set of tools including Spark, Databricks, Azure Data Lake, Azure Data Factory to produce “Data Lake House” environments which were hugely scalable yet much cheaper to run in practice.
Synapse alongside Power BI (a SAAS tool) became very quickly the default platform for Microsoft Cloud based Analytics. Fabric is latest iteration and addresses a number of points about the combination of Synapse and Power BI, these are:Seamless Infrastructure Management
Fabric is a pure SAAS product rather than combination of PAAS (Synapse, Azure Data Lake, Azure Data Factory) and SAAS (Power BI). With Fabric there is no platform infrastructure to provision and manage, it’s all done for you
Simplified tooling and licensing
Microsoft excels in connecting its applications, making it easier for organizations that rely on Microsoft tools to achieve seamless integration, unlike mixing vendors like Microsoft and Oracle. For example, DataVerse now offers direct “shortcuts” into OneLake, enhancing data access and management. Fabric simplifies licensing with a single cost for core services such as One Lake and Data Factory, allowing organizations to start with smaller SKUs and scale as needed. Additionally, Fabric streamlines the developer experience by consolidating multiple tools, including Data Factory, Power BI, Notebooks, Data Flows Gen2, and Data Lake, into a single environment, eliminating the need for multiple logins and enhancing productivity.
Tight integration with Power BI
Power BI is now part of Microsoft Fabric. Whilst still licenced separately for lower price SKUs, Power BI is very tightly integrated into the Fabric suite of products. This makes it easier to get data into and publish reporting through Power BI.
Focus on innovation
Continuous innovation from Microsoft since it’s initial release in 2023. Microsoft have publicly stated that whilst Synapse and on-premise data platforms will continue to be supported and in some scenarios enhanced, all of their investment will now be in Fabric. There have been monthly releases of new features, bug fixes since initial preview release last year. Fabric went into general release in Nov 2023. “One Lake” below is part of this.
Shortcuts and “One Lake”
This is a new feature only available in Fabric. “One Lake” is a new concept that spans all of your Fabric workspaces and means that as soon as the data is in the Lake it’s available to all processing capabilities, subject to normal access controls. A new feature called “Short Cuts” can create dynamic links to other cloud provider data stores (e.g. AWS, Google Cloud) and Microsoft is investing heavily in replicating their own software data store (e.g. Dynamics / Azure SQL) into the Lake. This all means it’s much easier in theory to ingest data into the Lake for processing.
Ready to transform your organisation’s data strategy?
At C5, we specialise in implementing Fabric as a service to help you establish a central hub for your data platform. Embrace the evolution of data management and unlock the full potential of your data assets.
Contact us today to learn how we can support your journey towards a more integrated, efficient, and insightful data ecosystem.
"*" indicates required fields