With the release of dataflows in Power BI, you being an intended user, may have a few queries in mind: what exactly are dataflows, how are they different from datasets, how should I take advantage of them, and more. Through this blog, I will attempt to address these queries you have and make an idea of dataflows accessible to you.
What are dataflows?
With the advancement of Power BI, you can create a collection of data called dataflow. It is an online data storage and collection tool. With its help, you can add and edit entities and brings semantic understanding and consistency to data across many sources, by mapping it to standard CDM entities.
Use of dataflows in an organization:
Main reasons to introduce dataflows:
1. It helps organizations to unify data from disparate sources and prepare it for modeling.
With the help of dataflows, organizations can directly link their data from different data sources to Power BI with just a few steps. Organizations can map their data to the Common Data Model or create their custom entities. Organizations can then use these entities as building blocks to create reports, dashboards, and apps and distribute them to users across their organization.
Further, organizations use Dataflows to transform and add value to big data by defining data source connections. With the help of a large variety of data connectors provided by Power BI and PowerApps, organizations can directly map their data from the data source.
Once you load the data in dataflow, the creation of reports and dashboards in the Power BI desktop becomes easy.
2. Self-service data prep in Power BI while dealing with large datasets:
As data volume grows, so does the need to have insights into it.
With the introduction to self-service data prep for large data in Power BI, it is now easy to have insights into any collection of data instantly. Dataflows help organizations combine data from various data sources and perform the task.
Organizations can now have their Data from multiple sources stored in Azure Data Lake Storage Gen2. Organizations can manage dataflows in workspaces by using the Power BI service. They can also map data to standard entities in the Common Data Model, which gives them the flexibility to modify data. It further helps its users work upon existing entities to customize them.
3. Different data source with a different schedule of refresh:
Dataflows play a vital role, especially when our data contain two tables with unique schedule options. Dataflows help build mechanisms that can schedule refresh according to the organization plan. Dataflows can run, extract, and load data entities in workspaces. They allow the transformation of the data process on a different schedule for every query or table.
4. An online data collection and storage tool
Collection: Dataflows use Power Query to connect to the data at the source and transform that data when needed. You can access the data through either a cloud service (such as Dynamics 365) or a PC/Network via an on-premise gateway.
Storage: Dataflows stores data in a table in the cloud so you can use it directly inside Power BI, to be more specific from Power BI Desktop.
5. Transformation of large data volume
While handling past or obsolete data, let us say a two-year-old sales record, you could come across files containing many rows or columns. Such data normally take hours to refresh. Dataflow proves to be the quicker option in such cases. You can seamlessly switch from the database to the dataflow option. You may further make use of the incremental refresh settings option in dataflow, which ensures that only updated data get refreshed.
Dataflow and Azure Data lake integration:
With the use of dataflows, users and organizations can connect data from disparate sources, and prepare it for modeling.
For using Azure Data Lake storage for dataflows, you need to have an Azure subscription and have the Data Lake Storage Gen2 Feature enabled.
After setting the Azure account, you must go to the dataflow settings tab of the Power BI admin portal wherein you must select Connect Your Azure Data Lake Storage Gen2 button. Once you do that, the system will ask you for the user credentials to initiate the Azure Data Lake. Finally, click the connect button to begin.
After the completion of this step, azure data lake storage is ready to use in Power BI.
Connect to different data sources for Power BI dataflows
With Power BI dataflows, we can connect to multiple data sources to create dataflows or add new entities to an existing dataflow.
Step1: To create the dataflow, click on the +Create menu button and select the dataflow option. You will now see several options related to entity creation. If your dataflow already exists, you can select the Add Entity or select Get Data in the dataflow authoring tool.
Step2: Now, select the data sources from the dialog box and search for the data source categories from the given options. Broadly, there are five categories from which you could choose your data: file, database, power platform, azure, and online platform.
Step 3: After successful sign in, click the next button to continue. It will open a Power Query Online where you can perform the basic editing of your data. Finally, load the model and save the dataflow.
Summary
In this blog, I have explained multiple features of dataflow to show you how it has the upper hand over datasets. However, a lot depends on the user's choice, especially the choice of the environment which the user prefers.
There are situations in which dataflow enables you, the user, to have better control over and insight into your business data.
By using the standard data model, schema, as defined by the Common Data Model, dataflows import the user's business data and help in reshaping and combining data across different data sources and have them ready for modeling and creation of dashboards and reports in a short period, which earlier used to take months, or years, to create.
However, there are a few constraints that restrict the use of dataflows:
· One of the primary constraints that the dataflow faces is the type of available data, namely: date/time, decimal number, text, whole number, date/time/zone, true/false, date, time, and others.
· To access Power BI dataflow, the user either needs access to the admin or the other members of the workspace wherein the dataflow resides or needs access to the workspace of contributor or viewer where the data is visible.