{"id":562,"date":"2025-01-12T15:04:49","date_gmt":"2025-01-12T12:04:49","guid":{"rendered":"https:\/\/ilhamidemirci.com\/?p=562"},"modified":"2025-01-12T15:09:28","modified_gmt":"2025-01-12T12:09:28","slug":"data-wrangler-in-microsoft-fabric","status":"publish","type":"post","link":"https:\/\/ilhamidemirci.com\/index.php\/2025\/01\/12\/data-wrangler-in-microsoft-fabric\/","title":{"rendered":"Data Wrangler in Microsoft Fabric"},"content":{"rendered":"\n<p class=\"has-black-color has-text-color\">Data analysts, engineers, or scientists\u2026 No matter what title we work under, the most exhausting process for all of us is making data ready for processing. Even if we just want to visualize the data using Power BI, we first need to dive into Power Query to make the data modelable. Similarly, if we plan to build models with machine learning, we must first explore and prepare the data. That\u2019s why data preprocessing, organizing, cleaning, and transformation become the most critical phases of data analytics processes.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*X94KVXbR1Hxe_MuD9RoiRw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-black-color has-text-color\">The <strong>\u201cData Wrangler\u201d<\/strong> tool, part of the Data Science layer in Microsoft Fabric, assists your data preparation processes by detecting unexpected anomalies in your datasets (e.g., outliers, missing or incorrect information, duplicate rows, logical errors) without requiring you to write any code.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Like Power Query, the Data Wrangler simplifies data transformation processes. However, it takes this a step further by generating and displaying pandas or PySpark code for every operation. This not only transforms your data but also provides learning opportunities for those without coding experience.<\/p>\n\n\n\n<p>Let\u2019s get started!<\/p>\n\n\n\n<h4 class=\"wp-block-heading has-black-color has-text-color\"><strong>1. Creating Pandas or Spark DataFrames<\/strong><\/h4>\n\n\n\n<p class=\"has-black-color has-text-color\">You can make the Data Wrangler tool operational by converting any file format in your Lakehouse (that can be transformed into a pandas or Spark DataFrame) into a dataframe.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># read csv file<br>df = spark.read.format(\"csv\").option(\"header\", \"true\").load(\"Files\/example_dataset.csv\")<br><br># df now is a Spark DataFrame containing CSV data from \"Files\/Raw Data.csv\".<br>display(df)<br><br>print(df.head())<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"># create Pandas DataFrame from Spark df<br>pandas_df = df.toPandas()<br><br>print(pandas_df.head())<\/pre>\n\n\n\n<p class=\"has-black-color has-text-color\">The Data Wrangler supports both pandas and Spark DataFrames.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*QVW4wzxcUNl4wiCVfdakqw.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. Data Wrangler Interface<\/strong><\/h4>\n\n\n\n<p class=\"has-black-color has-text-color\">The tool features a user-friendly interface and consists of five panels. The \u201cOperations\u201d panel (1), where you can apply predefined operations; the main data view (2), where you can examine the data in row and column format; the panel (3) displaying applied operations and allowing you to delete them; the panel (4) showcasing the Python code generated for each operation; and the summary panel (5), containing statistics for the entire dataset or selected columns.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*PxRYLZFt_E_bPihRJS39rg.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Example Operations<\/strong><\/h4>\n\n\n\n<p>The no-code feature allows you to generate statistics for your data and easily make your dataset processable (modelable) thanks to its intuitive interface. Here are a few examples:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>You can sort, filter, rename columns, or most importantly, change the data type of a column.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*S2uKVOkNrWVZ2ZolDSMX7Q.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>1.1. For the operations you perform, the existing state <em>(red)<\/em> and the proposed changes <em>(green)<\/em> are displayed in separate columns. This allows you to review the changes before applying them.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*ojr_BTrs8xNSlETorXYiJg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>2. You can create a new column based on a reference column.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*k7wOqOXczNRbUvtz9nvsGg.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>3. You can perform <em>normalization <\/em>by scaling the values in the dataset to a specific range (e.g., 0 to 1 or 1 to 5).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*OrIbgMGdCRTOtdYDUlEURw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-black-color has-text-color\">4. You can perform \u201cone-hot encoding\u201d by creating a separate column for each unique value of a categorical variable <em>(e.g., \u201cGender\u201d)<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*A25ycdX0qOVk83tmjg5Yew.png\" alt=\"\"\/><\/figure>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>Note:<\/strong> As shown in the image above, it is important to note that you cannot manually edit the code generated by the Data Wrangler tool. However, you can transfer the code <em>(in pandas or PySpark format)<\/em> to MS Fabric notebooks to carry out your work.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*eMVUVhd9AdvKQV33l0Pulw.png\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">4. All Operations<\/h4>\n\n\n\n<p class=\"has-black-color has-text-color\">The Data Wrangler includes a total of 27 operations organized under 8 main categories.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*JBY9CsnvewUjZBs97HeA4Q.png\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>&#8211; Find and Replace<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Drop duplicate rows:<\/strong> Removes duplicate rows, retaining only one instance of rows with identical values.<\/li>\n\n\n\n<li><strong>Drop missing values:<\/strong> Removes rows with missing values. This operation can lead to data loss, so it should be used carefully.<\/li>\n\n\n\n<li><strong>Fill missing values:<\/strong> Fills missing values in the dataset. You can use a specific constant value or the column\u2019s mean\/median value to fill the gaps.<\/li>\n\n\n\n<li><strong>Find and replace:<\/strong> Finds a specific value and replaces it with another. For example, it can be used to replace \u201cMale\u201d with \u201cM\u201d.<\/li>\n<\/ul>\n\n\n\n<p><strong>&#8211; Format<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Capitalize First Character:<\/strong> Converts the first character of text values to uppercase.<\/li>\n\n\n\n<li><strong>Convert Text to Lowercase:<\/strong> Converts all text to lowercase.<\/li>\n\n\n\n<li><strong>Convert Text to Uppercase:<\/strong> Converts all text to uppercase.<\/li>\n\n\n\n<li><strong>String Transform by Example:<\/strong> Transforms text based on an example input.<\/li>\n\n\n\n<li><strong>DateTime Formatting by Example:<\/strong> Converts date and time values into a specific format.<\/li>\n\n\n\n<li><strong>Split Text:<\/strong> Splits text into multiple columns based on a specified delimiter (e.g., space, comma).<\/li>\n\n\n\n<li><strong>Strip Whitespace:<\/strong> Removes leading and trailing whitespace from text values.<\/li>\n<\/ul>\n\n\n\n<p><strong>&#8211; Formulas<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Multi-label Binarizer:<\/strong> Splits categorical data into multiple new columns and assigns a 0 or 1 value for each category.<\/li>\n\n\n\n<li><strong>One-hot Encode:<\/strong> Separates the values in a single categorical column into individual columns for each category and encodes them as 0\/1.<\/li>\n\n\n\n<li><strong>Calculate Text Length:<\/strong> Calculates the character length of each value in a text column.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/750\/1*ZhT-f8vCe3NhYD0W_cE-Uw.png\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>&#8211; Numeric<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Round:<\/strong> Rounds a numeric value to the nearest integer.<\/li>\n\n\n\n<li><strong>Round Down (floor):<\/strong> Rounds numeric values down to the nearest smaller integer.<\/li>\n\n\n\n<li><strong>Round Up (ceiling):<\/strong> Rounds numeric values up to the nearest larger integer.<\/li>\n\n\n\n<li><strong>Scale by Min\/Max Values:<\/strong> Scales all numeric values in a column to a specific range (e.g., 0 to 1).<\/li>\n<\/ul>\n\n\n\n<p><strong>&#8211; Schema<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Change Column Type:<\/strong> Changes the data type of a specific column.<\/li>\n\n\n\n<li><strong>Clone Column:<\/strong> Creates a copy of an existing column.<\/li>\n\n\n\n<li><strong>Drop Columns:<\/strong> Removes specific column(s) from the dataset.<\/li>\n\n\n\n<li><strong>Rename Column:<\/strong> Changes the name of a specific column.<\/li>\n\n\n\n<li><strong>Select Columns:<\/strong> Selects only specific columns from the dataset.<\/li>\n<\/ul>\n\n\n\n<p><strong>&#8211; Sort and Filter<\/strong><\/p>\n\n\n\n<p>Used for sorting and filtering data.<\/p>\n\n\n\n<p><strong>&#8211; Group by and Aggregate<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Used to group data by one or more columns and perform calculations for each group.<\/p>\n\n\n\n<p><strong>&#8211; New Column by Example<\/strong><\/p>\n\n\n\n<p>Creating a new column using an example. This operation automatically performs transformations in the column based on the user\u2019s example inputs.<\/p>\n\n\n\n<p><strong>Thank you for reading!<\/strong><br><em>You can also access the <\/em><a href=\"https:\/\/ilhamidemirci.medium.com\/microsoft-fabric-veri-haz\u0131rlay\u0131c\u0131-7cd5bec6048e\" rel=\"noreferrer noopener\" target=\"_blank\"><em>Turkish version of the article<\/em><\/a><em>.<\/em><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">References<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-science\/data-wrangler\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/learn.microsoft.com\/en-us\/fabric\/data-science\/data-wrangler<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/fabric\/data-science\/data-wrangler-spark\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/learn.microsoft.com\/en-us\/fabric\/data-science\/data-wrangler-spark<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/aayushgupta9125.substack.com\/p\/accelerate-data-prep-with-data-wrangler?r=1btttw&amp;utm_campaign=post&amp;utm_medium=web&amp;triedRedirect=true\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/aayushgupta9125.substack.com\/p\/accelerate-data-prep-with-data-wrangler?r=1btttw&amp;utm_campaign=post&amp;utm_medium=web&amp;triedRedirect=true<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=EzSStWnyLkc\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/www.youtube.com\/watch?v=EzSStWnyLkc<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.udemy.com\/share\/10avrQ3@_vJHVws-wv3EnC24pH8DAdN6WjPgAAi3To7WinhPkGszBGo3t1ES6P6rQN0dMXO5\/\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/www.udemy.com\/share\/10avrQ3@_vJHVws&#8211; wv3EnC24pH8DAdN6WjPgAAi3To7WinhPkGszBGo3t1ES6P6rQN0dMXO5\/<\/a><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>In my previous article I talked about how you can easily move and organize your folders on the desktop to lake houses with OneLake File Explorer. Please check it out.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/medium.com\/@ilhamidemirci\/onelake-the-onedrive-for-data-f2abb210c2cb\" target=\"_blank\" rel=\"noreferrer noopener\"><em>OneLake: The OneDrive For Data<\/em><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Data analysts, engineers, or scientists\u2026 No matter what title we work under, the most exhausting process for all of us is making data ready for processing. Even if we just want to visualize the data using Power BI, we first need to dive into Power Query to make the data modelable. Similarly, if we plan&hellip; <br \/> <a class=\"read-more\" href=\"https:\/\/ilhamidemirci.com\/index.php\/2025\/01\/12\/data-wrangler-in-microsoft-fabric\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-562","post","type-post","status-publish","format-standard","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/posts\/562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/comments?post=562"}],"version-history":[{"count":2,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/posts\/562\/revisions"}],"predecessor-version":[{"id":568,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/posts\/562\/revisions\/568"}],"wp:attachment":[{"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/media?parent=562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/categories?post=562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ilhamidemirci.com\/index.php\/wp-json\/wp\/v2\/tags?post=562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}