Excel Beyond Excel: The Power of Power Query

Today’s businesses thrive on data. Data is being produced at an unprecedented rate and comes in almost too many different formats, from different sources, and with varying levels of quality. As a result, extracting useful information from this sea of data can sometimes feel like looking for a needle in a haystack. This is where Microsoft Excel’s Power Query comes into play. Power Query is a data connection technology that enables you to discover, connect, combine, and refine data across a wide variety of sources.

Bear with me here, folks. There won’t be as many screenshots or visuals in this post as you might be used to. Why? Well, it’s a bit tricky to make compelling visuals out of highlighting a ribbon button or showcasing a right-click context menu. But I assure you, the substance is here.

Now, remember how I mentioned in an earlier post that you could get to the Power Query Editor by using the “Get Data” option in the ribbon’s Data tab? You still can. But, let’s shake things up a bit – you can also launch the Editor directly. It’s a bit like choosing to cook your dinner from scratch rather than ordering a takeaway – it might seem more effort, but there’s a certain satisfaction in doing it all yourself, especially the first time you’re confident when looking at the blank ‘advanced editor’ screen.

The beauty (and beast) of Power Query is knowing its capabilities and planning how to use them effectively. The latter comes with practice and a splash of patience, but the former? Well, that’s what we’re here to delve into.

My master plan is to kick things off with a broad overview and then gradually transition into more in-depth project walkthroughs as I build more content on the site. Imagine it as building your personal portfolio of data and automation wizardry, so that you’re ready when it’s time to follow along on full projects. We’ll dig into these features in more detail along the way, but before we soar high, we have to brave the journey of a thousand steps.

Worried? Don’t be. After all, I’m your guide on this adventure and, not to brag, but this is kind of my jam. So let’s jump in, shall we?

Jeff

Importing Data with Power Query

Before we can begin to manipulate and analyze data, we first need to get it into our Excel workbook. Here, Power Query shines with its ability to import data from an impressive range of sources. In this section, we’ll take a closer look at some of the most common data sources you may find yourself working with.

Importing from a File

If your data resides in a local file, such as a CSV or an Excel workbook, Power Query can effortlessly import it. After you navigate to the file and select it, Power Query displays a preview of the data. You then have the option to choose specific worksheets or ranges, if applicable, before loading the data into Excel.

For example, suppose you have a CSV file with monthly sales data. With Power Query, you can directly import this data into your workbook, saving you from the tedious task of manually copying and pasting, and setting up a query that can be refreshed in the process. If you’re going for the refreshable option, just make sure that the file will be updated in the same place and with the same name going forward.

Importing from SQL

Many businesses store their data in SQL databases due to their efficient and flexible nature. Power Query allows you to write SQL scripts to pull data directly from such databases. Whether your data is stored in SQL Server, MySQL, Oracle, or another SQL-based database, you can import it directly into your Excel workbook using Power Query.

Let’s say you have an SQL database with detailed customer information. You could write a SQL query to pull only the data for customers from a specific region or those who made purchases over a certain amount. With Power Query, this filtered data can then be loaded directly into your workbook by selecting “From SQL Server Database” in the “Get Data” menu, and following the prompts in the import wizard.

Importing from a Webpage

In our digital age, vast amounts of data are accessible directly from web pages. Power Query can connect to these web pages and scrape data from HTML tables.

For example, you might want to import a table listing world population data from a webpage. Power Query makes it relatively easy to connect to the page, preview the available tables, and import the one(s) you need.

While we’ve only covered a few here, Power Query supports a myriad of other data sources, including APIs, XML files, JSON files, and more. Each data source type comes with unique options and settings, allowing you to fine-tune your data import process to your specific needs.

The Art of Data Transformation

Data transformation is a key aspect of data analysis, and Power Query offers an array of tools to refine, shape, and cleanse your data to prepare it for analysis. In this section, we’ll explore how to remove duplicate records, fill in missing values, split columns, and create calculated and custom columns—tasks that can make your data much more manageable.

I know some of you are thinking “Can’t we just do that with Excel? Why bother with Power Query?”, and I totally get that. The difference isn’t in what the end result looks like, but instead in the fact that Power Query works by querying your data and keeping a list of each transformation step you take after that to get to the end result. Since that’s a core part of its functionality, you shouldn’t be surprised when I say that each of those steps is followed every time you refresh the table that results from your Power Query import. If you chose to do the cleanup or transformation in excel, it would have to be done every time, either by a human or a macro. Power query makes it as simple as right-click… Refresh.

Removing Duplicates

In many cases, your dataset may contain duplicate entries that can distort your analysis. Power Query provides a simple and effective solution. You can select one or more columns and then use the “Remove Duplicates” function from the ribbon. The function will treat combinations of values across the selected columns as unique records and remove any duplicates.

For example, suppose you have a sales dataset with multiple entries for the same transaction. By selecting all relevant columns (like transaction ID, product ID, etc.) and using the “Remove Duplicates” function, you can ensure each transaction only appears once in your data.

Filling Missing Values

Real-world data often comes with gaps. Whether due to errors in data collection or other issues, missing values can cause problems in your analyses. Power Query offers two functions to fill missing values: “Fill Up” and “Fill Down”. They copy values from the cell above or below into the empty cell.

Let’s say you have a dataset of monthly revenue, but some of the months’ names are missing. If you know the data is in chronological order, and the blanks should have the same month name as the filled rows above them, use the “Fill Down” function to fill these gaps.

Splitting Columns

Sometimes, you may find that a single column in your dataset actually contains multiple pieces of information (Super common for things like comments, or in the information security world. Ask an InfoSec guy how neatly NIST stores CPE associations on its CVE API sometime; guaranteed free entertainment – you can thank me later). Power Query allows you to split such columns into multiple columns. Similar to Excel’s “Text to Columns” feature, you can split by delimiter, like a comma or a space, or by the number of characters.

For example, suppose you have a dataset with a full name column. You can use Power Query’s split column function to separate the full names into first and last names.

Adding Calculated and Custom Columns

Beyond the direct manipulation of existing columns, Power Query empowers you to add custom columns based on calculations or transformations of other column values. These calculated columns allow you to extend your data with new insights and analytics.

The ‘Add Column’ menu offers various options, including ‘Add Custom Column’. Here, you can write your own formula using Power Query’s M formula language.

I’ll link the M Documentation here, because I still reference it. It’s a pretty solid resource, but given the ‘build and forget’ nature of working with Power Query, you’ll rarely practice enough to memorize all of the functions.

For instance, if you have a sales dataset with ‘Quantity’ and ‘Price’ columns, you could add a calculated column named ‘Total Sales’ with the formula [Quantity] * [Price].

If you’re less confident with the M formula language, don’t fret! Power Query’s ‘Column from Examples’ feature lets you create a new column based on example output. You simply type a few examples of the desired results in a couple of rows, and Power Query will try to deduce the pattern and generate a formula.

Digging Deeper: Advanced Functions in Power Query

Power Query is not just about simple data cleaning or transformation. Its capabilities extend to advanced functions that can significantly enhance your data manipulation and preparation processes. Let’s look at a couple of these functions: merging queries and appending queries.

Merging Queries

Merging queries allows you to pull in data from different sources or tables and consolidate them into a single, useful table. The operation is similar to a JOIN operation in SQL.

For example, imagine you have two tables: one listing product sales by a unique product ID, and another providing product details corresponding to each unique product ID. Using Power Query, you could merge these two queries to create a unified table, ensuring that each sale entry now includes all relevant product details.

Power Query provides several kinds of merges, including inner, left outer, right outer, and full outer. Each of these merge types can be likened to their SQL JOIN counterparts, providing flexibility depending on what data you want to retain in your final table.

Appending Queries

While merging queries combines tables side by side, appending queries stacks one table on top of another, much like a UNION operation in SQL. This function is useful when you have data with the same structure but split across multiple tables or sources.

Consider a situation where you have sales data for different years in separate tables. With the same columns in each (such as product ID, quantity, and revenue), you could append these queries to compile all your sales data into a single table.

Remember that the key to successfully appending queries is that the structure—namely, the columns—of the tables should match.

Conclusion: Power Query – A Universal Data Management Tool

As we wrap up this brief journey through Power Query, it’s essential to take a step back and acknowledge the broader landscape. Power Query isn’t just an Excel tool—it’s a data manipulation language that spans across multiple Microsoft products, including Power BI and Analysis Services. This means that once you’ve honed your skills in Power Query within Excel, you’ve actually equipped yourself with a set of data transformation skills that apply to a wide range of tools in the Microsoft ecosystem.

The techniques and processes you’ve learned about here aren’t confined to a single software application; they’re transferable and scalable, much like the data you work with. This universality is part of what makes Power Query so powerful and valuable. The ability to extract, transform, load, and automate data processes in a consistent manner, regardless of the platform you’re using, is a game-changer in today’s data-driven world.

What’s more, Power Query’s interface is designed to be accessible and user-friendly, making it a fantastic entry point for those new to data transformation. As we’ve discussed, it offers robust features like simple importing, intuitive transformations, and advanced functions like merging and appending. Yet, it also provides more complex operations such as creating custom columns and calculated fields—making it flexible enough to handle more complex, bespoke data manipulation tasks.

And let’s not forget about the automation capabilities Power Query brings to the table. In an increasingly fast-paced business environment, automation is becoming a crucial factor in maintaining efficiency and accuracy. By setting up automated data transformation processes, you save valuable time, reduce the risk of human error, and create consistent, dependable data outputs. It’s another step towards creating a well-oiled data management machine.

By mastering Power Query, you’re not just improving your Excel game—you’re laying the groundwork for broader data management and automation skills that can be applied across a variety of tools. You’re investing in your own skillset, boosting your value as a data professional, and paving the way for seamless data transformations, no matter where your data journey takes you.

Whether you’re a seasoned data analyst, a business professional looking to improve your reporting capabilities, or someone beginning your journey into data transformation, Power Query is a potent tool in your data management arsenal. Embrace its capabilities, explore its features, and harness its power to turn raw data into valuable insights.

Leave a comment