Power Query Pivot Swap
I recently was given a CSV file that listed users and the groups they belonged to, from an Active Directory dump.
The person asked me if it I could transform it to show the list of groups and the users that belonged to them.
I thought I could, but I wasn't really sure how hard or easy it would be. It turned out to be easier than I thought. I call it a Pivot Swap (leave a comment at the bottom if you have a better name!)
If you're the type who doesn't care about the process, and just wants the answer up front, here is the Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="Survey"]}[Content],
#"Split Column by Commas" = Table.SplitColumn(Source,"Career Interests",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),
{"Career Interests.1", "Career Interests.2", "Career Interests.3", "Career Interests.4", "Career Interests.5"
, "Career Interests.6", "Career Interests.7", "Career Interests.8"}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Commas", {"Respondent"}, "Attribute", "Career Interest"),
#"Removed Attribute Column" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Grouped Rows by Concat Respondents" = Table.Group(#"Removed Attribute Column", {"Career Interest"},
{{"Respondents", each Text.Combine([Respondent],", "), type text}})
in
#"Grouped Rows by Concat Respondents"
I've reproduced the scenario using Survey respondents and their career interests. The first column is the name of the respondent, and the second column contains a list of careers they are interested in, separated by commas.
The desired result is to have a column that contains the career interests and the second column containing a list of respondents that are interested in them (separated by commas).
The first step is to split the second column by commas. That's pretty easy.
Note that the wizard will determine how many commas there are in all the rows and create new columns accordingly. If I needed this to be something that would need to run again, and there might be a different maximum number of columns, I'd probably insert some fancy dynamic column stuff that I would learn by reading Chris Webb's blog post "Dynamic Column Selection In Custom Columns In Power BI/Power Query/M" but here I'm just leaving it simple.
Ok, now to the kinda cool part - the Unpivot operation. We're going to select the Respondent column, right click, and choose Unpivot Other Columns. This is nice because we don't have to worry about how many career interests were listed by each respondent.
Lovely! Now we have a (much longer) table that repeats the Respondent for each of the other columns it had a value in, and puts the column name in the new Attribute column, and the value in the Career Interest column (the default wizard will use the name Value).
We don't really care about the order in which they listed their career interests, so we don't need the Attribute column. That's simple to remove.
Next is the REALLY COOL part. What I want to do is group the Career Interest column and aggregate the Respondents. This is the part that isn't so obvious. If you select the Career Interest column, right-click and choose Group By, the dialog box only gives you numerical aggregations, like Count, Sum, Avg, etc. There's no option for text concatenation.
Ok, we can overcome that! For now, choose Count rows.
Notice that the formula bar shows:
= Table.Group(#"Removed Attribute Column", {"Career Interest"}, {{"Respondents", each Table.RowCount(_), type number}})
Here's the magic: after you close the dialog box, edit the formula bar to use the Text.Combine function instead of Table.RowCount, and give it the [Respondent] column and ", " as parameters.
= Table.Group(#"Removed Attribute Column", {"Career Interest"}, {{"Respondents", each Text.Combine([Respondent],", "), type text}})
It's pretty simple once you know how, right?
My original estimate I gave for the Active Directory CSV file was "about 4 hours". It turned out to be about a half hour, and most of that was just searching for a way to aggregate the text column. This question on the Microsoft forums helped me (moral of the story, find someone else's solution; and, the Microsoft forums are great!).
Maybe you have another way? Post it in the comments or, as Andy Leonard says, blog about it yourself.
Mike