Using Power Query to match number of particular spells to characters
When I first saw the new Maven Magic Challenge description, I was drawn to the words “share a…visual that captures the MAGIC“. And so, when I started looking at the data I had little real interest in the budgets, box office and run times. It was all about the characters, their journey, and the spells they cast… both in the movies and on the movie goers.
Connecting the Places and Spells to the Characters
So, I started to look at how I could connect the story characters and their dialogues to the magical places around the Potter universe, and the data as it was arranged allowed me to do that without too much hassle.
I then came to look at the data associated with the spells. There was no path in the dataset as provided that allowed me to directly link the spells both to the characters, the dialogues and the places. All it would allow me to do is look at them in isolation, which was no good!!
The challenge of this data set was officially set – how can I link the spells to the dialogue, and then onto the characters and places, and how can I pull this weaving story into a single page visual???
I came up with two methods to try and achieve this
My first method would be to count the number of spells cast by a character. The next would be to also identify the type and name of each individual spell cast.
It was time to cast those “spells”…..Powerus Querio!!
Method 1 – Calculating the Number of Spells Cast by a Character
Step 1 – Create a List of Spells
I needed to create a separate list containing only the incantations from the spell file. So, I imported a copy of the “Spells” csv file. I then selected the Incantation column and the Transform tab, selected Convert to List (shown below) and muttered “Reducio” under my breath.
Voila! The file was reduced to a single list of the spell incantations, as shown below. I would use this list as a reference to look at instances where the incantations were mentioned in the character dialogue. Keep reading to find out more!
Step 2 – Searching the Dialogue for Incantations
The next step was to search the Dialogue for any of the line items from my newly acquired list. I imported the Dialogue csv, then set to work deploying a combination of List.Count and Splitter.SplitTextByAnyDelimeter. The line of M code I used was:
= Table.AddColumn(#"Changed Type", "No. of Spells - Splitter", each List.Count(Splitter.SplitTextByAnyDelimiter(Spells)([Dialogue])) -1)
This would count the number of times a spell was mentioned in the line of dialogue. Let me explain a little how this works, starting from the inside out.
This code allows us to work through the lines of dialogue, and when it meets a spell from our list, it effectively stops and splits that line into a list. This new mini list will contain a separate line for each time a spell is encountered, and will have a final line to dump the text from the dialogue that does not appear in the spell list.
This new mini list does not capture which spell it has encountered, only that it has encountered one. In fact, it returns a blank line each time it finds a spell. Confused? I was a little. The below picture may help visualize it.
List.Count – Wrap Around
The Text Splitter does not do a whole lot on it’s own. Therefore, the next piece of the puzzle was to wrap it in a List.Count. This would count the number of items in each mini list we have created. However, as the above shows, it would count once for the spell, and once for the remaining text.
The next action would be to add “-1”. The result would be to give us a count of the number of times an incantation from our list had been used in the dialogue. See below example highlighting where a count has been made of an incantation.
Step 3 – Tidy Up
The final step in this initial tidy up is simply to filter the column for values greater than zero. You have now identified all the lines of dialogue that contain an incantation, as well the numbers of incantations used – sometimes there was more than 1 – see below.
Not Satisfied Yet!!!!!!
Method 2 – Extracting the name of the spells cast by each character
Step 1 – Custom Column with List
Create a list of Spells in the same way as Method 1, Step 1 above – Easy as!
Step 2 – Add Spells List to Dialogue Line Items
Add a custom column to your dialogue sheet in Power Query called “Find Spell”, and simply add “= Spells” and hit ok. You will see a new column added containing the word List in each cell. This list contains the full list of spells.
Step 3 – Expanded List
This next step may not be recommended for very large data sets, like the Maven Taxi Challenge, but it performs fine here.
We will now expand the list. This is done by clicking on the double arrows and selecting “Expand to New Rows” as shown below.
This will then duplicate each line of dialogue to equal the number of spells in our list. Below is an example, where we see the line of dialogue repeated, with each spell listed beside it. The reason for doing this will become clear below.
Step 4 – Find the Spell
Next we can use a Boolean calculation in a custom column to find if our line of dialogue contains the spell that is living in the “Find Spell” column.
I have used Text.Contains([Dialogue],[Find Spell]) . This will look at the text in the Dialogue column and see if it contains the text shown in the corresponding Find Spell column. If it finds it, it will return TRUE, and if not it will return FALSE. See the below example, where if I can recall correctly Hermione repairs Harry’s glasses with an Occulus Reparo spell!!
Step 5 – Filter for TRUE values
Once the above step is taken, it is a matter of applying a simple filter on your custom column to only show TRUE values. Looking at the below, I can now see only the lines of dialogue that contain a spell, and now also the name of the actual spell that has been used!! This was what I originally had in mind.
Using this Information
As the Dialogue data sheet contains the movie, chapter, place and character IDs, I can now link the spells to these people and locations!
The next step is to now build the visualisation I had in mind to work with this new insight.
Watch this space…..
15 thoughts on “Maven Challenge – Harry Potter Scripts”
It’s actually a great and useful piece of information. I’m happy that
you shared this useful info with us. Please keep us up to date
like this. Thank you for sharing.
some really fantastic content on this website , thanks for contribution.
It is in point of fact a great and useful piece of info.
I am glad that you shared this useful info with us.
Please stay us informed like this. Thank you for sharing.
Great blog you have here but I was curious if you
knew of any discussion boards that cover the same topics discussed here?
I’d really like to be a part of group where I can get advice from other experienced individuals that share the same interest.
If you have any recommendations, please let me know. Cheers!
You can follow Maven Analytics on LinkedIn.
I also just shared a new blog on the Super Bowl challenge, sharing my approach to creating that dash report.
Are you looking for technical support, or interested in the planning and strategy that goes on behind the scenes?
Nice post. I learn something totally new and challenging on websites I stumbleupon on a daily basis. It will always be helpful to read articles from other authors and practice a little something from their sites.
LikeLiked by 1 person
Hi, good work on this, i just want to share my thought how i did it, please reply with your opinion. what possible issue we might face. I did the inner left join between dialogue[Dialogue] and spells[Incantation] with fuzzy logic threshold of .99 in powerbi. i got the same result, i just want to confirm whether it has any possible flaws in doing so.
Thanks for your comment. I am not an expert in use of fuzzy logic, but I think you will get the same result as it is very unlikely there will be spelling mistakes or typos etc that would give an incorrect match.
Did it pick up lines where there was more than one spell/incantation spoken in that dialogue line?
It’s something I will look into a bit more now – thanks for pointing it out👍
There is definately a lot to learn about this topic. I really like all the points you made.
I blog often and I really thank you for your content. The article has truly peaked my interest. I’m going to bookmark your site and keep checking for new details about once a week. I opted in for your RSS feed too.
Thanks Lyndon, let me know any topics you are interested in for data analysis
Hi Gerard, thx for sharing, I will try this step by step later myself to learn how it works!
Thanks, hope it works out. There should be a link to a YouTube video that shows the walkthrough I the blog. If not, and you are interested, I can send you a link 👍