Maven Challenge – Net Promoter Score and Likert Analysis – Airline Survey

Another Maven challenge, another blog post!

However, rather than a capturing a summary of a particular strategy, this one is on the development of a set of visuals that are well suited to represent the results of survey data and find the categories with the largest proportions of positive and negative attitudes.

The Challenge

The overall challenge was to analyse a survey response data set from around 130,000 passengers who have travelled on Maven Airlines to determine what areas were contributing to the satisfaction rate falling below 50%, and what data-backed steps could be taken to make improvements.

The Data

The data contained various personal information on each passenger, as well as details about their journeys and class of travel.

In addition there were a number of questions which ask to rate particular services on a scale of 1 to 5, with 1 being the worst service and 5 the best (see below extract from the data dictionary).

This type of survey query is typically known as a Likert Scale survey question, named after Renis Likert, a famous social scientist. Here a 5 number scale has been used, but you may also encounter 7 or 10 number scales. You may also come across non-numeric versions (e.g. “very likely” to “not likely at all”).

These types of scales allow for a little more nuance in the sentiment surrounding a survey response where the answer may not be a binary yes/no.

Analysis of Likert Scales

I had read about Likert scales and survey data when working on the remote working challenge. Although I didn’t fully use it there, I was now keen to learn more about it and look at ways it could be visually represented.

One method I had previously come across in my endless trawling of Youtube tutorials and browsing other challenges such as Workout Wednesday and Makeover Monday, was to show the results in a horizontal bar chart, and show divergence between positive and negative results.

Below are a few different ways of presenting data using the scales. This ranges from a simple stacked bar chart, to basing it around a neutral zero value, to extracting neutral values, and all the way to a full deconstruction.

Each method has it’s own particular advantages depending on what you are trying to present, but they are effective at showing the relative distribution of a spectrum of responses.

Net Promoter Score

A child of the Likert scale is the Net Promoter Score (NPS). Strictly speaking, this is used on survey results with scales of 1-10 (see below), and it is calculated by subtracting the % of promoter scores (9-10) from the % of detractor scores (0-6).

If you return a positive result, then people are more enthusiastic about that aspect of your brand or company, whereas if it is negative, that could be damaging to your company.

What I wanted to show

Taking this into account, I wished to show a version of the bar chart type analysis of the Likert data shown above, along with the corresponding NPS for each category – basically a tidied up version of the below.

However, in this case we only have a scale of 1-5. Therefore I opted to re-scale the analysis based on the following:

  • 1-2 = Negative
  • 3 = Neutral
  • 4-5 = Positive

Again, strictly if I strictly followed the NPS chart it probably should be 1-3 as negative, 4 as neutral and 5 as positive, but I would maybe argue we are looking for “satisfaction” rather than “evangelist promoters”, so seeing 4-5 as positive is good enough for me.

Research on building the Likert bar chart and NPS

I did some “research” ( aka Google and YouTube searching), on how to put together these charts, specifically in Power BI. Although I didn’t search for hours, I didn’t manage to find anything useful as a guide.

Step into the rescue, two invaluable sources of information:

  1. The Big Book of Dashboards (Wexler, Shaffer and Cotgreave)
  2. Andy Kriebel’s Tableau YouTube video

The book provided some great advice on visually what does and doesn’t work in these types of charts, and the video provided an overview of the steps and measures to allow me to build the charts.

Apply the techniques to Power BI

Now it was a matter of translating the textbook advice and Tableau tutorial over to Power BI desktop. This was done in several steps.

1. Power Query Work

First I imported the dataset into Power Query, then created a duplicate of the table for this particular work.

Next, I selected the column headers for all the columns containing the Likert categories. After that, I navigated to the Transform ribbon at the top, then selected “Unpivot Columns”, and then “Unpivot Selected columns.

This resulted in the below layout. There is a column called “Attributes”, which relates to all the Likert categories, and another column called “Values” which are the corresponding Likert scale numbers – effectively all the categories and their corresponding values are in two columns.

While I was in Power Query, I created a “sentiment score”, which was -1 for “neutral or dissatisfied”, and +1 for “satisfied”. I never used this in my final report, but was using it in some experimenting small multiple visuals shown below, effectively showing the 1-5 rating in each category, but mirroring the satisfied and unsatisfied passengers.

I found it showed some interesting patterns, but it took quite a close inspection to glean those patterns, therefore I discarded it for my final report, in favour of the stacked bar chart.

2. Measures

Now that the rejigging of the data was done in Power Query, I moved on to start creating the measures that would help me build the below draft visual. This included the:

  • Count the positive scores (4-5)
  • Count the negative scores (1-2)
  • Count the neutral scores (3) – this was done in two parts as they straddled the zero line
  • % positive
  • % negative
  • % neutral
  • NPS


The fairly simple count DAX measures were as follows:

Note that the below calculation is multiplied by -1 so that is will appear in the negative side of the y-axis.

Next, as mentioned, for the neutral scores, these straddle the zero line, therefore the calculation is split into two, one for the positive side, and one for the negative side.

All Selected Attributes

In order to transform these count measures into % values, I need a denominator. Typically, you could use an ALL() DAX function so that you are measuring as a portion of the total of a given column, but as I have created an unpivoted table with multiple categories, it requires something a little more refined.

Instead of ALL(), I opted for ALLEXCEPT() with several columns referenced – see below:

Percentage Values

Now this was done, all that was left to do was to create a few simple DIVIDE() functions to finalise the percentage values – simply dividing each count by the All Selected Attribute:

again, remember we need two values for the neutral (positive and negative).


Finally, after all those measures were created, I can now come to finalising the NPS. If we relook at the below calculation, I recreated a version using my measures. (Note that I add my positive to negative scores, as my negative are actually negative numbers)

Rank NPS – (Bonus Calculation)

I almost forgot, there is a bonus calculation that I needed to order my Likert chart by the NPS. I used the RANX function to rank the attributes by the NPS is descending order, as shown below. I will show you a little trick on how to apply this in the next few paragraphs.

3. Building the Visuals

Ok, now onto the applying all those measures into something visual.

First I select the stacked bar chart. Then I add Attribute to the X-axis. Following that, I add the 4 % measures in the order shown below. It looks a little strange, but the add the negative value first, going from the least negative to most negative, then add the positive values after that.

Finally, drop the Rank NPS in the tool tips.

By adding the Rank NPS to the tooltip, it allows you to select it as a sorting option for your chart. To do this, select the ellipsis (…) at the top right of the chart, then select the sort axis as shown below. This will allow you to rank both your Likert chart and NPS chart by the NPS scorings, and this allows you to have common y-axis categories across both charts.

The NPS chart is simply put together, adding the Attribute in the Y-axis and NPS Grouped score in the X-axis, as shown below.

Once the charts have been built, it is a matter of applying your preferred colour palettes, fonts and formats. in the formatting pane.

4. Final product

Once I was happy with my general visual content, I brought them into my final dashboard, and applied final touches to axis labels, legends and covering titles shown below.

I then applied this set of visuals twice; one time for analysing business class passengers, and another for analysing economy and economy plus passengers. This formed my most detailed analysis in the report, allowing me to drill down in detail from those initial large percentage numbers to finding the issues that were affecting passenger satisfaction.

Let me know in the comments below, or feedback on LinkedIn if this kind of content was of use, or if there were other ways you may have gone about executing this.

Thanks for reading.

Maven Challenge – Harry Potter Scripts

Using Power Query to match number of particular spells to characters

When I first saw the new Maven Magic Challenge description, I was drawn to the words “share a…visual that captures the MAGIC. And so, when I started looking at the data I had little real interest in the budgets, box office and run times. It was all about the characters, their journey, and the spells they cast… both in the movies and on the movie goers.

Connecting the Places and Spells to the Characters

So, I started to look at how I could connect the story characters and their dialogues to the magical places around the Potter universe, and the data as it was arranged allowed me to do that without too much hassle.

I then came to look at the data associated with the spells. There was no path in the dataset as provided that allowed me to directly link the spells both to the characters, the dialogues and the places. All it would allow me to do is look at them in isolation, which was no good!!

The challenge of this data set was officially set – how can I link the spells to the dialogue, and then onto the characters and places, and how can I pull this weaving story into a single page visual???

I came up with two methods to try and achieve this

My first method would be to count the number of spells cast by a character. The next would be to also identify the type and name of each individual spell cast.

It was time to cast those “spells”…..Powerus Querio!!

Method 1 – Calculating the Number of Spells Cast by a Character

Step 1 – Create a List of Spells

I needed to create a separate list containing only the incantations from the spell file. So, I imported a copy of the “Spells” csv file. I then selected the Incantation column and the Transform tab, selected Convert to List (shown below) and muttered “Reducio” under my breath.

Voila! The file was reduced to a single list of the spell incantations, as shown below. I would use this list as a reference to look at instances where the incantations were mentioned in the character dialogue. Keep reading to find out more!

Step 2 – Searching the Dialogue for Incantations

The next step was to search the Dialogue for any of the line items from my newly acquired list. I imported the Dialogue csv, then set to work deploying a combination of List.Count and Splitter.SplitTextByAnyDelimeter. The line of M code I used was:

= Table.AddColumn(#"Changed Type", "No. of Spells - Splitter", each List.Count(Splitter.SplitTextByAnyDelimiter(Spells)([Dialogue])) -1)

This would count the number of times a spell was mentioned in the line of dialogue. Let me explain a little how this works, starting from the inside out.


This code allows us to work through the lines of dialogue, and when it meets a spell from our list, it effectively stops and splits that line into a list. This new mini list will contain a separate line for each time a spell is encountered, and will have a final line to dump the text from the dialogue that does not appear in the spell list.

This new mini list does not capture which spell it has encountered, only that it has encountered one. In fact, it returns a blank line each time it finds a spell. Confused? I was a little. The below picture may help visualize it.

List.Count – Wrap Around

The Text Splitter does not do a whole lot on it’s own. Therefore, the next piece of the puzzle was to wrap it in a List.Count. This would count the number of items in each mini list we have created. However, as the above shows, it would count once for the spell, and once for the remaining text.

The next action would be to add “-1”. The result would be to give us a count of the number of times an incantation from our list had been used in the dialogue. See below example highlighting where a count has been made of an incantation.

Step 3 – Tidy Up

The final step in this initial tidy up is simply to filter the column for values greater than zero. You have now identified all the lines of dialogue that contain an incantation, as well the numbers of incantations used – sometimes there was more than 1 – see below.

Not Satisfied Yet!!!!!!

Method 2 – Extracting the name of the spells cast by each character

Step 1 – Custom Column with List

Create a list of Spells in the same way as Method 1, Step 1 above – Easy as!

Step 2 – Add Spells List to Dialogue Line Items

Add a custom column to your dialogue sheet in Power Query called “Find Spell”, and simply add “= Spells” and hit ok. You will see a new column added containing the word List in each cell. This list contains the full list of spells.

Step 3 – Expanded List

This next step may not be recommended for very large data sets, like the Maven Taxi Challenge, but it performs fine here.

We will now expand the list. This is done by clicking on the double arrows and selecting “Expand to New Rows” as shown below.

This will then duplicate each line of dialogue to equal the number of spells in our list. Below is an example, where we see the line of dialogue repeated, with each spell listed beside it. The reason for doing this will become clear below.

Step 4 – Find the Spell

Next we can use a Boolean calculation in a custom column to find if our line of dialogue contains the spell that is living in the “Find Spell” column.

I have used Text.Contains([Dialogue],[Find Spell]) . This will look at the text in the Dialogue column and see if it contains the text shown in the corresponding Find Spell column. If it finds it, it will return TRUE, and if not it will return FALSE. See the below example, where if I can recall correctly Hermione repairs Harry’s glasses with an Occulus Reparo spell!!

Oculus Reparo -

Step 5 – Filter for TRUE values

Once the above step is taken, it is a matter of applying a simple filter on your custom column to only show TRUE values. Looking at the below, I can now see only the lines of dialogue that contain a spell, and now also the name of the actual spell that has been used!! This was what I originally had in mind.

Using this Information

As the Dialogue data sheet contains the movie, chapter, place and character IDs, I can now link the spells to these people and locations!

Next Step

The next step is to now build the visualisation I had in mind to work with this new insight.

Watch this space…..

Expecto Patronum!

Power Query – Nested If/And/Or Statements

This is a brief follow on to my earlier blog on the Maven Analytics NYC Taxi challenge, and cleaning well over 25 million line items of data in Power Query, before developing a Power BI dashboard. In particular, I am looking at amending an “If/And” statement to become a nested “If/Or/And” statement.

One of the cleaning steps involved looking at values in multiple columns to check that if they were ALL negative, then we should change each value to a positive (multiply by -1).

Original M Code

To do the above, my original M Code involved adding a new column which returned a statement based on a set of IF/AND statements

= Table.AddColumn(#"Changed Type", "negative_charges", 
each if [fare_amount] <0 and 
[extra] <0 and 
[mta_tax] <0 and 
[improvement_surcharge] <0 and 
[congestion_surcharge] <0 then 
"all negative" else 

This effectively adds a new column called “negative charges”, with each row value being returned as either “all negative” or “ok” based on the if statements.

So, if the fare amount, extra, mta tax, improvement surcharge AND congestion surcharge were ALL negative for that particular journey, then the result would be “all negative”.

However, if one or more were greater than or equal to zero, they would be deemed “ok”. This would allow us to then amend values in rows which were only calculated as “all negative”.

Minor Problem

It was pointed out to me however, that the [congestion surcharge] may also contain ‘null’ values. These would not be considered in the above calculation. Typically, we may consider ‘null’ to equal zero, therefore it would not matter; however for these surcharges, ‘null’ may mean that the surcharge was not being applied at that time.

Therefore, I had to look at amending the M code to accommodate this.

Updated M Code Solution

In the end, the amendment is not so complicated, but you can create a nested conditional statement by wrapping it in parentheses – see below amended M code. This allows you to check for multiple Or and And conditions and return the appropriate result.

= Table.AddColumn(#"Changed Type", "All Negative Check", 
each if ([fare_amount] <0 or [fare_amount] = null) and 
([extra] <0 or [extra] = null) and
([mta_tax] <0 or [mta_tax] = null) and
([improvement_surcharge] <0 or [improvement_surcharge] = null) and 
([congestion_surcharge] <0 or [congestion_surcharge] = null) then 
"all negative" else 

If you want to see this in action, I have created a short walk through on my youtube channel here.

Maven Challenge – Power Query Mega Clean – Taxi Challenge


This is a walkthrough of the cleaning I undertook on excess of 25 million lines of data as part of the #maventaxichallenge , which is the monthly data visualization challenge set up by Maven Analytics. This month the challenge involved looking at detailed NYC Taxi data between 2017 and 2020 and developing a usable single page dashboard to analyze weekly trends.

Details of the challenge can be found here. This includes the data files being provided and the requirements for cleaning and presenting the data. I have also created a walkthrough of the below on a video link if you want to see it in live action.


My plan was fairly simple but structured:

  1. Write a clear list of the cleaning steps linked back to the data dictionary
  2. Create a Data Sample
  3. Apply the cleaning steps to the sample and verify
  4. Copy out the M Code for each step
  5. Load the full data set and apply the M Code for each step

Step 1 – Cleaning List

So, the following were the steps required, and what that translated into using the data dictionary:

Cleaning RequestData Dictionary Translation
Let’s stick to trips that were NOT sent via “store and forward”store_and_fwd_flag = N
I’m only interested in street-hailed trips paid by card or cash, with a standard ratetrip_type = 1 ,
payment_type = 1 or 2,
RatecodeID = 1
We can remove any trips with dates before 2017 or after 2020lpep_pickup_datetime = between 01/01/2017 and 31/12/2020 lpep_dropoff_datetime = between 01/01/2017 and 31/12/2020
…along with any trips with pickups or drop-offs in unknown zonesPULocationID <> blank/empty
DOLocationID <>blank/empty
Let’s assume any trips with no recorded passengers had 1 passengerIf passenger_count = 0 or null, then replace the 0 with 1
If a pickup date/time is AFTER the drop-off date/time, let’s swap themIf lpep_pickup_datetime > lpep_dropoff_datetime
then lpep_dropoff_datetime
else lpep_pickup_datetime

if lpep_dropoff_datetime < lpep_pickup_datetime
then lpep_pickup_datetime
else lpep_dropoff_datetime
We can remove trips lasting longer than a day, and any trips which show both a distance and fare amount of zeroadd column
then filter out values >= 24 hours

if trip_distance AND fare_amount = 0
then filter out values
If you notice any records where the fare, taxes and surcharges are ALL negative, please make them positiveif fare_amount <0
and mta_tax <0
and extra <0
and improvement_surcharge <0
and congestion_surcharge <0
then “all negative”
else “ok”

then apply “trick” replacement to change values (see below for more detail)
For any trips that have a fare amount but have a trip distance of 0, calculate the distance this way: (Fare amount – 2.5)/2.5If fare_amount > 0
And trip_distance = 0
Then ((Fare amount – 2.5) / 2.5)
Else trip_distance
For any trips that have a trip distance but have a fare amount of 0, calculate the fare amount this way: 2.5 + (trip distance x 2.5)If trip_distance > 0
And fare_amount = 0
Then (2.5 + (trip_distance x 2.5))
Else fare_amount
Cleaning Requests and Action Steps

Step 2 – Create Data Sample

It would be next to impossible to apply the steps to over 25 million lines of data and then easily verify that all the filters, additions, replacements and modifications had taken place and you got the results you were looking for.

A much more digestible method is to recreate a sample data set based on the data we were provided, and ensure that at least one example of each cleaning step scenario listed above is included.

To do this, I took the smallest file (2020), and performed a series of deduplications on values across the locations, passenger numbers, etc. so that I was able to get a small sample of approximately 50 varied line items.

Then, in order to recreate some of the cleaning scenarios, I made minor adjustments to some values. e.g. swap the drop off and pick up times. Above is the resulting sample data set.

Step 3 – Apply Cleaning Steps to Sample and Verify

Next, I created a new PowerBI file, uploaded the sample data set and then moved to edit the set in Power Query.

Sample data in Power Query

After performing the usual steps on checking data types (text, date, time, numbers, etc) have been applied, it was then a case of applying each cleaning step, and working through. For example, the first step became: =Table.SelectRows(#”Changed Type”, each [store_and_fwd_flag] = “N”)

M Code Replace Hack

There is one hack that is really worth highlighting here, and will save some added columns and processing time in your Power Query tasks, especially in larger data sets. The below gallery shows a snapshot of each step, but here is a brief description:

  • Create a “Dummy Replace” step for the column you which to replace values on.
    • Select the Transform ribbon, and click on Replace Values (picture 1)
    • Choose two dummy values to replace that would not be values in your column. As an example here, I chose replace 5000 with 6000 (picture 2), where most values would actually be single digit values in that column.
    • Click OK, and you will see the M code in the formula box at the top (picture 3)
  • Create a “Dummy Custom Column” step to allow you to write the conditional statement you want to insert into your Dummy Replace code
    • Select the Add Column ribbon, and click on Custom Column (picture 4)
    • use intellisense to help you write the conditional statement you wish to create (picture 5)
    • Copy the statement, and click cancel. This way you are not creating an additional column.
  • Return to your Dummy Replace step and perform the following:
    • For the first value (5000), amend this value to refer to the column you want to replace values in, and precede it with “each”.
    • In this example, I replace 5000 with each [fare_amount]
    • For the second value (6000), amend this value to refer to the conditional statement you want to apply to the selected column, and again precede it with “each”.
    • In this example. I replace 6000 with each if [negatives_charges] = “all negative” then [fare_amount]*-1 else [fare_amount]
  • Once you hit return, the replacement of values occurs within the column, without the need to create an additional column. This will increase efficiency for any refreshes.

If you are interested, the “normal language” logic behind this step is:

  • For each row value in the the column fare_amount, if the corresponding row in the negative_charges column is equal to “all negative”, then we would like to multiply the fare_amount by -1, thus changing it from a negative to a positive value. Else, if it is not “all negative”, then just keep the fare_amount value as it is (no change).

Step 4 – Copy out M Code

Once you have gone through each of the steps on your sample set and verified that it has had the desired effect on you line items, you will now have the list of steps under your Applied Steps on the right hand side of the Power Query interface. A good tip is to rename these steps based on your need (e.g. Change Negative Values or Swap Dates). This will help when you want to copy steps to use in the full data set.

You can see a list of this information in the Advanced Editor Window (see below). This can be copy and pasted out and saved for future use.

If you click on each step you will see the corresponding M code just above your data table.

Advanced Editor showing M Code Steps

There is another way that I learned from the Curbal Youtube page, which is pretty powerful. I wont repeat all the steps here, but here is a link to the tutorial.

Step 5 – Apply to Full Data Set

Now that you have your all your steps and code written, tested and verified, it is now a pretty straight forward proposition to apply them to your full data set.

An easy way to add steps, is to right-click on a step then select “Insert Step After” (see below). This will then allow you to paste in the M code that you have saved. One tip is to check the step reference. It will refer to the name of the preceding step. Therefore make sure you do that in the first instance. For following steps, it should be easy to use your copied values from the sample data, provided you use the same names for your steps.

Insert Step

Once you have completed all your steps, you are done – all that’s left to save is Apply and Close.

The “trial and error” approach is removed, which means that when Power BI applies the updates to the 25 + million line items, you can be reasonably confident you will not have to revisit your Power Query. This is important here, as with such large data sets, the updates can sometimes take hours to complete depending on your computer’s processing capabilities.

You are now free to move on to the next part of the challenge and concentrate on creating you DAX calculations and a nice neat dashboard.

If I have time to finish my dashboard on this one, I will add it on a future post. As always, any comments or queries, please let me know in the comments below!