2 axis responsive scatter plot

Scatter plots are very useful components for visualizing data. A scatter plot allows you to map value along two axis, typically a horizontal x axis and a vertical y axis. Scatterplots are great for finding correlations in data. That also brings us to the very important scientific mantra

Correlation does not imply causation (Wikipedia)

We can observe patterns and speculate about relations between two variables, but we can’t conclude that one is the cause of the other. To do so is a statistical fallacy.

Observing the relation between variables is still an interesting exercise and can sometimes lead to interesting questions. Scatterplots are great for exploring new datasets and for finding patterns . In this tutorial you will learn more about getting the data into this scatterplot. The scatterplot will map data along two axis, X and Y.

The scatterplot we will make will show data about real estate transactions of recreational properties in most norwegian counties. The Y axis shows the amount of transactions, while the X axis shows the average price per county. Note! The counties in this dataset is based on the old Norwegian counties, not the new merged ones such as Viken and Vestlandet.

You can download the files here!

A few notes on this. First of all two counties are missing, or actually they have merged and become one county and there is no fresh data from either of them or from the new combined county. These counties are Nord-Trøndelag and Sør-Trøndelag which are now simply called Trøndelag.

Data binding, the really really important bit.

I order for this to work you must bind the data to be visualized on screen. Here is how that works with the included csv file. The datafile can be seen by clicking this link. When publishing data files in wordpress it must be saved as a .txt file, but its formatting is a csv file so it can still be read. Here are ten rows of tha data. There are about 12000 more rows in the full dataset.

"03 Akershus and Oslo","Total purchase price, transfer of title (NOK million)","2018K3",336
"03 Akershus and Oslo","Total purchase price, transfer of title (NOK million)","2018K4",156
"03 Akershus and Oslo","Total purchase price, transfer of title (NOK million)","2019K1",118
"03 Akershus and Oslo","Total purchase price, transfer of title (NOK million)","2019K2",231
"03 Akershus and Oslo","Total purchase price, transfer of title (NOK million)","2019K3",273
"03 Akershus and Oslo","Purchase price per transfer of title (NOK 1 000)","2000K1",655
"03 Akershus and Oslo","Purchase price per transfer of title (NOK 1 000)","2000K2",587
"03 Akershus and Oslo","Purchase price per transfer of title (NOK 1 000)","2000K3",593
"03 Akershus and Oslo","Purchase price per transfer of title (NOK 1 000)","2000K4",676
"03 Akershus and Oslo","Purchase price per transfer of title (NOK 1 000)","2001K1",823

So let’s decode this with our heads first. Each line starts with string containing a county number number and a county name, in this case that is “03 Akershus and Oslo”. Then we see a comma separating the variables and then we can see two different variables.

The script generates elements in the setup function and alters them in the reDraw function. The reDraw function will be triggered by screen size changes and the redrawn visualization will adapt to the new screen size. In the script you can see a for loop starting on line 258, inside the loadData function. This loops through the entire file while picking the variables we need for the visualization. The aim of this is to compare the total amount of sold recreational properties per region on one axis, with the average purchase price on the other axis for each county. You can see it in action above. First we need an empty array to store the data we want:

var dataList = []; 

Then we need to make one object for each region to store the data in. In order to detect regions in the datafile set up a variable. When this variable changes to a value that is not represented by an Object , we have to make an Object for this region where we can keep data properties related to this region.

 var currentRegion = "dummyRegionName";

This dataset is organized by region so we know that all data from that region will come consecutively. This simplifies the code. We will name a variable ‘currentRegion’ and use it to check if the region changes. It is given a value of “dummyRegion” just to have an initial value an avoid an error. The variable will be set to a real region on the first line.

This is the line that checks for a new region:

 if (currentRegion != data[i].region){

So if currentRegion does not equal the region on the current line in the dataset, we do the following:

//The region has changed
ireg = i;
//sets currentRegion
currentRegion = data[ireg].region;
//This cuts away a number from the region text string so we only get the region name
var regionString = currentRegion.substring(3);
//This creates a new object for the region and gives it one property
var dObj = {region: regionString};
//The region object is pushed into an array of region objects
 dataList.push(dObj);

The region variable starts with a number and a blank space. In order to get just the name by we can use substring(3). This cuts a string fromthe original string, but it starts on the fourth character. In this case the original string “03 Akershus and Oslo” and substring(3) is “Akershus and Oslo”.

You can specify a second number with the substring method to get as many characters as you want after the start character. In this case we simply want to get everything from the fourth character.

The script then creates a object called dObj to store the values we want, and then pushes that object into the dataList array. Now we can start the fishing for the data we want. This if statement checks for two variables on each line:

 if (data[i].quarter == "2018K3" && data[i].contents == "Transfers, total"){

It checks for a certain quarter and also if the value is what we want. The logical operator && means that both conditions must be met for the script to execute the code within the curly brackets.

 data[i].transfers = parseInt(data[i].transfers);
 if (data[i].transfers != 0 || data[i].transfers != null){
     dObj.yVal = data[i].transfers;
  } else {
     dObj = null;
  }

This code first runs a parseInt operation to ensure we are dealing with a number and not a string. The there is another if statement. This one checks for missing values in the dataaset. This is using the either or operator which is ||. If either codition is met the code in the curly brackets will be executed. In this case it means that the value can not be null (Which means missing data) or zero. It checks that using the not equal to operator which is !=. If we have an actual value it will be mapped to the yVal variable of dObj which of course means that this value will place the visual marker on the y axis. If the value is null or zero on the other hand the dObj is set to be null, which means nothing. So now we have the code for the values on the y axis.

Setting the value on the x axis is very similar:

if (data[i].quarter == "2018K3" && data[i].contents == "totPrice"){
 
            data[i].transfers = parseInt(data[i].transfers);
            if (dObj){
              dObj.xVal = data[i].transfers;
            }
            if (data[i].transfers == 0 || data[i].transfers == null){
                dataList.pop();
              dObj = null;
            }
        } 

This does the same thing but rather than fish for “Transfers, total” it is looking for “totPrice”. Another thing it does is to remove dObj from the dataList array if the transfers value is zero or null. This means there is no element to visualize in the next stage and avoids display error by modelling the data. If you simply want to use this you are almost there. One more thing to adapt the component to your specific dataset.

The tooltip must make sense for the user

We have the data but it is important to remind the user what the data represents. From line 165 there is code related to the tooltip. In order to change the visible text on the tooltip go to lines 186 and 187 and alter the text to suit your needs. make sure you don’t mess up the formatting. Removing any “,({ and so on can cause a big mess.

If you want to display more lines in the tooltip you can simply add them along with additional variables. In that case you might also want adjust the negative ySwitch value on line 175., This value moves the tooltip further up when the mouse is below a certain position. A taller tooltip will require a bigger negative value. Also be aware that this tooltip can chage shape on narrow screen occupying more lines.

If you have managed to adapt the component as described you can use it. There are further ways you can adapt it by altering colours and other things but that won’t be explained in this article.