Scatter plot with regular intervals

A scatter plot is great for visualizing correlations in datasets. Some datasets have measurements with regular intervals resulting data being aligned on one axis. This can typically happen if one axis is time based. If such a dataset is visualized in a scatter plot the result can be overlapping data points that are harder to read. This article will demonstrate a scatter plot where the markers are flat, wide and transparent to make the chart more readable.

NOTE: Please contact me if you widh to use this component. It works but it is currently in development and there are several issues requiring manual adjustment of several variables to make it show data correctly. This will be fixed in future updates.

This is the graph. It shows the number of people catgorized by their education level who went to the opera cetain years. You can hover over the markers to see the data.

A scatter plot is in my opinion often a more correct way to visualize data than a curve graph. Drawing a line between two data points is a powerful visual statement that can be misleading. Where a scatter plot draws each measurement as an idividual point, a curve graph implies that anything between the two measurements follows a straight or curved line. The truth is that there is no measurement between the two points and we have no clue what the value could be. The norwegian political party høyre (conservative party) used a highly misleading graph to brag about cuts in CO2 emissions during their time in government. They simply cut out some datapoints showing peak emissions during their time in government so they could display a nice decending curve.

Using a scatter plot instead would simply have shown the removed data as missing.

Let’s get on to this scatterplot with flat markers. When a scatter plot has regular intervals the result is that datapoints are aligned vertically, and this can make a scatterplot with circle markers hard to read. For this purpose here is a scatter plot using flat markers allowing more points to be stacked vertically.

This scatterplot is still a bit rough around the edges and requires a fair bit of customization to suit each dataset. It can still be used and when used correctly it will show the data in a good way.

One such adjusment is that the width and x positions of the flat markers have to be adjusted manually to fit the size of the dataset. In the provided example each flat markers width should is set to be 5% of the width of the entire graph. This also means that the x-position of each marker must be set to a negative half of the markers width. This is sone on line 280 like this:

    .attr("x", function(d){ return(-(width / 40) + xScale(d.year))})

Because each marker is 5% of total width, the code line above adjusts the x position to a negative width divided by 40 which is equal to 2.5%. This will be fixed in a better way in the future when this component reached a more mature development stage.

Another variable that has to be manually adjusted is the ySwitch variable starting on line 227. This variable adjusts the y position of the tooltip. There are several other values to take in to consideration. First of all the height of the tooltip might vary depending on the dataset. As it is now the text in the tooltip is generate from all data point from all year within the same category. As a result of this the height of the tooltip will change as each year gets a new line in the tooltip. Another adjustment is wether to place the tooltip above or below the marker. If the marker is high on the chart the tooltip should be placed above the marker, and the opposite if the marker is towards the bottom of the chart. This is done on line 233.

NOTE: If the tooltip y position blocks the marker it will prevent the event that makes the tooltip appear and that makes the relevant markers appear red.

 if (d3.event.pageY < 210){
              ySwitch = 20;
            }

The ySwitch variable is adjusted if the pointer is further up on the screen, placing the tooltip below the mouse pointer.