Advanced Worksheet

Package and Data Loading

As mentioned within the session setup, load the following packages using the library() function. Additionally, as we will be using a data set with large numbers, set scipen to 999 using the option function.


  options(scipen = 999)

As mentioned previously, for the purpose of this session, we will be using a sample of the Online Gaming Anxiety Data, this subset is avaliable to download as part of the .zip file downloadable from the Github Repo. It can then be loaded using the following script:

gamingdata_samp5000 <- read_csv("data/gamingdata_samp5000.csv")

Section 1: ggplot2 vs plot

Exercise 1: Plotting SPIN_T against Age using both the plot() and ggplot() function, discuss which has more potential in displaying data clearly.

Exercise 2: Expand the plot to group these points by Gender, which provides us with more information and is easier to achieve? Remember, you’ll need to recode gamingdata_samp5000$Gender as a factor using the function:

gamingdata_samp5000$Gender <- as.factor(gamingdata_samp5000$Gender)

Section 2: Basic Plotting Using ggplot

Exercise 3: Change the size parameter to Hours to see if there is a trend between the amount of hours, age and total SPIN score.

Exercise 4: Change the geom, from geom_point() to geom_jitter() and change the alpha parameter to 0.5 to more clearly see the relationship

Exercise 5: Change the X and Y axis labels, and title to something suitable for the plot, using the the labs() addition

Section 3: Different geom’s different plots

Exercise 6: Using geom_bar() plot the number of different cases in each gender

Exercise 7: Expand upon the geom_bar() plot, to determine the average (mean) number of Hours played by each gender.
Hint: Ensure to add stat = “summary” into your geom_bar() function.

Exercise 8: Expanding again on the plot created in Exercise 6, after using the following code to convert the Platform variable into a factor, specify the fill value to Platform.

Exercise 9: Through adding the appropriate limits function (xlim(), ylim(), lims()) in ggplot() set the upper threshold of your plot to 5,000

Section 4: Customization Part 1

Exercise 10: Using all the customization skills we have already discussed (labs(), lims() and geom’s), examine the relationship between an individuals Age, Number of Hours playing, their employment and what platform they use. Once completed, add one of the following themes to the end of your code, which looks best (to you)?

  • theme_bw()
  • theme_minimal()
  • theme_void()
  • theme_dark()
  • theme_classic

Hint: Consider using the mapping aesthetics such as colour, size, shape, alpha etc, to map the categorical variables.

Section 5: Layering different geom’s

Exercise 11: Using the code developed in Section 2 (Age, Hours and Total Spin Score) add an additional geom_jitter() layer which plots the GAD score.

Exercise 12: Expanding on the code from Exercise 11, replace the colour variable (currently Gender) with a colour to differentiate between SPIN and GAD. You can add back in Gender as a shape variable (see here for options).

Exercise 13: Conducting a direct swap between a colour (“blue” or “red”) and the variable Gender, does not produce a usable legend. As such, remove the colour parameter from the mapping = aes() component.

Exercise 14: Although this corrects the colours, there is no usable legend. To add a colour specific legend, you must return the colour = parameter to inside mapping=aes(), alongside adding scale_colour_identity(). To apply this, use the given template to specify what the colours indicates. Additionally correct the axis names so they represent the contents of the plot.

Exercise 15: Colours and legends can also be used for continuous variables. Return to just a single variable (SPIN_T), and specify the colour parameter as the hours variable (Hours). Is this clearer than the when Hours was specified as the size variable?

Exercise 16: To improve the clarity of the scale, you can use the function scale_colour_gradient() to specify the colours of the low and high points as well as any specific breaks. Add this function to the previous plot (Ex.14), using the template provided. Think about what are good breaks to use within this function.

Exercise 17: An alternative to layering multiple plots on on top of each other, is to use facet_wrap() or facet_grid() to produce graphs side by side. Using this a faceting function you can split the graphs up by a specific variable. Try using facet_wrap() to split the graph produced in Ex.14, by Gender.

Exercise 18. Alternatively you can split by a factor not included in the graph at the point of creation. Try using facet_grid() again to this time split the same graph by “Platform”.

Section 6: Publishing Plots from ggplot2

Exercise 19a: Using the function ggsave(), export your last plot (using the parameter last_plot()). Use the template provided to structure your export.

Exercise 19b: Alternatively, you can set a specific graph to a variable (for example: “plot_to_print”) and past it in place of last_plot(). Additionally, you can change parameters such as the height and width, dpi, scale and type of export accordingly. Try to export the plot you made for Ex.16, as a JPEG (.jpeg), with a dpi of 500 and a size of 5cmx10cm.

Exercise 20: Alternatively, you can also use the export function in the Rstudio UI to export your plots. To do this, click export in your plots tab (if plots are produced there) and export accordingly. If you are using the accompanying worksheet (which is a .R script), try saving any of your previous plots as .pdf files following the instructions in RStudio.