Course Syllabus

What you’ll learn

Section 1: Data in R Identify the components of RStudio Identify the subjects and types of variables in R Summarise and visualise univariate data, including histograms and box plots

Section 2: Visualising relationships Produce plots in ggplot2 in R to illustrate the relationship between pairs of variables Understand which type of plot to use for different variables Identify methods to deal with large datasets

Section 3: Manipulating and joining data Organise different data types, including strings, dates and times Filter subjects in a data frame, select individual variables, group data by variables and calculate summary statistics Join separate dataframes into a single dataframe Learn how to implement these methods in mapReduce

Section 4: Transforming data and dimension reduction Transform data so that it is more appropriate for modelling Use various methods to transform variables, including q-q plots and Box-Cox transformation, so that they are distributed normally Reduce the number of variables using PCA Learn how to implement these techniques into modelling data with linear models

Section 5: Summarising data Estimate model parameters, both point and interval estimates Differentiate between the statistical concepts or parameters and statistics Use statistical summaries to infer population characteristics Utilise strings Learn about k-mers in genomics and their relationship to perfect hash functions as an example of text manipulation

Section 6: Introduction to Java Use complex data structures Implement your own data structures to organise data Explain the differences between classes and objects Motivate object-orientation

Section 7: Graphs Encode directed and undirected graphs in different data structures, such as matrices and adjacency lists Execute basic algorithms, such as depth-first search and breadth-first search Section 8: Probability Determine the probability of events occurring when the probability distribution is discrete How to approximate Section 9: Hashing Apply hash functions on basic data structures in Java Implement your own hash functions and execute, these as well as built-in ones Differentiate good from bad hash functions based on the concept of collisions Section 10: Bringing it all together Understand the context of big data in programming