For this project, you will choose an interesting dataset and analyze it using the statistical techniques learned in class. You will present your findings to the class at the end of the semester and well as display them on a webpage that can be used as part of a portfolio. All code will be uploaded to GitHub.

The Data set:

Choose any interesting dataset that meets these criteria: Sources for finding your dataset:

GitHub

GitHub is a free, widely-used website for storing code, with a focus on tracking changes and encouraging collaboration. Students can apply for free private repositories.
GitHub resources:

Project Presentation: Thursday 6 December in class (slides due by 10am on Blackboard)

Each person will give a 2 minute presentation about their project using slides. The presentation should introduce the dataset and describe the results from at least one of the analyses. You may use 2 or 3 slides in pdf format. The first slide should be a title slide, with the title of your project and your name. Slides are due on Blackboard by 10am on Dec. 6, 2018.

Project milestones:

  1. Thursday 13 September: Submit your GitHub username, setting up an account if necessary. If you wish to keep your code private, please add me (megan073) to the repository or project. If you create a GitHub project for your code, also submit the project name.
  2. Thursday 20 September: add your dataset to GitHub in a folder called 'data'. Create a webpage for your project, and on it write (i) a paragraph explaining what your data is and citing the source, (ii) 3-5 questions that you hope to answer with the data. (The questions are just to get you thinking about your data. You do not necessarily need to answer them during the project.) The webpage can be created with GitHub Pages or on any other service, like Google Sites.
  3. Thursday 27 September: in R, compute the mean and standard deviation for at least three numerical data columns. You will need to use read.csv to read your data file into an R data frame. Post your code on GitHub, and post the means and standard deviations on your webpage. You do not need to upload anything to Blackboard unless you haven't submitted your webpage or GitHub user name.
  4. Thursday 4 October: For each numerical data column, make one or two plots of the data, using the graph(s) that best represent the data. Post the code on GitHub and the graphs on your webpage. Write a paragraph summarizing the main features of the data distributions. If you say the graph exhibits skewness or kurtosis, this should be confirmed with quantitative calculations (include in GitHub code).
  5. Thursday 25 October 15 November: Create a bar chart or a pareto chart of your qualitative data column. Upload your code to GitHub, and post the graph on your webpage, along with a short paragraph describing what you notice in the graph. Submit something on Blackboard (either the chart, or a comment that you did it), so that I know to check your GitHub and webpage.
  6. Thursday 1 November: Compute the confidence intervals for the means of the three numerical data columns. You may choose the confidence level. Upload your code to GitHub, and post the confidence intervals on your webpage. Submit something on Blackboard (either the intervals, or a comment that you did it), so that I know to check your GitHub and webpage.
  7. Thursday 15 29 November: Decide on and test 3 hypotheses about your data. You can choose the alpha. Upload your code to GitHub. Write a paragraph describing what you tested and summarizing the results of your tests on your website. Submit something on Blackboard (either the intervals, or a comment that you did it), so that I know to check your GitHub and webpage.
  8. Thursday 29 November: Perform a statistical analysis of your choice on your data. This can be computing a regression line, computing one of the previous analyses on a subset of the data, an analysis of variance (ANOVA) test, or any other statistical test. Write a paragraph briefly describing the analysis and the results on your website and upload the code to GitHub. Submit a sentence on Blackboard describing the analysis.
  9. Thursday 13 December: Complete project due. You may include up to three additional analyses for extra credit. For each analysis, write a paragraph briefly describing the analysis and the results on your website and upload the code to GitHub. If you made a bar or Pareto chart for Milestone 5, this will count as one of the three analyses. On Blackboard, submit a list of the additional analyses.