CMP 464/CMP 788:
Topics Course: Data Science
A project is required for the course. You are encouraged to work in teams of up to 3 people. For the project, you choose a topic and a question or set of related questions that you would like to address.
The project specifications below are for undergraduates (CMP 464 students). Graduate students (CMP 788 students) must complete these project specifications, as well as either i) use a third data set in their analysis in a meaningful way or (ii) analyze their two data sets in at least two different meaningful ways.
The project is broken down into smaller pieces that must be submitted by the deadlines below. For details of each milestone, see the links. The project is worth 20% of the final grade. The point breakdown is listed in the right hand column.
|Monday, April 3, 11:59pm||Proposal||10
|Wednesday, April 19, 11:59pm||Timeline||10
|Monday, April 24, 11:59pm||Data Collection||20
|Monday, May 1, 11:59pm||Analysis||20
|Monday, May 8, 11:59pm||Visualization||20
|Monday, May 15, 11:59pm||Draft Presentation Slide||10
|Monday, May 15, 11:59pm||Complete Project||75
|Tuesday, May 16, noon||Updated Presentation Slide||10
|Wednesday, May 17, in class||Project Presentations||25
A short statement that includes:
- Other team member names (if any),
- Proposed topics or questions you would like to address,
- Why you are interested in this topic (in your own words-- if you submit the same as another team member, the points will be split between you and them),
- If part of a team, what you plan to contribute to the effort, that is, on what particular aspect of the project you will be take the lead,
- What data sets you plan to use (must use at least 2 distinct data sets per team member in meaningful ways in your project), and
- What techniques and tools you are thinking about using.
Note: teams are not required. Think carefully about team formation since your grade will be an average of all of your efforts, and significantly more overall work is required for team projects.
Your plan of attack to complete this project on time, including what you will have completed by the check-ins for Data Collection, Analysis, and Visualization. You should view the timeline as a contract with specifics of what the "deliverables" are at each milestone.
For the data collection milestone, you must submit:
- a list (with links) of all data sources used,
- for each data source:
- a description of what and why you are using that source,
- how you extracted the data from the source (i.e. detail how you downloaded/scraped the data as well as the file processing needed), and
- any issues you had or forsee with the analysis of the data.
For the analysis milestone, you must submit:
- the results of the initial analysis for each data set, and
- details of at least one data set per group member (that is, each group member submits the details of a different data set used).
For the visualization milestone, you must submit:
- draft images that you plan to use in your project presentation, and
- detailed description of the creation of at least one image per group member (that is, each group member submits the details of a different image from the project).
The presentation on the last day of class has two parts. The first part consists of a "sneak preview" of your project where your group speaks for 90 seconds about what they did. After every group has given their sneak preview, each group will display their project on a lab computer (see below for more details).
For the sneak preview, every group submits 2 slides with
- Slide 1: the front image and title from website, as well as the names of group members
- Slide 2: discoveries & conclusions (with images)
The project must be submitted as a webpage (use google sites or other pre-built if you're not comfortable writing html). The project website must include:
- a front image and title that summarizes your project,
- an overview paragraph on what you did: what was your underlying hypothesis? what data, methods, and tools did you use to test and explore it?
- links to your github repository with your code for the project,
- a team section describing how each person contributed to the project,
- a data section with a paragraph giving details for each data set,
- a techniques section with a paragraph giving details for each technique (process, tool, etc.) used,
- a citations section with links to all data sources, code sources, publications used.
The project presentations are on the last day of class and consist of two parts:
- 90 second sneak preview of your project (slide requirements described above), and
- interactive demo (on lab computers) where each group member explains the overall process of creating the project as well as their individual contributions.
Group work is encouraged. However, groups should accomplish proportionally more than those working indivdually.
Half of the points are awarded for the work-in-process milestones during the semester, and half are awarded for the final project and presentation.