DS 440: Data Sciences Capstone Course
Overview
This course provides a data sciences problem-solving experience, addressing realistic data science dilemmas for which solutions require teamwork and collaboration.
Logistics
- Time: Tuesday/Thursday 04:35 - 05:50PM
- Location: Willard Bldg 373
- Course Website: https://jinghuichen.github.io/DS440-25Spring/
- Canvas: https://psu.instructure.com/courses/2374580
Instructor Team
- Instructor: Jinghui Chen
-
Office hours: by appointment (email instructor to setup time)
- TA: Tianrong Zhang (Email: tbz5156@psu.edu)
- Office hours: Mon 1:30-2:30pm @Westgate E301
Course Objectives
- Learn basic techniques for developing a research problem with a client including: formulating an initial problem informed by available data and client goals, developing an analysis plan, executing the plan, and communicating results.
- Identify fruitful and fruitless project directions and adjust accordingly.
- Learn how to communicate effectively with your client
Course Materials
- No textbook is required for this course. However, the following (free online) textbooks may be helpful
- Deep Learning (Ian Goodfellow, Yoshua Bengio, Aaron Courville)
- The Elements of Statistical Learning - Data Mining, Inference, and Prediction, Second Edition (Trevor Hastie, Robert Tibshirani, Jerome Friedman)
- Pattern Recognition and Machine Learning (Christopher Bishop)
Capstone Project
Students (working in teams) will have 1 semester to pick, pitch, plan, perform, and present a project of their choosing. This course will guide students through this process. Still, students are expected to be self-motivated and take an active role in pushing their projects forward. Students who are expecting that instructors will “tell them what to do next” will fail this course. It’s one thing to request feedback, it’s another to wait for instruction. The former is welcomed, and the latter is not tolerated.
Students are not completely on their own. Throughout the semester there will be small assignments and regular check-ins with instructors to help them manage their time.
- Weekly Discussion: Each week teams will meet with instructors and report on progress, pitfalls, and plans. Students are expected to work as a team but also highlight efforts and value added by each team member as an individual. These meetings will be short, 10 min each typically, so be concise and come prepared. Students who do not prepare for these meetings will struggle in the course.
- Progress Reports: Approximately every other week students will prepare progress reports. These written reports should be about 2-3 pages in length. Most progress reports will be completed as teams, some will be completed as individuals. Most progress reports will be in response to a prompt which is designed to help students plan.
- Team Meetings: Teams should meet weekly on their own. Each team member should keep notes of meetings. Meetings should start with a review of work completed and then decide on the next steps for the project including explicitly deligating which team member will accomplish which task. Students may be asked to describe these meetings so keep good notes.
- Final Reports and Presentations: The final project in this class is a report (in the form of a journal/conference article) describing the project. Students will also be asked to present their report orally to the class. Both the presentation and the report are to be completed as a team.
Teams: Students may choose their own teams (typically 3-4 students per team). Those students without a team will be randomly paired.
Topics: In general, students can pick whatever topics related to data science they are interested in. We encourage students to pick “fun and doable” project topics. But please note that DS440 is a capstone project course, thus we have a higher standard than a typical ML/DS course project. Simply applying an existing algorithm on some dataset (e.g., predicting the house price using linear regression) is not satisfiable.
Your project should aim to either:
- solve a new/realistic data science problem with no current solutions.
- solve an existing data science problem that the current solutions are not good enough.
- advance our understanding on existing data science concepts, or create new/improved benchmarks for measuring the success of data science tasks.
And you need to convince the client that your topic is meaningful:
- if you solve a new problem, why is it new? why the current strategies don’t work? why your idea might work?
- if you solve an existing problem with a new solution, how do previous solutions perform? Why do they not work well? why your idea might improve upon them?
- if you want to advance our understandings on certain concepts, what is the current mainstream opinion towards this concept? Any initial evidence that our current understanding is wrong or not comprehensive?
- if you want to create a new benchmark for a certain task, why measuring the success of this task is hard in the current stage? Why existing strategies cannot well measure the success of this task?
Hints:
- Attendance is mandatory at all times. Failure to show up in classes/discussions/meetings will almost certainly end with a failing grade in the course unless there are truly extenuating circumstances.
- Students should come prepared for each and every interaction with instructors. It is your chance to seek suggestions and feedback. Students should also be prepared for questions about their own projects during weekly discussions.
- Projects do not always need a positive result to be successful. A project that well demonstrates a negative result can still be informative, (e.g., xxx cannot be used to do xxx), if you can clearly illustrate the reason with detailed evidence.
- Team members should endeavor to “make each other look good” in the eyes of the client (i.e., instructor). However, if there is a substantial or intractable personnel issue, this should be brought to the attention of the instructors at the early stage of the project. Teamwork will be evaluated through peer evaluations.
Workload Expectation
Consistent with University policies for 3 credit hours, this course requires about 9 hours per week by each person outside of scheduled class times. Please plan accordingly. It is critical that you establish regular times when your team can meet outside of class, since many activities are team-based. Even when things do not work, you can get credit if you can document the effort you put in and it seems reasonable that that time was used effectively. So, please keep documentation of the time you put into your project. This is good practice for the future.
Note: This course expects that deliverables are provided on time and completely. “Not having enough time” is not acceptable and can result in failure in the course. If students find they don’t have enough time then it is a failure of project planning.
Late Submission Policy
- All assignments are due on the due date at 11:59 pm (EST).
- Students can submit late with the penalty of 10% deduction for every 24 hours late (up to 4 days).
- After 4 days, no more late submission is allowed.
- Extensions can be granted for special cases (email the instructor)
Generative AI Policy
-
It is not allowed to use ChatGPT or other GenAI tools for writing any of your assignments. Everything you submit should be written in your own language. You will fail the class if your progress reports or final reports are written by Generative AI tools.
-
However, if your project aims to study GenAI themselves (e.g., studying their certain capabilities) or use generative AI as a tool (e.g., generating data for your project), you are allowed and encouraged to use them in your project.
Grading Policy
Grades will be computed based on the following factors:
- Project Execution (attendance/discussion/..) 30%
- Interim Progress Reports 30%
- Final Project Presentation 20%
- Final Reports 10%
- Teamwork & Peer Review 10%
Final grade cutoff:
- A [93%, 100%]
- A- [90%, 93%)
- B+ [87%, 90%)
- B [83%, 87%)
- B- [80%, 83%)
- C+ [77%, 80%)
- C [70%, 77%)
- D [60%, 70%)
- F [0%, 60%)
Tentative Schedule
Progress Report are typically Due by 11:59pm on Friday of that week indicated. Exact deadlines are indicated on corresponding assignments page.
# | Date | Topics | Due Dates |
---|---|---|---|
1 | 01/14/25 | Course Introduction | |
2 | 01/16/25 | Advices on Selecting Topics | |
3 | 01/21/25 | Topic Disscussion 1 | |
4 | 01/23/25 | Topic Disscussion 2 | |
5 | 01/28/25 | Week 3 Discussion 1 | |
6 | 01/30/25 | Week 3 Discussion 2 | Progress Report 1 Due |
7 | 02/04/25 | Week 4 Discussion 1 | |
8 | 02/06/25 | Week 4 Discussion 2 | |
9 | 02/11/25 | Week 5 Discussion 1 | |
10 | 02/13/25 | Week 5 Discussion 2 | Progress Report 2 Due |
11 | 02/18/25 | Week 6 Discussion 1 | |
12 | 02/20/25 | Week 6 Discussion 2 | Peer Review 1 Due |
13 | 02/25/25 | Week 7 Discussion 1 | |
14 | 02/27/25 | Week 7 Discussion 2 | Progress Report 3 Due |
15 | 03/04/25 | Week 8 Discussion 1 | |
16 | 03/06/25 | Week 8 Discussion 2 | |
– | 03/11/25 | Spring Break | |
– | 03/13/25 | Spring Break | |
17 | 03/18/25 | Week 10 Discussion 1 | |
18 | 03/20/25 | Week 10 Discussion 2 | Progress Report 4 Due |
19 | 03/25/25 | Week 11 Discussion 1 | |
20 | 03/27/25 | Week 11 Discussion 2 | Peer Review 2 Due |
21 | 04/01/25 | Week 12 Discussion 1 | |
22 | 04/03/25 | Week 12 Discussion 2 | Progress Report 5 Due |
23 | 04/08/25 | Week 13 Discussion 1 | |
24 | 04/10/25 | Week 13 Discussion 2 | |
25 | 04/15/25 | Week 14 Discussion 1 | |
26 | 04/17/25 | Week 14 Discussion 2 | Progress Report 6 Due |
27 | 04/22/25 | Presentation/Slide Review 1 | |
28 | 04/24/25 | Presentation/Slide Review 2 | |
29 | 04/29/25 | Project Presentation 1 | |
30 | 05/01/25 | Project Presentation 2 | Peer Review 3 Due |
- | 05/05/25 | Final Project Report Due |
The instructor reserves the right to make any changes.
ACADEMIC INTEGRITY STATEMENT
Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.
Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.
DISABILITY ACCOMMODATION STATEMENT
Penn State welcomes students with disabilities into the University’s educational programs. Every Penn State campus has an office for students with disabilities. Student Disability Resources (SDR) website provides contact information for every Penn State campus (http://equity.psu.edu/sdr/disability-coordinator). For further information, please visit the Student Disability Resources website (http://equity.psu.edu/sdr/).
In order to receive consideration for reasonable accommodations, you must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake interview, and provide documentation: See documentation guidelines (http://equity.psu.edu/sdr/guidelines). If the documentation supports your request for reasonable accommodations, your campus disability services office will provide you with an accommodation letter. Please share this letter with your instructors and discuss the accommodations with them as early as possible. You must follow this process for every semester that you request accommodations.
COUNSELING AND PSYCHOLOGICAL SERVICES STATEMENT
Many students at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional well-being. The university offers a variety of confidential services to help you through difficult times, including individual and group counseling, crisis intervention, consultations, online chats, and mental health screenings. These services are provided by staff who welcome all students and embrace a philosophy respectful of clients’ cultural and religious backgrounds, and sensitive to differences in race, ability, gender identity and sexual orientation.
Counseling and Psychological Services at University Park (CAPS) (http://studentaffairs.psu.edu/counseling/): 814-863-0395
Counseling and Psychological Services at Commonwealth Campuses (https://senate.psu.edu/faculty/counseling-services-at-commonwealth-campuses/)
Penn State Crisis Line (24 hours/7 days/week): 877-229-6400 Crisis Text Line (24 hours/7 days/week): Text LIONS to 741741
EDUCATIONAL EQUITY/REPORT BIAS STATEMENTS
Consistent with University Policy AD29, students who believe they have experienced or observed a hate crime, an act of intolerance, discrimination, or harassment that occurs at Penn State are urged to report these incidents as outlined on the University’s Report Bias webpage (http://equity.psu.edu/reportbias/)