The rapidly expanding collection of massive amounts of data is leading to transformations across broad segments of industry, science, and society. These changes have sparked great demand for individuals with skills in managing and analyzing complex data sets. Such skills are interdisciplinary, involving ideas typically associated with computing, information processing, mathematics, and statistics as well as the development of new methodologies spanning these fields. Our major in Data Science (offered jointly with the Dietrich School of Arts & Sciences Departments of Mathematics and Statistics) will enable students to participate in this data revolution.
This undergraduate major allows students to gain critical skill sets that span key areas of statistics, computing, and mathematics, with foundational training providing literacy in four areas (data, algorithmic, mathematical, and statistical) that every student needs to master data science. Students will develop expertise that connects theory to the solution of real-world problems, and be able to specialize their studies towards a more specific career focuses. Completing this major will prepare students to work as a data science professional or to pursue graduate study in a direction involving data in a significant way.
Major Requirements
Foundational Skills - 31 credits
The foundational courses provide students with fundamental knowledge across four "literacies": data, algorithmic, mathematical, and statistical. Courses in this area will help students develop baseline computational capabilities, will teach students to think about data in a statistical framework, and will introduce students to fundamental mathematical concepts arising in data analysis. These courses are drawn from three main disciplines (CS/IS, Math, and Statistics) and include an introductory course in the fundamental skills of working with data (Python/R programming, exploratory data analysis, data visualization):
- STAT 1061 - DATA SCIENCE FOUNDATIONS
- CS 0401 - INTERMEDIATE PROGRAMMING USING JAVA
- CS 0445 - ALGORITHMS AND DATA STRUCTURES 1
- MATH 0220 - ANALYTIC GEOMETRY AND CALCULUS 1
- MATH 0230 - ANALYTIC GEOMETRY AND CALCULUS 2
- MATH 0280 - INTRO TO MATRICES & LINEAR ALG /MATH 1180 - LINEAR ALGEBRA 1
- MATH 0480 - APPLIED DISCRETE MATHEMATICS /CS 0441 - DISCRETE STRUCTURES FOR CS
- STAT 1151 - INTRODUCTION TO PROBABILITY /STAT 1631 - INTERMEDIATE PROBABILITY
- STAT 1152 - INTRODUCTION TO MATHEMATICAL STATISTICS /STAT 1632
- Notes:
- Mathematically oriented students should take MATH 1180 rather than MATH 0280 .
- Mathematically advanced students may replace STAT 1151 /STAT 1152 with STAT 1631 /STAT 1632 , in consultation with their advisor.
- Students wishing to pursue graduate studies or mathematical directions related to data science are advised to also take MATH 0240
Expertise - 18 credits
This is where students become data scientists, integrating skills from the foundational areas to develop expertise in the realm of data. Skills will be developed in the description and analysis of data in terms of sources of variability and key relationships, the development of algorithms and data handling skills to extract and interpret information from complex data sets, as well as in the visualization and communication of results. The critical issue of the ethical use of data will also be addressed in the context of data science.
- STAT 1261 - PRINCIPLES OF DATA SCIENCE
- STAT 1361 - STATISTICAL LEARNING AND DATA SCIENCE
- OR
- CS 1675 - INTRODUCTION TO MACHINE LEARNING
- CS 1501 - ALGORITHMS AND DATA STRUCTURES 2
- CS 1656 - INTRODUCTION TO DATA SCIENCE
- MATH 1101 - AN INTRODUCTION TO OPTIMIZATION
- CS 0590 - SOCIAL IMPLICATIONS OF COMPUTING TECHNOLOGY
Specializations
Students within the data science major will have the opportunity to pursue an area of specialization through the selection of elective courses in a targeted direction relating to data analytics, computer systems, modeling, or data science in context. While selecting all 3 courses from the same category is advised for students seeking a focus, students may also choose courses across categories to suit their interests, if they prefer that approach. The specialization course groupings are as follows.
- Computer Systems: Students pursuing this specialization will gain depth of knowledge in the development, deployment, and analysis of the complex computer and information systems necessary for tackling large-scale data science problems.
- Data Analytics: Students pursuing a data analytics specialization will enhance their ability to make sound inferences and decisions using the science and art of learning from data: specifically, the design, collection, analysis, and interpretation of data in an uncertain world, and the communication of findings.
- Data Science in Context: Students pursuing this specialization will gain depth of knowledge in both the technical and organizational aspects of the management, curation, description, preservation, and application of digital datasets of varying sizes in specific business, professional, or scientific contexts. We expect the collection of courses within the specialization to expand as more domain-specific data science courses begin to be offered across campus.
- Modeling: Students pursuing a modeling specialization will enhance their ability to develop and harness theoretical tools to characterize structure within data and to represent and analyze processes that may underlie this structure.
Capstone - 3 credits
Data science is a hands-on field. Comprehensive training in data science requires substantive experience working on a problem outside of the realm of usual classroom experiences, with the complications of messy data, ambiguity, and lack of clear structure that characterize "real-world" scenarios. This experience should include work with others with diverse skill sets as well as communication with non-specialists. The capstone
course will provide students with such an experience. In the short term, the capstone course requirements can be fulfilled through completion of CMPINF 1981 - PROJECT STUDIO
, MATH 1103 - MATHEMATICAL PROBLEMS IN BUSINESS, INDUSTRY, AND GOVERNMENT
, or STAT 1961 (Statistical Data Science in Action). In this case, students will be advised to select a capstone course based on their specializations but ultimately will be able to choose from among the full list of capstone options. In addition, the capstone requirement may be satisfied via a faculty-guided research project that is relevant to data science, subject to approval by the Data Science program director(s). After we have more experience in integrating and coordinating our courses across the three units, we will consider a unified cross-listed capstone course if that is deemed more desirable.
For full major requirement details, visit the Data Science course catalog.