SemesterSpring Semester, 2020
DepartmentMA Program of Computer Science, First Year PhD Program of Computer Science, First Year MA Program of Computer Science, Second Year PhD Program of Computer Science, Second Year
Course NameData Science
InstructorCHANG JIA-MING
Credit3.0
Course TypeElective
Prerequisite
Course Objective
Course Description
Course Schedule

Week01 - Sep. 18

Introduction



What is data science? big data? deep learning?

Three components: data, modeling, evaluation

Data science platforms



  • why choose R programming language?

  • integrated development environment for R : RStudio


?Supporting Materials


  1. Chapter 1, 2, appendix A?







    ?Number of hours invested per week = 6 hours


Week02 - Sep. 25

Documentation and deployment of your code



Version control system by Github

?Supporting Materials



  1. Chapter 10, 11





    ?Number of hours invested per week = 6 hours



Week03 - Oct. 02

How to evaluate output?



Specificity, sensitivity, recall, F-score

Receiver operating characteristic curve, AUC

Statistical significance : p-value, false discovery rate

?Supporting Materials



  1. Chapter 5

  2. ROCR package - Visualizing classifier performance in R




    ?Number of hours invested per week = 6 hours


Week04 - Oct. 09

How to perform evaluation?



Cross-validation

Bootstrap and jackknife sampling

Bias, variance, overfitting

?Supporting Materials?



  1. ?Chapter 6.2




    ?Number of hours invested per week = 6 hours


Week05 - Oct. 16

Feature selection/extraction/reduction



?Principal component analysis (PCA), correspondence analysis (CA)

Probabilistic latent semantic analysis



  • maximum likelihood estimation

  • expectation–maximization algorithm


Supporting Materials


  1. A tutorial on principal component analysis by Jonathon Shlens

  2. Correspondence Analysis and Related Methods by Michael Greenacre

  3. Multivariate statistics by Michael Greenacre




    ?Number of hours invested per week = 6 hours


Week06 - Oct. 23

?Exploring/managing data



?Probabilistic and ideal-data models

Character/parsimony-based method
s

?Supporting Materials



  1. Chapter 3, 4?




    ?Number of hours invested per week = 6 hours


Week07 - Oct. 30

Visualization (1/2)



charts, graphs, networks, maps

?Interactive visualizations - Shiny app


Supporting Materials



  1. Simple Graphs with R

  2. Basic Graphs by Quick R




    ?Number of hours invested per week = 6 hours


Week08 - Nov. 06 

Visualization (2/2)



Workflow: scripts

Exploratory Data Analysis

Workflow: projects Data import


Supporting Materials



  • R for Data Science


    1. Cha 6. Workflow: scripts

    2. Cha 7. Exploratory Data Analysis

    3. Cha 8. Workflow: projects

    4. Cha 11. Data import






    ?Number of hours invested per week = 18 hours


Week09 - Nov. 13

Midterm



Closed book except to one A4 notes


 


Week10 - Nov. 20

?Unsupervised learning



Clustering analysis

Association rule


Supporting Materials



  1. Chapter 6, 8




    ?Number of hours invested per week = 6 hours


Week11 - Nov. 27

Supervised learning (1/6)



Memorization methods?

Supporting Materials



  1. Chapter 6




    ?Number of hours invested per week = 6 hours


Week12 - Dec. 04

Supervised learning (2/6)



Linear regression

?Supporting Materials



  1. PSDR: Chapter 7.1

  2. ISLR: Chapter 3




    ?Number of hours invested per week = 6 hours


Week13 - Dec. 11

Supervised learning (3/6)



Logistic regression

?Supporting Materials



  1. PSDR: Chapter 7.2

  2. ISLR: Chapter 4




    ?Number of hours invested per week = 6 hours


Week14 - Dec. 18

Supervised learning (4/6)



Generalized Additive Models

Supporting Materials



  1. PSDR: Chapter 9.1?

  2. ISLR: Chapter 7




    ?Number of hours invested per week = 6 hours


Week15 - Dec. 25

Supervised learning (5/6)



Decision Tree & Random forest

Supporting Materials



  1. PSDR: Chapter 9.1??

  2. ISLR: Chapter 8




    ?Number of hours invested per week = 6 hours


Week16 - Jan. 01 2019 Holiday



 



Week17 - Jan. 08

Supervised learning (6/6)



Kernel Methods

SVM   


Supporting Materials



  1. PSDR: Chapter 9.3, 9.4

  2. ISLR: Chapter 9

  3. Support vector machines and kernel methods: status and challenges by Chih-Jen Lin

  4. Talks about Machine Learning by Chih-Jen Lin




    ?Number of hours invested per week = 24 hours


Week18 - Jan. 15

Final project presentation




Teaching Methods
Teaching Assistant

  • Prepare assignments

  • Grade assignments

  • Maintain content in Moodle

  • Answer students' questions


Requirement/Grading

  • Homework     60%

  • Midterm     15%

  • Final project    25%

  • Attendance/Participation (bonus) ≤ 10%


Textbook & Reference

  • 指定




  1. PSDRPractical Data Science with R. by Zumel, N. & Mount, J.  (Manning, 2014).  ISBN-10: 1617291560

  2. ?ISLRAn Introduction to Statistical Learning with Applications in R? by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

  3. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data, by Hadley Wickham  & Garrett Grolemund (1st Edition) 



 




  • 其他參考資料




  1. ?How to Measure Anything Workbook: Finding the Value of Intangibles in Business

  2. ?Additional material (Credit by Thomas M. Carsey, carsey@unc.edu)

  3. Data Mining with R: Learning with Case Studies, by Torgo, http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/

  4. An Introduction to Data Science, Version 3, by Stanton, http://jsresearch.net/

  5. Machine Learning with R by Lantz, http://www.packtpub.com/machine-learning-with-r/book

  6. A Simple Introduction to Data Science, by Burlingame and Nielsen, http://newstreetcommunications.com/businesstechnical/a_simple_introduction_to_data_science

  7. Ethics of Big Data, by Davis, http://shop.oreilly.com/product/0636920021872.do

  8. Privacy and Big Data, by Craig and Ludloff, http://shop.oreilly.com/product/0636920020103.do

  9. Doing Data Science: Straight Talk from the Frontline, by O’Neil and Schutt, http://shop.oreilly.com/product/0636920028529.do

  10. Springer Textbooks Use R! Series, http://www.springer.com/series/6991

  11. Online search tool Rseek, http://www.rseek.org/

  12. ?The Odum Institute’s online course, http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=670


Urls about Course
http://www.changlabtw.com/1082-datascience.html
Attachment