yale | Yale University FES 611a - Fall 2018 - Syllabus

Data Science for Social Research: An Introduction

Dr. Justin Farrell

Wednesdays 4:00-6:50pm ~ Kroon G01
Office Hours: By appointment
TA: Elle Brunsdale (elle.brunsdale[at]yale.edu)
TA: Zihan Zhuo (zihan.zhuo[at]yale.edu)
Open Office Hours for Data/Project Help: Wednesdays 1-3pm (Kroon 3rd floor)

BOOT CAMPS: [Schedule] [Completion Form]
DISCUSSION LEADER: [Sign-up]

Overview

This seminar provides an introduction to a rapidly growing and promising area of social scientific research that has accompanied the explosion of data in our digital age, as nearly every aspect of life is now connected (e.g., mobile phones, smart devices, social media) and digitized (book archives, government records, websites, communication). Students are introduced to various techniques and software for collecting, cleaning, and analyzing data at large scales, especially text data (e.g., machine learning, topic modeling, location extraction, semantic networks). Strong emphasis is placed on integrating these methods into actual research, in hopes of moving new or ongoing student papers toward publication. The course is in a seminar format, with a focus on reading and discussing cutting-edge research, as well as interacting with invited guests from industry and academia. An overarching goal of the course is to incubate and launch new interdisciplinary collaborative projects at Yale that integrate data science techniques to solve important problems.

Learning Objectives

Course Requirements and Grading

1. Discussion Questions: (1) After completing the readings for each week, every student is required to submit three thoughtful discussion questions. These can be critical, clarifying, methodological, etc. I will review the questions with the TA and use these questions to stimulate discussion in conjunction with the person who wrote the question. (2) For the weeks we have an outside speaker, students will prepare two questions to ask our speaker. All questions are due to the TA by noon on Wednesday. Late discussion questions will not be accepted and students will receive zero credit for that week. [20% of grade]

2. Team Presentation & Discussion Leader: One time during the semester each student will work collaboratively as part of a small team to present on the readings and serve as discussion leaders. These presentations should (1) Introduce and summarize each reading, (2) Develop discussion points to stimulate class dialogue and debate, (3) Link readings with such things as: student research projects, ideas for creative data collection, important policy questions, theoretical puzzles, or current events. So while presenters should summarize the main points of the reading, it is more important that they also raise interesting questions to the class, provide broader context for the readings, present puzzles, engage the broader themes of the course. Lastly, students should find and present two concrete examples of published research (e.g. academic; reputable news media; or firms like Google, etc.) relative to the topic for that week. [40% of grade]

3. Class Participation: Because this is a seminar, and there is no final project, students are strongly encouraged to attend all class sessions, and be fully engaged as we discuss the readings and engage with outside speakers. Class participation will be enhanced if students are able to integrate concrete research projects into our discussions, and to pose new ideas that might foster collaboration and new ventures. [20% of grade]

4. Data Skills Boot Camps (4 required): Because there are vastly different skill levels and different areas of interest among students in this course, we will rely on boot camps to facilitate learning concrete skills relative to data science. Each student is required to take 4 boot camps of their choosing (most are between 1-3 hours). I will provide a detailed schedule of available data skills boot camps in class and online. [20% of grade]

Persons with Disabilities

Your experience in this class is important to me. If you have already established accommodations with the Resource Office on Disabilities, please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. If you have not yet established services through ROD, but have a temporary health condition or permanent disability that requires accommodations (conditions include but are not limited to: mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact ROD at 203-432-2324 to make an appointment.

Schedule

1. Course Introduction and Discussion (8/29)
2. Large-Scale Data Science: Promises and Pitfalls (9/5)
3. Data Skills Boot Camp (9/12)
4. Monsoon of Data: Creativity & Opportunity (9/19)
5. Automated Text Analysis & NLP (9/26)
6. Data Skills Boot Camp (10/3)
7. Data Skills Boot Camp (10/10)
8. Fall Break (10/17)
9. Social Network Analysis (10/24)
10. Data Skills Boot Camp (10/31)
11. Geospatial (11/7)
12. Data Skills Boot Camp (11/14)
13. Thanksgiving Break (11/21)
14. Visualization (11/28)
15. Blockchain Technology (12/5)

Readings

Course Introduction and Discussion (8/29)

Course introduction. What this course is, and what it is not.

Large-Scale Data Science: Promises and Pitfalls (9/5)

What is computational social science? “Big Data” hubris. Overview of types of data and analyses.

Data Skills Boot Camp (9/12)

Monsoon of Data: Creativity & Opportunity (9/19)

Sources of data. Digital trace data. Webscraping. Digital archives. Social Media.

Automated Text Analysis & NLP (9/26)

Introduction to automated text analysis, machine learning, natural language processing.

Data Skills Boot Camp (10/3)

* Which Program?: R, Python, GUIs. Picking a program that works best for your needs.*

Data Skills Boot Camp (10/10)

Ethics. Businesses and data. “Total institutions.” Professional options for Data Science

Fall Break (10/17)

Social Network Analysis (10/24)

Connections between everything and everyone (and how to harness them)

Data Skills Boot Camp (10/31)

Invaluable tools for research and writing: Establish an efficient software workflow.

Geospatial (11/7)

Spatial data science for politics and social problems.

Data Skills Boot Camp (11/14)

Thanksgiving Break (10/31)

Visualization with R and Tableau (11/28) - Meet @ CSSSI in Kline Tower (computer room C27)

Data exploration. Revealing new findings. Communicating creatively. Networks and Texts.

Blockchain Technology and Looking Toward the Future (12/5)

Understanding the blockchain. And, looking toward the future.