| Yale University FES 611a - Fall 2018 - Syllabus
Data Science for Social Research: An Introduction
Dr. Justin Farrell
Wednesdays 4:00-6:50pm ~ Kroon G01
Office Hours: By appointment
TA: Elle Brunsdale (elle.brunsdale[at]yale.edu)
TA: Zihan Zhuo (zihan.zhuo[at]yale.edu)
Open Office Hours for Data/Project Help: Wednesdays 1-3pm (Kroon 3rd floor)
BOOT CAMPS: [Schedule] [Completion Form]
DISCUSSION LEADER: [Sign-up]
This seminar provides an introduction to a rapidly growing and promising area of social scientific research that has accompanied the explosion of data in our digital age, as nearly every aspect of life is now connected (e.g., mobile phones, smart devices, social media) and digitized (book archives, government records, websites, communication). Students are introduced to various techniques and software for collecting, cleaning, and analyzing data at large scales, especially text data (e.g., machine learning, topic modeling, location extraction, semantic networks). Strong emphasis is placed on integrating these methods into actual research, in hopes of moving new or ongoing student papers toward publication. The course is in a seminar format, with a focus on reading and discussing cutting-edge research, as well as interacting with invited guests from industry and academia. An overarching goal of the course is to incubate and launch new interdisciplinary collaborative projects at Yale that integrate data science techniques to solve important problems.
- Gain a comprehensive understanding of data science as an interdisciplinary field.
- Be able to creatively apply digital data to answer real-world puzzles.
- Benefit from the seminar and project-oriented format of this course by launching potential collaborations with other students and faculty.
- Build computational skills pertinent to specific research questions.
Course Requirements and Grading
1. Discussion Questions: (1) After completing the readings for each week, every student is required to submit three thoughtful discussion questions. These can be critical, clarifying, methodological, etc. I will review the questions with the TA and use these questions to stimulate discussion in conjunction with the person who wrote the question. (2) For the weeks we have an outside speaker, students will prepare two questions to ask our speaker. All questions are due to the TA by noon on Wednesday. Late discussion questions will not be accepted and students will receive zero credit for that week. [20% of grade]
2. Team Presentation & Discussion Leader: One time during the semester each student will work collaboratively as part of a small team to present on the readings and serve as discussion leaders. These presentations should (1) Introduce and summarize each reading, (2) Develop discussion points to stimulate class dialogue and debate, (3) Link readings with such things as: student research projects, ideas for creative data collection, important policy questions, theoretical puzzles, or current events. So while presenters should summarize the main points of the reading, it is more important that they also raise interesting questions to the class, provide broader context for the readings, present puzzles, engage the broader themes of the course. Lastly, students should find and present two concrete examples of published research (e.g. academic; reputable news media; or firms like Google, etc.) relative to the topic for that week. [40% of grade]
3. Class Participation: Because this is a seminar, and there is no final project, students are strongly encouraged to attend all class sessions, and be fully engaged as we discuss the readings and engage with outside speakers. Class participation will be enhanced if students are able to integrate concrete research projects into our discussions, and to pose new ideas that might foster collaboration and new ventures. [20% of grade]
4. Data Skills Boot Camps (4 required): Because there are vastly different skill levels and different areas of interest among students in this course, we will rely on boot camps to facilitate learning concrete skills relative to data science. Each student is required to take 4 boot camps of their choosing (most are between 1-3 hours). I will provide a detailed schedule of available data skills boot camps in class and online. [20% of grade]
Persons with Disabilities
Your experience in this class is important to me. If you have already established accommodations with the Resource Office on Disabilities, please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. If you have not yet established services through ROD, but have a temporary health condition or permanent disability that requires accommodations (conditions include but are not limited to: mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact ROD at 203-432-2324 to make an appointment.
1. Course Introduction and Discussion (8/29)
2. Large-Scale Data Science: Promises and Pitfalls (9/5)
3. Data Skills Boot Camp (9/12)
4. Monsoon of Data: Creativity & Opportunity (9/19)
5. Automated Text Analysis & NLP (9/26)
6. Data Skills Boot Camp (10/3)
7. Data Skills Boot Camp (10/10)
8. Fall Break (10/17)
9. Social Network Analysis (10/24)
10. Data Skills Boot Camp (10/31)
11. Geospatial (11/7)
12. Data Skills Boot Camp (11/14)
13. Thanksgiving Break (11/21)
14. Visualization (11/28)
15. Blockchain Technology (12/5)
Course Introduction and Discussion (8/29)
Course introduction. What this course is, and what it is not.
Large-Scale Data Science: Promises and Pitfalls (9/5)
What is computational social science? “Big Data” hubris. Overview of types of data and analyses.
- Lazer et al. 2009. “Computational Social Science.” Science.
- Video: Gary King Discusses Big Data. [Link]
- Video: Matt Salganik “Introduction to Computational Social Science” from the 2018 Summer Institute in Computational Social Science [Link]
- Farrell, Justin. 2016. “Corporate funding and ideological polarization about climate change.” Proceedings of the National Academy of Sciences. 113(1),92-97.
Data Skills Boot Camp (9/12)
- Salganik, Matthew. 2017. “Introduction” Bit by Bit: Social Research in the Digital Age. Princeton University Press. [Link]
- Watts, Duncan J. 2013. “Computational Social Science: Exciting Progress and Future Directions.” National Academy of Engineering.
Monsoon of Data: Creativity & Opportunity (9/19)
Sources of data. Digital trace data. Webscraping. Digital archives. Social Media.
- King, Gary. 2011. “Ensuring the Data-Rich Future of the Social Sciences.” Science.
- Bail, Chris. What is Digital Trace Data? [Link]
- Bail, Chris. Strengths and weaknesses of digital trace data? from the 2018 Summer Institute in Computational Social Science [Link]
- Peruse the brand new (and fantastic!) Google Dataset Search [Link]
- Application Programming Interfaces (API) in R [Link] If you are new to R, do your best to follow along.
- Think creatively, and outside of the box, about different types of data, starting with some of these:
- JSTOR Text Mining [Link 1] [Link 2] – Chris Bail’s list of text sources [Link] – Light pollution as data? [Link] – Regulations.gov (e.g. 1.7 million Public Comment Letters for National Monuments) [Link] – HathiTrust Digital Library [Link]– TripAdvisor Hotel Reviews [Link] – Google Books ngrams [Link] – Google Search Trends [Link]
- Outside Guest: Dr. Jennifer Marlon - Yale FES
Automated Text Analysis & NLP (9/26)
Introduction to automated text analysis, machine learning, natural language processing.
- Grimmer, Justin and Brandon Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis.
- Dictionary-Based Text Analysis in R. Chris Bail. [Link]
- Topic Modeling. Chris Bail. [Link]
- Farrell, Justin. 2015. pages 196-216 of Chapter 4 “Between Good and Evil” in The Battle for Yellowstone. Princeton University Press.
- Peruse this NLP resource put together Yale Computer Science Department, Professor Dragomir Radev, and the LILY Lab: [Link]
- Outside Guest: Fred Raucci - Unilever, Manager Supply Chain Analytics
Data Skills Boot Camp (10/3)
* Which Program?: R, Python, GUIs. Picking a program that works best for your needs.*
- Willems, Karlijn. “Choosing R or Python for data analysis? An infographic” [Link]
- “Why R?” University of Chicago - Computing for the Social Sciences. [Link]
- Great interactive tools for beginning R: Try R [Link] – Rstudio ‘learnr’ [Link]
- Options using Graphical User Interfaces are improving. For each one of these, read/watch an overview, and consider their strengths and weaknesses:
Data Skills Boot Camp (10/10)
Ethics. Businesses and data. “Total institutions.” Professional options for Data Science
- Jasny, B.R. et al. 2017. “Fostering reproducibility in industry-academia research.” Science.
- Corey, Michael. “A sociologist working at facebook” [Link]
- Salganik, Matt. “Ethics: Principles-Based Approach.” [Link]
- Baum, Matthew and David Lazer. “Google and Facebook aren’t fighting fake news with the right weapons.” Los Angeles Times, Op-ed, 2017. [Link]
Fall Break (10/17)
Social Network Analysis (10/24)
Connections between everything and everyone (and how to harness them)
- Christakis, Nicholas & James Fowler. 2009. Connected: The Surprising Power of Social Networks and How They Shape Our Lives. Chapter 1.
- Video: “Charting Culture” from Nature journal. [Link]
- Easley, David and Jon Kleinberg. 2010. Networks, Crowds, and Markets. Chapter 2.
- Borgatti, Stephen P., and Daniel S. Halgin. 2011. “Analyzing Affiliation Networks.” The Sage handbook of social network analysis. pgs 417–33.
- Farrell, Justin. 2016. “Network Structure and Influence of the Climate Change Counter-Movement” Nature Climate Change
- Just for fun: Game of Thrones social network
- Outside Guest: John Brandt - Data Driven Yale
Data Skills Boot Camp (10/31)
Invaluable tools for research and writing: Establish an efficient software workflow.
- Healy, Kieran. “The Plain Person’s Guide to Plain Text Social Science” [Link]
Spatial data science for politics and social problems.
- Elwood, S., Goodchild, M. F., & Sui, D. Z. (2012). Researching volunteered geographic information: Spatial data, geographic research, and new social practice. Annals of the association of American geographers, 102(3), 571-590.
- Dunn, C. E. (2007). Participatory GIS—a people’s GIS? Progress in human geography, 31(5), 616-637.
- Outside Guest: Joshua Dull - Digital Scholarship Support Specialist, Yale University Library, Overview and tutorial on Text Mining, 4-5pm
- Outside Guest: Charlie Bettigole - UHPSI at Yale FES - Geospatial Research
Data Skills Boot Camp (11/14)
Thanksgiving Break (10/31)
Visualization with R and Tableau (11/28) - Meet @ CSSSI in Kline Tower (computer room C27)
Data exploration. Revealing new findings. Communicating creatively. Networks and Texts.
- Healy, Kieran and James Moody. 2014. “Data Visualization in Sociology” Annual Review of Sociology 40:105-28.
- Peruse Kieran Healy’s draft of online book. Includes R tutorials. [Link]
- Wagenmakers, Eric-Jan and Quentin F. Gronau. “A Compendium of Clean Graphs in R” [Link]
- “Top Ten Dos and Dont’s for Charts and Graphs” University of Arkansas Library. [Link]
- Look through all of these creative network visualizations (and bring a few favorites to discuss!): [Link]
Blockchain Technology and Looking Toward the Future (12/5)
Understanding the blockchain. And, looking toward the future.
- Overview: (1) 2018. “What is Blockchain” (6 min video), Center for International Governance Innovation, [Link] (2) 2016. “Blockchain 101 - A Visual Demo.” MIT. [Link]
- Chapron, Guillaume. 2017. “The environment needs cryptogovernance.” Nature. [Link]
- Forde, Brian. 2017. “Using Blockchain to Keep Public Data Public.” Harvard Business Review. [Link]
- Bartling, Sonke and Benedikt Fecher. 2016. “Could Blockchain provide the technical fix to solve science’s reproducibility crisis?” LSE Blog. [Link]
- Salganik, Matthew. 2017. “Chapter 7: The Future” Bit by Bit: Social Research in the Digital Age. Princeton University Press. [Link]