LING 5050: Technical tools for linguists, Maysession 2020 (May 13 - June 8)


Every week day 10:45-12:35 on Zoom
Instructor: Marie-Catherine de Marneffe
Office hours: Email me for an appointment

Description

In current linguistic research, it is often necessary to deal with lots of data (corpus or experimental data). This course offers practical training in standard computational tools for tackling different kinds of data for linguistic research. Students will learn computational techniques to access, search and format linguistic datasets, including text corpora, speech and audio, structured representations and experimental measurements. The course will also cover data exploration and visualization.

Prerequisites

No prerequisite in programming is required: the course will cover introductory scripting in Python, R and Praat. The course is designed to be hands-on, and students will have the opportunity to work on the problem sets during the class sessions.

Course goals, learning objectives/outcomes

  1. Students will gain hands-on experience gathering, formatting, and manipulating data.
  2. Students will learn to use corpus, field, and experimental data, as well as to combine data from multiple sources.
  3. Students will learn to work with existing computational tools.
  4. At the end of the course, students will be able to process massive amounts of linguistic data.
The course is designed to stand alone, but also to provide an introduction to the graduate Computational Linguistics sequence. It is not a prerequisite for the Computational Linguistics courses, but is helpful for students who lack any prior experience with computational tools.

Content topic list

  • Accessing and navigating corpora
  • Linguistic data manipulation and visualization
  • Automatic processing of structured linguistic representations
  • R scripting
  • Praat scripting
  • Syllabus

    The schedule page and this one serve as the syllabus for this course.

    Unit 1: Basic data manipulations Unit 2: Reading text and counting words Unit 3: R and Praat scripting Unit 4: Dealing with linguistic structured representations

    Course requirements

    There will be three assignments to turn in. Each one will require students to write a short program to perform some analysis of a dataset (for instance, assignment 1 is to write a Python program measuring utterance lengths by men and women in a section of the Fisher corpus). Students will work on the assignments both in class and at home, and will be encouraged to work collaboratively in small groups, but everyone has to turn in their own assignment.

    Periodically there will be day to day short assignments strongly recommended, but that do not have to be turned in.

    There will however be three short answers that need to be turned in. The goal of these short answers is to make sure that you are staying caught up with the material.

    The course has been designed to be hands-on and attendance is therefore strongly recommended. Due to the course being online this year, in the event that you cannot connect to the Zoom meeting, the notes are written to allow you to follow on your own. It is however expected that you will spend the normal class time each day on the notes of the day, even if you cannot attend the Zoom meeting!

    A tentative schedule for the month is posted on the schedule page. Readings and assignments may change. Deadlines will be announced in class too and are also on Carmen.

    Grading

    This is a 3-credit course, graded on a letter-grade (A, B, C, D, E) basis. Students are expected to attend class meetings, complete reading and assignments, as well as actively participate in class discussions.

    Homeworks (75%): Three homework assignments will be due by the beginning of class. They will be turned in through Carmen. No late homeworks will be accepted.

    Participation (10%): Attendance in class is strongly recommended. Active participation is required, since it will help you get the materials. Due to the course being online this year, if you cannot connect to the Zoom meeting, you are required to post on Carmen what you did on the day you missed the Zoom meeting and whether you have questions about the material.

    Short answers (15%): Every week, you will reply to a short prompt, reflecting on what you have learned: what you have mastered or are still struggling with. The goal of these short answers is to evaluate your progress and make sure you are keeping up with the materials.

    Grades will be assigned using the standard OSU scale.

    Zoom

    The class sessions will happen every week day on Zoom. The expectation is that you will attend the Zoom meetings. As noted above, if you cannot connect to the meeting, you are nonetheless expected to spend the class time following the notes of the day and turn in a short explanation of what you accomplished that day and possible questions you may have.
    To ensure that the Zoom meetings go as smoothly as possible, we will use the following practices:
  • Given that you will be typing on your computer, you should mute yourselves.
  • Use the "raise hand" option to ask questions, and I will call on you. When it is your turn to speak, unmute yourselves. When you are done talking, mute yourselves again.
  • Given that it is less easy for me to see each of you on Zoom, we will use the yes/no buttons to reply to some of my straightforward questions.
  • At times, I will use the breakroom feature, to make you work in smaller groups.
  • Computers

    The course usually takes place in our Computational Linguistics lab, where we are making sure that all software and packages needed for the course are available. There are detailed notes on Carmen to help you download and install the software on your computer. Email me if you are having issues with any of this *prior* to May 13!

    Website and Carmen

    Materials for each unit will be posted on the website and on Carmen, as will the slides presented in class. Datasets which cannot be made publicly available will be on Carmen. Assignments will need to be turned in through Carmen.

    Note that email from Carmen is sent to your official email address (Name.Number@osu.edu). You should read email sent to your official OSU account on a daily basis.

    Make-up Policy

    If you know you won't be able to make a deadline, talk to me before you miss the deadline! Once the deadline is passed, barring a very good reason, no late homework will be accepted.

    Policy on Academic Misconduct

    It is the responsibility of the Committee on Academic Misconduct to investigate or establish procedures for the investigation of all reported cases of student academic misconduct. The term “academic misconduct” includes all forms of student academic misconduct wherever committed; illustrated by, but not limited to, cases of plagiarism and dishonest practices in connection with examinations. Instructors shall report all instances of alleged academic misconduct to the committee (Faculty Rule 3335-5-487). For additional information, see the Code of Student Conduct (link to the Code).

    Students with Disabilities

    The University strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on your disability (including mental health, chronic or temporary medical conditions), please let me know immediately so that we can privately discuss options. To establish reasonable accommodations, I may request that you register with Student Life Disability Services. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. SLDS contact information: slds@osu.edu; 614-292-3307; slds.osu.edu; 098 Baker Hall, 113 W. 12th Avenue.