Tutorial on Annotating for Education Language Data

Welcome to this tutorial on using `edu-convokit <https://github.com/rosewang2008/edu-convokit>`__ for annotating your education language data! Annotation is a critical step in understanding your data, and it is important to do it right & consistently across datasets. edu-convokit is designed to help you do just that.

Annotation is useful because: - It creates descriptive statistics about your data, which can help you understand the data. - It quantifies the language used by your students and educators, which can help you understand the language. - It measures the interaction between the student and the educator, which can help you understand the interaction.

edu-convokit is designed to support these purposes.

πŸ“š Learning Objectives

In this tutorial, you will learn how to use Annotator to annotate your data. Some of the annotations we’ll cover include: - Section Link πŸ”— Talk Time: We will annotate the amount of time the student and educator talk. - Section Link πŸ”— Student Reasoning: We will annotate use of reasoning in the student’s speech. - Section Link πŸ”— Teacher Focusing Questions: We will annotate the use of focusing questions by the educator. - Section Link πŸ”— Conversational Uptake: We will annotate instances of high conversational uptake by the educator.

For other annotations, please refer to the documentation for more information. If you want to add your own annotations, please make a pull request to the repo.

Without further ado, let’s get started!

Installation

First, install edu-convokit:

[ ]:
!pip install git+https://github.com/rosewang2008/edu-convokit.git
Collecting git+https://github.com/rosewang2008/edu-convokit.git
  Cloning https://github.com/rosewang2008/edu-convokit.git to /tmp/pip-req-build-s81zucpt
  Running command git clone --filter=blob:none --quiet https://github.com/rosewang2008/edu-convokit.git /tmp/pip-req-build-s81zucpt
  Resolved https://github.com/rosewang2008/edu-convokit.git to commit 8eb087b51abfa36a7031bf1de4e3dc40d8848186
  Preparing metadata (setup.py) ... done
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.66.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.11.4)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.8.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (2.1.0+cu121)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.35.2)
Collecting clean-text (from edu-convokit==0.0.1)
  Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.6.1)
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.3.2)
Collecting num2words==0.5.10 (from edu-convokit==0.0.1)
  Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.6/101.6 kB 1.9 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.2.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (0.12.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.5.3)
Collecting docopt>=0.6.2 (from num2words==0.5.10->edu-convokit==0.0.1)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting emoji<2.0.0,>=1.0.0 (from clean-text->edu-convokit==0.0.1)
  Downloading emoji-1.7.0.tar.gz (175 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 175.4/175.4 kB 6.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting ftfy<7.0,>=6.0 (from clean-text->edu-convokit==0.0.1)
  Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.4/53.4 kB 4.0 MB/s eta 0:00:00
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim->edu-convokit==0.0.1) (6.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (2.8.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (2023.6.3)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->edu-convokit==0.0.1) (1.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->edu-convokit==0.0.1) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->edu-convokit==0.0.1) (3.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.9)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.10.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.10.13)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (67.7.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.2.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2.1.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.19.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (6.0.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.15.0)
Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.4.1)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy<7.0,>=6.0->clean-text->edu-convokit==0.0.1) (0.2.12)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->edu-convokit==0.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2023.11.17)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy->edu-convokit==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->edu-convokit==0.0.1) (1.3.0)
Building wheels for collected packages: edu-convokit, docopt, emoji
  Building wheel for edu-convokit (setup.py) ... done
  Created wheel for edu-convokit: filename=edu_convokit-0.0.1-py3-none-any.whl size=24897 sha256=75b716d442ceadb81276002313317e704cbbe06fb62edd60a030a7b046e9640e
  Stored in directory: /tmp/pip-ephem-wheel-cache-jx3bvie0/wheels/29/43/ec/d2472df0eb2af8f1e7d67d0710a4b3eb93fe983b15f8d7b841
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=ced3a42c8c907ca91870ff912e53bf52cf37f098a8c290821ce4318edfdb98e9
  Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
  Building wheel for emoji (setup.py) ... done
  Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171033 sha256=3c0f7527379333472f637243ab34d271bb000da7d4ef44284076515a8f8b647e
  Stored in directory: /root/.cache/pip/wheels/31/8a/8c/315c9e5d7773f74b33d5ed33f075b49c6eaeb7cedbb86e2cf8
Successfully built edu-convokit docopt emoji
Installing collected packages: emoji, docopt, num2words, ftfy, clean-text, edu-convokit
Successfully installed clean-text-0.6.0 docopt-0.6.2 edu-convokit-0.0.1 emoji-1.7.0 ftfy-6.1.3 num2words-0.5.10
[ ]:
from edu_convokit.annotation import Annotator

# We're going to standardize the text with TextPreprocessor.
# In this tutorial, we're going to assume you're familiar with TextPreprocessor.
# For the tutorial on TextPreprocessor, see: https://colab.research.google.com/drive/1a-EwYwkNYHSNcNThNTXe6DNpsis0bpQK
from edu_convokit.preprocessors import TextPreprocessor

# For helping us flexibly load data
from edu_convokit import utils
WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.

πŸ“‘ Data

Let’s load the data we’ll be working with. We’re going to be using a transcript from the TalkMoves dataset.

We’re also going to use TextPreprocessor to anonymize and pre-process the data. This is optional, but recommended.

For the tutorial on TextPreprocessor, please refer to this tutorial. Here, we’re going to assume you’re familiar with TextPreprocessor.

[ ]:
!wget "https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves/Boats and Fish 2_Grade 4.xlsx"

data_fname = "Boats and Fish 2_Grade 4.xlsx"
df = utils.load_data(data_fname) # Handles loading data from different file types including: .csv, .xlsx, .json

# We're going to first standardize the text with TextPreprocessor as done from our last tutorial: anonymize and merge utterances from the same speaker
processor = TextPreprocessor()
TEXT_COLUMN = "Sentence"
SPEAKER_COLUMN = "Speaker"
known_names = ["David", "Meredith", "Beth"]
known_replacement_names = [f"[STUDENT_{i}]" for i in range(len(known_names))]

# Anonymize text
df = processor.anonymize_known_names(
    df=df,
    text_column=TEXT_COLUMN,
    names=known_names,
    replacement_names=known_replacement_names,
    target_text_column=TEXT_COLUMN
)

# Anonymize speakers
df = processor.anonymize_known_names(
    df=df,
    text_column=SPEAKER_COLUMN,
    target_text_column=SPEAKER_COLUMN,
    names=known_names,
    replacement_names=known_replacement_names
)

# Merge utterances
df = processor.merge_utterances_from_same_speaker(
    df=df,
    text_column=TEXT_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    target_text_column=TEXT_COLUMN
)

# Show
df.head()
--2023-12-30 10:29:37--  https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves/Boats%20and%20Fish%202_Grade%204.xlsx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10528 (10K) [application/octet-stream]
Saving to: β€˜Boats and Fish 2_Grade 4.xlsx’

Boats and Fish 2_Gr 100%[===================>]  10.28K  --.-KB/s    in 0.001s

2023-12-30 10:29:38 (14.2 MB/s) - β€˜Boats and Fish 2_Grade 4.xlsx’ saved [10528/10528]

Sentence Speaker
0 I'm wondering which is bigger, one half or two... T
1 Try the purples. Get three purples. It doesn’t... [STUDENT_0]
2 What was it? Two thirds? [STUDENT_1]
3 It would be like brown or something like that. [STUDENT_0]
4 Ok [STUDENT_1]

πŸ“ Annotating Talk Time

Let’s start by annotating the amount of time the student and educator talk. We will define talk time as the number of words in TEXT_COLUMN. However, if you have metadata about the length of the audio, you can also use that to annotate talk time. Please refer to the documentation for more information.

[ ]:
annotator = Annotator()

# The talktime values will be populated in this column
TALK_TIME_COLUMN = "talktime"

df = annotator.get_talktime(
    df=df,
    text_column=TEXT_COLUMN,
    analysis_unit="words",
    output_column=TALK_TIME_COLUMN
)

df.head()
Sentence Speaker talktime
0 I'm wondering which is bigger, one half or two... T 54
1 Try the purples. Get three purples. It doesn’t... [STUDENT_0] 12
2 What was it? Two thirds? [STUDENT_1] 5
3 It would be like brown or something like that. [STUDENT_0] 9
4 Ok [STUDENT_1] 1

πŸŽ‰ We can see with a single function call, we’ve added our first annotation – talktime – to our data!

All the other annotations work in a similar way. Let’s continue!

πŸ“ Annotating Student Reasoning

Next, let’s annotate the student’s reasoning. Under the hood, we’re using a model trained on student’s math reasoning from prior work. So…

πŸ’‘ Note: - This model is trained on math reasoning, so it may not work well on other subjects. - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on student utterances. edu-convokit has a simple way to only annotate student utterances, which we’ll see below.

[ ]:
# The reasoning annotations will be populated in this column
STUDENT_REASONING_COLUMN = "student_reasoning"

df = annotator.get_student_reasoning(
    df=df,
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    output_column=STUDENT_REASONING_COLUMN,
    # Since this model is only trained on _student_ utterances,
    # we can explicitly pass in the speaker names associated to students.
    # It will only annotate utterances from these speakers.
    speaker_value=known_replacement_names,
)

df.head()
WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
    For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
Sentence Speaker talktime student_reasoning
0 I'm wondering which is bigger, one half or two... T 54 NaN
1 Try the purples. Get three purples. It doesn’t... [STUDENT_0] 12 0.0
2 What was it? Two thirds? [STUDENT_1] 5 NaN
3 It would be like brown or something like that. [STUDENT_0] 9 0.0
4 Ok [STUDENT_1] 1 NaN

πŸŽ‰ Great! We’ve added our second annotation – student_reasoning – to our data!

πŸ’‘ Note: - student_reasoning is NaN for the educator’s utterances as desired. - Otherwise, for the students, student_reasoning is either 1.0 or 0.0. 1.0 means the model thinks the student is using reasoning, and 0.0 means the model thinks the student is not using reasoning.

πŸ’‘ Are you wondering whether there’s an easy way to view examples of the student’s reasoning?

edu-convokit has a simple way to do this with our Analyzers. This will be covered in the tutorial on Analyzers: link. For now, let’s continue annotating!

πŸ“ Annotating Teacher Focusing Questions

Let’s annotate the educator’s use of focusing questions. Under the hood, we’re using a model trained on focusing questions in math classrooms from prior work. So…

πŸ’‘ Note: - This model is trained on math classroom data, so it may not work well on other subjects. - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on teacher utterances. edu-convokit has a simple way to only annotate teacher utterances which is similar to the one we saw above for student utterances.

[ ]:
# The focusing questions annotation will be populated in this column
FOCUSING_QUESTIONS_COLUMN = "focusing_questions"

df = annotator.get_focusing_questions(
    df=df,
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    output_column=FOCUSING_QUESTIONS_COLUMN,
    # Since this model is only trained on _teacher_ utterances,
    # we can explicitly pass in the speaker names associated to the teacher.
    speaker_value=['T']
)

df.head()
WARNING:root:Note: This model was trained on teacher focusing questions, so it should be used on teacher utterances.
    For more details on the model, see https://aclanthology.org/2022.bea-1.27.pdf
Sentence Speaker talktime student_reasoning focusing_questions
0 I'm wondering which is bigger, one half or two... T 54 NaN 0.0
1 Try the purples. Get three purples. It doesn’t... [STUDENT_0] 12 0.0 NaN
2 What was it? Two thirds? [STUDENT_1] 5 NaN NaN
3 It would be like brown or something like that. [STUDENT_0] 9 0.0 NaN
4 Ok [STUDENT_1] 1 NaN NaN

πŸŽ‰ Great! We’ve added our third annotation – focusing_questions – to our data!

πŸ’‘ Note: - focusing_questions is NaN for the student utterances. - Similar to before, focusing_questions is either 1.0 or 0.0. 1.0 means the model thinks the educator is using a focusing question, and 0.0 means the model thinks the educator is not using a focusing question.

πŸ“ Annotating Conversational Uptake

Let’s annotate the educator’s conversational uptake of the student. Under the hood, we’re using a model trained from prior work. It measures whether the educator builds on the contribution of the student’s utterance.

So…

πŸ’‘ Note: - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on teacher utterances following student utterances. edu-convokit has a simple way to only annotate these teacher utterances which is similar to the function calls we saw before.

[ ]:
UPTAKE_COLUMN = "uptake"

df = annotator.get_uptake(
    df=df,
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    output_column=UPTAKE_COLUMN,
    # We want to specify the first speaker to be the students.
    speaker1=known_replacement_names,
    # We want to specify the second speaker to be the teacher
    speaker2='T'
)
WARNING:root:Note: This model was trained on teacher's uptake of student's utterances. So, speaker1 should be the student and speaker2 should be the teacher.
    For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: It's recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
[ ]:
df.head(20)
Sentence Speaker talktime student_reasoning focusing_questions uptake
0 I'm wondering which is bigger, one half or two... T 54 NaN 0.0 NaN
1 Try the purples. Get three purples. It doesn’t... [STUDENT_0] 12 0.0 NaN NaN
2 What was it? Two thirds? [STUDENT_1] 5 NaN NaN NaN
3 It would be like brown or something like that. [STUDENT_0] 9 0.0 NaN NaN
4 Ok [STUDENT_1] 1 NaN NaN NaN
5 We’re not doing the one third, we’re doing two... [STUDENT_0] 14 0.0 NaN NaN
6 First we’ve got to find out what a third of it... [STUDENT_1] 18 0.0 NaN NaN
7 One third? [STUDENT_0] 2 NaN NaN NaN
8 What’s third of an orange? Let’s start a diffe... [STUDENT_1] 21 0.0 NaN NaN
9 Alright, yeah, I was thinking of that way before [STUDENT_0] 9 0.0 NaN NaN
10 And you can take the take the red, and the lig... [STUDENT_1] 35 0.0 NaN NaN
11 She asked, which is bigger, one half or two th... [STUDENT_0] 10 0.0 NaN NaN
12 One half or two thirds? Now take six of the ones [STUDENT_1] 11 0.0 NaN NaN
13 Yeah, I know, and put β€˜em up to there, and tha... [STUDENT_0] 30 0.0 NaN NaN
14 Now take six of the ones Which is bigger? T 9 NaN 0.0 1.0
15 One half [STUDENT_2] 2 NaN NaN NaN
16 I think one half is... [STUDENT_0] 5 NaN NaN NaN
17 Yes, [STUDENT_0] and [STUDENT_1]? T 4 NaN 0.0 0.0
18 What do you have? [STUDENT_0] 4 NaN NaN NaN
19 Well [STUDENT_1] and [STUDENT_0] 1 NaN NaN NaN

πŸŽ‰ Great, we finished our last annotation of the tutorial!

With these annotations, we can now do some analysis on our data.

We can save our annotated data to a file which we’ll use in the next tutorial on Analyzers: link.

[ ]:
df.to_csv("annotated_data.csv", index=False)

πŸ“ Conclusion and Where to Go From Here

In this tutorial, we learned how to use Annotator to annotate our data. With one simple function call, we were able to annotate: - Talk Time - Student Reasoning - Teacher Focusing Questions - Conversational Uptake

What are some natural next steps? - You can annotate with other features. Please refer to the documentation for an exhaustive list of features. Or, you can add your own features by making a pull request to the repo. - You can analyze your data with `edu-convokit’s Analyzer <https://colab.research.google.com/drive/1xfrq5Ka3FZH7t9l87u4sa_oMlmMvuTfe>`__. - For a tutorial on Analyzer, please refer to this tutorial. - For the documentation on Analyzer, please refer to this documentation.

If you have any questions, please feel free to reach out to us on `edu-convokit’s GitHub <https://github.com/rosewang2008/edu-convokit>`__.

πŸ‘‹ Happy exploring your data with edu-convokit!

[ ]: