Tutorial on `edu-convokit` for the TalkMoves dataset

Welcome to the tutorial on edu-convokit for the TalkMoves dataset. This tutorial will walk you through the process of using edu-convokit to pre-process, annotate and analyze the TalkMoves dataset.

If you are looking for a tutorial on the individual components of edu-convokit, please refer to the following tutorials to get started: - Text Pre-processing Colab - Annotation Colab - Analysis Colab

This tutorial will use all of the components!

Installation

Let’s start by installing edu-convokit and importing the necessary modules.

[ ]:

!pip install git+https://github.com/rosewang2008/edu-convokit.git

Collecting git+https://github.com/rosewang2008/edu-convokit.git
  Cloning https://github.com/rosewang2008/edu-convokit.git to /tmp/pip-req-build-dgphjpe_
  Running command git clone --filter=blob:none --quiet https://github.com/rosewang2008/edu-convokit.git /tmp/pip-req-build-dgphjpe_
  Resolved https://github.com/rosewang2008/edu-convokit.git to commit 1e094c8836a3e3112cc1f996f5f12aeff013777c
  Preparing metadata (setup.py) ... done
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.66.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.11.4)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.8.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (2.1.0+cu121)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.35.2)
Collecting clean-text (from edu-convokit==0.0.1)
  Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.6.1)
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.3.2)
Collecting num2words==0.5.10 (from edu-convokit==0.0.1)
  Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.6/101.6 kB 1.5 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.2.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (0.12.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.5.3)
Collecting docopt>=0.6.2 (from num2words==0.5.10->edu-convokit==0.0.1)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting emoji<2.0.0,>=1.0.0 (from clean-text->edu-convokit==0.0.1)
  Downloading emoji-1.7.0.tar.gz (175 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 175.4/175.4 kB 5.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting ftfy<7.0,>=6.0 (from clean-text->edu-convokit==0.0.1)
  Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.4/53.4 kB 4.1 MB/s eta 0:00:00
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim->edu-convokit==0.0.1) (6.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (2.8.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (2023.6.3)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->edu-convokit==0.0.1) (1.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->edu-convokit==0.0.1) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->edu-convokit==0.0.1) (3.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.9)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.10.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.10.13)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (67.7.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.2.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2.1.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.19.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (6.0.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.15.0)
Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.4.1)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy<7.0,>=6.0->clean-text->edu-convokit==0.0.1) (0.2.12)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->edu-convokit==0.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2023.11.17)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy->edu-convokit==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->edu-convokit==0.0.1) (1.3.0)
Building wheels for collected packages: edu-convokit, docopt, emoji
  Building wheel for edu-convokit (setup.py) ... done
  Created wheel for edu-convokit: filename=edu_convokit-0.0.1-py3-none-any.whl size=25946 sha256=bacc5ae8cec78f73dd6432b9a641058237be062d59c7dcfcac080e9a19077bf3
  Stored in directory: /tmp/pip-ephem-wheel-cache-a92ctwua/wheels/29/43/ec/d2472df0eb2af8f1e7d67d0710a4b3eb93fe983b15f8d7b841
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=19f3926503485ba42f4fb35754933106263ea928f13b10e358a34f5f263f839a
  Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
  Building wheel for emoji (setup.py) ... done
  Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171033 sha256=0024d11da3567b1c7f328fd06e05831297bd61be31635baec2d057a050286c56
  Stored in directory: /root/.cache/pip/wheels/31/8a/8c/315c9e5d7773f74b33d5ed33f075b49c6eaeb7cedbb86e2cf8
Successfully built edu-convokit docopt emoji
Installing collected packages: emoji, docopt, num2words, ftfy, clean-text, edu-convokit
Successfully installed clean-text-0.6.0 docopt-0.6.2 edu-convokit-0.0.1 emoji-1.7.0 ftfy-6.1.3 num2words-0.5.10

[ ]:

from edu_convokit.preprocessors import TextPreprocessor
from edu_convokit.annotation import Annotator
from edu_convokit.analyzers import (
    QualitativeAnalyzer,
    QuantitativeAnalyzer,
    LexicalAnalyzer,
    TemporalAnalyzer
)
# For helping us load data
from edu_convokit import utils

import os
import tqdm

WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

📑 Data

Let’s download the dataset under raw_data/. Note we’re only download a subsample of the dataset for this tutorial; this cuts down the annotation time. If you would like to annotate the entire dataset, feel free to upload the entire dataset to this Colab!

[ ]:

# We will put the data here:
DATA_DIR = "raw_data"
!mkdir -p $DATA_DIR

# We will put the annotated data here:
ANNOTATIONS_DIR = "annotations"
!mkdir -p $ANNOTATIONS_DIR

# # Download the data
!wget "https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip"

# # Unzip the data
!unzip -n -q talkmoves.zip -d $DATA_DIR

# Data directory is then raw_data/talkmoves
DATA_DIR = "raw_data/talkmoves"

--2023-12-30 11:46:56--  https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 346774 (339K) [application/zip]
Saving to: ‘talkmoves.zip’

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================>] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - ‘talkmoves.zip’ saved [346774/346774]

</pre>

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================>] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - ‘talkmoves.zip’ saved [346774/346774]

end{sphinxVerbatim}

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================>] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - ‘talkmoves.zip’ saved [346774/346774]

[ ]:

# We'll set the important variables specific to this dataset. If you open one of the files, you'll see that the
# speaker and text columns are defined as:
TEXT_COLUMN = "Sentence"
SPEAKER_COLUMN = "Speaker"

# We will also define the annotation columns.
# For the purposes of this tutorial, we will only be using talktime, student_reasoning, and uptake.
TALK_TIME_COLUMN = "talktime"
STUDENT_REASONING_COLUMN = "student_reasoning"
UPTAKE_COLUMN = "uptake"

One thing that will be important is knowing how the teacher/tutor and student are represented in the dataset. Let’s load some examples and see how they are represented.

[ ]:

files = os.listdir(DATA_DIR)
files = [os.path.join(DATA_DIR, f) for f in files if utils.is_valid_file_extension(f)]

df = utils.merge_dataframes_in_list(files)

[ ]:

# Randomly show 10 rows
df.sample(10)

	Unnamed: 0	TimeStamp	Turn	Speaker	Sentence	Teacher Tag	Student Tag
5	NaN	NaN	1.0	T/R1	Do you remember it looks like this.	1 - None	NaN
94	94.0	NaN	49.0	T	How many of you disagree?	3 - Getting Students to Relate	NaN
53	53.0	NaN	NaN	Erik and Brian	Yeah	NaN	2 - Relating to Another Student
135	135.0	NaN	86.0	Mark	If, if the blue was one whole, what would the ...	NaN	3 - Asking for More Information
193	193.0	NaN	85.0	T	Or the people who aren't sure want to tell us...	2 - Keeping Everyone Together	NaN
93	93.0	NaN	17.0	T	Joey?	2 - Keeping Everyone Together	NaN
34	34.0	NaN	11.0	T	I want to call the white rod one half.	1 - None	NaN
42	42.0	NaN	31.0	Alan	[Puts three light green rods on top of the blu...	NaN	5 - Providing Evidence / Explaining Reasoning
46	NaN	NaN	31.0	T/R1	Do the number names change?	2 - Keeping Everyone Together	NaN
839	839.0	NaN	481.0	T	Okay, but why, how could she be sure?	3 - Getting Students to Relate	NaN

The students and teachers are inconsistently represented in the dataset. Let’s look at all the speakers in the dataset:

[ ]:

speaker_names = df[SPEAKER_COLUMN].unique()

print("Speaker names: ", speaker_names)

Speaker names:  ['T' 'David' 'Meredith' 'Beth' 'Meredith and David' 'T 2' 'Danielle' 'T2'
 'Gregory' 'Michael' 'Andrew' 'Laura' 'Jessica' 'Audra' 'Kelly' 'Brian'
 'Jessica and Audra' 'SS' 'Erik' 'Mark' 'Graham' 'Others' nan 'S' 'BRYAN'
 'DANIEL' 'ANDREW' 'CYNTHIA' 'SAURABH' 'STUDENT 1' 'MS. Liu' 'JAKE'
 'ASHANK' 'Alan' 'ALYSSA' 'SN' 'KEVIN' 'SI' 'Amy' 'Jackie' 'PARTNER'
 'TIMOTHY' 'Jacquelyn' 'T/R1' 'Students' 'Student' 'LINDA FISHER'
 'DEBORAH' 'Jason' 'CHARLOTTE' 'Jeff' 'Michelle' 'Milin' 'Stephanie'
 'Stephanie & Jeff' 'Michelle & Milin' 'Blonde' 'Milin,' '~23' 'All'
 'Michelle & Jeff' 'R2' 'CECILIO DIMAS' 'STUDENT' 'Erik and Brian'
 'SAMUEL' 'OSI' 'CLAIRE']

It seems like the teacher’s speaker name starts with T. The student’s speaker name are all other names. Let’s split the names into two groups: teacher and student.

[ ]:

# Let's remove nan speakers
speaker_names = [_ for _ in speaker_names if str(_) != "nan"]

# And let's make sure the names are interpreted as strings
speaker_names = [str(_) for _ in speaker_names]

# Create a regex for the teacher names
TEACHER_START_LETTER = "T"
TEACHER_SPEAKER = [_ for _ in speaker_names if _.startswith(TEACHER_START_LETTER)]
STUDENT_SPEAKER = [_ for _ in speaker_names if _ not in TEACHER_SPEAKER]

print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)

Teacher speaker:  ['T', 'T 2', 'T2', 'TIMOTHY', 'T/R1']
Student speaker:  ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE']

There are some names in the teacher list that belong to the student. Let’s just manually fix that:

[ ]:

FALSE_POSITIVE_NAMES = ["TIMOTHY"]

# Remove from the teacher speaker list
TEACHER_SPEAKER = [_ for _ in TEACHER_SPEAKER if _ not in FALSE_POSITIVE_NAMES]

# Add to the student speaker list
STUDENT_SPEAKER.extend(FALSE_POSITIVE_NAMES)

print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)

Teacher speaker:  ['T', 'T 2', 'T2', 'T/R1']
Student speaker:  ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE', 'TIMOTHY']

📝 Text Pre-Processing and Annotation

Let’s first preprocess and annotate the dataset with edu-convokit. The following section will: - Read each file in the dataset and preprocess it using edu-convokit’s Preprocessor. We’ll need to anonymize the names and merge the utterances by speaker. - Then, annotate the file using edu-convokit’s Annotator for talktime, student reasoning and uptake. - Finally, save the annotated file under data/annotated/.

Let’s get started!

[ ]:

# First, let's create the replacement names for the teacher and student speakers
TEACHER_REPLACEMENT_NAMES = ["[TEACHER]"] * len(TEACHER_SPEAKER)

# We will replace the student names with [STUDENT_0], [STUDENT_1], etc.
# This will approximately preserve the unique identity of each student, while also anonymizing them.
STUDENT_REPLACEMENT_NAMES = [f"[STUDENT_{i}]" for i in range(len(STUDENT_SPEAKER))]

[ ]:

# Initialize the preprocessor and annotator
processor = TextPreprocessor()
annotator = Annotator()

# This takes about 50 minutes on Colab, CPU
# Though this time varies depending on bandwidth
for filename in tqdm.tqdm(os.listdir(DATA_DIR)):
    if utils.is_valid_file_extension(filename):
        df = utils.load_data(os.path.join(DATA_DIR, filename))

    # Preprocess the data. Let's anonymize the names in the speaker column.
    df = processor.anonymize_known_names(
        df=df,
        text_column=SPEAKER_COLUMN,
        # We're going to directly replace the text in the speaker column with the anonymized text.
        target_text_column=SPEAKER_COLUMN,
        names=TEACHER_SPEAKER + STUDENT_SPEAKER,
        replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
    )

    # Now let's anonymize the names in the text column.
    df = processor.anonymize_known_names(
        df=df,
        text_column=TEXT_COLUMN,
        target_text_column=TEXT_COLUMN,
        names=TEACHER_SPEAKER + STUDENT_SPEAKER,
        replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
    )

    # Now let's merge the utterances of the same speaker together and directly update the dataframe.
    df = processor.merge_utterances_from_same_speaker(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        target_text_column=TEXT_COLUMN
    )

    # Now we're going to annotate the data.
    df = annotator.get_talktime(
        df=df,
        text_column=TEXT_COLUMN,
        output_column=TALK_TIME_COLUMN
    )

    df = annotator.get_student_reasoning(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        output_column=STUDENT_REASONING_COLUMN,
        # We just want to annotate the student utterances. So we're going to specify the speaker value as STUDENT_SPEAKER.
        speaker_value=STUDENT_REPLACEMENT_NAMES
    )

    df = annotator.get_uptake(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        output_column=UPTAKE_COLUMN,
        # We want to annotate the teacher's uptake of the student's utterances.
        # So we're looking for instances where the student first speaks, then the teacher speaks.
        speaker1=STUDENT_REPLACEMENT_NAMES,
        speaker2=TEACHER_REPLACEMENT_NAMES
    )

    # And we're done! Let's now save the annotated data as a csv file.
    filename = filename.split(".")[0] + ".csv"
    df.to_csv(os.path.join(ANNOTATIONS_DIR, filename), index=False)

more-to-come:

class:

stderr

0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.: For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

</pre>
0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.: For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

end{sphinxVerbatim}
0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.: For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher's uptake of student's utterances. So, speaker1 should be the student and speaker2 should be the teacher.
    For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: It's recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

class:

stderr

3%|▎ | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

7%|▋ | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

10%|█ | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

13%|█▎ | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

17%|█▋ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

20%|██ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

23%|██▎ | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

27%|██▋ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

30%|███ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

33%|███▎ | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

37%|███▋ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

40%|████ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

43%|████▎ | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

47%|████▋ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

50%|█████ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

53%|█████▎ | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

57%|█████▋ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

60%|██████ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

63%|██████▎ | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

67%|██████▋ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

70%|███████ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

73%|███████▎ | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

77%|███████▋ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

80%|████████ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

83%|████████▎ | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

87%|████████▋ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

90%|█████████ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

93%|█████████▎| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

97%|█████████▋| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|██████████| 30/30 [05:26<00:00, 10.87s/it] </pre>

3%|▎ | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

7%|▋ | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

10%|█ | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

13%|█▎ | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

17%|█▋ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

20%|██ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

23%|██▎ | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

27%|██▋ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

30%|███ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

33%|███▎ | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

37%|███▋ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

40%|████ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

43%|████▎ | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

47%|████▋ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

50%|█████ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

53%|█████▎ | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

57%|█████▋ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

60%|██████ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

63%|██████▎ | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

67%|██████▋ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

70%|███████ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

73%|███████▎ | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

77%|███████▋ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

80%|████████ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

83%|████████▎ | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

87%|████████▋ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

90%|█████████ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

93%|█████████▎| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.

97%|█████████▋| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|██████████| 30/30 [05:26<00:00, 10.87s/it] end{sphinxVerbatim}

3%|▎ | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf