Tutorial on edu-convokit
for the TalkMoves datasetο
Welcome to the tutorial on edu-convokit
for the TalkMoves dataset. This tutorial will walk you through the process of using edu-convokit
to pre-process, annotate and analyze the TalkMoves dataset.
If you are looking for a tutorial on the individual components of edu-convokit
, please refer to the following tutorials to get started: - Text Pre-processing Colab - Annotation Colab - Analysis Colab
This tutorial will use all of the components!
Installationο
Letβs start by installing edu-convokit
and importing the necessary modules.
[ ]:
!pip install git+https://github.com/rosewang2008/edu-convokit.git
Collecting git+https://github.com/rosewang2008/edu-convokit.git
Cloning https://github.com/rosewang2008/edu-convokit.git to /tmp/pip-req-build-dgphjpe_
Running command git clone --filter=blob:none --quiet https://github.com/rosewang2008/edu-convokit.git /tmp/pip-req-build-dgphjpe_
Resolved https://github.com/rosewang2008/edu-convokit.git to commit 1e094c8836a3e3112cc1f996f5f12aeff013777c
Preparing metadata (setup.py) ... done
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.66.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.11.4)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.8.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (2.1.0+cu121)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.35.2)
Collecting clean-text (from edu-convokit==0.0.1)
Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.6.1)
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.3.2)
Collecting num2words==0.5.10 (from edu-convokit==0.0.1)
Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
ββββββββββββββββββββββββββββββββββββββββ 101.6/101.6 kB 1.5 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.2.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (0.12.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.5.3)
Collecting docopt>=0.6.2 (from num2words==0.5.10->edu-convokit==0.0.1)
Downloading docopt-0.6.2.tar.gz (25 kB)
Preparing metadata (setup.py) ... done
Collecting emoji<2.0.0,>=1.0.0 (from clean-text->edu-convokit==0.0.1)
Downloading emoji-1.7.0.tar.gz (175 kB)
ββββββββββββββββββββββββββββββββββββββββ 175.4/175.4 kB 5.1 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting ftfy<7.0,>=6.0 (from clean-text->edu-convokit==0.0.1)
Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
ββββββββββββββββββββββββββββββββββββββββ 53.4/53.4 kB 4.1 MB/s eta 0:00:00
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim->edu-convokit==0.0.1) (6.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (2.8.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (2023.6.3)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->edu-convokit==0.0.1) (1.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->edu-convokit==0.0.1) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->edu-convokit==0.0.1) (3.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.9)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.10.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.10.13)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (67.7.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.2.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2.1.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.19.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (6.0.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.15.0)
Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.4.1)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy<7.0,>=6.0->clean-text->edu-convokit==0.0.1) (0.2.12)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->edu-convokit==0.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2023.11.17)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy->edu-convokit==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->edu-convokit==0.0.1) (1.3.0)
Building wheels for collected packages: edu-convokit, docopt, emoji
Building wheel for edu-convokit (setup.py) ... done
Created wheel for edu-convokit: filename=edu_convokit-0.0.1-py3-none-any.whl size=25946 sha256=bacc5ae8cec78f73dd6432b9a641058237be062d59c7dcfcac080e9a19077bf3
Stored in directory: /tmp/pip-ephem-wheel-cache-a92ctwua/wheels/29/43/ec/d2472df0eb2af8f1e7d67d0710a4b3eb93fe983b15f8d7b841
Building wheel for docopt (setup.py) ... done
Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=19f3926503485ba42f4fb35754933106263ea928f13b10e358a34f5f263f839a
Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
Building wheel for emoji (setup.py) ... done
Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171033 sha256=0024d11da3567b1c7f328fd06e05831297bd61be31635baec2d057a050286c56
Stored in directory: /root/.cache/pip/wheels/31/8a/8c/315c9e5d7773f74b33d5ed33f075b49c6eaeb7cedbb86e2cf8
Successfully built edu-convokit docopt emoji
Installing collected packages: emoji, docopt, num2words, ftfy, clean-text, edu-convokit
Successfully installed clean-text-0.6.0 docopt-0.6.2 edu-convokit-0.0.1 emoji-1.7.0 ftfy-6.1.3 num2words-0.5.10
[ ]:
from edu_convokit.preprocessors import TextPreprocessor
from edu_convokit.annotation import Annotator
from edu_convokit.analyzers import (
QualitativeAnalyzer,
QuantitativeAnalyzer,
LexicalAnalyzer,
TemporalAnalyzer
)
# For helping us load data
from edu_convokit import utils
import os
import tqdm
WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
π Dataο
Letβs download the dataset under raw_data/
. Note weβre only download a subsample of the dataset for this tutorial; this cuts down the annotation time. If you would like to annotate the entire dataset, feel free to upload the entire dataset to this Colab!
[ ]:
# We will put the data here:
DATA_DIR = "raw_data"
!mkdir -p $DATA_DIR
# We will put the annotated data here:
ANNOTATIONS_DIR = "annotations"
!mkdir -p $ANNOTATIONS_DIR
# # Download the data
!wget "https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip"
# # Unzip the data
!unzip -n -q talkmoves.zip -d $DATA_DIR
# Data directory is then raw_data/talkmoves
DATA_DIR = "raw_data/talkmoves"
--2023-12-30 11:46:56-- https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 346774 (339K) [application/zip]
Saving to: βtalkmoves.zipβ
talkmoves.zip 0%[ ] 0 β.-KB/s talkmoves.zip 100%[===================>] 338.65K β.-KB/s in 0.02s
2023-12-30 11:46:56 (17.2 MB/s) - βtalkmoves.zipβ saved [346774/346774]
</pre>
talkmoves.zip 0%[ ] 0 β.-KB/s talkmoves.zip 100%[===================>] 338.65K β.-KB/s in 0.02s
2023-12-30 11:46:56 (17.2 MB/s) - βtalkmoves.zipβ saved [346774/346774]
end{sphinxVerbatim}
talkmoves.zip 0%[ ] 0 β.-KB/s talkmoves.zip 100%[===================>] 338.65K β.-KB/s in 0.02s
2023-12-30 11:46:56 (17.2 MB/s) - βtalkmoves.zipβ saved [346774/346774]
[ ]:
# We'll set the important variables specific to this dataset. If you open one of the files, you'll see that the
# speaker and text columns are defined as:
TEXT_COLUMN = "Sentence"
SPEAKER_COLUMN = "Speaker"
# We will also define the annotation columns.
# For the purposes of this tutorial, we will only be using talktime, student_reasoning, and uptake.
TALK_TIME_COLUMN = "talktime"
STUDENT_REASONING_COLUMN = "student_reasoning"
UPTAKE_COLUMN = "uptake"
One thing that will be important is knowing how the teacher/tutor and student are represented in the dataset. Letβs load some examples and see how they are represented.
[ ]:
files = os.listdir(DATA_DIR)
files = [os.path.join(DATA_DIR, f) for f in files if utils.is_valid_file_extension(f)]
df = utils.merge_dataframes_in_list(files)
[ ]:
# Randomly show 10 rows
df.sample(10)
Unnamed: 0 | TimeStamp | Turn | Speaker | Sentence | Teacher Tag | Student Tag | |
---|---|---|---|---|---|---|---|
5 | NaN | NaN | 1.0 | T/R1 | Do you remember it looks like this. | 1 - None | NaN |
94 | 94.0 | NaN | 49.0 | T | How many of you disagree? | 3 - Getting Students to Relate | NaN |
53 | 53.0 | NaN | NaN | Erik and Brian | Yeah | NaN | 2 - Relating to Another Student |
135 | 135.0 | NaN | 86.0 | Mark | If, if the blue was one whole, what would the ... | NaN | 3 - Asking for More Information |
193 | 193.0 | NaN | 85.0 | T | Or the people who aren't sure want to tell us... | 2 - Keeping Everyone Together | NaN |
93 | 93.0 | NaN | 17.0 | T | Joey? | 2 - Keeping Everyone Together | NaN |
34 | 34.0 | NaN | 11.0 | T | I want to call the white rod one half. | 1 - None | NaN |
42 | 42.0 | NaN | 31.0 | Alan | [Puts three light green rods on top of the blu... | NaN | 5 - Providing Evidence / Explaining Reasoning |
46 | NaN | NaN | 31.0 | T/R1 | Do the number names change? | 2 - Keeping Everyone Together | NaN |
839 | 839.0 | NaN | 481.0 | T | Okay, but why, how could she be sure? | 3 - Getting Students to Relate | NaN |
The students and teachers are inconsistently represented in the dataset. Letβs look at all the speakers in the dataset:
[ ]:
speaker_names = df[SPEAKER_COLUMN].unique()
print("Speaker names: ", speaker_names)
Speaker names: ['T' 'David' 'Meredith' 'Beth' 'Meredith and David' 'T 2' 'Danielle' 'T2'
'Gregory' 'Michael' 'Andrew' 'Laura' 'Jessica' 'Audra' 'Kelly' 'Brian'
'Jessica and Audra' 'SS' 'Erik' 'Mark' 'Graham' 'Others' nan 'S' 'BRYAN'
'DANIEL' 'ANDREW' 'CYNTHIA' 'SAURABH' 'STUDENT 1' 'MS. Liu' 'JAKE'
'ASHANK' 'Alan' 'ALYSSA' 'SN' 'KEVIN' 'SI' 'Amy' 'Jackie' 'PARTNER'
'TIMOTHY' 'Jacquelyn' 'T/R1' 'Students' 'Student' 'LINDA FISHER'
'DEBORAH' 'Jason' 'CHARLOTTE' 'Jeff' 'Michelle' 'Milin' 'Stephanie'
'Stephanie & Jeff' 'Michelle & Milin' 'Blonde' 'Milin,' '~23' 'All'
'Michelle & Jeff' 'R2' 'CECILIO DIMAS' 'STUDENT' 'Erik and Brian'
'SAMUEL' 'OSI' 'CLAIRE']
It seems like the teacherβs speaker name starts with T. The studentβs speaker name are all other names. Letβs split the names into two groups: teacher and student.
[ ]:
# Let's remove nan speakers
speaker_names = [_ for _ in speaker_names if str(_) != "nan"]
# And let's make sure the names are interpreted as strings
speaker_names = [str(_) for _ in speaker_names]
# Create a regex for the teacher names
TEACHER_START_LETTER = "T"
TEACHER_SPEAKER = [_ for _ in speaker_names if _.startswith(TEACHER_START_LETTER)]
STUDENT_SPEAKER = [_ for _ in speaker_names if _ not in TEACHER_SPEAKER]
print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)
Teacher speaker: ['T', 'T 2', 'T2', 'TIMOTHY', 'T/R1']
Student speaker: ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE']
There are some names in the teacher list that belong to the student. Letβs just manually fix that:
[ ]:
FALSE_POSITIVE_NAMES = ["TIMOTHY"]
# Remove from the teacher speaker list
TEACHER_SPEAKER = [_ for _ in TEACHER_SPEAKER if _ not in FALSE_POSITIVE_NAMES]
# Add to the student speaker list
STUDENT_SPEAKER.extend(FALSE_POSITIVE_NAMES)
print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)
Teacher speaker: ['T', 'T 2', 'T2', 'T/R1']
Student speaker: ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE', 'TIMOTHY']
π Text Pre-Processing and Annotationο
Letβs first preprocess and annotate the dataset with edu-convokit
. The following section will: - Read each file in the dataset and preprocess it using edu-convokit
βs Preprocessor
. Weβll need to anonymize the names and merge the utterances by speaker. - Then, annotate the file using edu-convokit
βs Annotator
for talktime, student reasoning and uptake. - Finally, save the annotated file under data/annotated/
.
Letβs get started!
[ ]:
# First, let's create the replacement names for the teacher and student speakers
TEACHER_REPLACEMENT_NAMES = ["[TEACHER]"] * len(TEACHER_SPEAKER)
# We will replace the student names with [STUDENT_0], [STUDENT_1], etc.
# This will approximately preserve the unique identity of each student, while also anonymizing them.
STUDENT_REPLACEMENT_NAMES = [f"[STUDENT_{i}]" for i in range(len(STUDENT_SPEAKER))]
[ ]:
# Initialize the preprocessor and annotator
processor = TextPreprocessor()
annotator = Annotator()
# This takes about 50 minutes on Colab, CPU
# Though this time varies depending on bandwidth
for filename in tqdm.tqdm(os.listdir(DATA_DIR)):
if utils.is_valid_file_extension(filename):
df = utils.load_data(os.path.join(DATA_DIR, filename))
# Preprocess the data. Let's anonymize the names in the speaker column.
df = processor.anonymize_known_names(
df=df,
text_column=SPEAKER_COLUMN,
# We're going to directly replace the text in the speaker column with the anonymized text.
target_text_column=SPEAKER_COLUMN,
names=TEACHER_SPEAKER + STUDENT_SPEAKER,
replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
)
# Now let's anonymize the names in the text column.
df = processor.anonymize_known_names(
df=df,
text_column=TEXT_COLUMN,
target_text_column=TEXT_COLUMN,
names=TEACHER_SPEAKER + STUDENT_SPEAKER,
replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
)
# Now let's merge the utterances of the same speaker together and directly update the dataframe.
df = processor.merge_utterances_from_same_speaker(
df=df,
text_column=TEXT_COLUMN,
speaker_column=SPEAKER_COLUMN,
target_text_column=TEXT_COLUMN
)
# Now we're going to annotate the data.
df = annotator.get_talktime(
df=df,
text_column=TEXT_COLUMN,
output_column=TALK_TIME_COLUMN
)
df = annotator.get_student_reasoning(
df=df,
text_column=TEXT_COLUMN,
speaker_column=SPEAKER_COLUMN,
output_column=STUDENT_REASONING_COLUMN,
# We just want to annotate the student utterances. So we're going to specify the speaker value as STUDENT_SPEAKER.
speaker_value=STUDENT_REPLACEMENT_NAMES
)
df = annotator.get_uptake(
df=df,
text_column=TEXT_COLUMN,
speaker_column=SPEAKER_COLUMN,
output_column=UPTAKE_COLUMN,
# We want to annotate the teacher's uptake of the student's utterances.
# So we're looking for instances where the student first speaks, then the teacher speaks.
speaker1=STUDENT_REPLACEMENT_NAMES,
speaker2=TEACHER_REPLACEMENT_NAMES
)
# And we're done! Let's now save the annotated data as a csv file.
filename = filename.split(".")[0] + ".csv"
df.to_csv(os.path.join(ANNOTATIONS_DIR, filename), index=False)
- more-to-come:
- class:
stderr
- 0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
</pre>
- 0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
end{sphinxVerbatim}
- 0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
WARNING:root:Note: This model was trained on teacher's uptake of student's utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: It's recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- class:
stderr
- 3%|β | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 7%|β | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 10%|β | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 13%|ββ | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 17%|ββ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 20%|ββ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 23%|βββ | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 27%|βββ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 30%|βββ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 33%|ββββ | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 37%|ββββ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 40%|ββββ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 43%|βββββ | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 47%|βββββ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 50%|βββββ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 53%|ββββββ | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 57%|ββββββ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 60%|ββββββ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 63%|βββββββ | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 67%|βββββββ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 70%|βββββββ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 73%|ββββββββ | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 77%|ββββββββ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 80%|ββββββββ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 83%|βββββββββ | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 87%|βββββββββ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 90%|βββββββββ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 93%|ββββββββββ| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 97%|ββββββββββ| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|ββββββββββ| 30/30 [05:26<00:00, 10.87s/it] </pre>
- 3%|β | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 7%|β | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 10%|β | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 13%|ββ | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 17%|ββ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 20%|ββ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 23%|βββ | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 27%|βββ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 30%|βββ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 33%|ββββ | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 37%|ββββ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 40%|ββββ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 43%|βββββ | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 47%|βββββ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 50%|βββββ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 53%|ββββββ | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 57%|ββββββ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 60%|ββββββ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 63%|βββββββ | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 67%|βββββββ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 70%|βββββββ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 73%|ββββββββ | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 77%|ββββββββ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 80%|ββββββββ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 83%|βββββββββ | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 87%|βββββββββ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 90%|βββββββββ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 93%|ββββββββββ| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 97%|ββββββββββ| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|ββββββββββ| 30/30 [05:26<00:00, 10.87s/it] end{sphinxVerbatim}
- 3%|β | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 7%|β | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 10%|β | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 13%|ββ | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 17%|ββ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 20%|ββ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 23%|βββ | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 27%|βββ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 30%|βββ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 33%|ββββ | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 37%|ββββ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 40%|ββββ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 43%|βββββ | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 47%|βββββ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 50%|βββββ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 53%|ββββββ | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 57%|ββββββ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 60%|ββββββ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 63%|βββββββ | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 67%|βββββββ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 70%|βββββββ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 73%|ββββββββ | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 77%|ββββββββ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 80%|ββββββββ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 83%|βββββββββ | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 87%|βββββββββ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 90%|βββββββββ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 93%|ββββββββββ| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
- WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
- 97%|ββββββββββ| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
- WARNING:root:Note: This model was trained on teacherβs uptake of studentβs utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: Itβs recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|ββββββββββ| 30/30 [05:26<00:00, 10.87s/it]
Analysisο
Now that we have annotated the dataset, letβs analyze it using edu-convokit
βs Analyzer
. Weβll be doing the following: - Weβll use QualitativeAnalyzer
to look at some examples of the talktime, student reasoning and uptake annotations. - Weβll use QuantitativeAnalyzer
to look at the aggregate statistics of the talktime, student reasoning and uptake annotations. - Weβll use LexicalAnalyzer
to compare the student and tutorβs vocabulary. - Weβll use TemporalAnalyzer
to look at
the temporal trends of the talktime, student reasoning and uptake annotations.
Letβs get started!!!
π Qualitative Analysisο
[ ]:
# We're going to look at examples from the entire dataset.
qualitative_analyzer = QualitativeAnalyzer(data_dir=ANNOTATIONS_DIR)
# Examples of talktime. Will show random examples from the dataset.
qualitative_analyzer.print_examples(
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
feature_column=TALK_TIME_COLUMN,
)
talktime: 54
>> [TEACHER]: Okay, thatβs really great. Okay, you are really wonderful. We have about five more minutes where we can do some problem solving [STUDENT_3] then put these things away. This is what I would like you to do. I would like you to take a turn to make a problem that will challenge your partnerβ¦
talktime: 54
>> [TEACHER]: I'm wondering which is bigger, one half or two thirds. Now before you model it you might think in your head, before you begin to model it what you is bigger [STUDENT_3] if so, if one is bigger, by how much. Why donβt you work with your partner [STUDENT_3] see what you can do.
talktime: 54
>> [TEACHER]: Let me write this down. This, what you are saying here is so important, here. Let me see if I can write this down. You're saying that you're calling the red, you're giving red the number name, right? The length of the red, right? We'll give it the number name, what did you say?
talktime: 4
>> [STUDENT_15]: I made a problem.
talktime: 4
>> [STUDENT_15]: You donβt know mine.
talktime: 4
>> [TEACHER]: What would be what?
talktime: 6
>> [TEACHER]: And ask your partner that problem,
talktime: 6
>> [TEACHER] [TEACHER]: This looks interesting. Are you experimenting?
talktime: 6
>> [TEACHER] [TEACHER]: You have to make it convincing.
[ ]:
# Examples of student reasoning. Let's look at positive examples:
qualitative_analyzer.print_examples(
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
feature_column=STUDENT_REASONING_COLUMN,
feature_value=1.0,
)
student_reasoning: 1.0
>> [STUDENT_29]: [Puts three light green rods on top of the blue rod] There, that would, if you look down it would equal up to a blue.
student_reasoning: 1.0
>> [STUDENT_15]: And I know that thatβs half of [He points to the orange rod], [STUDENT_3] I know that yellow is half of orange, which is ten.
student_reasoning: 1.0
>> [STUDENT_1]: one half by one sixth. Cause if you put six ones up to a whole
[ ]:
# We can also look at negative examples:
qualitative_analyzer.print_examples(
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
feature_column=STUDENT_REASONING_COLUMN,
feature_value=0.0,
)
student_reasoning: 0.0
>> [STUDENT_15]: You just donβt know the problem I just made up. You donβt know the problem I made up.
student_reasoning: 0.0
>> [STUDENT_15]: I have a problem already. I have a problem. I have a problem, remember it? If a light green, um, no, no.
student_reasoning: 0.0
>> [STUDENT_15]: I have one for you, too. [[TEACHER] [TEACHER] walks over] I have a problem.
[ ]:
# Examples of uptake.
qualitative_analyzer.print_examples(
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
feature_column=UPTAKE_COLUMN,
# I want to look at positive examples of uptake (uptake = 1.0)
feature_value=1.0,
# ... and look at the previous student utterance (show_k_previous_lines = 1).
# This is interesting because it will show us how the teacher is responding to the student's utterance.
show_k_previous_lines=1,
)
uptake: 1.0
[STUDENT_15]: If a light green was one third, what would be a whole?
>> [TEACHER]: What would be what?
uptake: 1.0
[STUDENT_15]: If a light green was one third, what would be a whole?
>> [TEACHER]: One, what would one be.
uptake: 1.0
[STUDENT_29]: [Puts three light green rods on top of the blue rod] There, that would, if you look down it would equal up to a blue.
>> [TEACHER]: Hold on, Iβm a little confused. Tell me again. Six ones? You called this one? What are you calling these?
π Quantitative Analysisο
[ ]:
quantitative_analyzer = QuantitativeAnalyzer(data_dir=ANNOTATIONS_DIR)
# Let's create a speaker mapping to shorten the speaker names. Teacher -> T, Student{i} -> S{i}
speaker_mapping = {
**{_: "T" for _ in TEACHER_REPLACEMENT_NAMES},
**{_: f"S{i}" for i, _ in enumerate(STUDENT_REPLACEMENT_NAMES)}
}
# For figure formatting because there are a lot of students
import matplotlib.pyplot as plt
plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6))
# Let's plot the talk time ratio between the speakers.
quantitative_analyzer.plot_statistics(
feature_column=TALK_TIME_COLUMN,
speaker_column=SPEAKER_COLUMN,
# Proportion of talk time for each speaker.
value_as="prop",
label_mapping=speaker_mapping
)
# We can also print the statistics:
quantitative_analyzer.print_statistics(
feature_column=TALK_TIME_COLUMN,
speaker_column=SPEAKER_COLUMN,
# Proportion of talk time for each speaker.
value_as="prop"
)
talktime
Proportion statistics
count mean std min 25% 50% 75% max
speaker
[STUDENT_0] 3.0 0.082532 0.055057 0.038502 0.051667 0.064833 0.104548 0.144262
[STUDENT_10] 1.0 0.052941 NaN 0.052941 0.052941 0.052941 0.052941 0.052941
[STUDENT_11] 2.0 0.050219 0.010940 0.042484 0.046351 0.050219 0.054087 0.057955
[STUDENT_12] 6.0 0.103688 0.088372 0.004112 0.022533 0.114360 0.180103 0.195356
[STUDENT_14] 5.0 0.006621 0.006594 0.000435 0.000936 0.004612 0.012336 0.014786
[STUDENT_15] 11.0 0.149584 0.111150 0.002649 0.083878 0.131579 0.187356 0.379571
[STUDENT_15] [STUDENT_3] [STUDENT_12] 1.0 0.000801 NaN 0.000801 0.000801 0.000801 0.000801 0.000801
[STUDENT_16] 2.0 0.013095 0.004014 0.010256 0.011676 0.013095 0.014514 0.015933
[STUDENT_17] 3.0 0.016949 0.017947 0.001677 0.007066 0.012454 0.024586 0.036717
[STUDENT_18] 1.0 0.000419 NaN 0.000419 0.000419 0.000419 0.000419 0.000419
[STUDENT_19] 13.0 0.158800 0.186038 0.004929 0.051454 0.079588 0.146341 0.584345
[STUDENT_1] 6.0 0.144681 0.167141 0.013691 0.025632 0.059223 0.273155 0.377049
[STUDENT_1] [STUDENT_3] [STUDENT_0] 1.0 0.005464 NaN 0.005464 0.005464 0.005464 0.005464 0.005464
[STUDENT_20] 1.0 0.040346 NaN 0.040346 0.040346 0.040346 0.040346 0.040346
[STUDENT_21] 2.0 0.170696 0.017246 0.158501 0.164599 0.170696 0.176794 0.182891
[STUDENT_22] 6.0 0.085113 0.077821 0.044715 0.045282 0.050777 0.072001 0.241883
[STUDENT_23] 2.0 0.087668 0.080432 0.030794 0.059231 0.087668 0.116106 0.144543
[STUDENT_24] 4.0 0.090441 0.077632 0.031746 0.048115 0.062901 0.105227 0.204214
[STUDENT_25] 2.0 0.161549 0.145949 0.058347 0.109948 0.161549 0.213149 0.264750
[STUDENT_26] 1.0 0.011345 NaN 0.011345 0.011345 0.011345 0.011345 0.011345
[STUDENT_27] 3.0 0.076598 0.056412 0.030794 0.045092 0.059390 0.099500 0.139610
[STUDENT_28] 3.0 0.092144 0.015845 0.077796 0.083641 0.089485 0.099317 0.109149
[STUDENT_29] 5.0 0.126505 0.055537 0.042045 0.104058 0.142857 0.158890 0.184676
[STUDENT_2] 3.0 0.009829 0.007439 0.002186 0.006221 0.010256 0.013651 0.017045
[STUDENT_30] 1.0 0.064877 NaN 0.064877 0.064877 0.064877 0.064877 0.064877
[STUDENT_31] 4.0 0.066166 0.090450 0.002237 0.018617 0.031213 0.078761 0.200000
[STUDENT_32] 3.0 0.089839 0.021829 0.077236 0.077236 0.077236 0.096140 0.115044
[STUDENT_33] 2.0 0.069106 0.000000 0.069106 0.069106 0.069106 0.069106 0.069106
[STUDENT_34] 2.0 0.023347 0.029145 0.002738 0.013043 0.023347 0.033652 0.043956
[STUDENT_35] 3.0 0.054487 0.073891 0.001965 0.012241 0.022517 0.080748 0.138980
[STUDENT_36] 1.0 0.011364 NaN 0.011364 0.011364 0.011364 0.011364 0.011364
[STUDENT_37] 2.0 0.055127 0.010618 0.047619 0.051373 0.055127 0.058881 0.062635
[STUDENT_38] 1.0 0.012500 NaN 0.012500 0.012500 0.012500 0.012500 0.012500
[STUDENT_39] 1.0 0.005682 NaN 0.005682 0.005682 0.005682 0.005682 0.005682
[STUDENT_40] 1.0 0.021669 NaN 0.021669 0.021669 0.021669 0.021669 0.021669
[STUDENT_41] 1.0 0.048957 NaN 0.048957 0.048957 0.048957 0.048957 0.048957
[STUDENT_42] 1.0 0.047352 NaN 0.047352 0.047352 0.047352 0.047352 0.047352
[STUDENT_43] 2.0 0.026003 0.016571 0.014286 0.020144 0.026003 0.031862 0.037721
[STUDENT_44] 1.0 0.132927 NaN 0.132927 0.132927 0.132927 0.132927 0.132927
[STUDENT_45] 1.0 0.094036 NaN 0.094036 0.094036 0.094036 0.094036 0.094036
[STUDENT_45] & [STUDENT_44] 1.0 0.000145 NaN 0.000145 0.000145 0.000145 0.000145 0.000145
[STUDENT_45] & [STUDENT_46] 1.0 0.000871 NaN 0.000871 0.000871 0.000871 0.000871 0.000871
[STUDENT_46] 1.0 0.133507 NaN 0.133507 0.133507 0.133507 0.133507 0.133507
[STUDENT_46], 1.0 0.001016 NaN 0.001016 0.001016 0.001016 0.001016 0.001016
[STUDENT_47] 1.0 0.193441 NaN 0.193441 0.193441 0.193441 0.193441 0.193441
[STUDENT_47] & [STUDENT_44] 1.0 0.000726 NaN 0.000726 0.000726 0.000726 0.000726 0.000726
[STUDENT_4] 4.0 0.092072 0.110127 0.025918 0.028582 0.043212 0.106702 0.255947
[STUDENT_50] 1.0 0.000145 NaN 0.000145 0.000145 0.000145 0.000145 0.000145
[STUDENT_53] 1.0 0.000871 NaN 0.000871 0.000871 0.000871 0.000871 0.000871
[STUDENT_55] 1.0 0.001887 NaN 0.001887 0.001887 0.001887 0.001887 0.001887
[STUDENT_56] 1.0 0.735250 NaN 0.735250 0.735250 0.735250 0.735250 0.735250
[STUDENT_59] 1.0 0.007937 NaN 0.007937 0.007937 0.007937 0.007937 0.007937
[STUDENT_5] 1.0 0.019981 NaN 0.019981 0.019981 0.019981 0.019981 0.019981
[STUDENT_60] 1.0 0.004762 NaN 0.004762 0.004762 0.004762 0.004762 0.004762
[STUDENT_61] 1.0 0.004762 NaN 0.004762 0.004762 0.004762 0.004762 0.004762
[STUDENT_62] 2.0 0.137668 0.115241 0.056180 0.096924 0.137668 0.178412 0.219156
[STUDENT_6] 8.0 0.041322 0.042725 0.002649 0.009801 0.029687 0.059497 0.130818
[STUDENT_7] 1.0 0.021569 NaN 0.021569 0.021569 0.021569 0.021569 0.021569
[STUDENT_8] 2.0 0.004774 0.000643 0.004320 0.004547 0.004774 0.005001 0.005229
[STUDENT_9] 6.0 0.047136 0.046222 0.005298 0.014495 0.040520 0.054591 0.131373
[STUDENT_9] [STUDENT_3] [STUDENT_10] 1.0 0.026797 NaN 0.026797 0.026797 0.026797 0.026797 0.026797
[TEACHER] 27.0 0.577200 0.194013 0.252459 0.448436 0.586710 0.700936 0.936508
[TEACHER] [TEACHER] 2.0 0.206389 0.017239 0.194199 0.200294 0.206389 0.212484 0.218579
[TEACHER]/R1 2.0 0.722468 0.036595 0.696591 0.709529 0.722468 0.735406 0.748344
~23 1.0 0.002612 NaN 0.002612 0.002612 0.002612 0.002612 0.002612
<Figure size 640x480 with 0 Axes>
[ ]:
# For figure formatting because there are a lot of students
import matplotlib.pyplot as plt
plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6))
# What about the student reasoning? How often does the student use reasoning?
quantitative_analyzer.plot_statistics(
feature_column=STUDENT_REASONING_COLUMN,
speaker_column=SPEAKER_COLUMN,
# We change this to "avg" because we're now looking at within-speaker statistics.
value_as="avg",
# We can set the y-axis limits to [0, 1] because the student reasoning column is a binary column.
yrange=(0, 1),
label_mapping=speaker_mapping
)
# We can also print the statistics:
quantitative_analyzer.print_statistics(
feature_column=STUDENT_REASONING_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="avg"
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
student_reasoning
Average statistics
count mean std min 25% 50% 75% max
speaker
[STUDENT_0] 3.0 0.714286 0.494872 0.142857 0.571429 1.000000 1.000000 1.000000
[STUDENT_10] 1.0 0.333333 NaN 0.333333 0.333333 0.333333 0.333333 0.333333
[STUDENT_11] 2.0 0.250000 0.353553 0.000000 0.125000 0.250000 0.375000 0.500000
[STUDENT_12] 5.0 0.466667 0.361325 0.000000 0.333333 0.500000 0.500000 1.000000
[STUDENT_14] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_15] 9.0 0.373016 0.363437 0.000000 0.071429 0.285714 0.500000 1.000000
[STUDENT_15] [STUDENT_3] [STUDENT_12] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_16] 2.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_17] 2.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_18] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_19] 12.0 0.250000 0.349964 0.000000 0.000000 0.000000 0.437500 1.000000
[STUDENT_1] 6.0 0.352381 0.377063 0.000000 0.053571 0.307143 0.475000 1.000000
[STUDENT_1] [STUDENT_3] [STUDENT_0] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_20] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_21] 2.0 0.750000 0.353553 0.500000 0.625000 0.750000 0.875000 1.000000
[STUDENT_22] 6.0 0.222222 0.403687 0.000000 0.000000 0.000000 0.250000 1.000000
[STUDENT_23] 2.0 0.500000 0.707107 0.000000 0.250000 0.500000 0.750000 1.000000
[STUDENT_24] 4.0 0.562500 0.515388 0.000000 0.187500 0.625000 1.000000 1.000000
[STUDENT_25] 2.0 0.700000 0.424264 0.400000 0.550000 0.700000 0.850000 1.000000
[STUDENT_26] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_27] 3.0 0.666667 0.288675 0.500000 0.500000 0.500000 0.750000 1.000000
[STUDENT_28] 3.0 0.833333 0.288675 0.500000 0.750000 1.000000 1.000000 1.000000
[STUDENT_29] 5.0 0.428182 0.445552 0.000000 0.090909 0.250000 0.800000 1.000000
[STUDENT_2] 2.0 0.500000 0.707107 0.000000 0.250000 0.500000 0.750000 1.000000
[STUDENT_30] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_31] 3.0 0.277778 0.254588 0.000000 0.166667 0.333333 0.416667 0.500000
[STUDENT_32] 3.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_33] 2.0 1.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000
[STUDENT_34] 1.0 1.000000 NaN 1.000000 1.000000 1.000000 1.000000 1.000000
[STUDENT_35] 2.0 0.392857 0.151523 0.285714 0.339286 0.392857 0.446429 0.500000
[STUDENT_36] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_37] 2.0 0.833333 0.235702 0.666667 0.750000 0.833333 0.916667 1.000000
[STUDENT_38] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_39] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_40] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_41] 1.0 0.333333 NaN 0.333333 0.333333 0.333333 0.333333 0.333333
[STUDENT_42] 1.0 1.000000 NaN 1.000000 1.000000 1.000000 1.000000 1.000000
[STUDENT_43] 2.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_44] 1.0 0.146341 NaN 0.146341 0.146341 0.146341 0.146341 0.146341
[STUDENT_45] 1.0 0.189189 NaN 0.189189 0.189189 0.189189 0.189189 0.189189
[STUDENT_45] & [STUDENT_44] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_45] & [STUDENT_46] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_46] 1.0 0.153846 NaN 0.153846 0.153846 0.153846 0.153846 0.153846
[STUDENT_46], 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_47] 1.0 0.096154 NaN 0.096154 0.096154 0.096154 0.096154 0.096154
[STUDENT_47] & [STUDENT_44] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_4] 4.0 0.221154 0.259675 0.000000 0.000000 0.192308 0.413462 0.500000
[STUDENT_50] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_53] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_55] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_56] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_59] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_5] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_60] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_61] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_62] 2.0 0.250000 0.353553 0.000000 0.125000 0.250000 0.375000 0.500000
[STUDENT_6] 6.0 0.472222 0.452360 0.000000 0.083333 0.416667 0.875000 1.000000
[STUDENT_7] 1.0 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[STUDENT_8] 0.0 NaN NaN NaN NaN NaN NaN NaN
[STUDENT_9] 6.0 0.305556 0.400231 0.000000 0.000000 0.166667 0.458333 1.000000
[STUDENT_9] [STUDENT_3] [STUDENT_10] 0.0 NaN NaN NaN NaN NaN NaN NaN
[TEACHER] 0.0 NaN NaN NaN NaN NaN NaN NaN
[TEACHER] [TEACHER] 0.0 NaN NaN NaN NaN NaN NaN NaN
[TEACHER]/R1 0.0 NaN NaN NaN NaN NaN NaN NaN
~23 0.0 NaN NaN NaN NaN NaN NaN NaN
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
<Figure size 640x480 with 0 Axes>
Note, the tutor has no student_reasoning
because we did not annotate the tutorβs utterances for student reasoning. We can easily remove the tutor from the plot by dropping na values:
[ ]:
# For figure formatting because there are a lot of students
import matplotlib.pyplot as plt
plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6))
quantitative_analyzer.plot_statistics(
feature_column=STUDENT_REASONING_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="avg",
yrange=(0, 1),
dropna=True,
label_mapping=speaker_mapping
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
<Figure size 640x480 with 0 Axes>
[ ]:
# Finally, let's look at the tutor's uptake of the student's utterances.
quantitative_analyzer.plot_statistics(
feature_column=UPTAKE_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="avg",
yrange=(0, 1),
dropna=True
)
<Figure size 640x480 with 0 Axes>
π¬ Lexical Analysisο
[ ]:
lexical_analyzer = LexicalAnalyzer(data_dir=ANNOTATIONS_DIR)
# Cast as str
lexical_analyzer._df[TEXT_COLUMN] = lexical_analyzer._df[TEXT_COLUMN].astype(str)
# Let's look at the most common words per speaker in the dataset.
lexical_analyzer.print_word_frequency(
text_column=TEXT_COLUMN,
speaker_column=SPEAKER_COLUMN,
# We want to look at the top 10 words per speaker.
topk=10,
# Let's also format the text (e.g., remove punctuation, lowercase the text, etc.)
run_text_formatting=True
)
Top Words By Speaker
[TEACHER]
student: 695
one: 343
okay: 229
think: 170
would: 134
red: 101
right: 91
want: 87
two: 83
see: 82
[STUDENT_15]
one: 75
student: 58
two: 34
green: 23
yeah: 21
would: 20
half: 20
rod: 20
three: 20
call: 20
[TEACHER] [TEACHER]
one: 13
two: 10
okay: 9
student: 9
let: 8
interesting: 6
see: 5
twelfths: 4
hear: 3
clever: 3
[STUDENT_29]
one: 28
would: 16
student: 12
green: 11
rod: 10
three: 9
considered: 9
thirds: 9
red: 7
fifth: 6
[STUDENT_0]
one: 12
would: 8
two: 6
third: 4
student: 4
put: 4
green: 4
purples: 3
three: 3
yeah: 3
[STUDENT_1]
student: 43
two: 33
one: 30
put: 17
rod: 14
rods: 13
take: 11
bigger: 11
orange: 10
yeah: 10
[STUDENT_2]
one: 4
whole: 2
third: 2
half: 1
green: 1
would: 1
blue: 1
student: 1
put: 1
three: 1
[STUDENT_1] [STUDENT_3] [STUDENT_0]
well: 1
bigger: 1
one: 1
[STUDENT_14]
yeah: 8
one: 6
sixth: 5
two: 4
yes: 3
hmm: 2
thirds: 2
twelfths: 2
red: 1
well: 1
[STUDENT_6]
one: 20
student: 16
bigger: 10
would: 10
two: 8
like: 7
red: 7
yeah: 6
sixth: 6
equal: 5
[STUDENT_16]
bigger: 2
boat: 2
children: 2
well: 1
thinking: 1
sizes: 1
made: 1
sea: 1
monster: 1
biggest: 1
[STUDENT_17]
rod: 5
orange: 4
red: 2
well: 1
fit: 1
would: 1
take: 1
white: 1
student: 1
stick: 1
[STUDENT_18]
yes: 1
[STUDENT_7]
student: 2
one: 2
yeah: 1
working: 1
write: 1
example: 1
see: 1
bigger: 1
half: 1
third: 1
[STUDENT_8]
one: 3
half: 2
third: 1
red: 1
[STUDENT_9]
one: 30
student: 27
third: 9
half: 9
would: 8
rod: 7
green: 7
purple: 6
red: 6
think: 6
[STUDENT_10]
saw: 4
student: 4
half: 3
third: 3
two: 3
agree: 2
bigger: 2
comes: 1
overhead: 1
places: 1
[STUDENT_11]
student: 5
well: 3
red: 3
like: 3
one: 3
third: 2
put: 2
tallest: 2
black: 2
green: 2
[STUDENT_12]
one: 34
student: 27
well: 22
half: 22
like: 22
two: 16
sixth: 15
split: 12
three: 11
third: 10
[STUDENT_9] [STUDENT_3] [STUDENT_10]
one: 3
three: 2
girls: 1
agreeing: 1
demonstrates: 1
said: 1
yeah: 1
put: 1
reds: 1
green: 1
[STUDENT_34]
student: 5
brown: 4
took: 4
purple: 4
tried: 2
another: 2
looked: 2
half: 2
orange: 1
red: 1
[STUDENT_19]
student: 39
cups: 27
yeah: 14
one: 12
make: 12
like: 12
flix: 11
pay: 11
sense: 11
teacher: 10
[STUDENT_22]
student: 23
cups: 10
cream: 8
teacher: 7
chocolate: 6
chocolates: 5
equals: 4
kept: 4
think: 4
going: 3
[STUDENT_36]
really: 1
get: 1
one: 1
[STUDENT_24]
student: 9
cindy: 6
cups: 5
valerie: 5
yeah: 3
row: 3
given: 3
would: 3
said: 3
columns: 2
[STUDENT_27]
student: 8
cups: 7
would: 5
think: 3
thought: 3
teacher: 3
chocolate: 3
wrote: 2
ingredients: 2
like: 2
[STUDENT_62]
student: 17
cup: 9
cream: 9
would: 7
cups: 6
teacher: 5
chocolate: 4
thought: 3
whole: 3
recipe: 3
[STUDENT_44]
blue: 26
student: 23
red: 22
one: 18
teacher: 15
yeah: 15
different: 9
know: 8
see: 8
make: 7
[STUDENT_45]
teacher: 28
would: 20
student: 15
towers: 9
see: 8
high: 8
like: 7
add: 6
make: 5
keep: 5
[STUDENT_46]
student: 27
teacher: 19
blue: 19
put: 19
red: 18
one: 16
could: 15
see: 15
top: 12
yeah: 9
[STUDENT_47]
blue: 48
student: 47
red: 27
one: 22
okay: 20
like: 17
right: 15
could: 14
yeah: 13
teacher: 13
[STUDENT_47] & [STUDENT_44]
towers: 1
yeah: 1
[STUDENT_45] & [STUDENT_46]
teacher: 1
colors: 1
[STUDENT_50]
teacher: 1
[STUDENT_46],
draw: 1
think: 1
~23
student: 3
easier: 1
maybe: 1
like: 1
shelly: 1
pattern: 1
put: 1
different: 1
category: 1
[STUDENT_53]
right: 1
yeah: 1
okay: 1
[STUDENT_45] & [STUDENT_44]
teacher: 1
[STUDENT_55]
sure: 1
find: 1
show: 1
little: 1
bit: 1
[STUDENT_59]
empty: 1
spaces: 1
boxes: 1
[STUDENT_60]
student: 1
[STUDENT_61]
simplest: 1
form: 1
[STUDENT_43]
cups: 5
student: 4
columns: 2
basically: 2
teacher: 2
chocolate: 2
two: 1
empty: 1
three: 1
together: 1
[STUDENT_31]
like: 8
people: 7
got: 6
low: 4
right: 3
student: 3
scores: 3
everything: 3
histogram: 3
think: 2
[STUDENT_32]
cups: 6
chocolate: 5
candy: 4
student: 3
cream: 3
teacher: 3
anthony: 2
makes: 2
first: 2
mixes: 2
[STUDENT_23]
student: 6
cups: 4
chocolate: 3
knew: 2
teacher: 2
yeah: 1
cup: 1
cream: 1
equals: 1
total: 1
[STUDENT_21]
student: 7
cups: 6
cream: 4
teacher: 4
chocolate: 4
every: 3
cup: 3
would: 2
think: 2
total: 2
[STUDENT_4]
two: 19
one: 18
student: 14
bigger: 12
half: 10
sixth: 10
would: 9
thirds: 8
fourths: 7
three: 7
[STUDENT_37]
one: 7
student: 6
four: 2
line: 2
add: 2
would: 2
third: 2
put: 2
side: 2
ten: 2
[STUDENT_33]
cups: 6
wrong: 2
thought: 2
chocolate: 2
student: 2
cream: 2
means: 2
question: 2
said: 2
[STUDENT_56]
student: 18
rent: 9
movie: 8
would: 7
gonna: 7
cost: 6
flix: 6
pay: 6
plus: 6
online: 5
[STUDENT_25]
one: 7
student: 6
look: 3
would: 3
well: 2
prices: 2
like: 2
first: 2
flix: 2
gonna: 2
[STUDENT_26]
excellent: 1
first: 1
one: 1
[STUDENT_28]
student: 12
would: 9
like: 7
chocolate: 6
cups: 6
teacher: 5
get: 5
candies: 4
every: 4
one: 3
[STUDENT_5]
one: 2
two: 2
three: 2
eighteenth: 1
twelfths: 1
four: 1
five: 1
six: 1
seven: 1
eight: 1
[STUDENT_35]
one: 14
green: 10
student: 8
light: 8
bigger: 7
half: 6
yeah: 5
six: 5
thought: 4
third: 4
[STUDENT_40]
hear: 2
think: 2
diagram: 1
well: 1
know: 1
say: 1
[STUDENT_41]
student: 4
thought: 2
cups: 2
cream: 2
multiplied: 2
another: 2
one: 2
answer: 2
misunderstood: 1
two: 1
[STUDENT_42]
cups: 4
answer: 2
total: 2
think: 1
person: 1
got: 1
mixed: 1
thinking: 1
times: 1
teacher: 1
[STUDENT_30]
candies: 2
cindy: 2
reading: 1
valerie: 1
shares: 1
box: 1
gave: 1
student: 1
candy: 1
every: 1
[TEACHER]/R1
student: 66
one: 42
number: 34
two: 32
name: 30
think: 23
names: 19
candy: 17
bar: 16
agree: 15
[STUDENT_38]
one: 2
third: 2
tentatively: 1
yes: 1
mmm: 1
hmm: 1
[STUDENT_39]
yeah: 1
dark: 1
blue: 1
[STUDENT_20]
problem: 2
think: 1
person: 1
answered: 1
read: 1
whole: 1
[STUDENT_15] [STUDENT_3] [STUDENT_12]
yeah: 1
Itβs a bit hard to see how the student and teacherβs vocabulary compare. Letβs run a log-odds analysis to see which words are more likely to be used by the student or teacher.
[ ]:
# This returns the merged dataframe of the annotated files in DATA_DIR.
df = lexical_analyzer.get_df()
# We want to create two groups of df: one for the student and one for the tutor.
student_df = df[df[SPEAKER_COLUMN].isin(STUDENT_REPLACEMENT_NAMES)]
teacher_df = df[df[SPEAKER_COLUMN].isin(TEACHER_REPLACEMENT_NAMES)]
# Now we can run the log-odds analysis:
lexical_analyzer.plot_log_odds(
df1=student_df,
df2=teacher_df,
text_column1=TEXT_COLUMN,
text_column2=TEXT_COLUMN,
# Let's name the df groups to show on the plot
group1_name="Student",
group2_name="Teacher",
# Let's also run the text formatting
run_text_formatting=True,
)
<Figure size 640x480 with 0 Axes>
[ ]:
# We might also be interested in other n-grams. Let's look at the top 10 bigrams per speaker.
lexical_analyzer.plot_log_odds(
df1=student_df,
df2=teacher_df,
text_column1=TEXT_COLUMN,
text_column2=TEXT_COLUMN,
group1_name="Student",
group2_name="Teacher",
run_text_formatting=True,
# n-grams:
run_ngrams=True,
n=2,
topk=10,
logodds_factor=0.5
)
<Figure size 640x480 with 0 Axes>
π Temporal Analysisο
Letβs look at the temporal trends of the talktime, student reasoning and uptake annotations!
[ ]:
# Cutting max num transcripts to 3 to show speakers
temporal_analyzer = TemporalAnalyzer(data_dir=ANNOTATIONS_DIR, max_transcripts=3)
# First let's look at the talk time ratio between the speakers over time.
temporal_analyzer.plot_temporal_statistics(
feature_column=TALK_TIME_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="prop",
# Let's create 10 bins for the x-axis.
num_bins=10,
label_mapping=speaker_mapping
)
<Figure size 640x480 with 0 Axes>
[ ]:
# Now student reasoning over time.
temporal_analyzer.plot_temporal_statistics(
feature_column=STUDENT_REASONING_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="avg",
# Let's create 5 bins for the x-axis.
num_bins=10,
label_mapping=speaker_mapping
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/temporal_analyzer.py:57: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
<Figure size 640x480 with 0 Axes>
[ ]:
# Finally, let's look at the tutor's uptake of the student's utterances over time.
temporal_analyzer.plot_temporal_statistics(
feature_column=UPTAKE_COLUMN,
speaker_column=SPEAKER_COLUMN,
value_as="avg",
# Let's create 5 bins for the x-axis.
num_bins=10,
label_mapping=speaker_mapping
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/temporal_analyzer.py:57: RuntimeWarning: invalid value encountered in double_scalars
f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
<Figure size 640x480 with 0 Axes>
π Conclusions and Next Stepsο
Great! From this tutorial, we learned how to use edu-convokit
to preprocess, annotate and analyze the Amber dataset. We saw how very simple principles built into edu-convokit
can be used to analyze the dataset and gain insights into the data from various perspectives (qualitative, quantitative, lexical and temporal).
Other resources you can check out include: - Tutorial on edu-convokit for the NCTE dataset - Tutorial on edu-convokit for the Amber dataset - `edu-convokit
documentation <https://edu-convokit.readthedocs.io/en/latest/index.html>`__ - `edu-convokit
GitHub repository <https://github.com/rosewang2008/edu-convokit/tree/main>`__
If you have any questions, please feel free to reach out to us on `edu-convokit
βs GitHub <https://github.com/rosewang2008/edu-convokit>`__.
π Happy exploring your data with edu-convokit
!
[ ]: