Tutorial on edu-convokit for the TalkMoves dataset

Welcome to the tutorial on edu-convokit for the TalkMoves dataset. This tutorial will walk you through the process of using edu-convokit to pre-process, annotate and analyze the TalkMoves dataset.

If you are looking for a tutorial on the individual components of edu-convokit, please refer to the following tutorials to get started: - Text Pre-processing Colab - Annotation Colab - Analysis Colab

This tutorial will use all of the components!

Installation

Let’s start by installing edu-convokit and importing the necessary modules.

[ ]:
!pip install git+https://github.com/rosewang2008/edu-convokit.git

Collecting git+https://github.com/rosewang2008/edu-convokit.git
  Cloning https://github.com/rosewang2008/edu-convokit.git to /tmp/pip-req-build-dgphjpe_
  Running command git clone --filter=blob:none --quiet https://github.com/rosewang2008/edu-convokit.git /tmp/pip-req-build-dgphjpe_
  Resolved https://github.com/rosewang2008/edu-convokit.git to commit 1e094c8836a3e3112cc1f996f5f12aeff013777c
  Preparing metadata (setup.py) ... done
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.66.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.11.4)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.8.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (2.1.0+cu121)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.35.2)
Collecting clean-text (from edu-convokit==0.0.1)
  Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.6.1)
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.3.2)
Collecting num2words==0.5.10 (from edu-convokit==0.0.1)
  Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.6/101.6 kB 1.5 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.2.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (0.12.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.5.3)
Collecting docopt>=0.6.2 (from num2words==0.5.10->edu-convokit==0.0.1)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting emoji<2.0.0,>=1.0.0 (from clean-text->edu-convokit==0.0.1)
  Downloading emoji-1.7.0.tar.gz (175 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 175.4/175.4 kB 5.1 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting ftfy<7.0,>=6.0 (from clean-text->edu-convokit==0.0.1)
  Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.4/53.4 kB 4.1 MB/s eta 0:00:00
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim->edu-convokit==0.0.1) (6.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (2.8.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (2023.6.3)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->edu-convokit==0.0.1) (1.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->edu-convokit==0.0.1) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->edu-convokit==0.0.1) (3.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.9)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.10.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.10.13)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (67.7.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.2.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2.1.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.19.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (6.0.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.15.0)
Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.4.1)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy<7.0,>=6.0->clean-text->edu-convokit==0.0.1) (0.2.12)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->edu-convokit==0.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2023.11.17)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy->edu-convokit==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->edu-convokit==0.0.1) (1.3.0)
Building wheels for collected packages: edu-convokit, docopt, emoji
  Building wheel for edu-convokit (setup.py) ... done
  Created wheel for edu-convokit: filename=edu_convokit-0.0.1-py3-none-any.whl size=25946 sha256=bacc5ae8cec78f73dd6432b9a641058237be062d59c7dcfcac080e9a19077bf3
  Stored in directory: /tmp/pip-ephem-wheel-cache-a92ctwua/wheels/29/43/ec/d2472df0eb2af8f1e7d67d0710a4b3eb93fe983b15f8d7b841
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=19f3926503485ba42f4fb35754933106263ea928f13b10e358a34f5f263f839a
  Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
  Building wheel for emoji (setup.py) ... done
  Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171033 sha256=0024d11da3567b1c7f328fd06e05831297bd61be31635baec2d057a050286c56
  Stored in directory: /root/.cache/pip/wheels/31/8a/8c/315c9e5d7773f74b33d5ed33f075b49c6eaeb7cedbb86e2cf8
Successfully built edu-convokit docopt emoji
Installing collected packages: emoji, docopt, num2words, ftfy, clean-text, edu-convokit
Successfully installed clean-text-0.6.0 docopt-0.6.2 edu-convokit-0.0.1 emoji-1.7.0 ftfy-6.1.3 num2words-0.5.10
[ ]:
from edu_convokit.preprocessors import TextPreprocessor
from edu_convokit.annotation import Annotator
from edu_convokit.analyzers import (
    QualitativeAnalyzer,
    QuantitativeAnalyzer,
    LexicalAnalyzer,
    TemporalAnalyzer
)
# For helping us load data
from edu_convokit import utils

import os
import tqdm
WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

πŸ“‘ Data

Let’s download the dataset under raw_data/. Note we’re only download a subsample of the dataset for this tutorial; this cuts down the annotation time. If you would like to annotate the entire dataset, feel free to upload the entire dataset to this Colab!

[ ]:
# We will put the data here:
DATA_DIR = "raw_data"
!mkdir -p $DATA_DIR

# We will put the annotated data here:
ANNOTATIONS_DIR = "annotations"
!mkdir -p $ANNOTATIONS_DIR

# # Download the data
!wget "https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip"

# # Unzip the data
!unzip -n -q talkmoves.zip -d $DATA_DIR

# Data directory is then raw_data/talkmoves
DATA_DIR = "raw_data/talkmoves"
--2023-12-30 11:46:56--  https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 346774 (339K) [application/zip]
Saving to: β€˜talkmoves.zip’

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================&gt;] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - β€˜talkmoves.zip’ saved [346774/346774]

</pre>

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================>] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - β€˜talkmoves.zip’ saved [346774/346774]

end{sphinxVerbatim}

talkmoves.zip 0%[ ] 0 –.-KB/s talkmoves.zip 100%[===================>] 338.65K –.-KB/s in 0.02s

2023-12-30 11:46:56 (17.2 MB/s) - β€˜talkmoves.zip’ saved [346774/346774]

[ ]:
# We'll set the important variables specific to this dataset. If you open one of the files, you'll see that the
# speaker and text columns are defined as:
TEXT_COLUMN = "Sentence"
SPEAKER_COLUMN = "Speaker"

# We will also define the annotation columns.
# For the purposes of this tutorial, we will only be using talktime, student_reasoning, and uptake.
TALK_TIME_COLUMN = "talktime"
STUDENT_REASONING_COLUMN = "student_reasoning"
UPTAKE_COLUMN = "uptake"

One thing that will be important is knowing how the teacher/tutor and student are represented in the dataset. Let’s load some examples and see how they are represented.

[ ]:
files = os.listdir(DATA_DIR)
files = [os.path.join(DATA_DIR, f) for f in files if utils.is_valid_file_extension(f)]

df = utils.merge_dataframes_in_list(files)
[ ]:
# Randomly show 10 rows
df.sample(10)
Unnamed: 0 TimeStamp Turn Speaker Sentence Teacher Tag Student Tag
5 NaN NaN 1.0 T/R1 Do you remember it looks like this. 1 - None NaN
94 94.0 NaN 49.0 T How many of you disagree? 3 - Getting Students to Relate NaN
53 53.0 NaN NaN Erik and Brian Yeah NaN 2 - Relating to Another Student
135 135.0 NaN 86.0 Mark If, if the blue was one whole, what would the ... NaN 3 - Asking for More Information
193 193.0 NaN 85.0 T Or the people who aren't sure want to tell us... 2 - Keeping Everyone Together NaN
93 93.0 NaN 17.0 T Joey? 2 - Keeping Everyone Together NaN
34 34.0 NaN 11.0 T I want to call the white rod one half. 1 - None NaN
42 42.0 NaN 31.0 Alan [Puts three light green rods on top of the blu... NaN 5 - Providing Evidence / Explaining Reasoning
46 NaN NaN 31.0 T/R1 Do the number names change? 2 - Keeping Everyone Together NaN
839 839.0 NaN 481.0 T Okay, but why, how could she be sure? 3 - Getting Students to Relate NaN

The students and teachers are inconsistently represented in the dataset. Let’s look at all the speakers in the dataset:

[ ]:
speaker_names = df[SPEAKER_COLUMN].unique()

print("Speaker names: ", speaker_names)

Speaker names:  ['T' 'David' 'Meredith' 'Beth' 'Meredith and David' 'T 2' 'Danielle' 'T2'
 'Gregory' 'Michael' 'Andrew' 'Laura' 'Jessica' 'Audra' 'Kelly' 'Brian'
 'Jessica and Audra' 'SS' 'Erik' 'Mark' 'Graham' 'Others' nan 'S' 'BRYAN'
 'DANIEL' 'ANDREW' 'CYNTHIA' 'SAURABH' 'STUDENT 1' 'MS. Liu' 'JAKE'
 'ASHANK' 'Alan' 'ALYSSA' 'SN' 'KEVIN' 'SI' 'Amy' 'Jackie' 'PARTNER'
 'TIMOTHY' 'Jacquelyn' 'T/R1' 'Students' 'Student' 'LINDA FISHER'
 'DEBORAH' 'Jason' 'CHARLOTTE' 'Jeff' 'Michelle' 'Milin' 'Stephanie'
 'Stephanie & Jeff' 'Michelle & Milin' 'Blonde' 'Milin,' '~23' 'All'
 'Michelle & Jeff' 'R2' 'CECILIO DIMAS' 'STUDENT' 'Erik and Brian'
 'SAMUEL' 'OSI' 'CLAIRE']

It seems like the teacher’s speaker name starts with T. The student’s speaker name are all other names. Let’s split the names into two groups: teacher and student.

[ ]:
# Let's remove nan speakers
speaker_names = [_ for _ in speaker_names if str(_) != "nan"]

# And let's make sure the names are interpreted as strings
speaker_names = [str(_) for _ in speaker_names]

# Create a regex for the teacher names
TEACHER_START_LETTER = "T"
TEACHER_SPEAKER = [_ for _ in speaker_names if _.startswith(TEACHER_START_LETTER)]
STUDENT_SPEAKER = [_ for _ in speaker_names if _ not in TEACHER_SPEAKER]

print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)
Teacher speaker:  ['T', 'T 2', 'T2', 'TIMOTHY', 'T/R1']
Student speaker:  ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE']

There are some names in the teacher list that belong to the student. Let’s just manually fix that:

[ ]:
FALSE_POSITIVE_NAMES = ["TIMOTHY"]

# Remove from the teacher speaker list
TEACHER_SPEAKER = [_ for _ in TEACHER_SPEAKER if _ not in FALSE_POSITIVE_NAMES]

# Add to the student speaker list
STUDENT_SPEAKER.extend(FALSE_POSITIVE_NAMES)

print("Teacher speaker: ", TEACHER_SPEAKER)
print("Student speaker: ", STUDENT_SPEAKER)
Teacher speaker:  ['T', 'T 2', 'T2', 'T/R1']
Student speaker:  ['David', 'Meredith', 'Beth', 'Meredith and David', 'Danielle', 'Gregory', 'Michael', 'Andrew', 'Laura', 'Jessica', 'Audra', 'Kelly', 'Brian', 'Jessica and Audra', 'SS', 'Erik', 'Mark', 'Graham', 'Others', 'S', 'BRYAN', 'DANIEL', 'ANDREW', 'CYNTHIA', 'SAURABH', 'STUDENT 1', 'MS. Liu', 'JAKE', 'ASHANK', 'Alan', 'ALYSSA', 'SN', 'KEVIN', 'SI', 'Amy', 'Jackie', 'PARTNER', 'Jacquelyn', 'Students', 'Student', 'LINDA FISHER', 'DEBORAH', 'Jason', 'CHARLOTTE', 'Jeff', 'Michelle', 'Milin', 'Stephanie', 'Stephanie & Jeff', 'Michelle & Milin', 'Blonde', 'Milin,', '~23', 'All', 'Michelle & Jeff', 'R2', 'CECILIO DIMAS', 'STUDENT', 'Erik and Brian', 'SAMUEL', 'OSI', 'CLAIRE', 'TIMOTHY']

πŸ“ Text Pre-Processing and Annotation

Let’s first preprocess and annotate the dataset with edu-convokit. The following section will: - Read each file in the dataset and preprocess it using edu-convokit’s Preprocessor. We’ll need to anonymize the names and merge the utterances by speaker. - Then, annotate the file using edu-convokit’s Annotator for talktime, student reasoning and uptake. - Finally, save the annotated file under data/annotated/.

Let’s get started!

[ ]:
# First, let's create the replacement names for the teacher and student speakers
TEACHER_REPLACEMENT_NAMES = ["[TEACHER]"] * len(TEACHER_SPEAKER)

# We will replace the student names with [STUDENT_0], [STUDENT_1], etc.
# This will approximately preserve the unique identity of each student, while also anonymizing them.
STUDENT_REPLACEMENT_NAMES = [f"[STUDENT_{i}]" for i in range(len(STUDENT_SPEAKER))]

[ ]:
# Initialize the preprocessor and annotator
processor = TextPreprocessor()
annotator = Annotator()

# This takes about 50 minutes on Colab, CPU
# Though this time varies depending on bandwidth
for filename in tqdm.tqdm(os.listdir(DATA_DIR)):
    if utils.is_valid_file_extension(filename):
        df = utils.load_data(os.path.join(DATA_DIR, filename))

    # Preprocess the data. Let's anonymize the names in the speaker column.
    df = processor.anonymize_known_names(
        df=df,
        text_column=SPEAKER_COLUMN,
        # We're going to directly replace the text in the speaker column with the anonymized text.
        target_text_column=SPEAKER_COLUMN,
        names=TEACHER_SPEAKER + STUDENT_SPEAKER,
        replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
    )

    # Now let's anonymize the names in the text column.
    df = processor.anonymize_known_names(
        df=df,
        text_column=TEXT_COLUMN,
        target_text_column=TEXT_COLUMN,
        names=TEACHER_SPEAKER + STUDENT_SPEAKER,
        replacement_names=TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES
    )

    # Now let's merge the utterances of the same speaker together and directly update the dataframe.
    df = processor.merge_utterances_from_same_speaker(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        target_text_column=TEXT_COLUMN
    )

    # Now we're going to annotate the data.
    df = annotator.get_talktime(
        df=df,
        text_column=TEXT_COLUMN,
        output_column=TALK_TIME_COLUMN
    )

    df = annotator.get_student_reasoning(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        output_column=STUDENT_REASONING_COLUMN,
        # We just want to annotate the student utterances. So we're going to specify the speaker value as STUDENT_SPEAKER.
        speaker_value=STUDENT_REPLACEMENT_NAMES
    )

    df = annotator.get_uptake(
        df=df,
        text_column=TEXT_COLUMN,
        speaker_column=SPEAKER_COLUMN,
        output_column=UPTAKE_COLUMN,
        # We want to annotate the teacher's uptake of the student's utterances.
        # So we're looking for instances where the student first speaks, then the teacher speaks.
        speaker1=STUDENT_REPLACEMENT_NAMES,
        speaker2=TEACHER_REPLACEMENT_NAMES
    )

    # And we're done! Let's now save the annotated data as a csv file.
    filename = filename.split(".")[0] + ".csv"
    df.to_csv(os.path.join(ANNOTATIONS_DIR, filename), index=False)
more-to-come:

class:

stderr

0%| | 0/30 [00:00&lt;?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

</pre>

0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

end{sphinxVerbatim}

0%| | 0/30 [00:00<?, ?it/s]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher's uptake of student's utterances. So, speaker1 should be the student and speaker2 should be the teacher.
    For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: It's recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
class:

stderr

3%|β–Ž | 1/30 [00:30&lt;14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
7%|β–‹ | 2/30 [00:39&lt;08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
10%|β–ˆ | 3/30 [00:51&lt;06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
13%|β–ˆβ–Ž | 4/30 [01:05&lt;06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
17%|β–ˆβ–‹ | 5/30 [01:14&lt;05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
20%|β–ˆβ–ˆ | 6/30 [01:20&lt;04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
23%|β–ˆβ–ˆβ–Ž | 7/30 [01:29&lt;03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
27%|β–ˆβ–ˆβ–‹ | 8/30 [01:35&lt;03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
30%|β–ˆβ–ˆβ–ˆ | 9/30 [01:42&lt;02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
33%|β–ˆβ–ˆβ–ˆβ–Ž | 10/30 [01:49&lt;02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
37%|β–ˆβ–ˆβ–ˆβ–‹ | 11/30 [01:56&lt;02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 12/30 [02:02&lt;02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 13/30 [02:12&lt;02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 14/30 [02:21&lt;02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 15/30 [02:32&lt;02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 16/30 [02:41&lt;02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 17/30 [02:54&lt;02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 18/30 [02:58&lt;01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 19/30 [03:05&lt;01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 20/30 [03:21&lt;01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 21/30 [03:26&lt;01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 22/30 [03:33&lt;01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 23/30 [04:31&lt;02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 24/30 [04:39&lt;01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 25/30 [04:52&lt;01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 26/30 [04:59&lt;00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 27/30 [05:05&lt;00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 28/30 [05:15&lt;00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 29/30 [05:20&lt;00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [05:26&lt;00:00, 10.87s/it] </pre>

3%|β–Ž | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
7%|β–‹ | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
10%|β–ˆ | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
13%|β–ˆβ–Ž | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
17%|β–ˆβ–‹ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
20%|β–ˆβ–ˆ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
23%|β–ˆβ–ˆβ–Ž | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
27%|β–ˆβ–ˆβ–‹ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
30%|β–ˆβ–ˆβ–ˆ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
33%|β–ˆβ–ˆβ–ˆβ–Ž | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
37%|β–ˆβ–ˆβ–ˆβ–‹ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [05:26<00:00, 10.87s/it] end{sphinxVerbatim}

3%|β–Ž | 1/30 [00:30<14:37, 30.25s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
7%|β–‹ | 2/30 [00:39<08:28, 18.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
10%|β–ˆ | 3/30 [00:51<06:48, 15.13s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
13%|β–ˆβ–Ž | 4/30 [01:05<06:22, 14.70s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
17%|β–ˆβ–‹ | 5/30 [01:14<05:19, 12.79s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
20%|β–ˆβ–ˆ | 6/30 [01:20<04:10, 10.42s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
23%|β–ˆβ–ˆβ–Ž | 7/30 [01:29<03:45, 9.78s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
27%|β–ˆβ–ˆβ–‹ | 8/30 [01:35<03:12, 8.77s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
30%|β–ˆβ–ˆβ–ˆ | 9/30 [01:42<02:49, 8.10s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
33%|β–ˆβ–ˆβ–ˆβ–Ž | 10/30 [01:49<02:32, 7.64s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
37%|β–ˆβ–ˆβ–ˆβ–‹ | 11/30 [01:56<02:21, 7.46s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 12/30 [02:02<02:06, 7.01s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 13/30 [02:12<02:14, 7.93s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 14/30 [02:21<02:15, 8.45s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 15/30 [02:32<02:17, 9.18s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 16/30 [02:41<02:08, 9.17s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 17/30 [02:54<02:11, 10.15s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 18/30 [02:58<01:41, 8.48s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 19/30 [03:05<01:28, 8.09s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 20/30 [03:21<01:43, 10.31s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 21/30 [03:26<01:18, 8.73s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 22/30 [03:33<01:05, 8.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 23/30 [04:31<02:42, 23.28s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 24/30 [04:39<01:51, 18.63s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 25/30 [04:52<01:23, 16.74s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 26/30 [04:59<00:56, 14.03s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 27/30 [05:05<00:34, 11.56s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 28/30 [05:15<00:22, 11.21s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 29/30 [05:20<00:09, 9.32s/it]WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.

For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf

WARNING:root:Note: This model was trained on teacher’s uptake of student’s utterances. So, speaker1 should be the student and speaker2 should be the teacher.

For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf

WARNING:root:Note: It’s recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker. 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [05:26<00:00, 10.87s/it]

Analysis

Now that we have annotated the dataset, let’s analyze it using edu-convokit’s Analyzer. We’ll be doing the following: - We’ll use QualitativeAnalyzer to look at some examples of the talktime, student reasoning and uptake annotations. - We’ll use QuantitativeAnalyzer to look at the aggregate statistics of the talktime, student reasoning and uptake annotations. - We’ll use LexicalAnalyzer to compare the student and tutor’s vocabulary. - We’ll use TemporalAnalyzer to look at the temporal trends of the talktime, student reasoning and uptake annotations.

Let’s get started!!!

πŸ” Qualitative Analysis

[ ]:
# We're going to look at examples from the entire dataset.
qualitative_analyzer = QualitativeAnalyzer(data_dir=ANNOTATIONS_DIR)

# Examples of talktime. Will show random examples from the dataset.
qualitative_analyzer.print_examples(
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    feature_column=TALK_TIME_COLUMN,
)
talktime: 54
>> [TEACHER]: Okay, that’s really great. Okay, you are really wonderful. We have about five more minutes where we can do some  problem solving [STUDENT_3] then put these things away. This is what  I would like you to do. I would like you to take a turn to  make a problem that will challenge your partner…

talktime: 54
>> [TEACHER]: I'm wondering which is bigger, one half or two thirds. Now  before you model it you might think in your head, before you begin  to model it what you is bigger [STUDENT_3] if so, if one is bigger, by how  much. Why don’t you work with your partner [STUDENT_3] see what you can  do.

talktime: 54
>> [TEACHER]: Let me write this  down. This, what you are saying here is so important, here. Let me  see if I can write this down. You're saying that you're calling the red,  you're giving red the number name, right? The length of the red,  right? We'll give it the number name, what did you say?

talktime: 4
>> [STUDENT_15]: I made a problem.

talktime: 4
>> [STUDENT_15]: You don’t know mine.

talktime: 4
>> [TEACHER]: What would be what?

talktime: 6
>> [TEACHER]: And ask your partner that problem,

talktime: 6
>> [TEACHER] [TEACHER]: This looks interesting. Are you experimenting?

talktime: 6
>> [TEACHER] [TEACHER]: You have to make it convincing.

[ ]:
# Examples of student reasoning. Let's look at positive examples:
qualitative_analyzer.print_examples(
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    feature_column=STUDENT_REASONING_COLUMN,
    feature_value=1.0,
)
student_reasoning: 1.0
>> [STUDENT_29]: [Puts three light green rods on top of the blue rod]  There,  that would, if you look down it would equal up to a blue.

student_reasoning: 1.0
>> [STUDENT_15]: And I know that that’s half of [He points to the orange rod],  [STUDENT_3] I know that yellow is half of orange, which is ten.

student_reasoning: 1.0
>> [STUDENT_1]: one half by one sixth. Cause if you put six ones up to a whole

[ ]:
# We can also look at negative examples:
qualitative_analyzer.print_examples(
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    feature_column=STUDENT_REASONING_COLUMN,
    feature_value=0.0,
)
student_reasoning: 0.0
>> [STUDENT_15]: You just don’t know the problem I just made up. You don’t  know the problem I made up.

student_reasoning: 0.0
>> [STUDENT_15]: I have a problem already. I have a problem. I have a  problem, remember it? If a light green, um, no, no.

student_reasoning: 0.0
>> [STUDENT_15]: I have one for you, too. [[TEACHER] [TEACHER] walks over]  I have a  problem.

[ ]:
# Examples of uptake.
qualitative_analyzer.print_examples(
    speaker_column=SPEAKER_COLUMN,
    text_column=TEXT_COLUMN,
    feature_column=UPTAKE_COLUMN,
    # I want to look at positive examples of uptake (uptake = 1.0)
    feature_value=1.0,
    # ... and look at the previous student utterance (show_k_previous_lines = 1).
    # This is interesting because it will show us how the teacher is responding to the student's utterance.
    show_k_previous_lines=1,
)
uptake: 1.0
[STUDENT_15]: If a light green was one third, what would be a whole?
>> [TEACHER]: What would be what?

uptake: 1.0
[STUDENT_15]: If a light green was one third, what would be a whole?
>> [TEACHER]: One, what would one be.

uptake: 1.0
[STUDENT_29]: [Puts three light green rods on top of the blue rod]  There,  that would, if you look down it would equal up to a blue.
>> [TEACHER]: Hold on, I’m a little confused. Tell me again. Six ones? You called  this one? What are you calling these?

πŸ“Š Quantitative Analysis

[ ]:
quantitative_analyzer = QuantitativeAnalyzer(data_dir=ANNOTATIONS_DIR)

# Let's create a speaker mapping to shorten the speaker names. Teacher -> T, Student{i} -> S{i}
speaker_mapping = {
    **{_: "T" for _ in TEACHER_REPLACEMENT_NAMES},
    **{_: f"S{i}" for i, _ in enumerate(STUDENT_REPLACEMENT_NAMES)}
}

# For figure formatting because there are a lot of students
import matplotlib.pyplot as plt
plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6))

# Let's plot the talk time ratio between the speakers.
quantitative_analyzer.plot_statistics(
    feature_column=TALK_TIME_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    # Proportion of talk time for each speaker.
    value_as="prop",
    label_mapping=speaker_mapping
)

# We can also print the statistics:
quantitative_analyzer.print_statistics(
    feature_column=TALK_TIME_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    # Proportion of talk time for each speaker.
    value_as="prop"
)
_images/tutorial_talkmoves_26_0.png
talktime

Proportion statistics
                                       count      mean       std       min       25%       50%       75%       max
speaker
[STUDENT_0]                              3.0  0.082532  0.055057  0.038502  0.051667  0.064833  0.104548  0.144262
[STUDENT_10]                             1.0  0.052941       NaN  0.052941  0.052941  0.052941  0.052941  0.052941
[STUDENT_11]                             2.0  0.050219  0.010940  0.042484  0.046351  0.050219  0.054087  0.057955
[STUDENT_12]                             6.0  0.103688  0.088372  0.004112  0.022533  0.114360  0.180103  0.195356
[STUDENT_14]                             5.0  0.006621  0.006594  0.000435  0.000936  0.004612  0.012336  0.014786
[STUDENT_15]                            11.0  0.149584  0.111150  0.002649  0.083878  0.131579  0.187356  0.379571
[STUDENT_15] [STUDENT_3] [STUDENT_12]    1.0  0.000801       NaN  0.000801  0.000801  0.000801  0.000801  0.000801
[STUDENT_16]                             2.0  0.013095  0.004014  0.010256  0.011676  0.013095  0.014514  0.015933
[STUDENT_17]                             3.0  0.016949  0.017947  0.001677  0.007066  0.012454  0.024586  0.036717
[STUDENT_18]                             1.0  0.000419       NaN  0.000419  0.000419  0.000419  0.000419  0.000419
[STUDENT_19]                            13.0  0.158800  0.186038  0.004929  0.051454  0.079588  0.146341  0.584345
[STUDENT_1]                              6.0  0.144681  0.167141  0.013691  0.025632  0.059223  0.273155  0.377049
[STUDENT_1] [STUDENT_3] [STUDENT_0]      1.0  0.005464       NaN  0.005464  0.005464  0.005464  0.005464  0.005464
[STUDENT_20]                             1.0  0.040346       NaN  0.040346  0.040346  0.040346  0.040346  0.040346
[STUDENT_21]                             2.0  0.170696  0.017246  0.158501  0.164599  0.170696  0.176794  0.182891
[STUDENT_22]                             6.0  0.085113  0.077821  0.044715  0.045282  0.050777  0.072001  0.241883
[STUDENT_23]                             2.0  0.087668  0.080432  0.030794  0.059231  0.087668  0.116106  0.144543
[STUDENT_24]                             4.0  0.090441  0.077632  0.031746  0.048115  0.062901  0.105227  0.204214
[STUDENT_25]                             2.0  0.161549  0.145949  0.058347  0.109948  0.161549  0.213149  0.264750
[STUDENT_26]                             1.0  0.011345       NaN  0.011345  0.011345  0.011345  0.011345  0.011345
[STUDENT_27]                             3.0  0.076598  0.056412  0.030794  0.045092  0.059390  0.099500  0.139610
[STUDENT_28]                             3.0  0.092144  0.015845  0.077796  0.083641  0.089485  0.099317  0.109149
[STUDENT_29]                             5.0  0.126505  0.055537  0.042045  0.104058  0.142857  0.158890  0.184676
[STUDENT_2]                              3.0  0.009829  0.007439  0.002186  0.006221  0.010256  0.013651  0.017045
[STUDENT_30]                             1.0  0.064877       NaN  0.064877  0.064877  0.064877  0.064877  0.064877
[STUDENT_31]                             4.0  0.066166  0.090450  0.002237  0.018617  0.031213  0.078761  0.200000
[STUDENT_32]                             3.0  0.089839  0.021829  0.077236  0.077236  0.077236  0.096140  0.115044
[STUDENT_33]                             2.0  0.069106  0.000000  0.069106  0.069106  0.069106  0.069106  0.069106
[STUDENT_34]                             2.0  0.023347  0.029145  0.002738  0.013043  0.023347  0.033652  0.043956
[STUDENT_35]                             3.0  0.054487  0.073891  0.001965  0.012241  0.022517  0.080748  0.138980
[STUDENT_36]                             1.0  0.011364       NaN  0.011364  0.011364  0.011364  0.011364  0.011364
[STUDENT_37]                             2.0  0.055127  0.010618  0.047619  0.051373  0.055127  0.058881  0.062635
[STUDENT_38]                             1.0  0.012500       NaN  0.012500  0.012500  0.012500  0.012500  0.012500
[STUDENT_39]                             1.0  0.005682       NaN  0.005682  0.005682  0.005682  0.005682  0.005682
[STUDENT_40]                             1.0  0.021669       NaN  0.021669  0.021669  0.021669  0.021669  0.021669
[STUDENT_41]                             1.0  0.048957       NaN  0.048957  0.048957  0.048957  0.048957  0.048957
[STUDENT_42]                             1.0  0.047352       NaN  0.047352  0.047352  0.047352  0.047352  0.047352
[STUDENT_43]                             2.0  0.026003  0.016571  0.014286  0.020144  0.026003  0.031862  0.037721
[STUDENT_44]                             1.0  0.132927       NaN  0.132927  0.132927  0.132927  0.132927  0.132927
[STUDENT_45]                             1.0  0.094036       NaN  0.094036  0.094036  0.094036  0.094036  0.094036
[STUDENT_45] & [STUDENT_44]              1.0  0.000145       NaN  0.000145  0.000145  0.000145  0.000145  0.000145
[STUDENT_45] & [STUDENT_46]              1.0  0.000871       NaN  0.000871  0.000871  0.000871  0.000871  0.000871
[STUDENT_46]                             1.0  0.133507       NaN  0.133507  0.133507  0.133507  0.133507  0.133507
[STUDENT_46],                            1.0  0.001016       NaN  0.001016  0.001016  0.001016  0.001016  0.001016
[STUDENT_47]                             1.0  0.193441       NaN  0.193441  0.193441  0.193441  0.193441  0.193441
[STUDENT_47] & [STUDENT_44]              1.0  0.000726       NaN  0.000726  0.000726  0.000726  0.000726  0.000726
[STUDENT_4]                              4.0  0.092072  0.110127  0.025918  0.028582  0.043212  0.106702  0.255947
[STUDENT_50]                             1.0  0.000145       NaN  0.000145  0.000145  0.000145  0.000145  0.000145
[STUDENT_53]                             1.0  0.000871       NaN  0.000871  0.000871  0.000871  0.000871  0.000871
[STUDENT_55]                             1.0  0.001887       NaN  0.001887  0.001887  0.001887  0.001887  0.001887
[STUDENT_56]                             1.0  0.735250       NaN  0.735250  0.735250  0.735250  0.735250  0.735250
[STUDENT_59]                             1.0  0.007937       NaN  0.007937  0.007937  0.007937  0.007937  0.007937
[STUDENT_5]                              1.0  0.019981       NaN  0.019981  0.019981  0.019981  0.019981  0.019981
[STUDENT_60]                             1.0  0.004762       NaN  0.004762  0.004762  0.004762  0.004762  0.004762
[STUDENT_61]                             1.0  0.004762       NaN  0.004762  0.004762  0.004762  0.004762  0.004762
[STUDENT_62]                             2.0  0.137668  0.115241  0.056180  0.096924  0.137668  0.178412  0.219156
[STUDENT_6]                              8.0  0.041322  0.042725  0.002649  0.009801  0.029687  0.059497  0.130818
[STUDENT_7]                              1.0  0.021569       NaN  0.021569  0.021569  0.021569  0.021569  0.021569
[STUDENT_8]                              2.0  0.004774  0.000643  0.004320  0.004547  0.004774  0.005001  0.005229
[STUDENT_9]                              6.0  0.047136  0.046222  0.005298  0.014495  0.040520  0.054591  0.131373
[STUDENT_9] [STUDENT_3] [STUDENT_10]     1.0  0.026797       NaN  0.026797  0.026797  0.026797  0.026797  0.026797
[TEACHER]                               27.0  0.577200  0.194013  0.252459  0.448436  0.586710  0.700936  0.936508
[TEACHER] [TEACHER]                      2.0  0.206389  0.017239  0.194199  0.200294  0.206389  0.212484  0.218579
[TEACHER]/R1                             2.0  0.722468  0.036595  0.696591  0.709529  0.722468  0.735406  0.748344
~23                                      1.0  0.002612       NaN  0.002612  0.002612  0.002612  0.002612  0.002612
<Figure size 640x480 with 0 Axes>
[ ]:

# For figure formatting because there are a lot of students import matplotlib.pyplot as plt plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6)) # What about the student reasoning? How often does the student use reasoning? quantitative_analyzer.plot_statistics( feature_column=STUDENT_REASONING_COLUMN, speaker_column=SPEAKER_COLUMN, # We change this to "avg" because we're now looking at within-speaker statistics. value_as="avg", # We can set the y-axis limits to [0, 1] because the student reasoning column is a binary column. yrange=(0, 1), label_mapping=speaker_mapping ) # We can also print the statistics: quantitative_analyzer.print_statistics( feature_column=STUDENT_REASONING_COLUMN, speaker_column=SPEAKER_COLUMN, value_as="avg" )
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
_images/tutorial_talkmoves_27_1.png
student_reasoning

Average statistics
                                       count      mean       std       min       25%       50%       75%       max
speaker
[STUDENT_0]                              3.0  0.714286  0.494872  0.142857  0.571429  1.000000  1.000000  1.000000
[STUDENT_10]                             1.0  0.333333       NaN  0.333333  0.333333  0.333333  0.333333  0.333333
[STUDENT_11]                             2.0  0.250000  0.353553  0.000000  0.125000  0.250000  0.375000  0.500000
[STUDENT_12]                             5.0  0.466667  0.361325  0.000000  0.333333  0.500000  0.500000  1.000000
[STUDENT_14]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_15]                             9.0  0.373016  0.363437  0.000000  0.071429  0.285714  0.500000  1.000000
[STUDENT_15] [STUDENT_3] [STUDENT_12]    0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_16]                             2.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_17]                             2.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_18]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_19]                            12.0  0.250000  0.349964  0.000000  0.000000  0.000000  0.437500  1.000000
[STUDENT_1]                              6.0  0.352381  0.377063  0.000000  0.053571  0.307143  0.475000  1.000000
[STUDENT_1] [STUDENT_3] [STUDENT_0]      0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_20]                             1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_21]                             2.0  0.750000  0.353553  0.500000  0.625000  0.750000  0.875000  1.000000
[STUDENT_22]                             6.0  0.222222  0.403687  0.000000  0.000000  0.000000  0.250000  1.000000
[STUDENT_23]                             2.0  0.500000  0.707107  0.000000  0.250000  0.500000  0.750000  1.000000
[STUDENT_24]                             4.0  0.562500  0.515388  0.000000  0.187500  0.625000  1.000000  1.000000
[STUDENT_25]                             2.0  0.700000  0.424264  0.400000  0.550000  0.700000  0.850000  1.000000
[STUDENT_26]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_27]                             3.0  0.666667  0.288675  0.500000  0.500000  0.500000  0.750000  1.000000
[STUDENT_28]                             3.0  0.833333  0.288675  0.500000  0.750000  1.000000  1.000000  1.000000
[STUDENT_29]                             5.0  0.428182  0.445552  0.000000  0.090909  0.250000  0.800000  1.000000
[STUDENT_2]                              2.0  0.500000  0.707107  0.000000  0.250000  0.500000  0.750000  1.000000
[STUDENT_30]                             1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_31]                             3.0  0.277778  0.254588  0.000000  0.166667  0.333333  0.416667  0.500000
[STUDENT_32]                             3.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_33]                             2.0  1.000000  0.000000  1.000000  1.000000  1.000000  1.000000  1.000000
[STUDENT_34]                             1.0  1.000000       NaN  1.000000  1.000000  1.000000  1.000000  1.000000
[STUDENT_35]                             2.0  0.392857  0.151523  0.285714  0.339286  0.392857  0.446429  0.500000
[STUDENT_36]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_37]                             2.0  0.833333  0.235702  0.666667  0.750000  0.833333  0.916667  1.000000
[STUDENT_38]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_39]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_40]                             1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_41]                             1.0  0.333333       NaN  0.333333  0.333333  0.333333  0.333333  0.333333
[STUDENT_42]                             1.0  1.000000       NaN  1.000000  1.000000  1.000000  1.000000  1.000000
[STUDENT_43]                             2.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_44]                             1.0  0.146341       NaN  0.146341  0.146341  0.146341  0.146341  0.146341
[STUDENT_45]                             1.0  0.189189       NaN  0.189189  0.189189  0.189189  0.189189  0.189189
[STUDENT_45] & [STUDENT_44]              0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_45] & [STUDENT_46]              0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_46]                             1.0  0.153846       NaN  0.153846  0.153846  0.153846  0.153846  0.153846
[STUDENT_46],                            0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_47]                             1.0  0.096154       NaN  0.096154  0.096154  0.096154  0.096154  0.096154
[STUDENT_47] & [STUDENT_44]              0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_4]                              4.0  0.221154  0.259675  0.000000  0.000000  0.192308  0.413462  0.500000
[STUDENT_50]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_53]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_55]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_56]                             1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_59]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_5]                              1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_60]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_61]                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_62]                             2.0  0.250000  0.353553  0.000000  0.125000  0.250000  0.375000  0.500000
[STUDENT_6]                              6.0  0.472222  0.452360  0.000000  0.083333  0.416667  0.875000  1.000000
[STUDENT_7]                              1.0  0.000000       NaN  0.000000  0.000000  0.000000  0.000000  0.000000
[STUDENT_8]                              0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[STUDENT_9]                              6.0  0.305556  0.400231  0.000000  0.000000  0.166667  0.458333  1.000000
[STUDENT_9] [STUDENT_3] [STUDENT_10]     0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[TEACHER]                                0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[TEACHER] [TEACHER]                      0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
[TEACHER]/R1                             0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
~23                                      0.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN


/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
<Figure size 640x480 with 0 Axes>

Note, the tutor has no student_reasoning because we did not annotate the tutor’s utterances for student reasoning. We can easily remove the tutor from the plot by dropping na values:

[ ]:

# For figure formatting because there are a lot of students import matplotlib.pyplot as plt plt.figure(figsize=(len(TEACHER_REPLACEMENT_NAMES + STUDENT_REPLACEMENT_NAMES)/2, 6)) quantitative_analyzer.plot_statistics( feature_column=STUDENT_REASONING_COLUMN, speaker_column=SPEAKER_COLUMN, value_as="avg", yrange=(0, 1), dropna=True, label_mapping=speaker_mapping )
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/quantitative_analyzer.py:51: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
_images/tutorial_talkmoves_29_1.png
<Figure size 640x480 with 0 Axes>
[ ]:
# Finally, let's look at the tutor's uptake of the student's utterances.
quantitative_analyzer.plot_statistics(
    feature_column=UPTAKE_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    value_as="avg",
    yrange=(0, 1),
    dropna=True
)
_images/tutorial_talkmoves_30_0.png
<Figure size 640x480 with 0 Axes>

πŸ’¬ Lexical Analysis

[ ]:
lexical_analyzer = LexicalAnalyzer(data_dir=ANNOTATIONS_DIR)

# Cast as str
lexical_analyzer._df[TEXT_COLUMN] = lexical_analyzer._df[TEXT_COLUMN].astype(str)

# Let's look at the most common words per speaker in the dataset.
lexical_analyzer.print_word_frequency(
    text_column=TEXT_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    # We want to look at the top 10 words per speaker.
    topk=10,
    # Let's also format the text (e.g., remove punctuation, lowercase the text, etc.)
    run_text_formatting=True
)

Top Words By Speaker
[TEACHER]
student: 695
one: 343
okay: 229
think: 170
would: 134
red: 101
right: 91
want: 87
two: 83
see: 82


[STUDENT_15]
one: 75
student: 58
two: 34
green: 23
yeah: 21
would: 20
half: 20
rod: 20
three: 20
call: 20


[TEACHER] [TEACHER]
one: 13
two: 10
okay: 9
student: 9
let: 8
interesting: 6
see: 5
twelfths: 4
hear: 3
clever: 3


[STUDENT_29]
one: 28
would: 16
student: 12
green: 11
rod: 10
three: 9
considered: 9
thirds: 9
red: 7
fifth: 6


[STUDENT_0]
one: 12
would: 8
two: 6
third: 4
student: 4
put: 4
green: 4
purples: 3
three: 3
yeah: 3


[STUDENT_1]
student: 43
two: 33
one: 30
put: 17
rod: 14
rods: 13
take: 11
bigger: 11
orange: 10
yeah: 10


[STUDENT_2]
one: 4
whole: 2
third: 2
half: 1
green: 1
would: 1
blue: 1
student: 1
put: 1
three: 1


[STUDENT_1] [STUDENT_3] [STUDENT_0]
well: 1
bigger: 1
one: 1


[STUDENT_14]
yeah: 8
one: 6
sixth: 5
two: 4
yes: 3
hmm: 2
thirds: 2
twelfths: 2
red: 1
well: 1


[STUDENT_6]
one: 20
student: 16
bigger: 10
would: 10
two: 8
like: 7
red: 7
yeah: 6
sixth: 6
equal: 5


[STUDENT_16]
bigger: 2
boat: 2
children: 2
well: 1
thinking: 1
sizes: 1
made: 1
sea: 1
monster: 1
biggest: 1


[STUDENT_17]
rod: 5
orange: 4
red: 2
well: 1
fit: 1
would: 1
take: 1
white: 1
student: 1
stick: 1


[STUDENT_18]
yes: 1


[STUDENT_7]
student: 2
one: 2
yeah: 1
working: 1
write: 1
example: 1
see: 1
bigger: 1
half: 1
third: 1


[STUDENT_8]
one: 3
half: 2
third: 1
red: 1


[STUDENT_9]
one: 30
student: 27
third: 9
half: 9
would: 8
rod: 7
green: 7
purple: 6
red: 6
think: 6


[STUDENT_10]
saw: 4
student: 4
half: 3
third: 3
two: 3
agree: 2
bigger: 2
comes: 1
overhead: 1
places: 1


[STUDENT_11]
student: 5
well: 3
red: 3
like: 3
one: 3
third: 2
put: 2
tallest: 2
black: 2
green: 2


[STUDENT_12]
one: 34
student: 27
well: 22
half: 22
like: 22
two: 16
sixth: 15
split: 12
three: 11
third: 10


[STUDENT_9] [STUDENT_3] [STUDENT_10]
one: 3
three: 2
girls: 1
agreeing: 1
demonstrates: 1
said: 1
yeah: 1
put: 1
reds: 1
green: 1


[STUDENT_34]
student: 5
brown: 4
took: 4
purple: 4
tried: 2
another: 2
looked: 2
half: 2
orange: 1
red: 1


[STUDENT_19]
student: 39
cups: 27
yeah: 14
one: 12
make: 12
like: 12
flix: 11
pay: 11
sense: 11
teacher: 10


[STUDENT_22]
student: 23
cups: 10
cream: 8
teacher: 7
chocolate: 6
chocolates: 5
equals: 4
kept: 4
think: 4
going: 3


[STUDENT_36]
really: 1
get: 1
one: 1


[STUDENT_24]
student: 9
cindy: 6
cups: 5
valerie: 5
yeah: 3
row: 3
given: 3
would: 3
said: 3
columns: 2


[STUDENT_27]
student: 8
cups: 7
would: 5
think: 3
thought: 3
teacher: 3
chocolate: 3
wrote: 2
ingredients: 2
like: 2


[STUDENT_62]
student: 17
cup: 9
cream: 9
would: 7
cups: 6
teacher: 5
chocolate: 4
thought: 3
whole: 3
recipe: 3


[STUDENT_44]
blue: 26
student: 23
red: 22
one: 18
teacher: 15
yeah: 15
different: 9
know: 8
see: 8
make: 7


[STUDENT_45]
teacher: 28
would: 20
student: 15
towers: 9
see: 8
high: 8
like: 7
add: 6
make: 5
keep: 5


[STUDENT_46]
student: 27
teacher: 19
blue: 19
put: 19
red: 18
one: 16
could: 15
see: 15
top: 12
yeah: 9


[STUDENT_47]
blue: 48
student: 47
red: 27
one: 22
okay: 20
like: 17
right: 15
could: 14
yeah: 13
teacher: 13


[STUDENT_47] & [STUDENT_44]
towers: 1
yeah: 1


[STUDENT_45] & [STUDENT_46]
teacher: 1
colors: 1


[STUDENT_50]
teacher: 1


[STUDENT_46],
draw: 1
think: 1


~23
student: 3
easier: 1
maybe: 1
like: 1
shelly: 1
pattern: 1
put: 1
different: 1
category: 1


[STUDENT_53]
right: 1
yeah: 1
okay: 1


[STUDENT_45] & [STUDENT_44]
teacher: 1


[STUDENT_55]
sure: 1
find: 1
show: 1
little: 1
bit: 1


[STUDENT_59]
empty: 1
spaces: 1
boxes: 1


[STUDENT_60]
student: 1


[STUDENT_61]
simplest: 1
form: 1


[STUDENT_43]
cups: 5
student: 4
columns: 2
basically: 2
teacher: 2
chocolate: 2
two: 1
empty: 1
three: 1
together: 1


[STUDENT_31]
like: 8
people: 7
got: 6
low: 4
right: 3
student: 3
scores: 3
everything: 3
histogram: 3
think: 2


[STUDENT_32]
cups: 6
chocolate: 5
candy: 4
student: 3
cream: 3
teacher: 3
anthony: 2
makes: 2
first: 2
mixes: 2


[STUDENT_23]
student: 6
cups: 4
chocolate: 3
knew: 2
teacher: 2
yeah: 1
cup: 1
cream: 1
equals: 1
total: 1


[STUDENT_21]
student: 7
cups: 6
cream: 4
teacher: 4
chocolate: 4
every: 3
cup: 3
would: 2
think: 2
total: 2


[STUDENT_4]
two: 19
one: 18
student: 14
bigger: 12
half: 10
sixth: 10
would: 9
thirds: 8
fourths: 7
three: 7


[STUDENT_37]
one: 7
student: 6
four: 2
line: 2
add: 2
would: 2
third: 2
put: 2
side: 2
ten: 2


[STUDENT_33]
cups: 6
wrong: 2
thought: 2
chocolate: 2
student: 2
cream: 2
means: 2
question: 2
said: 2


[STUDENT_56]
student: 18
rent: 9
movie: 8
would: 7
gonna: 7
cost: 6
flix: 6
pay: 6
plus: 6
online: 5


[STUDENT_25]
one: 7
student: 6
look: 3
would: 3
well: 2
prices: 2
like: 2
first: 2
flix: 2
gonna: 2


[STUDENT_26]
excellent: 1
first: 1
one: 1


[STUDENT_28]
student: 12
would: 9
like: 7
chocolate: 6
cups: 6
teacher: 5
get: 5
candies: 4
every: 4
one: 3


[STUDENT_5]
one: 2
two: 2
three: 2
eighteenth: 1
twelfths: 1
four: 1
five: 1
six: 1
seven: 1
eight: 1


[STUDENT_35]
one: 14
green: 10
student: 8
light: 8
bigger: 7
half: 6
yeah: 5
six: 5
thought: 4
third: 4


[STUDENT_40]
hear: 2
think: 2
diagram: 1
well: 1
know: 1
say: 1


[STUDENT_41]
student: 4
thought: 2
cups: 2
cream: 2
multiplied: 2
another: 2
one: 2
answer: 2
misunderstood: 1
two: 1


[STUDENT_42]
cups: 4
answer: 2
total: 2
think: 1
person: 1
got: 1
mixed: 1
thinking: 1
times: 1
teacher: 1


[STUDENT_30]
candies: 2
cindy: 2
reading: 1
valerie: 1
shares: 1
box: 1
gave: 1
student: 1
candy: 1
every: 1


[TEACHER]/R1
student: 66
one: 42
number: 34
two: 32
name: 30
think: 23
names: 19
candy: 17
bar: 16
agree: 15


[STUDENT_38]
one: 2
third: 2
tentatively: 1
yes: 1
mmm: 1
hmm: 1


[STUDENT_39]
yeah: 1
dark: 1
blue: 1


[STUDENT_20]
problem: 2
think: 1
person: 1
answered: 1
read: 1
whole: 1


[STUDENT_15] [STUDENT_3] [STUDENT_12]
yeah: 1



It’s a bit hard to see how the student and teacher’s vocabulary compare. Let’s run a log-odds analysis to see which words are more likely to be used by the student or teacher.

[ ]:
# This returns the merged dataframe of the annotated files in DATA_DIR.
df = lexical_analyzer.get_df()

# We want to create two groups of df: one for the student and one for the tutor.
student_df = df[df[SPEAKER_COLUMN].isin(STUDENT_REPLACEMENT_NAMES)]
teacher_df = df[df[SPEAKER_COLUMN].isin(TEACHER_REPLACEMENT_NAMES)]

# Now we can run the log-odds analysis:
lexical_analyzer.plot_log_odds(
    df1=student_df,
    df2=teacher_df,
    text_column1=TEXT_COLUMN,
    text_column2=TEXT_COLUMN,
    # Let's name the df groups to show on the plot
    group1_name="Student",
    group2_name="Teacher",
    # Let's also run the text formatting
    run_text_formatting=True,
)

_images/tutorial_talkmoves_34_0.png
<Figure size 640x480 with 0 Axes>
[ ]:
# We might also be interested in other n-grams. Let's look at the top 10 bigrams per speaker.
lexical_analyzer.plot_log_odds(
    df1=student_df,
    df2=teacher_df,
    text_column1=TEXT_COLUMN,
    text_column2=TEXT_COLUMN,
    group1_name="Student",
    group2_name="Teacher",
    run_text_formatting=True,
    # n-grams:
    run_ngrams=True,
    n=2,
    topk=10,
    logodds_factor=0.5
)
_images/tutorial_talkmoves_35_0.png
<Figure size 640x480 with 0 Axes>

πŸ“ˆ Temporal Analysis

Let’s look at the temporal trends of the talktime, student reasoning and uptake annotations!

[ ]:
# Cutting max num transcripts to 3 to show speakers
temporal_analyzer = TemporalAnalyzer(data_dir=ANNOTATIONS_DIR, max_transcripts=3)

# First let's look at the talk time ratio between the speakers over time.
temporal_analyzer.plot_temporal_statistics(
    feature_column=TALK_TIME_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    value_as="prop",
    # Let's create 10 bins for the x-axis.
    num_bins=10,
    label_mapping=speaker_mapping
)
_images/tutorial_talkmoves_37_0.png
<Figure size 640x480 with 0 Axes>
[ ]:
# Now student reasoning over time.
temporal_analyzer.plot_temporal_statistics(
    feature_column=STUDENT_REASONING_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    value_as="avg",
    # Let's create 5 bins for the x-axis.
    num_bins=10,
    label_mapping=speaker_mapping
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/temporal_analyzer.py:57: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
_images/tutorial_talkmoves_38_1.png
<Figure size 640x480 with 0 Axes>
[ ]:
# Finally, let's look at the tutor's uptake of the student's utterances over time.
temporal_analyzer.plot_temporal_statistics(
    feature_column=UPTAKE_COLUMN,
    speaker_column=SPEAKER_COLUMN,
    value_as="avg",
    # Let's create 5 bins for the x-axis.
    num_bins=10,
    label_mapping=speaker_mapping
)
/usr/local/lib/python3.10/dist-packages/edu_convokit/analyzers/temporal_analyzer.py:57: RuntimeWarning: invalid value encountered in double_scalars
  f"prop_speaker_{feature_column}": speaker_df[feature_column].sum() / feature_sum,
_images/tutorial_talkmoves_39_1.png
<Figure size 640x480 with 0 Axes>

πŸ“š Conclusions and Next Steps

Great! From this tutorial, we learned how to use edu-convokit to preprocess, annotate and analyze the Amber dataset. We saw how very simple principles built into edu-convokit can be used to analyze the dataset and gain insights into the data from various perspectives (qualitative, quantitative, lexical and temporal).

Other resources you can check out include: - Tutorial on edu-convokit for the NCTE dataset - Tutorial on edu-convokit for the Amber dataset - `edu-convokit documentation <https://edu-convokit.readthedocs.io/en/latest/index.html>`__ - `edu-convokit GitHub repository <https://github.com/rosewang2008/edu-convokit/tree/main>`__

If you have any questions, please feel free to reach out to us on `edu-convokit’s GitHub <https://github.com/rosewang2008/edu-convokit>`__.

πŸ‘‹ Happy exploring your data with edu-convokit!

[ ]: