Tutorial on Annotating for Education Language Dataο
Welcome to this tutorial on using `edu-convokit
<https://github.com/rosewang2008/edu-convokit>`__ for annotating your education language data! Annotation is a critical step in understanding your data, and it is important to do it right & consistently across datasets. edu-convokit
is designed to help you do just that.
Annotation is useful because: - It creates descriptive statistics about your data, which can help you understand the data. - It quantifies the language used by your students and educators, which can help you understand the language. - It measures the interaction between the student and the educator, which can help you understand the interaction.
edu-convokit
is designed to support these purposes.
π Learning Objectivesο
In this tutorial, you will learn how to use Annotator
to annotate your data. Some of the annotations weβll cover include: - Section Link π Talk Time: We will annotate the amount of time the student and educator talk. - Section Link π Student Reasoning: We will annotate use of reasoning in the studentβs speech. - Section Link π Teacher Focusing Questions: We will annotate the use of focusing questions by the educator. - Section Link π Conversational Uptake: We will annotate instances of high
conversational uptake by the educator.
For other annotations, please refer to the documentation for more information. If you want to add your own annotations, please make a pull request to the repo.
Without further ado, letβs get started!
Installationο
First, install edu-convokit
:
[ ]:
!pip install git+https://github.com/rosewang2008/edu-convokit.git
Collecting git+https://github.com/rosewang2008/edu-convokit.git
Cloning https://github.com/rosewang2008/edu-convokit.git to /tmp/pip-req-build-s81zucpt
Running command git clone --filter=blob:none --quiet https://github.com/rosewang2008/edu-convokit.git /tmp/pip-req-build-s81zucpt
Resolved https://github.com/rosewang2008/edu-convokit.git to commit 8eb087b51abfa36a7031bf1de4e3dc40d8848186
Preparing metadata (setup.py) ... done
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.66.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.23.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.11.4)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.8.1)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (2.1.0+cu121)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.35.2)
Collecting clean-text (from edu-convokit==0.0.1)
Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Requirement already satisfied: openpyxl in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.6.1)
Requirement already satisfied: gensim in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (4.3.2)
Collecting num2words==0.5.10 (from edu-convokit==0.0.1)
Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
ββββββββββββββββββββββββββββββββββββββββ 101.6/101.6 kB 1.9 MB/s eta 0:00:00
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.2.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (0.12.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from edu-convokit==0.0.1) (1.5.3)
Collecting docopt>=0.6.2 (from num2words==0.5.10->edu-convokit==0.0.1)
Downloading docopt-0.6.2.tar.gz (25 kB)
Preparing metadata (setup.py) ... done
Collecting emoji<2.0.0,>=1.0.0 (from clean-text->edu-convokit==0.0.1)
Downloading emoji-1.7.0.tar.gz (175 kB)
ββββββββββββββββββββββββββββββββββββββββ 175.4/175.4 kB 6.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting ftfy<7.0,>=6.0 (from clean-text->edu-convokit==0.0.1)
Downloading ftfy-6.1.3-py3-none-any.whl (53 kB)
ββββββββββββββββββββββββββββββββββββββββ 53.4/53.4 kB 4.0 MB/s eta 0:00:00
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.10/dist-packages (from gensim->edu-convokit==0.0.1) (6.4.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (4.46.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->edu-convokit==0.0.1) (2.8.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->edu-convokit==0.0.1) (2023.6.3)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl->edu-convokit==0.0.1) (1.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->edu-convokit==0.0.1) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->edu-convokit==0.0.1) (3.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.0.9)
Requirement already satisfied: thinc<8.2.0,>=8.1.8 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (8.1.12)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.9.0)
Requirement already satisfied: pathy>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (0.10.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (2.31.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (1.10.13)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (67.7.2)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy->edu-convokit==0.0.1) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (3.2.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->edu-convokit==0.0.1) (2.1.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.19.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (6.0.1)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.15.0)
Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->edu-convokit==0.0.1) (0.4.1)
Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /usr/local/lib/python3.10/dist-packages (from ftfy<7.0,>=6.0->clean-text->edu-convokit==0.0.1) (0.2.12)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->edu-convokit==0.0.1) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy->edu-convokit==0.0.1) (2023.11.17)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.2.0,>=8.1.8->spacy->edu-convokit==0.0.1) (0.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy->edu-convokit==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->edu-convokit==0.0.1) (1.3.0)
Building wheels for collected packages: edu-convokit, docopt, emoji
Building wheel for edu-convokit (setup.py) ... done
Created wheel for edu-convokit: filename=edu_convokit-0.0.1-py3-none-any.whl size=24897 sha256=75b716d442ceadb81276002313317e704cbbe06fb62edd60a030a7b046e9640e
Stored in directory: /tmp/pip-ephem-wheel-cache-jx3bvie0/wheels/29/43/ec/d2472df0eb2af8f1e7d67d0710a4b3eb93fe983b15f8d7b841
Building wheel for docopt (setup.py) ... done
Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=ced3a42c8c907ca91870ff912e53bf52cf37f098a8c290821ce4318edfdb98e9
Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
Building wheel for emoji (setup.py) ... done
Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171033 sha256=3c0f7527379333472f637243ab34d271bb000da7d4ef44284076515a8f8b647e
Stored in directory: /root/.cache/pip/wheels/31/8a/8c/315c9e5d7773f74b33d5ed33f075b49c6eaeb7cedbb86e2cf8
Successfully built edu-convokit docopt emoji
Installing collected packages: emoji, docopt, num2words, ftfy, clean-text, edu-convokit
Successfully installed clean-text-0.6.0 docopt-0.6.2 edu-convokit-0.0.1 emoji-1.7.0 ftfy-6.1.3 num2words-0.5.10
[ ]:
from edu_convokit.annotation import Annotator
# We're going to standardize the text with TextPreprocessor.
# In this tutorial, we're going to assume you're familiar with TextPreprocessor.
# For the tutorial on TextPreprocessor, see: https://colab.research.google.com/drive/1a-EwYwkNYHSNcNThNTXe6DNpsis0bpQK
from edu_convokit.preprocessors import TextPreprocessor
# For helping us flexibly load data
from edu_convokit import utils
WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
π Dataο
Letβs load the data weβll be working with. Weβre going to be using a transcript from the TalkMoves dataset.
Weβre also going to use TextPreprocessor
to anonymize and pre-process the data. This is optional, but recommended.
For the tutorial on TextPreprocessor
, please refer to this tutorial. Here, weβre going to assume youβre familiar with TextPreprocessor
.
[ ]:
!wget "https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves/Boats and Fish 2_Grade 4.xlsx"
data_fname = "Boats and Fish 2_Grade 4.xlsx"
df = utils.load_data(data_fname) # Handles loading data from different file types including: .csv, .xlsx, .json
# We're going to first standardize the text with TextPreprocessor as done from our last tutorial: anonymize and merge utterances from the same speaker
processor = TextPreprocessor()
TEXT_COLUMN = "Sentence"
SPEAKER_COLUMN = "Speaker"
known_names = ["David", "Meredith", "Beth"]
known_replacement_names = [f"[STUDENT_{i}]" for i in range(len(known_names))]
# Anonymize text
df = processor.anonymize_known_names(
df=df,
text_column=TEXT_COLUMN,
names=known_names,
replacement_names=known_replacement_names,
target_text_column=TEXT_COLUMN
)
# Anonymize speakers
df = processor.anonymize_known_names(
df=df,
text_column=SPEAKER_COLUMN,
target_text_column=SPEAKER_COLUMN,
names=known_names,
replacement_names=known_replacement_names
)
# Merge utterances
df = processor.merge_utterances_from_same_speaker(
df=df,
text_column=TEXT_COLUMN,
speaker_column=SPEAKER_COLUMN,
target_text_column=TEXT_COLUMN
)
# Show
df.head()
--2023-12-30 10:29:37-- https://raw.githubusercontent.com/rosewang2008/edu-convokit/master/data/talkmoves/Boats%20and%20Fish%202_Grade%204.xlsx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10528 (10K) [application/octet-stream]
Saving to: βBoats and Fish 2_Grade 4.xlsxβ
Boats and Fish 2_Gr 100%[===================>] 10.28K --.-KB/s in 0.001s
2023-12-30 10:29:38 (14.2 MB/s) - βBoats and Fish 2_Grade 4.xlsxβ saved [10528/10528]
Sentence | Speaker | |
---|---|---|
0 | I'm wondering which is bigger, one half or two... | T |
1 | Try the purples. Get three purples. It doesnβt... | [STUDENT_0] |
2 | What was it? Two thirds? | [STUDENT_1] |
3 | It would be like brown or something like that. | [STUDENT_0] |
4 | Ok | [STUDENT_1] |
π Annotating Talk Timeο
Letβs start by annotating the amount of time the student and educator talk. We will define talk time as the number of words in TEXT_COLUMN
. However, if you have metadata about the length of the audio, you can also use that to annotate talk time. Please refer to the documentation for more information.
[ ]:
annotator = Annotator()
# The talktime values will be populated in this column
TALK_TIME_COLUMN = "talktime"
df = annotator.get_talktime(
df=df,
text_column=TEXT_COLUMN,
analysis_unit="words",
output_column=TALK_TIME_COLUMN
)
df.head()
Sentence | Speaker | talktime | |
---|---|---|---|
0 | I'm wondering which is bigger, one half or two... | T | 54 |
1 | Try the purples. Get three purples. It doesnβt... | [STUDENT_0] | 12 |
2 | What was it? Two thirds? | [STUDENT_1] | 5 |
3 | It would be like brown or something like that. | [STUDENT_0] | 9 |
4 | Ok | [STUDENT_1] | 1 |
π We can see with a single function call, weβve added our first annotation β talktime
β to our data!
All the other annotations work in a similar way. Letβs continue!
π Annotating Student Reasoningο
Next, letβs annotate the studentβs reasoning. Under the hood, weβre using a model trained on studentβs math reasoning from prior work. Soβ¦
π‘ Note: - This model is trained on math reasoning, so it may not work well on other subjects. - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on student utterances. edu-convokit
has a simple way to only annotate student utterances, which weβll see below.
[ ]:
# The reasoning annotations will be populated in this column
STUDENT_REASONING_COLUMN = "student_reasoning"
df = annotator.get_student_reasoning(
df=df,
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
output_column=STUDENT_REASONING_COLUMN,
# Since this model is only trained on _student_ utterances,
# we can explicitly pass in the speaker names associated to students.
# It will only annotate utterances from these speakers.
speaker_value=known_replacement_names,
)
df.head()
WARNING:root:Note: This model was trained on student reasoning, so it should be used on student utterances.
For more details on the model, see https://arxiv.org/pdf/2211.11772.pdf
Sentence | Speaker | talktime | student_reasoning | |
---|---|---|---|---|
0 | I'm wondering which is bigger, one half or two... | T | 54 | NaN |
1 | Try the purples. Get three purples. It doesnβt... | [STUDENT_0] | 12 | 0.0 |
2 | What was it? Two thirds? | [STUDENT_1] | 5 | NaN |
3 | It would be like brown or something like that. | [STUDENT_0] | 9 | 0.0 |
4 | Ok | [STUDENT_1] | 1 | NaN |
π Great! Weβve added our second annotation β student_reasoning
β to our data!
π‘ Note: - student_reasoning
is NaN for the educatorβs utterances as desired. - Otherwise, for the students, student_reasoning
is either 1.0 or 0.0. 1.0 means the model thinks the student is using reasoning, and 0.0 means the model thinks the student is not using reasoning.
π‘ Are you wondering whether thereβs an easy way to view examples of the studentβs reasoning?
edu-convokit
has a simple way to do this with our Analyzer
s. This will be covered in the tutorial on Analyzer
s: link. For now, letβs continue annotating!
π Annotating Teacher Focusing Questionsο
Letβs annotate the educatorβs use of focusing questions. Under the hood, weβre using a model trained on focusing questions in math classrooms from prior work. Soβ¦
π‘ Note: - This model is trained on math classroom data, so it may not work well on other subjects. - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on teacher utterances. edu-convokit
has a simple way to only annotate teacher utterances which is similar to the one we saw above for student utterances.
[ ]:
# The focusing questions annotation will be populated in this column
FOCUSING_QUESTIONS_COLUMN = "focusing_questions"
df = annotator.get_focusing_questions(
df=df,
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
output_column=FOCUSING_QUESTIONS_COLUMN,
# Since this model is only trained on _teacher_ utterances,
# we can explicitly pass in the speaker names associated to the teacher.
speaker_value=['T']
)
df.head()
WARNING:root:Note: This model was trained on teacher focusing questions, so it should be used on teacher utterances.
For more details on the model, see https://aclanthology.org/2022.bea-1.27.pdf
Sentence | Speaker | talktime | student_reasoning | focusing_questions | |
---|---|---|---|---|---|
0 | I'm wondering which is bigger, one half or two... | T | 54 | NaN | 0.0 |
1 | Try the purples. Get three purples. It doesnβt... | [STUDENT_0] | 12 | 0.0 | NaN |
2 | What was it? Two thirds? | [STUDENT_1] | 5 | NaN | NaN |
3 | It would be like brown or something like that. | [STUDENT_0] | 9 | 0.0 | NaN |
4 | Ok | [STUDENT_1] | 1 | NaN | NaN |
π Great! Weβve added our third annotation β focusing_questions
β to our data!
π‘ Note: - focusing_questions
is NaN for the student utterances. - Similar to before, focusing_questions
is either 1.0 or 0.0. 1.0 means the model thinks the educator is using a focusing question, and 0.0 means the model thinks the educator is not using a focusing question.
π Annotating Conversational Uptakeο
Letβs annotate the educatorβs conversational uptake of the student. Under the hood, weβre using a model trained from prior work. It measures whether the educator builds on the contribution of the studentβs utterance.
Soβ¦
π‘ Note: - This model will run slow on CPU, so we recommend using a GPU. If you have a GPU, this library will automatically use it. - This model is trained on teacher utterances following student utterances. edu-convokit
has a simple way to only annotate these teacher utterances which is similar to the function calls we saw before.
[ ]:
UPTAKE_COLUMN = "uptake"
df = annotator.get_uptake(
df=df,
speaker_column=SPEAKER_COLUMN,
text_column=TEXT_COLUMN,
output_column=UPTAKE_COLUMN,
# We want to specify the first speaker to be the students.
speaker1=known_replacement_names,
# We want to specify the second speaker to be the teacher
speaker2='T'
)
WARNING:root:Note: This model was trained on teacher's uptake of student's utterances. So, speaker1 should be the student and speaker2 should be the teacher.
For more details on the model, see https://arxiv.org/pdf/2106.03873.pdf
WARNING:root:Note: It's recommended that you merge utterances from the same speaker before running this model. You can do that with edu_convokit.text_preprocessing.merge_utterances_from_same_speaker.
[ ]:
df.head(20)
Sentence | Speaker | talktime | student_reasoning | focusing_questions | uptake | |
---|---|---|---|---|---|---|
0 | I'm wondering which is bigger, one half or two... | T | 54 | NaN | 0.0 | NaN |
1 | Try the purples. Get three purples. It doesnβt... | [STUDENT_0] | 12 | 0.0 | NaN | NaN |
2 | What was it? Two thirds? | [STUDENT_1] | 5 | NaN | NaN | NaN |
3 | It would be like brown or something like that. | [STUDENT_0] | 9 | 0.0 | NaN | NaN |
4 | Ok | [STUDENT_1] | 1 | NaN | NaN | NaN |
5 | Weβre not doing the one third, weβre doing two... | [STUDENT_0] | 14 | 0.0 | NaN | NaN |
6 | First weβve got to find out what a third of it... | [STUDENT_1] | 18 | 0.0 | NaN | NaN |
7 | One third? | [STUDENT_0] | 2 | NaN | NaN | NaN |
8 | Whatβs third of an orange? Letβs start a diffe... | [STUDENT_1] | 21 | 0.0 | NaN | NaN |
9 | Alright, yeah, I was thinking of that way before | [STUDENT_0] | 9 | 0.0 | NaN | NaN |
10 | And you can take the take the red, and the lig... | [STUDENT_1] | 35 | 0.0 | NaN | NaN |
11 | She asked, which is bigger, one half or two th... | [STUDENT_0] | 10 | 0.0 | NaN | NaN |
12 | One half or two thirds? Now take six of the ones | [STUDENT_1] | 11 | 0.0 | NaN | NaN |
13 | Yeah, I know, and put βem up to there, and tha... | [STUDENT_0] | 30 | 0.0 | NaN | NaN |
14 | Now take six of the ones Which is bigger? | T | 9 | NaN | 0.0 | 1.0 |
15 | One half | [STUDENT_2] | 2 | NaN | NaN | NaN |
16 | I think one half is... | [STUDENT_0] | 5 | NaN | NaN | NaN |
17 | Yes, [STUDENT_0] and [STUDENT_1]? | T | 4 | NaN | 0.0 | 0.0 |
18 | What do you have? | [STUDENT_0] | 4 | NaN | NaN | NaN |
19 | Well | [STUDENT_1] and [STUDENT_0] | 1 | NaN | NaN | NaN |
π Great, we finished our last annotation of the tutorial!
With these annotations, we can now do some analysis on our data.
We can save our annotated data to a file which weβll use in the next tutorial on Analyzer
s: link.
[ ]:
df.to_csv("annotated_data.csv", index=False)
π Conclusion and Where to Go From Hereο
In this tutorial, we learned how to use Annotator
to annotate our data. With one simple function call, we were able to annotate: - Talk Time - Student Reasoning - Teacher Focusing Questions - Conversational Uptake
What are some natural next steps? - You can annotate with other features. Please refer to the documentation for an exhaustive list of features. Or, you can add your own features by making a pull request to the repo. - You can analyze your data with `edu-convokit
βs Analyzer
<https://colab.research.google.com/drive/1xfrq5Ka3FZH7t9l87u4sa_oMlmMvuTfe>`__. - For a tutorial on Analyzer
,
please refer to this tutorial. - For the documentation on Analyzer
, please refer to this documentation.
If you have any questions, please feel free to reach out to us on `edu-convokit
βs GitHub <https://github.com/rosewang2008/edu-convokit>`__.
π Happy exploring your data with edu-convokit
!
[ ]: