Annotation

This page contains the documentation for the annotation module. Our annotation module includes both traditional and recent neural methods for annotating text.

Annotator

class edu_convokit.annotation.Annotator[source]

Annotator class for edu-convokit. Contains methods for annotating data.

__init__()[source]
get_focusing_questions(df: DataFrame, text_column: str, output_column: str, speaker_column: str | None = None, speaker_value: str | List[str] | None = None) DataFrame[source]

Get focusing question predictions for a dataframe.

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • speaker_column (str) – name of column containing speaker names. Only required if speaker_value is not None.

  • speaker_value (str or list) – if speaker_column is not None, only get predictions for this speaker.

Returns:

dataframe with focusing question predictions

Return type:

df (pd.DataFrame)

get_math_density(df: DataFrame, text_column: str, output_column: str, count_type: str = 'total', result_type: str = 'total') DataFrame[source]

Get math density for a dataframe. Following the implementation here: https://edworkingpapers.com/sites/default/files/ai23-855.pdf

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • count_type (str) – total or unique

  • result_type (str) – total or proportion

Returns:

dataframe with math density analysis

Return type:

df (pd.DataFrame)

get_student_reasoning(df: DataFrame, text_column: str, output_column: str, speaker_column: str | None = None, speaker_value: str | List[str] | None = None) DataFrame[source]

Get student reasoning predictions for a dataframe.

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • speaker_column (str) – name of column containing speaker names. Only required if speaker_value is not None.

  • speaker_value (str or list) – if speaker_column is not None, only get predictions for this speaker.

Returns:

dataframe with student reasoning predictions

Return type:

df (pd.DataFrame)

get_student_talk_moves(df: DataFrame, text_column: str, output_column: str, speaker_column: str | None = None, speaker_value: str | List[str] | None = None) DataFrame[source]

Get student talk move predictions for a dataframe.

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • speaker_column (str) – name of column containing speaker names. Only required if speaker_value is not None.

  • speaker_value (str or list) – if speaker_column is not None, only get predictions for this speaker.

Returns:

dataframe with teacher talk move predictions

Return type:

df (pd.DataFrame)

get_talktime(df: DataFrame, text_column: str | None = None, analysis_unit: str = 'words', representation: str = 'frequency', time_start_column: str | None = None, time_end_column: str | None = None, output_column: str = 'talktime_analysis') DataFrame[source]

Analyze talk time of speakers in a dataframe. Return original df and new dataframe with talk time analysis.

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze. Only required if analysis_unit is words or sentences.

  • analysis_unit (str) – unit to analyze. Choose from “words”, “sentences”, “timestamps”.

  • representation (str) – representation of talk time. Choose from “frequency”, “proportion”.

  • time_start_column (str) – name of column containing start time. Only required if analysis_unit is timestamps.

  • time_end_column (str) – name of column containing end time. Only required if analysis_unit is timestamps.

  • output_column (str) – name of column to store result.

Returns:

dataframe with talk time analysis

Return type:

df (pd.DataFrame)

get_teacher_talk_moves(df: DataFrame, text_column: str, output_column: str, speaker_column: str | None = None, speaker_value: str | List[str] | None = None) DataFrame[source]

Get teacher talk move predictions for a dataframe.

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • speaker_column (str) – name of column containing speaker names. Only required if speaker_value is not None.

  • speaker_value (str or list) – if speaker_column is not None, only get predictions for this speaker.

Returns:

dataframe with teacher talk move predictions

Return type:

df (pd.DataFrame)

get_uptake(df: DataFrame, text_column: str, output_column: str, speaker_column: str, speaker1: str | List[str], speaker2: str | List[str], result_type: str = 'binary') DataFrame[source]

Get uptake predictions for a dataframe. Following the implementation here: https://huggingface.co/ddemszky/uptake-model/blob/main/handler.py

Parameters:
  • df (pd.DataFrame) – dataframe to analyze

  • text_column (str) – name of column containing text to analyze

  • output_column (str) – name of column to store result

  • speaker_column (str) – name of column containing speaker names.

  • speaker1 (str or list) – speaker1 is the student

  • speaker2 (str or list) – speaker2 is the teacher

  • result_type (str) – raw or binary

Returns:

dataframe with uptake predictions

Return type:

df (pd.DataFrame)