Code
Notebook
Quick statistical audit for Elevator Small Talk Corpus
Loads the mock CSV, checks row counts, describes numeric columns, and plots pair probability by split.
kaggle datasets download -d awkward-nlp/elevator-small-talk-corpus
python - <<'PY'
import pandas as pd
rows = pd.read_csv("corpus/train.jsonl")
print(rows.describe(include="all"))
PY
Script
Download and inspect files
Uses the Kaggle-style command line flow reviewers expect before a real dataset release.
kaggle datasets files awkward-nlp/elevator-small-talk-corpus
kaggle datasets download -d awkward-nlp/elevator-small-talk-corpus --unzip
ls -lh data/