Watch how embeddings evolve across training epochs — from random noise to
meaningful semantic structure. Switch epochs above to compare.
Raw Text
→
Tokenize
→
Subsample
→
Skip-Gram
→
Neg. Sampling
→
Embeddings
Training Epoch
Epoch 10
— words loaded
🗺 Explore Space
≈ Similarity
⬡ Neighbors
⊕ Analogy
📐 How It Works
⚠ Could not load .
Make sure all three JSON files are in the same folder as this HTML file: embeddings_epoch2.json ·
embeddings_epoch5.json ·
embeddings_epoch10.json
Run export_embeddings.py in your notebook to generate them.
Meanwhile the demo runs with mock embeddings.
2D Embedding Space · PCA Projection
Vocabulary: —
Embed dim: —
PCA var: —
Dataset: WikiText-2
Method: Skip-Gram + NS
💡 Switch epochs above to watch clusters form. Hover a word for its label.
Early epochs look scattered — later epochs show tight semantic groupings.
Cosine Similarity
Cosine similarity measures the angle between two embedding vectors.
1.0 = same direction, 0 = orthogonal, −1 = opposite.
Watch how scores change across epochs as the model learns.
Try:
Current Epoch Result
Enter two words above and click Compute.
Across All Epochs
How does similarity for the same word pair evolve as training progresses?
Compute a similarity above to see the comparison.
Score Guide
0.7 – 1.0 → Strongly related
0.3 – 0.7 → Moderately related
0.0 – 0.3 → Weakly related
−1.0 – 0.0 → Dissimilar
sim(A, B) = (A · B) / (|A| × |B|)
Nearest Neighbors
Words closest in embedding space — i.e. words that appear in similar contexts.
Switch epochs above to watch neighbors stabilize as the model trains.
Try:
Neighbors
Vector Arithmetic · Analogy Reasoning
Solve A − B + C = ?. Word relationships are encoded as directions in embedding space.
With more training epochs, the model's analogical reasoning becomes more accurate.
video
−
mtv
+
music
≈
?
Try:
Top Predictions
📖
What are Embeddings?
Each word maps to a dense vector of real numbers. Words with similar meanings or usage patterns end up with similar vectors — proximity in vector space = semantic relatedness.
word → [0.23, −0.87, ..., 0.64] ∈ ℝ¹⁰⁰
🪟
Skip-Gram Model
Given a center word, predict surrounding context words within a sliding window of size k. This forces the model to learn which words share similar neighborhoods.