Lucas Lingle

Lucas Lingle

Home
Notes
Archive
About
Upgrading to TPU v6e
All the way from TPU v3
Oct 1 • 
Lucas Lingle

September 2025

Predicting Game Club Attendance
A Small-Data Challenge
Sep 4 • 
Lucas Lingle

August 2025

Scaling Laws for Optimal LR and Batch Size - Part XIII
Scaling Laws for the Lion Optimizer
Aug 27 • 
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part XII
Scaling Laws for the Muon Optimizer
Aug 24 • 
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part XI
Is the loss surface w.r.t. LR truly convex?
Aug 6 • 
Lucas Lingle

July 2025

Scaling Laws for Optimal LR and Batch Size - Part X
Trying out the OLMo 1 architecture
Jul 21 • 
Lucas Lingle
QK-LayerNorm or no?
A medium-scale ablation study
Jul 19 • 
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part IX
How much scaling laws data do we really need?
Jul 18 • 
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part VIII
Trying out the Gemma 3 architecture
Jul 17 • 
Lucas Lingle
Stability Across Time
On the activation growth rate of different features
Jul 14 • 
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part VII
Redoing everything with FineWeb, QK-LayerNorm, Independent Weight Decay
Jul 14 • 
Lucas Lingle
Polar Express or Classic Muon?
A medium-scale ablation of the Muon coefficients
Jul 8 • 
Lucas Lingle
© 2025 Lucas Lingle
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture