Lucas Lingle
Subscribe
Sign in
Home
Notes
Archive
About
Latest
Top
Upgrading to TPU v6e
All the way from TPU v3
Oct 1
•
Lucas Lingle
September 2025
Predicting Game Club Attendance
A Small-Data Challenge
Sep 4
•
Lucas Lingle
August 2025
Scaling Laws for Optimal LR and Batch Size - Part XIII
Scaling Laws for the Lion Optimizer
Aug 27
•
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part XII
Scaling Laws for the Muon Optimizer
Aug 24
•
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part XI
Is the loss surface w.r.t. LR truly convex?
Aug 6
•
Lucas Lingle
July 2025
Scaling Laws for Optimal LR and Batch Size - Part X
Trying out the OLMo 1 architecture
Jul 21
•
Lucas Lingle
QK-LayerNorm or no?
A medium-scale ablation study
Jul 19
•
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part IX
How much scaling laws data do we really need?
Jul 18
•
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part VIII
Trying out the Gemma 3 architecture
Jul 17
•
Lucas Lingle
Stability Across Time
On the activation growth rate of different features
Jul 14
•
Lucas Lingle
Scaling Laws for Optimal LR and Batch Size - Part VII
Redoing everything with FineWeb, QK-LayerNorm, Independent Weight Decay
Jul 14
•
Lucas Lingle
Polar Express or Classic Muon?
A medium-scale ablation of the Muon coefficients
Jul 8
•
Lucas Lingle
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts