README.md · 59844835f91fac2a8245f74a4b691548f323694f · gitlab@mail.com / kokoro-ruslan

further lowered LR multiplier, improved LR calculation on resume (#81) · 59844835

Igor Shmukler authored Mar 11, 2026

* futher lowered LR multiplier, improved LR calculation on resume

* updated config again

* pass RoPE relative position type to the decoder init

* fixed regex

* fixed OneCycle LR internal step counter

* fixed weight decay on biases

* tuning model configuration

* fixed sample length desync bug

* isolated stop-head optimization pressure

* extended training analysis script

* renamed analysis script

* updated README

* minor enhancements for training report

* spread heavy batches more evenly across the epoch to prevent clustering

* fixed a bug in dynamic batching code

* stop token parameters tuning

* lowered encoder FFN spike clip norm

* enhanced training analysis script

* updates to diagnostics script

* training analysis script improvements

* added post-step max weight-norm clamp for decoder.layers.0.ff.linear1.weight

* bumped patch version number

* fixed tests

* removed short analysis script, full one is better

59844835