Skip to content
  • Igor Shmukler's avatar
    further lowered LR multiplier, improved LR calculation on resume (#81) · 59844835
    Igor Shmukler authored
    * futher lowered LR multiplier, improved LR calculation on resume
    
    * updated config again
    
    * pass RoPE relative position type to the decoder init
    
    * fixed regex
    
    * fixed OneCycle LR internal step counter
    
    * fixed weight decay on biases
    
    * tuning model configuration
    
    * fixed sample length desync bug
    
    * isolated stop-head optimization pressure
    
    * extended training analysis script
    
    * renamed analysis script
    
    * updated README
    
    * minor enhancements for training report
    
    * spread heavy batches more evenly across the epoch to prevent clustering
    
    * fixed a bug in dynamic batching code
    
    * stop token parameters tuning
    
    * lowered encoder FFN spike clip norm
    
    * enhanced training analysis script
    
    * updates to diagnostics script
    
    * training analysis script improvements
    
    * added post-step max weight-norm clamp for decoder.layers.0.ff.linear1.weight
    
    * bumped patch version number
    
    * fixed tests
    
    * removed short analysis script, full one is better
    59844835
Loading