Discussion about this post

User's avatar
Paul Gibbons's avatar

I've heard this story in different ways but that was the best written explanation so far. QUESTIONS : how is the problem of local minima handled during gradient descent? How does test time compute affect the RL/ SL process? (Another layer/ step? Or is it a feature on inference once queried?)

Expand full comment
Rainbow Roxy's avatar

Love this perspective; what if these incredible pattern systems could truely reason beyond just statistical structures?

Expand full comment

No posts

Ready for more?