• Pennomi@lemmy.world
    link
    fedilink
    English
    arrow-up
    32
    arrow-down
    3
    ·
    4 months ago

    The open paper they published details the algorithms and techniques used to train it, and it’s been replicated by researchers already.

    • legolas@fedit.pl
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      3
      ·
      edit-2
      4 months ago

      So are these techiques so novel and breaktrough? Will we now have a burst of deepseek like models everywhere? Cause that’s what absolutely should happen if the whole storey is true. I would assume there are dozens or even hundreds of companies in USA that are in a posession of similar number but surely more chips that Chinese folks claimed to trained their model on, especially in finance sector and just AI reserach focused.

        • Aatube@kbin.melroy.org
          link
          fedilink
          arrow-up
          3
          ·
          4 months ago

          Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn’t need to retrain its knowledge nearly as much as training a model from scratch. It’s still important, but the training resources aren’t really directly comparable.