• AliasAKA@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 months ago

    They don’t, but with quantization and distillation, as well as fancy use of fast ssd storage (they published a paper on this exact topic last year), you can get a really decent model to work on device. People are already doing this with things like OpenHermes and Mistral (given, 7B models, but I could easily see Apple doubling ram and optimizing models with the research paper I mentioned above, and getting 40B models running entirely locally). If the start of the network is good, a 40B model could take care of a vast majority of user Siri queries without ever reaching out to the server.

    For what it’s worth, according to their wwdc note, they’re basically trying to do this.