Like every Big Tech company these days, Meta has its own flagship generative AI model, called Llama. Llama isĀ somewhat uniqueĀ among major models in thatĀ itāsĀ āopen,ā meaning developers can download and use it however they please (with certain limitations).Ā ThatāsĀ in contrast to models likeĀ AnthropicāsĀ Claude,Ā Googleās Gemini,Ā xAIāsĀ Grok, and most of OpenAIās ChatGPT models,Ā which can only be accessed via APIs.Ā
In the interest of giving developers choice, however, Meta has also partnered with vendors, including AWS, Google Cloud,Ā and Microsoft Azure, to make cloud-hosted versions of Llama available. In addition, the companyĀ publishes tools, libraries, and recipes in its Llama cookbook to help developers fine-tune, evaluate, and adapt the models to their domain. With newer generations likeĀ Llama 3Ā and Llama 4, these capabilities have expanded to include native multimodal support and broader cloud rollouts.Ā
HereāsĀ everything you need to know aboutĀ MetaāsĀ Llama, from its capabilities and editions to where you can use it.Ā WeāllĀ keep this post updated as Meta releases upgrades and introduces new dev tools to support the modelās use.
What is Llama?
Llama is a family of models ā not just one. The latest version isĀ Llama 4;Ā it wasĀ released in April 2025Ā andĀ includes three models:Ā Ā
- Scout:Ā 17 billion active parameters, 109 billion total parameters, and a context window of 10 million tokens.Ā
- Maverick:Ā 17 billion active parameters, 400 billion total parameters, and a context window of 1 million tokens.Ā
- Behemoth:Ā Not yet releasedĀ butĀ willĀ have 288 billion activeĀ parametersĀ and 2 trillion total parameters.Ā Ā
(In data science, tokens are subdivided bits of raw data, like the syllables āfan,ā ātas,ā and āticā in the word āfantastic.ā)Ā Ā
A modelās context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g.,Ā additionalĀ text). Long context can prevent models from āforgettingā the content of recent docs and data, and from veering off topic and extrapolating wrongly. However, longer context windows can alsoĀ result in the model āforgettingā certain safety guardrailsĀ and being more prone to produce content that is in line with the conversation, which has ledĀ some users towardĀ delusional thinking.Ā Ā
For reference,Ā theĀ 10 million context windowĀ that Llama 4 Scout promisesĀ roughly equalsĀ the text of about 80 average novels.Ā Llama 4Ā MaverickāsĀ 1 million context window equals about eight novels.Ā Ā
Techcrunch event
San Francisco
|
October 27-29, 2025
AllĀ ofĀ the Llama 4 modelsĀ were trained on ālarge amounts of unlabeled text, image, and video dataā to give them ābroad visual understanding,āĀ as well as on 200 languages,Ā according to Meta.Ā Ā
Llama 4 Scout and Maverick are Metaās first open-weight natively multimodal models.Ā TheyāreĀ built using a āmixture-of-expertsā (MoE) architecture, which reduces computational loadĀ and improves efficiency in training and inference. Scout, for example, has 16 experts, and Maverick has 128 experts.Ā Ā Ā
Llama 4 Behemoth includes 16 experts, and Meta is referring to it as a teacher for the smaller models.Ā
Llama 4 builds on the Llama 3 series, which included 3.1 and 3.2 models widely used for instruction-tuned applications and cloud deployment.Ā
What can Llama do?
Like other generative AI models, Llama can perform a range of different assistive tasks, like coding and answering basic math questions, as well as summarizing documents inĀ at least 12Ā languages (Arabic,Ā English, German, French,Ā Hindi, Indonesian,Ā Italian, Portuguese, Hindi,Ā Spanish, Tagalog, Thai, and Vietnamese). Most text-based workloads ā think analyzingĀ largeĀ files like PDFs and spreadsheets ā are within its purview, and all Llama 4Ā modelsĀ support text, image, and video input.Ā
Llama 4 Scout is designed for longer workflows and massive data analysis. Maverick is a generalist model that is better at balancing reasoning power and response speed and is suitable for coding, chatbots, and technical assistants. And Behemoth is designed for advanced research, model distillation, and STEM tasks.Ā Ā
Llama models, including Llama 3.1, can be configured toĀ leverageĀ third-party applications, tools, and APIs to perform tasks. They are trained to use Brave Search for answering questions about recent events;Ā the Wolfram Alpha API for math- and science-related queries;Ā and a Python interpreter for validating code. However, these tools requireĀ proper configuration and are not automatically enabled out of the box.Ā
Where can I use Llama?
IfĀ youāreĀ looking to simply chat with Llama,Ā itāsĀ powering the Meta AI chatbot experienceĀ on Facebook Messenger, WhatsApp, Instagram, Oculus,Ā and Meta.aiĀ in 40 countries. Fine-tuned versions of Llama are used in Meta AI experiences in over 200 countries and territories.Ā Ā
Llama 4 models Scout and Maverick are available on Llama.com and Metaās partners, including the AI developer platform Hugging Face. Behemoth is still in training.Ā DevelopersĀ building withĀ Llama can download,Ā use,Ā or fine-tune the model across most of the popular cloud platforms.Ā Meta claims it hasĀ more thanĀ 25 partners hosting Llama, including Nvidia, Databricks,Ā Groq, Dell,Ā and Snowflake.Ā And while āselling accessā to Metaās openly available modelsĀ isnātĀ Metaās business model, the company makes some moneyĀ throughĀ revenue-sharing agreementsĀ with model hosts.Ā
Some of these partners have builtĀ additionalĀ tools and services on top of Llama, including tools that let the models reference proprietary data and enable them to run at lower latencies.Ā
Importantly, the Llama licenseĀ constrains how developers can deploy the model: App developers with more than 700 million monthly users must request a special license from Meta that the company will grant on its discretion.Ā
In May 2025, Meta launched aĀ new programĀ to incentivize startups to adopt its Llama models. Llama for Startups gives companies support from Metaās Llama team and access to potential funding.Ā Ā
Alongside Llama, Meta provides tools intended to make the model āsaferā to use:Ā Ā
- Llama Guard, a moderation framework.Ā
- CyberSecEval, a cybersecurity risk-assessment suite.Ā
- Llama Firewall, a security guardrail designed to enable building secure AI systems.Ā
- Code Shield, which provides support for inference-time filtering of insecure code produced by LLMs.Ā Ā
LlamaĀ Guard tries to detect potentially problematic content either fed into ā or generated ā by a Llama model, including content relating to criminal activity, child exploitation, copyright violations, hate, self-harm, and sexual abuse.Ā
That said,Ā itāsĀ clearly not a silver bullet sinceĀ Metaās own previous guidelinesĀ allowed the chatbot to engage in sensual and romantic chats with minors, and some reports show thoseĀ turnedĀ intoĀ sexual conversations.Ā Developers canĀ customizeĀ the categories of blocked content and apply the blocks to all the languages Llama supports.Ā
Like Llama Guard, Prompt Guard can block text intended for Llama, but only text meant to āattackā the model and get it to behave in undesirable ways. Meta claims thatĀ LlamaĀ Guard can defend against explicitly malicious prompts (i.e., jailbreaks thatĀ attemptĀ to get around Llamaās built-in safety filters) in addition to prompts thatĀ containĀ āinjected inputs.āĀ The Llama Firewall works to detect and prevent risks like prompt injection, insecure code, and risky tool interactions. And Code Shield helps mitigate insecure code suggestions and offers secure command execution for seven programming languages.Ā
As forĀ CyberSecEval,Ā itāsĀ less a tool than a collection of benchmarks to measure model security.Ā CyberSecEvalĀ can assess the risk a Llama model poses (at least according to Metaās criteria) to app developers and end users in areas like āautomated social engineeringā and āscaling offensive cyber operations.āĀ
Llamaās limitations

Llama comes with certain risks and limitations, like all generative AI models.Ā For example, while its most recent model has multimodal features, those areĀ mainly limitedĀ toĀ the EnglishĀ languageĀ for now.Ā
Zooming out,Ā Meta used a dataset of pirated e-booksĀ and articles to train its Llama models. A federal judge recently sided with Meta in a copyright lawsuit brought against the company by 13 book authors, ruling thatĀ the use of copyrighted works for training fell under āfair use.ā However, if LlamaĀ regurgitatesĀ a copyrighted snippetĀ and someone uses it in a product, they could potentially be infringing on copyright and be liable.Ā Ā
Meta alsoĀ controversially trains its AI on Instagram and Facebook posts,Ā photos,Ā and captions, andĀ makes it difficult for users to opt out.Ā Ā
Programming is another area whereĀ itāsĀ wise to tread lightly when using Llama.Ā ThatāsĀ because Llama might āĀ perhapsĀ moreĀ so thanĀ its generative AI counterparts āĀ produce buggy or insecure code.Ā OnĀ LiveCodeBench, aĀ benchmarkĀ that tests AI models on competitive coding problems, Metaās Llama 4 Maverick model achieved a score of 40%.Ā ThatāsĀ compared toĀ 85% for OpenAIās GPT-5 highĀ andĀ 83% forĀ xAIāsĀ Grok 4 Fast.Ā
As always,Ā itāsĀ best to have a human expert review any AI-generated code before incorporating it into a service or software.Ā
Finally, as with other AI models, Llama models are still guilty of generatingĀ plausible-soundingĀ but false or misleading information, whetherĀ thatāsĀ in coding, legal guidance, orĀ emotional conversations with AI personas.Ā Ā
This was originally published on September 8, 2024, and is updated regularly with new information.