Fri Mar 20 2026

Overview

This is a quick introduction to the Hugging Face ecosystem to get you started. Hugging Face provides a set of LLM models to choose from. It provides the transformers library that constitutes a complete ecosystem for transformers. It allows to create models, train, and run inference. To start building your first model, install the library using pip install transformers.

To use a transformer model for inference, we need two components: a tokenizer, and a model. The transformer models do not take text sentences as input. They understand only numbers. To map sentences to numbers, we use tokenizers. A tokenizer converts the text into numerical sequences based on some rules. The model then takes the tokenized sequences to produce the output.

graph LR
    input["Input Text"] --> Tokenizer 
    Tokenizer--"Numeric sequence"--> Model
    Model --> output["Output"]

Different models come with their own tokenizer. It is important to use the tokenizer that comes with the model as different models use different tokens for texts and also different tokenization rules.

All models differ in their architecture. The architecture defines the exact components of which the model is constructed and what operations are performed by the components to produce the output. The transformer architectures provided by the Hugging Face are of the following categories.

Autoregressive models: These are decode-only transformer models. These are trained to generate next tokens based on the previous tokens only. An attention mask is used to mask the tokens so that the model only sees the previous tokens. Typical application for these models is text generation.
Autoencoding models: The models are trained without attention masks. They are trained to produce the correct text by corrupting a portion of input text. They see the full input text. They are suitable for text summarization or classification tasks.
Sequence-to-sequence models: seq-to-seq models use the complete transformer architecture i.e., encoder and decoder components. They are trained for tasks such as translation, summarization, question answering tasks.
Multimodal models: These models are trained for other modalities such as images, audios, videos, etc in addition to text data.

Following is a short list of models in each category.

Autoregressive
- GPT - An open source models. Trained on Book Corpus dataset.
- GPT-2 - Trained on WebText dataset.
- GPT-3 - A closed model provided by OpenAI API.
- InstructGPT - A closed model provided by OpenAI API.
- Llama
- Gemma2
- SmolLM2- A superier alternative to GPT-2 trained on 2 trillion tokens of diverse dataset. It is apt for instruction following, knowledge, and reasoning. The three variants of the model are 135M, 360M, and 1.7B parameter models.
- DeepSeek-V2
Autoencoding
- BERT
- RoBERTa
- DistilBERT
Sequence-to-sequence
- BART
- T5
Multimodal
- MMBT

Loading Pretrained Models

Each architecture provided in Hugging Face has its corresponding class that encompasses a family of models. Open AI released its model parameters for upto GPT-2. The very first openai model comes under the class OpenAIGPTModel. The model’s checkpoint is openai-gpt, and can be loaded using the from_pretrained() helper function. The next upgrade GPT2Model class consists of a variants of models:

gpt2 - The smallest version in GPT-2 model series. Its a 124M parameter model.
gpt2-medium - A 335M parameter model.
gpt2-large: A 774M parameter version of GPT-2.
gpt2-xl - The largest variant in GPT-2 series. It’s 1.5B parameter model.
roberta-base-openai-detector: a fine-tuned RoBERTa base model trained on the output of GPT-2 xl to detect GPT-2 output.
roberta-large-openai-detector: a larger version of roberta-base-openai-detector.

Example snippet to load the model

from transformers import GPT2Model

checkpoint = 'gpt2-medium'
model = GPT2Model.from_pretrained(checkpoint)

In addition to the above-mentioned models, Hugging Face provides many other models that use the same base model and attaches a head on top of it that are fine-tuned for various downstream tasks. A list of the models that use the GPT-2 base model are as follows.

GPT2LMHeadModel - GPT2Model with a language modeling head on the top layer. The weights of the head layer correspond to the input embedding weights.
GPT2DoubleHeadsModel - It consists of the language modeling head and a multiple-choice classification head on the top.
GPT2ForQuestionAnswering - It consists of a top head trained on Stanford Question Answering Dataset (SQuAD).
GPT2ForSequenceClassification - A sequence classification head on top of the GPT2Model. It uses the last token for the classification.
GPT2ForTokenClassification - A head on top of the base layer trained for Named Entity Recognition (NER) tasks.

Since each of the above models use the same GPT-2 base model, the same checkpoints mentioned above for the GPT-2 variants can be used to build the various head on top of the base GPT2.

Using Auto classes

While most of the models have their own classes and the models can be loaded by calling the from_pretrained() on the class, HuggingFace also provides an Auto class API that allows to load models without manually defining the associated model class. The auto class AutoModel automatically instantiates the appropriate model class from the base model’s repo_id. A model’s repo_id has the form username_or_org/repo_name. For example, the GPT-2 model can also be loaded as

from transformers import AutoModel

repo_id = 'openai-community/gpt2'
model = AutoModel.from_pretrained(repo_id)

The other auto classes associated to various tasks are provided below for reference.

Natural Language Processing
- AutoModelForCausalLM
- AutoModelForMaskedLM
- AutoModelForMaskGeneration
- AutoModelForSeq2SeqLM
- AutoModelForSequenceClassification
- AutoModelForMultipleChoice
- AutoModelForNextSentencePrediction
- AutoModelForTokenClassification
- AutoModelForQuestionAnswering
- AutoModelForTextEncoding
Computer Vision
- AutoModelForDepthEstimation
- AutoModelForImageClassification
- AutoModelForVideoClassification
- AutoModelForKeypointDetection
- AutoModelForKeypointMatching
- AutoModelForMaskedImageModeling
- AutoModelForObjectDetection
- AutoModelForImageSegmentation
- AutoModelForImageToImage
- AutoModelForSemanticSegmentation
- AutoModelForInstanceSegmentation
- AutoModelForUniversalSegmentation
- AutoModelForZeroShotImageClassification
- AutoModelForZeroShotObjectDetection -Audio
- AutoModelForAudioClassification
- AutoModelForAudioFrameClassification
- AutoModelForCTC
- AutoModelForSpeechSeq2Seq
- AutoModelForAudioXVector
- AutoModelForTextToSpectrogram
- AutoModelForTextToWaveform
- AutoModelForAudioTokenization
Multimodal
- AutoModelForMultimodalLM
- AutoModelForTableQuestionAnswering
- AutoModelForDocumentQuestionAnswering
- AutoModelForVisualQuestionAnswering
- AutoModelForImageTextToText
Time Series
- AutoModelForTimeSeriesPrediction

The SmolLM2 can be loaded using the auto class as below.

from transformers import AutoModelForCausalLM

repo_id = 'HuggingFaceTB/SmolLM2-135M-Instruct'
model = AutoModelForCausalLM.from_pretrained(repo_id)

Creating Custom Models

Hugging Face provides a config class for each classes of the models. The config class allows to define the configuration of the corresponding model, which can be used to create the model with custom configuration or architecture. The GPT2Model comes with its own config class GPT2Config. It can be used to define, for example, the GPT-2 model with different context size, attention layers, activation functions, etc.

from transformers import GPT2Config

config_args = {'n_embd':840, # dimension of embedding layer and hidden states
               'n_layer':10, # number of hidden layers
               'n_head':10   # number of attention heads in each attention layer
               }
config = GPT2Config(**arconfig_argsgs)

Output

GPT2Config {
“activation_function”: “gelu_new”,
“attn_pdrop”: 0.1,
“bos_token_id”: 50256,
“embd_pdrop”: 0.1,
“eos_token_id”: 50256,
“initializer_range”: 0.02,
“layer_norm_epsilon”: 1e-05,
“model_type”: “gpt2”,
“n_emb”: 768,
“n_embd”: 840,
“n_head”: 10,
“n_inner”: null,
“n_layer”: 10,
“n_positions”: 1024,
“reorder_and_upcast_attn”: false,
“resid_pdrop”: 0.1,
“scale_attn_by_inverse_layer_idx”: false,
“scale_attn_weights”: true,
“summary_activation”: null,
“summary_first_dropout”: 0.1,
“summary_proj_to_labels”: true,
“summary_type”: “cls_index”,
“summary_use_proj”: true,
“transformers_version”: “4.57.1”,
“use_cache”: true,
“vocab_size”: 50257
}

Once the cofiguration is defined, the model can be instantiated by passing the cofig to the model class as model = GPT2Model(config).

Auto classes also come with config classes. To fetch the configuration of a pre-trained model, we can use AutoConfig.

from transformers import AutoConfig

repo_id = "google-bert/bert-base-uncased"
config = AutoConfig.from_pretrained(repo_id)

Note that the AutoConfig class cannot be instantiated using the __init__() method. A local path to the directory containing the configuration files, or the specific json file can be passed by calling from_pretrained() on the class.

Having a config object in place, we can use it to create a custom model by passing it to from_config() method on the task-specific model class. For instance, if we want to create a sequence classifier using the above config object, we can do so in the following way.

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_config(config)

Overview

Loading Pretrained Models

Using Auto classes

Creating Custom Models

Saving the model