SAI KAMPES

SAI KAMPESSAI KAMPESSAI KAMPES

SAI KAMPES

SAI KAMPESSAI KAMPESSAI KAMPES
  • Home
  • Computer Science Courses
    • Computer Science
    • CS Sem Courses
    • AI for beginers
    • Inner Working BERT GPT
    • Deep Learning with CNN
    • DS and Cloud Comp
    • ML for Beginers
    • CV and Image Proc
    • Adv AI
    • Genreal Programming
    • TransformersLLM
    • Parallel Programming
    • Adv ML
  • Mechanical Courses
    • Mechanical
    • Semester Courses
    • FEM For Beginers
    • Advanced FEM
    • CFD
  • Advanced Mathematics
    • Advanced Mathematics
    • Engineering Mathematics
    • Maths for AIML
    • Linear Algebra

Inner Working of BERT / GPT

  

Foundations

Transformers Architecture

  • Self-attention mechanism
  • Multi-head attention
  • Positional encoding
  • Layer normalization
  • Residual connections
  • Feed-forward networks
  • Encoder vs Decoder structure

Tokenization & Input Embeddings

  • Byte Pair Encoding (BPE) or WordPiece
  • Token types, segment embeddings (esp. in BERT)
  • Positional embeddings

BERT (Bidirectional Encoder Representations from Transformers)

Architecture Overview

  • Stack of transformer encoders only
  • Input format: [CLS] sentence A [SEP] sentence B [PAD]
  • Embedding types: token, segment, positional

Training Objective

  • Masked Language Modeling (MLM)
  • Next Sentence Prediction (NSP)

Pre-training vs Fine-tuning

  • Transfer learning in BERT
  • Fine-tuning for specific tasks (e.g., QA, NER, sentiment)

Variants

  • RoBERTa (removes NSP, more training)
  • DistilBERT (smaller, faster)
  • ALBERT (parameter sharing)

GPT (Generative Pre-trained Transformer)

Architecture Overview

  • Stack of transformer decoders only
  • Autoregressive model (predict next token)

Training Objective

  • Causal Language Modeling (CLM)
  • Next-token prediction with unidirectional context

Fine-tuning and In-context Learning

  • GPT-2/3/4: zero-shot, few-shot, chain-of-thought prompting
  • Reinforcement Learning with Human Feedback (RLHF) — especially in GPT-4

Attention Visualization & Interpretation

  • Attention heads and what they learn
  • Visualization tools (e.g., BertViz)

Memory and Computational Cost

  • Model size (parameters), GPU/TPU requirements
  • Memory-efficient training (e.g., FlashAttention)

Limitations and Biases

  • Overfitting, hallucination, adversarial examples
  • Mitigating bias in language models

Implementing Transformer Blocks from Scratch

  • Building attention, positional encoding, etc. in PyTorch or TensorFlow

Using Pre-trained Models

  • HuggingFace Transformers: loading and fine-tuning BERT/GPT models

Benchmarking and Evaluation

  • Metrics for classification (BERT), generation (GPT)
  • Datasets (GLUE, SQuAD, LAMBADA, etc.)

Copyright © 2025 Sai KAMPES - All Rights Reserved.

  • Computer Science
  • Mechanical

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept