Large Language Models (LLMs)

Introduction

Exploring LLMs

Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand, generate, and manipulate human-like text. These models are trained on vast amounts of textual data, allowing them to capture complex patterns in language and perform a wide range of natural language processing tasks.

Key characteristics of LLMs include:

Scale: LLMs are typically trained on billions of parameters, allowing them to capture intricate linguistic patterns.
Versatility: They can perform various tasks such as text generation, translation, summarization, and question-answering without task-specific training.
Context understanding: LLMs can comprehend and maintain context over long sequences of text.
Transfer learning: Knowledge acquired from pre-training can be applied to new, unseen tasks.

LLMs have revolutionized natural language processing and have found applications in diverse fields, from content creation to scientific research.

Classification of LLMs

LLMs can be classified based on various factors, including model size, capabilities, and intended use. Here's an updated classification reflecting the most recent models as of August 2024:

Frontier Models
- Examples: GPT-4o, Claude 3.5 Sonnet, Grok-2, Gemini 1.5 Pro and Flash Mixtral Large 2, DeepSeek V2
- Characteristics:
  - Massive scale (hundreds of billions of parameters or undisclosed)
  - State-of-the-art performance across a wide range of tasks
  - Often multimodal (text, image, video, voice)
  - Typically proprietary with API access
Large Open-Source Models
- Examples: Llama 3.1 (405B), Mixtral 8x22B (141B),
- DBRX (132B), Nemotron-4 (340B)
- Characteristics:
  - Tens to hundreds of billions of parameters
  - Open-source, allowing for research and customization
  - Strong performance on many tasks
  - Suitable for fine-tuning and deployment
Medium-sized Models
- Examples: Jamba (52B), Command R (35B)
- Characteristics:
  - Tens of billions of parameters
  - Balance between performance and computational requirements
  - Often available in both open-source and API formats
Efficient Small Models
- Examples: Mistral 7B (7.3B), Gemma (2B, 7B), Phi-3 (3.8B)
- Characteristics:
  - Less than 10 billion parameters
  - Designed for efficiency and faster inference
  - Often outperform larger models in specific tasks
  - Suitable for edge devices or resource-constrained environments
  - Can run locally on most GPU laptops and PCs with Ollama, LM Studio, ...
Specialized Models
- Examples: Sora (for video generation), domain-specific variants of larger models
- Characteristics:
  - Focused on specific tasks or domains (e.g., code generation, video creation)
  - May have custom architectures or training approaches
  - Often integrated into larger AI ecosystems or products

This classification reflects the rapid evolution of LLMs, with a trend towards:

Increasing model sizes for frontier models
Improved efficiency in smaller models
Greater availability of open-source options
Specialization for specific tasks or industries

It's important to note that the field of LLMs is rapidly evolving, with new models and breakthroughs emerging frequently.
The capabilities and relative performance of these models can change quickly, and users should refer to the most recent benchmarks and evaluations when choosing a model for a specific application.

Understanding Parameters

Parameters in LLMs refer to the adjustable values that the model learns during training. These are essentially the " knowledge" of the model, stored as numerical values. Some examples of parameters include:

Weights in neural network layers
Bias terms
Embedding vectors for words or tokens

The number of parameters often correlates with the model's capacity to learn and perform complex tasks. However, more parameters also mean increased computational requirements for training and inference.

Model Size	Parameter Count
LLama 3.1 8B	8.000.000.000
LLama 3.1 70B	70.000.000.000
LLama 3.1 405B	405.000.000.000

Note: The parameter counts are approximations based on the model names. The actual counts may vary slightly.

Online Chatbots

Online chatbots powered by LLMs have become increasingly popular, offering conversational AI capabilities to users worldwide. Here are some prominent examples:

OpenAI's ChatGPT
- Based on GPT-3.5 and GPT-4 models
- Available through https://chat.openai.com
- Offers free tier with limitations and paid subscriptions (ChatGPT Plus)
- Known for versatile conversational abilities and task completion
- TIP: Microsoft Copilot is based on ChatGPT
Anthropic's Claude
- Available through https://claude.ai and API access
- Emphasizes ethical AI and safety considerations
- Offers both free (with limitations) and paid tiers
Google's Gemini
- Powered by Gemini Flash/pro and other Google AI technologies
- Free Flash version available through https://gemini.google.com/ to use with a Google account
- Pro version only with a subscription
- Integrates with other Google services (e.g., Gmail, Google Drive, Google Sheets, ...) for enhanced functionality
DeepSeek Chat
- Developed by DeepSeek, a Chinese AI company
- Offers both English and Chinese language support
- Free to use with registration at https://www.deepseek.com

These chatbots typically offer:

Limited free use with restrictions on features or usage quotas
Paid subscriptions for enhanced capabilities, longer conversations, and priority access

Special Chatbots

Several specialized chatbots have emerged, focusing on specific use cases or offering unique features:

Copilot with Microsoft Edge

Login with your Education account and click on the Copilot icon in the top right corner
Copilot, with educational account, uses:
- ChatGPT 4 for chat
- Dalle-3 for image generation

Prompt: Explain the content on this page

Perplexity AI

Combines LLM capabilities with real-time web search
Free tier available through https://www.perplexity.ai
Focuses on providing up-to-date information and citations
Shows references to relevant web pages and YouTube videos
Results depends on the Focus setting (All, Academic, Math, Writing, Video and Social)
Fee tier has 5 Pro searches per day and 3 PDF uploads per day

Prompt:

text

For my JavaScript course: create a page about all types of loops. 
- Start with a short introduction
- Go over each type of loop and discuss it
- For each loop, provide an exercise for my students to solve
- End with a brief overview showing the differences between the types of loops.

Morphic

Morphic is totally free for the base model
Available through https://www.morphic.sh
It's a alternative for Perplexity

Prompt:

text

JavaScript: what is the difference between var, let and const. 
Explain, give some examples and scribe a short conclusion when to use var, let or const.

Morphic

Storm

Storm creates high-quality Wikipedia-like articles based on a given prompt
Created by Stanford University: https://github.com/stanford-oval/storm
You can install Storm on your own server or use it online for free:https://storm.genie.stanford.edu/
This is an Agent-based tool that uses Claude 3.5 Sonnet to generate the response
(See BrainSTORMing Process for more details about the agents)

Prompt: Create an article about the history of NATO
Elaboration: For educational purposes

Websim

Available through https://websim.ai
At first glance, a nice gadget to make a game with only one prompt
But it's not very useful for educational purposes...
Websim als uses the Claude 3.5 Sonnet model and has even better artifacts than Claude!
Generated code can be downloaded (HTML, CSS and JavaScript)
Some examples:
- 3D Roller Coaster: `https://websim.ai/@CatKitty19232/3d-roller-coaster-simulator
- SoundFont Keyboard: https://websim.ai/app/soundfont-keyboard
- 3D Torus Viewer: https://websim.ai/c/r2ZbL7pIsv5HEs5H4
- Interactive TV Room With Live Channels: https://websim.ai/c/98NmnjrCmWGnL4aX1
Try it yourself and look what happens...

Experiment 1

Starting with the prompt: hallow-breakout.game
Re-defining the game with 5 extra prompts
Result: https://websim.ai/@patrick/hallow-breakout-game

Experiment 2

Click on the Home icon to start a new project

Search for tailwind glow effect
Drag the image you like to use on the URL area of Websim
Click Enter to start the creation

These specialized chatbots often cater to niche markets or specific professional needs, complementing general-purpose LLM chatbots.

LLM Leaderboard

LLM leaderboards are platforms that compare and rank different language models based on their performance across various tasks. They provide valuable insights into the current state of LLM technology.

LLM leaderboards are:

Benchmarking platforms for comparing model performance
Tools for tracking progress in LLM development
Resources for researchers and practitioners to evaluate models

Popular LLM leaderboards include

Common abbreviations and terms in LLM leaderboards:

ARC
AI2 Reasoning Challenge, a question-answering dataset designed to test various reasoning abilities. It includes both easy and challenge sets, covering a wide range of topics and requiring multi-step logical reasoning to solve.
BBH
Big Bench Hard, a subset of more challenging tasks from the Big Bench benchmark. This collection focuses on particularly difficult problems that push the limits of language models' capabilities, often requiring advanced reasoning, knowledge application, and problem-solving skills.
GPQA
Grade Point Question Answering, a benchmark for question-answering tasks. This dataset typically includes questions that might be found in academic settings, testing the model's ability to understand and respond to complex, multi-faceted queries across various subjects.
GSM8K
Grade School Math 8K, a dataset of 8,000 grade school-level math word problems. This benchmark tests a model's ability to understand natural language descriptions of mathematical problems, perform the necessary calculations, and provide step-by-step solutions.
Human eval
Human Evaluation, a method to assess AI performance using human judges. This approach involves having human evaluators rate the quality, relevance, and coherence of AI-generated responses, providing a more nuanced assessment of model performance beyond simple metrics.
IFEval
Instruction Following Evaluation, a benchmark for assessing how well models follow instructions. This test measures a model's ability to accurately interpret and execute complex, multi-step instructions in various contexts, simulating real-world task completion scenarios.
Math
Mathematics benchmark, testing mathematical problem-solving abilities across various difficulty levels and mathematical domains. This can include arithmetic, algebra, geometry, calculus, and more advanced topics, assessing both computational accuracy and problem-solving strategies.
MMLU
Massive Multitask Language Understanding, a broad benchmark covering various subjects including science, mathematics, humanities, and more. This comprehensive test evaluates a model's general knowledge and reasoning abilities across a wide range of academic and professional domains.

Benchmarking datasets

Most of these benchmarks are based on predefined datasets. These datasets are carefully curated to ensure they accurately reflect the types of tasks and challenges the models are expected to handle. The datasets are typically standardized to allow for consistent and fair comparisons across different models and over time.

However, it's important to note that while the core datasets may remain relatively stable to ensure continuity in evaluations, they can also evolve and be updated over time.

LLM Comparison

Comparing LLMs across different applications helps users and developers choose the most suitable model for their needs. Here's a brief comparison for chat and programming applications:

Chat Comparison

Model	Conversational Ability	Contextual Understanding	Multilingual Support	Multimodal Capabilities
GPT-4o	Excellent	Exceptional	Extensive	Advanced
Claude 3.5	Excellent	Very High	Strong	Advanced
Gemini 1.5	Excellent	Very High	Strong	Advanced
Llama 3.1	Very Good	High	Strong	Limited
Mistral Large	Very Good	High	Strong	Limited

Programming Comparison

Model	Code Generation	Debugging	Documentation	Multi-language Support	AI-assisted Development
GPT-4o	Exceptional	Excellent	Excellent	Extensive	Advanced
Claude 3.5	Excellent	Very Good	Excellent	Strong	Advanced
Gemini 1.5	Excellent	Very Good	Very Good	Strong	Advanced
CodeLlama 2	Excellent	Excellent	Very Good	Extensive	Very Good
GitHub Copilot	Very Good	Very Good	Very Good	Strong	Excellent

These comparisons are based on the latest available information as of August 2024. However, it's crucial to note that:

The field of LLMs is rapidly evolving, and new models or updates can change these rankings quickly.
Performance can vary significantly based on specific tasks, contexts, and how the models are fine-tuned or implemented.
Many of these models receive regular updates, which can improve their capabilities over time.
Actual performance may vary in real-world applications, and users should conduct their own evaluations for specific use cases.
Some models, like GPT-4o and Gemini 1.5, are relatively new and their full capabilities are still being explored by the community.

OpenAI's Path to AGI

OpenAI has outlined a 5-step plan for achieving Artificial General Intelligence (AGI).
As of 2024, we are transitioning from Level 1 to Level 2.
Read more: https://www.tomsguide.com/ai/chatgpt/openai-has-5-steps-to-agi-and-were-only-a-third-of-the-way-there

Level 1. Chatbots (Current Level)

AI with natural conversational abilities
Examples: GPT-3.5, GPT-4o, Gemini Pro 1.5, Claude Sonnet 3.5
Capabilities: Complex conversations, some memory, limited reasoning

Level 2. Reasoners (Emerging)

Human-level problem-solving across broad topics
Frontier models approaching this level
Expected in upcoming models like GPT-4.5?, Strawberry?, Orion? , Claude Opus 3.5?

Level 3. Agents (In Development)

Independent action-taking AI systems
Content creation without direct human input
Some companies building agentic systems

Level 4. Innovators (Future Goal)

AI aiding in invention and expanding human knowledge
Creation of novel ideas and solutions
Initial steps: OpenAI's partnership with Los Alamos National Laboratory

Level 5. Organizations (Final AGI Stage)

AI capable of running entire organizations independently
Requires broad intelligence and systemic understanding
Not yet achieved

While rapid progress is being made, the path to AGI involves overcoming significant technical and ethical challenges. The timeline remains uncertain, and ongoing discussions about responsible AI development are crucial.

Abacus: ChatLLM

Abacus ChatLLM is a versatile platform that offers a range of capabilities for working with large language models. Here are some of its key features and possibilities:

Access to Multiple LLMs:
ChatLLM provides access to a wide range of state-of-the-art language models, including GPT-4o, GPT-4o mini, Claude Sonnet-3.5, Gemini 1.5 Pro, LLama 3.1 405B, and Abacus Smaug
Search LLM:
Search LLM is a Perplaxity clone with references to links to relevant resources, images, videos and news articles.
Web Search Integration:
Users can perform web searches directly through the ChatLLM interface, enhancing the AI's ability to provide up-to-date information.
Image interpretation
Just like e.g. GPT-4o, ChatLLM can interpret images and provide insights into their content and you can chat with it.
Image Generation:
The platform includes capabilities for AI-powered image generation, expanding its utility beyond text-based interactions. Images are creating with DALL-E and Flux 1 Pro
Code Execution:
ChatLLM can write, execute, and analyze code, making it a powerful tool for developers and data scientists.
It also has "artefacts" like the Anthropic chatbot Claude 3.5 Sonnet and Genini 1.5 Pro
Document Interaction:
Users can chat with PDFs and other document types, facilitating easy information extraction and analysis from various sources.
Custom Chatbot Creation:
The platform allows for the development of customized chatbots tailored to specific needs or knowledge bases.

Pricing $10/month

ChatLLM Teams is priced at only $10 per user per month for unlimited access to all those frontier models.
Indivividual pricing:
- ChatGPT Plus: $20 per month
- Anthropic: $20 per month
- Google Gemini Advanced: €21.99 per month
The first month is offered free of charge.
A minimum subscription of 2 months is required.
Please follow this link if you want to try it out for 2 months.
(Then I get a small reduction on my next payment 😉 )

Prompt:

text

Create a javascript function that gets the real-time information about the Belgium railway. 
Show a dashboard page wit the actual information about the trains departing from Geel. 
You can use the internet to search for actual information about the API.

Belgium Railway Dashboard

The making of ...

Almost the entire page is built with ChatLLM and the Claude Sonnet 3.5 model. Here are the steps in the process:

Step 1: create an outline fot this page
Step 2: write a basic system prompt on how the page should look like
Step 3: write the subtitles for each part of the page
Step 4: let the LLM optimize your basic system prompt and the subtitles
Step 5: use the LLM to write the content for each part of the page
Step 6: ask the LLM to create some images that illustrate the content

URL: https://apps.abacus.ai/chatllm/57832bef0/?convoId=6486a2bce

Prompt:

text

You are a specialist in course development and write detailed texts in concise but easily understandable language for non-specialists.

Write a detailed chapter on Large Language models using the structure provided. Add additional topics if you think they are important to the topic.
Search the Internet for recent sources first.
For references to websites, always put the link on the title.
Display the result in a markdown code block!
Make sure the markdown contains no syntax errors.

Large Language Models (LLMs) ​

Introduction ​

Exploring LLMs ​

Classification of LLMs ​

Understanding Parameters ​

Online Chatbots ​

Special Chatbots ​

Copilot with Microsoft Edge ​

Perplexity AI ​

Morphic ​

Storm ​

Websim ​

Experiment 1 ​

Experiment 2 ​

LLM Leaderboard ​

Popular LLM leaderboards include ​

Common abbreviations and terms in LLM leaderboards: ​

LLM Comparison ​

Chat Comparison ​

Programming Comparison ​

OpenAI's Path to AGI ​

Level 1. Chatbots (Current Level) ​

Level 2. Reasoners (Emerging) ​

Level 3. Agents (In Development) ​

Level 4. Innovators (Future Goal) ​

Level 5. Organizations (Final AGI Stage) ​

Abacus: ChatLLM ​

The making of ... ​

Large Language Models (LLMs)

Introduction

Exploring LLMs

Classification of LLMs

Understanding Parameters

Online Chatbots

Special Chatbots

Copilot with Microsoft Edge

Perplexity AI

Morphic

Storm

Websim

Experiment 1

Experiment 2

LLM Leaderboard

Popular LLM leaderboards include

Common abbreviations and terms in LLM leaderboards:

LLM Comparison

Chat Comparison

Programming Comparison

OpenAI's Path to AGI

Level 1. Chatbots (Current Level)

Level 2. Reasoners (Emerging)

Level 3. Agents (In Development)

Level 4. Innovators (Future Goal)

Level 5. Organizations (Final AGI Stage)

Abacus: ChatLLM

The making of ...