AI Upskilling #001 - Introduction to Generative AI

+ Launch of my Online Shop (30% discount code included)

Sep 09, 2024

Hello TWFers!

I am Super happy & excited to start this new series on The Weekend Freelancer called “AI Upskilling”. The goal of this series is to share my current process of learning the new skills in the GenAI world to upgrade my skills & also help many like me who are trying to do the same.

Before you start this new journey, There is another new journey that I recently started!

Yes! It’s my new Shop on Buy me a Coffee where before you could support my work, but now you can also take advantage of the different products I will create. Currently, You will find the following 3 products that you can get for yourself or your close ones -

Apart from that, If you would like to work with me 1 to 1, you can also choose to have a Consultation call with me. Apart from that, You also have the option to take a monthly Membership where I will share my personal freelancing experiences using my Analytics skills

Here is the surprise for you! As this is the first week of my shop I am giving away a 30% discount of all of my products! For that, you just have use the discount code (VALID FOR 1 WEEK) which you will find at the end of this post!

❤️ Go to My Shop! ❤️

I hope you guys will appreciate the value from the products and give me your crucial feedback!

That’s it for the little promotion, Now let’s get back to GEN AI…

What will be covered in this series?

Well, Literally everything I will learn on my journey. This series is sort of going to be my Knowledge dump on GenAI topics. Currently I have a decent idea of the topics I might want to learn & the practical aspect. So here is a generic list of things you can expect from this series -

Core AI and Machine Learning Foundations
Generative AI Techniques
Practical Applications and Customization
Tools, Frameworks and Implementation
Ethical, Societal and Emerging Topics

By the end of this series my goal is to have a portfolio of GenAI projects which would validate for me & everyone following this series, our knowledge & skills in GenAI.

With that said, I hope you are ready to begin this journey. Let’s get going then -

The AI Universe

To understand GenAI we will need to zoom out & understand the AI universe first. Here is a quick visual for reference -

Let’s talk about each element of this Universe then, -

Artificial Intelligence (AI)

AI is the broadest concept, referring to the simulation of human intelligence in machines. It enables computers to perform tasks that would normally require human intelligence, such as problem-solving, understanding language, recognizing patterns, and making decisions.

Example: A chess-playing computer, a voice assistant like Siri, or self-driving cars.

Machine Learning (ML)

ML is a subset of AI that focuses on giving machines the ability to learn from data without being explicitly programmed. It involves algorithms that allow computers to identify patterns and make decisions based on past data.

Example: Email spam filters that get better over time, recommendation engines like those used by Netflix.

Deep Learning (DL)

DL is a specialized subset of ML that focuses on neural networks with many layers (called deep neural networks). DL algorithms aim to mimic the human brain’s ability to process complex data through hierarchical learning.

Example: Facial recognition systems, self-driving car vision systems, advanced voice assistants like Google Assistant.

Generative AI (Gen AI)

Gen AI is a subset of DL, focusing specifically on models that can produce new content rather than just classify or predict. It includes techniques that generate images, text, audio, or video based on patterns learned from input data.

Example: ChatGPT generating human-like text responses, DALL·E generating realistic images from text descriptions, or AI models creating synthetic faces.

In the coming editions we will also take an in depth look at all these elements as its a super important step to Upskilling ourselves in Gen AI.

The 2 kind of models…

To better understand Generative AI, it’s important to contrast two primary types of models in machine learning: discriminative models and generative models.

1. Discriminative Models
Discriminative models focus on distinguishing or classifying between different categories of data. Given an input, they predict a label or class by modeling the decision boundary between classes. These models are typically used for tasks like classification, where the goal is to assign a category to a given input.

Discriminative models answer the question:

"Given this image of an animal (input), is it a cat or a dog (output)?"

2. Generative Models

Generative models, in contrast, aim to model how data is generated. They try to understand the underlying distribution of the data and can generate new instances that resemble the training data.

Generative models answer the question:

"Given that this is what a cat looks like (based on training data), can you generate a new, realistic-looking image of a cat?"

The difference between LLMs & LIMs

Large Language Models (LLMs)

LLMs are deep learning models, typically based on Transformer architectures, designed to process and generate human-like text. These models are trained on vast amounts of text data and can perform a variety of language-based tasks, such as text generation, translation, summarization, answering questions, and more.

Large Image Models (LIMs)

LIMs are large-scale models trained to process, interpret, and generate images. Similar to LLMs, LIMs are built on deep learning architectures, but they specialize in image-related tasks, including image classification, segmentation, object detection, and image generation.

Here is a quick comparison between the 2 on different metrics -

Apart from these, There are models, such as CLIP (Contrastive Language–Image Pre-training) and DALL·E, that combine the capabilities of both LLMs and LIMs to create multi-modal models. These models can process both text and image data, enabling them to perform tasks like generating images from text prompts or associating images with corresponding text descriptions.

What are Foundational models?

Foundational Large Language Models (LLMs) are deep learning models trained on vast amounts of text data using architectures like Transformers. These models serve as the basis for a wide range of natural language processing (NLP) tasks and can be fine-tuned for specific applications.

The term foundational refers to their versatility, as they are pre-trained on extensive datasets and can be adapted for multiple downstream tasks like language translation, summarization, question answering, and more.

Let’s look at foundational LLMs from prominent AI organizations like OpenAI, Google, Meta, and Anthropic.

OpenAI: GPT Series

Key LLMs: GPT-3 and GPT-4
OpenAI's GPT (Generative Pretrained Transformer) series is among the most widely known and used LLMs. These models are designed for generating human-like text based on input prompts and are built using the Transformer architecture.

GPT-3 (released in 2020) has 175 billion parameters and is capable of performing a variety of text-related tasks without task-specific training.
GPT-4 (released in 2023) improved upon GPT-3 in terms of reasoning, coherence, and multimodal understanding. It can process both text and image inputs (in certain versions) and provides more nuanced and accurate responses.

Google

Key LLMs: BERT, T5, PaLM & Gemini

Google has developed a number of foundational models, each with specific innovations:

BERT (2018): Bidirectional language model; excelled at text understanding by looking at both sides of a sentence. Still powers many core search functions.
T5 (2019): A text-to-text transformer that makes every NLP task into a text-generation task. Hugely flexible for various language tasks.
PaLM (2022): A large-scale, highly powerful language model with 540 billion parameters, excelling at complex reasoning and multitasking. It brought in the Pathways system for efficient task switching.
Gemini (2023): The latest and most advanced, Gemini represents a leap into multimodal AI that can handle images and text, built with safety, alignment, and advanced reasoning capabilities at its core.

Meta (Facebook AI Research)

Key LLMs: RoBERTa, OPT, LLaMA

Meta (formerly Facebook) has contributed several foundational LLMs, pushing the boundaries of efficiency, scale, and multilingual capabilities:

RoBERTa (2019): An optimized version of BERT, RoBERTa improved on the pretraining process by training on larger datasets for longer durations, resulting in better performance across a variety of NLP benchmarks.
OPT (Open Pretrained Transformer, 2022): Meta’s answer to OpenAI’s GPT-3, the OPT series ranges from 125 million to 175 billion parameters. The goal was to create a highly scalable model with similar capabilities to GPT-3, but with open-access architecture.
LLaMA (Large Language Model Meta AI, 2023): LLaMA is Meta’s state-of-the-art language model, designed to be more efficient than models like GPT-3 while maintaining competitive performance. The model is focused on being accessible and more resource-efficient, with parameter sizes ranging from 7B to 65B.

Anthropic

Key LLM: Claude

Anthropic is an AI safety research company focused on building LLMs that prioritize safety, alignment, and ethical AI development. Their primary model, Claude, is named after Claude Shannon, a pioneer in information theory.

Claude 1 and Claude 2 are advanced LLMs designed to be safer and more interpretable compared to existing models. While specifics about parameter size and architecture are not always publicly shared, Claude is seen as an alternative to OpenAI's GPT models, focusing on safety and user alignment.

Each organization has developed foundational LLMs tailored to its goals and principles. OpenAI’s GPT models focus on general-purpose text generation and reasoning, Google’s BERT and PaLM models are central to search and language understanding, Meta's RoBERTa and LLaMA emphasize efficiency and scalability, and Anthropic’s Claude prioritizes ethical AI. All of these models play critical roles in advancing natural language processing across diverse industries and applications.

In the next edition we will see the Evolution of LLMs & how they are trained

& Finally, here is the discount code for my Shop - TWFWEEK1 (The code is valid until 15 Sep, Hurry!)