I wish every AI Engineer could watch this.

Time: 0.12

five levels of llm apps consider this to

Time: 2.6

be a framework and help you decide where

Time: 4.56

you can use llm there are lot of

Time: 6.44

different myths around what llms can do

Time: 9.28

what llms cannot do where do you use

Time: 11.24

llms today so I decided to put together

Time: 13.32

this material uh in which I'm going to

Time: 15.4

take you through kind of like a mental

Time: 17.4

framework based on the extension or the

Time: 20.56

depth in which you go towards an LM you

Time: 23

can decide where you can fit this llm so

Time: 25.84

we're going to first see what are those

Time: 27.96

different levels of llms that I have put

Time: 30.679

together then we are going to see slight

Time: 33.16

extension of that got two different

Time: 35.36

documents to take you through that so

Time: 37.28

this will give you an idea about how LM

Time: 40.32

is being used today and how you can use

Time: 42.96

llms for your own applications to start

Time: 46.559

with imagine this pyramid structure this

Time: 48.8

is a very simple pyramid structure and

Time: 51.44

as you can imagine with any pyramid

Time: 53.559

structure the top of the pyramid or the

Time: 55.8

peak of the pyramid is our aspirational

Time: 58.559

goal and what you see at the bottom is

Time: 61.28

the easiest that we can do and as with

Time: 63.879

everything else you have to slowly climb

Time: 66.84

to the top of the pyramid so you can

Time: 68.96

probably hit the aspirational goal so to

Time: 71.56

start with where do we use llms first Q

Time: 74.92

and A a question and answering engine

Time: 77.88

what do I mean by that it is quite

Time: 80.04

simple for us to understand so question

Time: 82

and answering engine is a system where

Time: 84.64

you have an llm and all you are going to

Time: 87.079

ask the llm is a question so you send a

Time: 89.439

prompt and the llm takes the prompt and

Time: 92.159

gives you an answer that is it that is

Time: 94.84

the entire transaction that you have

Time: 97.119

between an llm send a prompt get send it

Time: 101.2

to the llm get an answer llm large

Time: 104.88

language models are nothing but

Time: 106.6

sophisticated next word prediction

Time: 108.759

engines and they have been fine-tuned

Time: 110.92

with something called instruction so the

Time: 112.6

instruction fine tune models that means

Time: 114.64

they can take a human instruction and

Time: 116.399

get you an answer back for example if I

Time: 118.56

ask a question for this what is the

Time: 120.68

capital of India then the llm would

Time: 122.799

process this and then llm has

Time: 124.6

information about how to answer it and

Time: 126.64

then it will give me the answer back the

Time: 128.2

capital of India is New Delhi that's all

Time: 131.2

what you're going to do with this thing

Time: 133.239

so first level question and answering

Time: 136.12

now you might wonder at this point that

Time: 138.2

where can you use question and answering

Time: 140.36

as an llm engine this is the first thing

Time: 142.879

that people built like when llm started

Time: 146.36

even back in the day gp22 level people

Time: 148.84

started building simply Q&A bots so all

Time: 151.84

you want to do is ask a question give an

Time: 154.64

answer could be a homework could be a

Time: 156.72

general knowledge question could be

Time: 159.319

something about the world could be about

Time: 161

science could be about anything ask a

Time: 163.519

question get an answer as simple as that

Time: 166.239

it's a very three-step process ask a

Time: 169.159

question or send a prompt take the llm

Time: 171.319

to process it give me the answer back

Time: 173.44

very simple application now what you're

Time: 175.4

going to do is you're going to add

Time: 177.44

something to that application and that

Time: 180

is how you actually build a

Time: 181.8

conversational chat bot and to

Time: 184.76

understand this better I would like to

Time: 186.28

take you to my second document which

Time: 188.12

will give you probably better idea

Time: 190.239

whenever we are talking about llm

Time: 191.72

there's one important thing that we need

Time: 193.2

to understand is we have crossed the

Time: 195.799

stage where llm is simply a large

Time: 198.239

language model we have more than that so

Time: 200.879

for you to understand that I have five

Time: 203.72

dimensions a prompt a short-term memory

Time: 207.4

an external knowledge tools and extended

Time: 210.319

tools if you think of this as your

Time: 212.76

horizontal these are your verticals

Time: 214.879

these are different dimensions that can

Time: 216.84

add to an LM so you have a prompt you

Time: 219.84

have a short-term memory you have a

Time: 221.599

long-term memory or external data you

Time: 223.519

have tools and you have got extended

Time: 226.2

tools so let me give you an example for

Time: 228.4

each of this so that you can understand

Time: 230.319

this better a prompt is what is the

Time: 233.2

capital of New Delhi that's all the

Time: 236.04

prompts you simply go give what is the

Time: 238.64

capital of New Delhi and the llm

Time: 240.64

understands it and gives you a back

Time: 242.64

understanding just gives it back now

Time: 245.04

shortterm memory is when you have

Time: 247.48

conversational history or something in

Time: 249.239

the llm that is what we call as ICL in

Time: 252.56

context learning so whatever you stuff

Time: 254.799

inside the context window the llm can

Time: 256.919

take it that is your shortterm memory so

Time: 259.28

you give a few short examples you give

Time: 261.639

an example like for example what is the

Time: 264.24

capital of us uh I guess it's Washington

Time: 266.96

DC Washington DC and you give a bunch of

Time: 270.919

examples like this so the llm knows what

Time: 273.199

is that it has to answer this is a

Time: 275.56

short-term memory next you have external

Time: 278.479

data now you take data from Wikipedia

Time: 280.68

and you keep it and then give it to the

Time: 282.479

LM that is your long-term memory because

Time: 285.08

short-term memory just like a computer

Time: 286.759

the ram it gets reset every time you

Time: 288.88

reset the conversation or the session

Time: 291.039

and then tools you let llm use tools

Time: 293.44

like calculator internet python terminal

Time: 297.36

and all these things and extended tools

Time: 299.4

is when you expand this much Beyond it I

Time: 303.08

hope now you have understanding about

Time: 304.88

the five different dimensions that we

Time: 307.16

have in llms a prompt a shortterm memory

Time: 310.8

or in context memory a long-term memory

Time: 313.36

or external knowledge external data or

Time: 315.84

custom knowledge tools like calculators

Time: 318.8

and python Ripple and extended tools

Time: 321.919

that goes much beyond that what we do

Time: 324.4

not have currently so these are

Time: 326.36

different dimensions now coming into

Time: 329.4

what we wanted to see is chatbot so how

Time: 332.44

do you make a Q&A bot as a chat bot is

Time: 335.479

very simple now at this point you might

Time: 337

have already got this idea so you take a

Time: 339.639

prompt and you give it to the llm where

Time: 343.08

you can have shortterm memory me in

Time: 347.16

context memory in context learning for

Time: 349.28

example so what is the capital of India

Time: 351.8

so you what is the capital of India you

Time: 354.36

ask and the llm answers New Delhi this

Time: 358.12

is what happens in a simple q and a bot

Time: 362.36

but how do you make it a conversational

Time: 365

bot or a chat bot by adding a new

Time: 367.759

dimension called shortterm memory and

Time: 370.72

how do you do that you keep all these

Time: 372.919

things that you are conversing into the

Time: 375.12

chat conversational history so what this

Time: 378.8

gives the ability for an llm to do is

Time: 381.08

when you say what is the capital of

Time: 382.44

India it says new D then you can just

Time: 384.84

simply go and say what are some famous

Time: 388.12

Cuisines uh there

Time: 390.08

so at this point the llm would have an

Time: 392.72

understanding you're talking about New

Time: 394

Delhi because that conversation is

Time: 396.08

stored there in the lm's shortterm

Time: 399.12

memory or the in context memory so the

Time: 401.599

llm can do something called I in context

Time: 404.52

learning and give you the response back

Time: 406.44

and that is how you upgrade in the

Time: 409.36

pyramid by building a Q&A Bard giving a

Time: 412.8

new dimension call history and then

Time: 414.599

making the Q&A bot a chat bot so that it

Time: 417.56

can converse now chat bot has

Time: 419.96

applications everywhere that you can

Time: 421.639

turn towards youve got chatbot in

Time: 423.919

customer support you have got chatbot on

Time: 425.68

websites you have got chatbot for

Time: 427.4

Education like you've seen a lot of

Time: 428.96

demos from Khan Academy so chatbot is

Time: 431.879

quite versatile it almost has its

Time: 434.28

purpose in every single business or

Time: 436.479

domain that you can think of now people

Time: 439.96

were using chatbot um but you know

Time: 442.16

chatbot itself is not enough why we

Time: 444.479

already know the answer to the

Time: 446.84

question can you pause and answer if you

Time: 449.199

know the answer so why is that chatbot

Time: 451.84

is not enough uh for a lot of use cases

Time: 455.479

the answer to the question is chatbot

Time: 457.68

stops with only short-term memory you

Time: 459.599

need long-term memory or you need

Time: 461.759

external memory see for example I ask

Time: 464.44

what is the capital of India it says new

Time: 466.759

what are the famous quins there it will

Time: 468.56

give me an answer quite valid llm is

Time: 470.56

doing its job so let's say I'm a I'm a

Time: 472.68

company okay so I'm I'm an organization

Time: 475.52

let's take uh Apple for an example okay

Time: 478.96

now I ask what who is the CEO of Apple

Time: 483.28

of course the internet has information

Time: 485.68

about it so it will say Tim Cook that's

Time: 487.84

quite easy now if I go say who is the

Time: 491.84

manager of the team handling iPhone 16

Time: 496.12

will it answer no I mean it might answer

Time: 498.84

because it hallucinates a lot but the

Time: 500.919

answer would not be correct and that has

Time: 503.599

become a big bottleneck in a lot of

Time: 506.319

Enterprise use cases because you do not

Time: 509

just need internet knowledge you do not

Time: 511.96

just need the knowledge that the llm has

Time: 514.2

got you need more than that and that is

Time: 517.2

the custom knowledge component or the

Time: 519.599

external knowledge component that you

Time: 521.399

need the dimension that you need to make

Time: 523.919

your llm slightly more than just a

Time: 526.6

chatbot and that is where a new

Time: 528.959

technique called rag comes into picture

Time: 531.92

retrieval augmented generation where you

Time: 534.92

use the knowledge that you provide or

Time: 537.92

you call it a long-term memory you use

Time: 540.72

the documents the internet the sources

Time: 543

everything that you have around and you

Time: 544.92

use that knowledge to send to route to

Time: 547.92

llm and then make the llm use the

Time: 551.36

leverage that knowledge and now at this

Time: 553.36

point probably you might have guessed it

Time: 555.04

see first we had only prompt one

Time: 556.56

dimension second we had shortterm memory

Time: 558.72

two Dimension now we have external

Time: 561.56

knowledge which is three dimension so

Time: 563.519

this llm is at the center of three

Time: 566.12

different things you have got prompt you

Time: 568.16

have got um short-term memory and you

Time: 570.88

have got long-term memory to make you

Time: 573.6

understand this better uh so I'm going

Time: 575.279

to take you to the rag so how does a rag

Time: 578.56

look like so you have got the llm at the

Time: 581.32

center of it you have got your data

Time: 583.8

somewhere available so it could be on

Time: 585.64

different structures it could be on

Time: 587.839

database most organizations have data in

Time: 590.76

their database structure database rdbms

Time: 593.24

database then you have got documents

Time: 595.04

which are unstructured like PDF HTML

Time: 597.48

files internal portals blah blah blah

Time: 600.24

blah blah then you have got apas let's

Time: 602.12

say you are a sales team uh probably

Time: 603.88

your data is in some CRM or Salesforce

Time: 606.399

right so you need a programmatic call to

Time: 609.36

make the call and get the answer back so

Time: 611.88

your data could be of these different

Time: 613.6

places could be like structured database

Time: 615.88

like rdbms system it could be

Time: 617.519

unstructured documents uh PDFs uh HTML

Time: 621.16

documents anything that you have locally

Time: 624.12

and then you have got programmatic

Time: 625.72

access like you're a marketing team you

Time: 627.279

need data from Google ads you a sales

Time: 629.68

team you need data from Salesforce you

Time: 632.279

are your company is heavily into it so

Time: 635.079

you need data from AWS like billing cost

Time: 637.16

and all other things so this is

Time: 638.68

programmatic so you use one of these

Time: 640.72

methods a structured passing or

Time: 642.44

unstructured passing a programmatic call

Time: 645.2

and take all the input data and create

Time: 648.48

an index an index is what Google creates

Time: 652.279

at every single moment you have got all

Time: 655.12

these websites what Google does is

Time: 656.76

Google creates this index so it is easy

Time: 659.279

easier for Google to go Travers when

Time: 661.12

somebody's asking a question and that's

Time: 662.839

how Google became popular before Google

Time: 665.44

people were using totally different

Time: 666.76

thing Google came up with something

Time: 667.959

called page rank algorithm at the

Time: 669.92

fundamental of page rank algorithm you

Time: 671.56

have got this index with the different

Time: 673.12

parameters of course and definitely

Time: 675.04

we're not building Google but so index

Time: 677.2

is what we are building it makes it

Time: 678.92

easier for you to understand what is

Time: 681.32

inside the data so now a user comes in

Time: 684.519

asks a question what is a question who

Time: 686.44

is the manager of iPhone 16 team so so

Time: 689.48

that question goes to the index the in

Time: 692.399

this this system particular system takes

Time: 695.2

that and picks only the relevant

Time: 697.48

information see this index might have

Time: 699.8

information about all the teams iPhone

Time: 702.079

16 Apple Vision Pro billing accounting

Time: 705.639

procurement marketing blah blah blah

Time: 707.8

blah blah so it has all the

Time: 709.72

information what you are interested in

Time: 711.959

is only this particular piece which is

Time: 715.48

what you asked which is iPhone 16

Time: 717.88

manager so it this particular part is

Time: 721.6

where it takes only the relevant

Time: 723.36

information from the index and then it

Time: 725.6

matches with the query uh The Prompt

Time: 727.8

that you give and then it finally gives

Time: 729.72

you sends it to the llm The Prompt what

Time: 733.68

you asked and the data that you

Time: 735.88

extracted and it goes to the llm llm

Time: 738.32

gives the answer back to the user this

Time: 740.48

is quite different from the chatbot

Time: 743.519

application if you see I'll give you an

Time: 745

example why so in the chat bot all you

Time: 747.68

are doing is you have a memory question

Time: 750.56

is there sometimes you might do uh let's

Time: 753.199

say a long-term memory by doing user

Time: 755.399

profiling I'll I'll ignore this for now

Time: 757.399

you don't have to use this now ignore

Time: 759.839

this for now so what you're doing is you

Time: 762.079

have a question you're sending it as a

Time: 764.04

prompt and you have memory that also

Time: 766.519

goes to the prompt because that's how

Time: 768.32

you can do it and you have llm answering

Time: 771.199

this question and you get the answer

Time: 772.519

back now you might ask me hey why do I

Time: 775.72

need to put my thing in the external

Time: 779.24

data and create an index rather why

Time: 781.12

can't I keep it in memory if you have

Time: 784

got this question at this point that is

Time: 786.12

a very important question and you are

Time: 788.12

thinking in the right direction in fact

Time: 790.8

people who reached at this point you can

Time: 792.32

tell me whether you know the answer or

Time: 794.16

not the reason why we cannot do this uh

Time: 797.72

or we could not have done it early in

Time: 800.12

these days of alms is due to an

Time: 802.839

important factor called

Time: 806.079

CTX window what is CTX window CTX window

Time: 810.88

is nothing but called context window

Time: 814.04

this internal memory and question or the

Time: 817.079

short-term memory and the question is

Time: 819.68

bounded by what is the context window of

Time: 823.279

this particular l so you have an llm the

Time: 826.44

llm might have context window like 4K

Time: 828.519

which is quite popular these days or 8K

Time: 831.04

and even G like LMS have like 1 million

Time: 833.48

as context window so context window is

Time: 836.04

there now what you are actually doing

Time: 838.6

here is you have a question the llm

Time: 841.079

answers so you have a question one right

Time: 844.36

and answer one comes back then you have

Time: 846.519

a question two then you have answer two

Time: 849.839

by the time you go to question three

Time: 852.44

what you are sending to the llm is not

Time: 854.32

just your question 3 you are actually

Time: 856.12

sending all these things right so let's

Time: 859.44

say this is 2K this is 1K answer then

Time: 863.32

again 2K question 1K answer and let's

Time: 866.24

say this is a 2K question so at the end

Time: 868.759

of the day when you are hitting the

Time: 870.279

third level of conversation I'm kind of

Time: 872.44

exaggerating but let's say 2 + 3 uh 2 +

Time: 875.48

1 3 3 6 8 so you already hit 8K so

Time: 879.6

conversation context window so if you

Time: 881.36

have got 8K token model at this point

Time: 884.16

your model will hit out of memory error

Time: 886.44

or it cannot hold it in shortterm memory

Time: 889.959

and that is exactly why you need rag

Time: 893.079

ritual augmented generation because this

Time: 896.32

one is not bound by the conversation of

Time: 899.279

course you are going to keep it in

Time: 900.44

conversation but you don't have to stuff

Time: 902.92

everything inside your question rather

Time: 904.68

you can keep it inside your index right

Time: 908

because you already indexed and you can

Time: 909.8

keep it and only the bit that is

Time: 912.199

relevant comes to you and now you might

Time: 914.12

be asking how is that possible and for

Time: 916.839

that you know you go into like a

Time: 918.959

separate tangential side that talks

Time: 921.04

about semantics and uh semantic search

Time: 923.759

and all the other things embedding

Time: 924.959

semantic search that is quite out of

Time: 927.24

scope uh if you want to go deep you

Time: 928.56

should read rag llama index is an

Time: 930.68

excellent library for you to read about

Time: 932.319

rag uh they have got really good

Time: 934.44

developer relation system uh they have

Time: 936.399

got a lot of Articles uh and you should

Time: 938.6

definitely read about llama index and

Time: 940.759

rag if you want Advanced rag but I hope

Time: 943.68

you get the point going back to our

Time: 946.759

system that we put together so what do

Time: 949

we have we have a Q and A system at the

Time: 951.56

front which just takes an input gives an

Time: 954.279

output nothing else then you have got

Time: 955.959

the chatbot the input plus history goes

Time: 959.319

together that is always short-term

Time: 960.56

memory you get the output the output

Time: 962.68

also goes back to the input that's why

Time: 964.16

you keep the conversation history then

Time: 966.24

you have got a rag retrieval augmented

Time: 968.44

generation the reason why it is called

Time: 970.519

retrial augmented generation is because

Time: 972.72

you have got a retrieval component that

Time: 975.12

you augment with the llm component and

Time: 978.519

then you generate the response back so

Time: 981.079

that is retrial augmented generation and

Time: 983.88

the applications are enormous there are

Time: 987.079

a lot of startups in in 2024 when we are

Time: 990.24

recording this lot of startups just

Time: 993

doing rag so if you can build a rag

Time: 995.36

solution today in 2024 you can probably

Time: 998.8

even raise F or you can be a good

Time: 1000.92

successful SAS there are a lot of

Time: 1003.36

companies making really good money solid

Time: 1005.6

money out of it I'll give you an example

Time: 1007.56

in fact like one thing that I've seen

Time: 1009.48

site gp. if you go to site

Time: 1012.48

gp. it says make eii your customer

Time: 1016.16

export Export customer support agent and

Time: 1019.04

I know this is this is a product that is

Time: 1021.319

making a lot of money um hundreds and

Time: 1023.519

thousands of dollars and at the

Time: 1025.919

foundation of it it is a rag it takes

Time: 1028.48

all the information that is available in

Time: 1030.439

your website indexes it or we call it

Time: 1033.76

data injection injection and index is

Time: 1035.919

set and when you ask a question it just

Time: 1038.28

gives you an answer back that's it it's

Time: 1040.64

not just a normal chatbot it is a

Time: 1042.559

chatbot that can answer based on the

Time: 1044.64

existing data so if you are breaking

Time: 1046.959

into llm today I would strongly

Time: 1049.52

encourage you to do some rag system that

Time: 1053.44

is by default something that you should

Time: 1055.88

do so if you're University student

Time: 1057.6

watching this if you're an early in

Time: 1059.08

career professional I would say you

Time: 1060.919

should build a couple of rag examples so

Time: 1062.96

you know there are a lot of no aners in

Time: 1065

rag like how do you improve indexing how

Time: 1067.24

do you improve indexing by changing

Time: 1069.679

chunking what kind of algorithms you use

Time: 1071.64

for embedding and what kind of models

Time: 1073.679

are good with rag whether you put the

Time: 1075.84

text at the top is it good whether you

Time: 1077.44

put the text at the bottom is it good

Time: 1078.88

good if the text is in the middle it is

Time: 1080.36

good a lot of components to rag rag is

Time: 1082.96

not just simply what we discuss usually

Time: 1084.76

on this channel you can go Advanced Rag

Time: 1087.36

and I would strongly encourage you to

Time: 1088.84

spend some time in drag unless you want

Time: 1092.2

to get into something that is quite

Time: 1093.88

exciting and interesting but before we

Time: 1095.84

do that I would like to quickly show you

Time: 1098.12

one more thing that not a lot of people

Time: 1100.64

discuss when we talk about llms it is

Time: 1103.6

not necessarily rag it is just like

Time: 1105.559

using short-term memory so it doesn't

Time: 1107.36

use long-term memory but it has its own

Time: 1110.12

potential which is to use llms large

Time: 1112.799

language models for classical NLP task

Time: 1117.039

classical NLP Downstream tasks for

Time: 1119.84

example let's say you want to build a

Time: 1121.72

text classification system what is a

Time: 1123.48

text classification system you give a

Time: 1126

sentence for example uh the movie was

Time: 1129.12

complete crap now is it positive or

Time: 1132.84

negative positive or negative you choose

Time: 1136.88

you build you train a text class

Time: 1138.76

classification model just to figure out

Time: 1140.4

this for example or the other example I

Time: 1142.28

can give is you have a review let's say

Time: 1145.12

the movie was amazing and the actress um

Time: 1150.28

was

Time: 1151.84

exceptional now you try to build a model

Time: 1155.76

that will say what kind of review is

Time: 1158.88

this for example is this review about

Time: 1160.88

movie um theater or director or actor so

Time: 1166.36

now you know this is an actor based so

Time: 1168.2

this is what Tex classification in

Time: 1170.64

classical nlps there are lot of other

Time: 1172.679

tasks that you do in classical NLP what

Time: 1175.64

you can do is without having to build

Time: 1178.4

your custom model like let's say bird

Time: 1180.44

based model XG boost based models you

Time: 1182.96

can use llms large language models for

Time: 1187.64

classical NLP problems because large

Time: 1189.44

language models have really good in

Time: 1191.32

context learning and with the current

Time: 1193.84

memory that you have got with a few

Time: 1195.36

short examples or tree tree of thoughts

Time: 1198.159

or a chain of thoughts you can make your

Time: 1200.72

large language models a good

Time: 1203.559

zero NLP classifier or it is applicable

Time: 1207.159

for lot of other tasks as well so one

Time: 1209.76

thing that not a lot of people are

Time: 1211.4

exploring I would encourage you to

Time: 1213.44

explore if you work in classical NLP

Time: 1216.159

problems like labeling or a text

Time: 1218.6

classification entity recognition

Time: 1220.679

whatever it is you can leverage llm now

Time: 1223.76

the question is do you want an llm based

Time: 1227.12

Solution that's a different topic I'm

Time: 1229.72

not talking about you know you looking

Time: 1231.76

for a nail because you have got a hammer

Time: 1234.32

I'm just saying that this is a good

Time: 1236.36

opportunity wherever you don't want to

Time: 1238.32

build models you can use this but of

Time: 1240.44

course if you can build models that will

Time: 1242.36

be probably cheaper than you know making

Time: 1244.4

calls to llms and getting answer back

Time: 1246.48

but summarization text classification

Time: 1248.76

entity recognition I think llms are

Time: 1251.2

exceptional zero short llm uh and down

Time: 1255.08

for Downstream tasks and you should

Time: 1256.64

definitely leverage it now now with this

Time: 1261.08

we have arrived at rag okay so we have

Time: 1264.28

arrived at Rag and we already know what

Time: 1266.24

is rag now we are entering into a very

Time: 1269.6

interesting phase about what everybody

Time: 1273.039

is obsessed with what everybody's love

Time: 1277.52

agents very recent announcements from

Time: 1281.12

Google Microsoft previously open Ai and

Time: 1285.12

every announcement you would have seen

Time: 1287.279

two important things as a common Trend

Time: 1291.279

one is you would have seen

Time: 1294.12

multimodality multimodality what does it

Time: 1296.799

mean it just simply means instead of

Time: 1300.24

just chatting with text you can chat

Time: 1303.08

with images you can ask questions in

Time: 1305.44

voice it can respond back in speech you

Time: 1308.08

can send videos so one important Trend

Time: 1311.48

that you are seeing is

Time: 1312.919

multimodality and the second important

Time: 1315.72

Trend that you see everywhere is Agents

Time: 1319.159

multi-agent setup where you have got

Time: 1321.159

multiple agents you can summon these

Time: 1323.32

agents to do certain tasks and these

Time: 1326.039

agents will do it for you just like men

Time: 1328.24

in black MIB they have a purpose and

Time: 1331.6

they will do certain tasks but before I

Time: 1334.159

jump into agents I want to actually

Time: 1336.6

introduce you to another important

Time: 1338.6

concept called function calling function

Time: 1341.2

calling is the precursor to llm agents

Time: 1345.32

in function calling what you do is you

Time: 1346.799

have a short-term prompt you you have

Time: 1348.76

prompt you have short-term memory uh

Time: 1350.559

sometimes you need external memory

Time: 1351.88

sometimes you don't need external memory

Time: 1353.84

but you are giving the ability of

Time: 1357.76

calling external tools and you are

Time: 1360.48

giving the ability of calling external

Time: 1362.2

tools by doing something called function

Time: 1363.88

calling function calling to be honest is

Time: 1365.36

a terrible name cuz you're not calling

Time: 1367.72

any function here you're you're not

Time: 1369.6

making the llm call anything not at all

Time: 1372.279

all you are doing is you're forcing the

Time: 1374.159

llm to give a structured response back

Time: 1377.48

so you can call and I'll give you an

Time: 1379.72

example what is function calling so

Time: 1381.96

let's say that you have a weather API

Time: 1384.48

okay weather I think everybody goes with

Time: 1387.24

weather AP so I'm going to I'm going to

Time: 1388.799

skip uh let's say you have a currency

Time: 1391.12

converter okay currency converter what

Time: 1393.72

kind of things a currency converter need

Time: 1396.08

okay you need input currency you need

Time: 1398.76

output currency you need date you need

Time: 1401.12

amount technically these are the four

Time: 1403.039

things you need what is the amount that

Time: 1404.84

you want to convert for what is the

Time: 1406.24

input currency what is the output

Time: 1407.44

currency and what is the date for which

Time: 1409.12

you want to do currency conversion let's

Time: 1411.64

keep as a simple APA now typically when

Time: 1414.52

you go to an llm okay and say what is

Time: 1417.52

USD to irr

Time: 1419.48

today first of all llm may not

Time: 1422.24

understand what is today llm might know

Time: 1425

USD llm might know INR but lm's memory

Time: 1428.44

is frozen because it's a snapshot see a

Time: 1431.44

large language model is a snapshot so it

Time: 1433.679

memory has been frozen to let's say

Time: 1435.159

September 2023 or something like that

Time: 1437.52

okay so what it cannot do is it cannot

Time: 1440.32

give you the latest information and you

Time: 1442.2

cannot do this with I mean you can do

Time: 1444.4

this with rag kind of like you can every

Time: 1446.6

day take knowledge ingest keep it in

Time: 1448.64

your memory and then give it back not

Time: 1450.76

very efficient um expand this to stock

Time: 1453.24

market a daily data doesn't matter

Time: 1455.24

because everything changes like every

Time: 1457.24

minute and every second so you need

Time: 1458.64

something instant what you do you call

Time: 1461.2

an API if you are a programmer that's

Time: 1462.919

what you would naturally do you call an

Time: 1465.08

API now if you want to call an API uh

Time: 1468.72

what you need to call an API so let's

Time: 1470.84

say this is the information what I need

Time: 1472.48

at the end of the day I want to call it

Time: 1474.039

currency converter right and I'll say

Time: 1477.44

input output date amount right I need to

Time: 1481.6

make a call like this so I need four

Time: 1483.399

arguments that is a solid input could

Time: 1486.44

not be like oh United States dollar and

Time: 1489.96

some other time I'll be like USD some

Time: 1491.559

other time I'll be like US dollar I mean

Time: 1493.52

that will not work right you need a

Time: 1496.76

specific format for everything your

Time: 1499.2

let's say amount should be an integer

Time: 1501.88

right a this should be a date object so

Time: 1504.32

you need to force this llm to give you a

Time: 1506

particular response back otherwise what

Time: 1507.64

happens is this LM will throw you

Time: 1508.96

anything for example what I want to say

Time: 1510.44

is what is USD and I so it'll be like oh

Time: 1513.6

USD and I Soo on September 2023 so you

Time: 1517.6

have to force guide the llm to make a

Time: 1521.32

particular type of output and somehow

Time: 1524.2

universally everybody has accepted that

Time: 1525.88

format is going to be Json EX except

Time: 1528.6

anthropic which absolutely loves XML so

Time: 1530.919

if you use anthropic you use XML if you

Time: 1532.88

use any other model you use Json so

Time: 1534.96

you're forcing an llm to give you a

Time: 1536.64

structured response back a Json that can

Time: 1541.08

help you make this function call you can

Time: 1545.159

call this function with that Json so a

Time: 1547.84

guided response into a Json is what

Time: 1551.039

everybody calls function calling you

Time: 1552.48

don't necessarily call the function and

Time: 1554.76

function calling but you get the output

Time: 1556.96

that will help you call function call

Time: 1559.48

right clear now that is exactly what is

Time: 1563.2

a precursor to agent because in a

Time: 1566.12

function call you have the ability to

Time: 1568.919

call a function and agents are nothing

Time: 1572.32

but a bunch of function calls stitched

Time: 1574.159

with tools so what do we have in agents

Time: 1576.919

we have a bunch of function calls plus

Time: 1579.799

tools and I would like to introduce to

Time: 1582.88

you a very interesting solution that can

Time: 1585.799

help you understand more about a

Time: 1588.88

agents if you are too old in the AI

Time: 1592.76

world you would have probably recognized

Time: 1595

this immediately and this was the

Time: 1597.08

workflow of something called Baby AGI so

Time: 1601.399

baby AGI was quite a popular thing back

Time: 1604.559

in the day I mean back in the days like

Time: 1606.039

less than one year before I guess or

Time: 1607.679

maybe more than one year a function call

Time: 1610.6

is what I said is the foundation of

Time: 1613.039

Agents but what is an agent now if you

Time: 1616.52

have seen our pyramid you would know

Time: 1618.279

know our agent sits right at the top

Time: 1622.12

like closer to what we our aspirational

Time: 1624.399

figure is now what is this agent how do

Time: 1626.96

you define an agent so it's simple first

Time: 1630.559

of all a chatbot and a rag all of these

Time: 1634.919

guys if you see here they end a text or

Time: 1638.72

you know some kind of thing like input

Time: 1640.88

output images video all these things

Time: 1643.399

right that's where they in one of these

Time: 1645.44

modalities they're done what you achieve

Time: 1648.279

with agent is something that is

Time: 1650.84

absolutely stunning you don't stop at

Time: 1654.12

text response you stop at an action you

Time: 1657.52

trigger an action and that is what

Time: 1660.36

agents are simply you take llm you

Time: 1663.519

connect them with tool you give them a

Time: 1665.44

purpose or goal that is your agent and

Time: 1668.6

that is exactly what baby AG has done

Time: 1671.2

back in the day like there are multiple

Time: 1672.72

agents now but if you see baby a which

Time: 1675.64

is a very wonderful framework you can

Time: 1677.559

see that there is a task like there is

Time: 1680.84

something that has to happen there are

Time: 1682.6

certain tools like for example Vector DB

Time: 1684.679

and all the other things are there and

Time: 1686.96

every agent has a purpose like okay you

Time: 1690.399

have to execute you have to return you

Time: 1691.919

have to do something you have to do

Time: 1693

something and they have a goal so you

Time: 1695.76

have tools purpose SL goals and llms and

Time: 1700.919

this all together work for a common goal

Time: 1703.76

and that is your agent there are

Time: 1705.919

multiple agent Frameworks that are quite

Time: 1707.519

popular these days is crew AI L graph

Time: 1711.08

you have got a py autogen and most of

Time: 1713.519

these things you will see first you have

Time: 1715.64

to define a role you have to refine a

Time: 1719.12

goal Define a goal a role goal and then

Time: 1722.32

you have to save which llm that you want

Time: 1724.159

to use as a backend engine and then you

Time: 1726.44

put together a system of one this is

Time: 1728.36

single agent now you put together like

Time: 1730.96

this is a team that is your multi-agent

Time: 1733.44

setup with agents people are doing

Time: 1736

amazing things you can make make an

Time: 1738.559

agent book your ticket you can make an

Time: 1740.96

agent let's say read something um

Time: 1743.84

distill something create a note publish

Time: 1746.08

the blog post you can summon these

Time: 1748.159

agents to do a lot of things and

Time: 1750.08

personally for me uh the most time that

Time: 1752.72

I spent reading about agents because you

Time: 1756.039

it's it's becoming quite obvious that

Time: 1757.88

agents are the next Frontier in uh the

Time: 1761.36

way we can take llms forward I mean

Time: 1763.799

there are a lot of different things but

Time: 1765.039

at least personally I'm quite interested

Time: 1766.559

in automation usually and I think agents

Time: 1768.919

are going to be the next big thing in I

Time: 1771.64

mean currently itself is a big thing

Time: 1773.72

Google has got Google's own projects

Time: 1775.799

like they call their own agents I don't

Time: 1777.36

know what they call they have a lot of

Time: 1778.36

different names opena has its own agents

Time: 1781.039

and uh every time you talk to some

Time: 1783

company you speak about agents because

Time: 1784.84

you want to summon these agents you want

Time: 1786.76

to connect these llms to like different

Time: 1788.96

dimension and on this Dimension that

Time: 1791

what we are connecting is the tools

Time: 1792.559

Dimension so you take llms you have the

Time: 1795.2

function calling ability and once you

Time: 1797.2

connect them to to tools you are

Time: 1799.36

unlocking the potential of something

Time: 1801

immense and that is what you call as

Time: 1803.279

agents I'm not going deep into agents

Time: 1805.76

because this is probably I'm hoping it

Time: 1807.96

to be a series depending upon how you

Time: 1809.519

all like it but in the series my next

Time: 1812

focus is going to be agents so agent is

Time: 1814.84

quite closer to the top and that takes

Time: 1817.88

us to the almost the end of the video

Time: 1821.039

which is what is our aspirational thing

Time: 1823.64

what is that we are all trying to go

Time: 1826.08

towards to which is L LM OS and this is

Time: 1829.919

inspired by Andre kPa who created this

Time: 1833.039

amazing structure so what is happening

Time: 1835.08

here this talks about using llm at the

Time: 1838.96

center of a conversation or sorry center

Time: 1841.039

of an operating system if you go back in

Time: 1843.72

the day computer was created just for

Time: 1845.72

simple calculation purpose right you

Time: 1847.36

want to add a and you want to add a and

Time: 1850

b you want to keep a for one and B for

Time: 1853.039

two and then you want to add them that's

Time: 1855.159

that's what like initially computer was

Time: 1856.679

started like very very very back back in

Time: 1859.039

the days then computation started

Time: 1860.88

increasing computation started becoming

Time: 1862.919

less expensive more compute then we have

Time: 1865.639

the computer that we have today and

Time: 1867.76

garpa is arguing can we have a similar

Time: 1870.84

vision for llm and where the vision is

Time: 1874.159

you keep llm at the center right you

Time: 1876.84

keep llm at the center and at the center

Time: 1880

with llm you have Ram which is the

Time: 1882.6

shortterm memory or the context window

Time: 1885.76

then you have long-term memory the diss

Time: 1888

system that can be used with rag then

Time: 1891.399

you have the agent structure that you

Time: 1894.159

have with tools and then you connect it

Time: 1897

with internet and when you connect it

Time: 1898.88

with other llms to have like a

Time: 1900.44

multi-agent setup or like a peripheral

Time: 1902.96

setup and then you have your peripheral

Time: 1905.039

devices where you have got audio and

Time: 1906.679

video can we put together a system with

Time: 1910.039

all these things working towards a

Time: 1911.96

common goal and that will ideally become

Time: 1914.48

your large language model operating

Time: 1917

system this is quite a vision at this

Time: 1919.039

point there are certain implementations

Time: 1920.6

available at this point those

Time: 1922.6

implementations are based on current

Time: 1924.84

understanding they are mostly let's say

Time: 1927.639

llms plus function calling plus agents

Time: 1930.679

multi-agent more tools that is what the

Time: 1933.32

current llm OES it's not like a

Time: 1935.32

radically has a different total View

Time: 1937.919

altoe and that's why if you see even in

Time: 1940.08

my framework that I've created llm o is

Time: 1942.72

currently developing and it is

Time: 1944.519

everything that we have got the tools

Time: 1947.32

the extended tools the peripheral tools

Time: 1949.639

with long-term memory with shortterm

Time: 1951.44

memory just one input from the user

Time: 1954.12

where it can run itself and then it can

Time: 1956.24

execute certain things I think that is a

Time: 1958.2

future that we are heading I'm not sure

Time: 1959.84

when we are going to do it but uh if

Time: 1961.96

somebody says something a for me today a

Time: 1965.08

could be like this could be like the

Time: 1966.24

baby a I mean I don't I don't I don't

Time: 1968.88

trust a as a concept anytime soon but um

Time: 1972.36

yeah leaving the conscious thing

Time: 1973.919

Consciousness and all the other things

Time: 1975.32

out I would say llm o is at the top

Time: 1978.44

where we can expect something closer to

Time: 1980.2

a happen and all these things lead us up

Time: 1983.88

to there so I wanted to keep this video

Time: 1986.279

brief but uh this video is already going

Time: 1988.519

to be like more than half an hour I

Time: 1989.84

wanted this to be like a crash course

Time: 1991.44

where you understand if you don't know

Time: 1993.24

anything about llm OS uh maybe you have

Time: 1995.559

not taken any course so this is going to

Time: 1998.279

help you to see how the future of llm O

Time: 2001.159

is coming and what led us up to there

Time: 2003.72

and uh let me know in the comment

Time: 2005.44

section if you like this kind of content

Time: 2007.36

I'll put together more this took me a

Time: 2009.279

lot of time to create the framework

Time: 2011.6

design put it um in a particular thought

Time: 2014.36

process to you know make it make it

Time: 2016.48

understandable and this is basically

Time: 2018.32

what a lot of llm courses offer so I'm

Time: 2020.679

I'm definitely looking forward to hear

Time: 2022.08

more feedback and if you like this kind

Time: 2024.32

of format subscribe to the channel see

Time: 2026

you in another video Happy prompting