I wish every AI Engineer could watch this.
five levels of llm apps consider this to
be a framework and help you decide where
you can use llm there are lot of
different myths around what llms can do
what llms cannot do where do you use
llms today so I decided to put together
this material uh in which I'm going to
take you through kind of like a mental
framework based on the extension or the
depth in which you go towards an LM you
can decide where you can fit this llm so
we're going to first see what are those
different levels of llms that I have put
together then we are going to see slight
extension of that got two different
documents to take you through that so
this will give you an idea about how LM
is being used today and how you can use
llms for your own applications to start
with imagine this pyramid structure this
is a very simple pyramid structure and
as you can imagine with any pyramid
structure the top of the pyramid or the
peak of the pyramid is our aspirational
goal and what you see at the bottom is
the easiest that we can do and as with
everything else you have to slowly climb
to the top of the pyramid so you can
probably hit the aspirational goal so to
start with where do we use llms first Q
and A a question and answering engine
what do I mean by that it is quite
simple for us to understand so question
and answering engine is a system where
you have an llm and all you are going to
ask the llm is a question so you send a
prompt and the llm takes the prompt and
gives you an answer that is it that is
the entire transaction that you have
between an llm send a prompt get send it
to the llm get an answer llm large
language models are nothing but
sophisticated next word prediction
engines and they have been fine-tuned
with something called instruction so the
instruction fine tune models that means
they can take a human instruction and
get you an answer back for example if I
ask a question for this what is the
capital of India then the llm would
process this and then llm has
information about how to answer it and
then it will give me the answer back the
capital of India is New Delhi that's all
what you're going to do with this thing
so first level question and answering
now you might wonder at this point that
where can you use question and answering
as an llm engine this is the first thing
that people built like when llm started
even back in the day gp22 level people
started building simply Q&A bots so all
you want to do is ask a question give an
answer could be a homework could be a
general knowledge question could be
something about the world could be about
science could be about anything ask a
question get an answer as simple as that
it's a very three-step process ask a
question or send a prompt take the llm
to process it give me the answer back
very simple application now what you're
going to do is you're going to add
something to that application and that
is how you actually build a
conversational chat bot and to
understand this better I would like to
take you to my second document which
will give you probably better idea
whenever we are talking about llm
there's one important thing that we need
to understand is we have crossed the
stage where llm is simply a large
language model we have more than that so
for you to understand that I have five
dimensions a prompt a short-term memory
an external knowledge tools and extended
tools if you think of this as your
horizontal these are your verticals
these are different dimensions that can
add to an LM so you have a prompt you
have a short-term memory you have a
long-term memory or external data you
have tools and you have got extended
tools so let me give you an example for
each of this so that you can understand
this better a prompt is what is the
capital of New Delhi that's all the
prompts you simply go give what is the
capital of New Delhi and the llm
understands it and gives you a back
understanding just gives it back now
shortterm memory is when you have
conversational history or something in
the llm that is what we call as ICL in
context learning so whatever you stuff
inside the context window the llm can
take it that is your shortterm memory so
you give a few short examples you give
an example like for example what is the
capital of us uh I guess it's Washington
DC Washington DC and you give a bunch of
examples like this so the llm knows what
is that it has to answer this is a
short-term memory next you have external
data now you take data from Wikipedia
and you keep it and then give it to the
LM that is your long-term memory because
short-term memory just like a computer
the ram it gets reset every time you
reset the conversation or the session
and then tools you let llm use tools
like calculator internet python terminal
and all these things and extended tools
is when you expand this much Beyond it I
hope now you have understanding about
the five different dimensions that we
have in llms a prompt a shortterm memory
or in context memory a long-term memory
or external knowledge external data or
custom knowledge tools like calculators
and python Ripple and extended tools
that goes much beyond that what we do
not have currently so these are
different dimensions now coming into
what we wanted to see is chatbot so how
do you make a Q&A bot as a chat bot is
very simple now at this point you might
have already got this idea so you take a
prompt and you give it to the llm where
you can have shortterm memory me in
context memory in context learning for
example so what is the capital of India
so you what is the capital of India you
ask and the llm answers New Delhi this
is what happens in a simple q and a bot
but how do you make it a conversational
bot or a chat bot by adding a new
dimension called shortterm memory and
how do you do that you keep all these
things that you are conversing into the
chat conversational history so what this
gives the ability for an llm to do is
when you say what is the capital of
India it says new D then you can just
simply go and say what are some famous
Cuisines uh there
so at this point the llm would have an
understanding you're talking about New
Delhi because that conversation is
stored there in the lm's shortterm
memory or the in context memory so the
llm can do something called I in context
learning and give you the response back
and that is how you upgrade in the
pyramid by building a Q&A Bard giving a
new dimension call history and then
making the Q&A bot a chat bot so that it
can converse now chat bot has
applications everywhere that you can
turn towards youve got chatbot in
customer support you have got chatbot on
websites you have got chatbot for
Education like you've seen a lot of
demos from Khan Academy so chatbot is
quite versatile it almost has its
purpose in every single business or
domain that you can think of now people
were using chatbot um but you know
chatbot itself is not enough why we
already know the answer to the
question can you pause and answer if you
know the answer so why is that chatbot
is not enough uh for a lot of use cases
the answer to the question is chatbot
stops with only short-term memory you
need long-term memory or you need
external memory see for example I ask
what is the capital of India it says new
what are the famous quins there it will
give me an answer quite valid llm is
doing its job so let's say I'm a I'm a
company okay so I'm I'm an organization
let's take uh Apple for an example okay
now I ask what who is the CEO of Apple
of course the internet has information
about it so it will say Tim Cook that's
quite easy now if I go say who is the
manager of the team handling iPhone 16
will it answer no I mean it might answer
because it hallucinates a lot but the
answer would not be correct and that has
become a big bottleneck in a lot of
Enterprise use cases because you do not
just need internet knowledge you do not
just need the knowledge that the llm has
got you need more than that and that is
the custom knowledge component or the
external knowledge component that you
need the dimension that you need to make
your llm slightly more than just a
chatbot and that is where a new
technique called rag comes into picture
retrieval augmented generation where you
use the knowledge that you provide or
you call it a long-term memory you use
the documents the internet the sources
everything that you have around and you
use that knowledge to send to route to
llm and then make the llm use the
leverage that knowledge and now at this
point probably you might have guessed it
see first we had only prompt one
dimension second we had shortterm memory
two Dimension now we have external
knowledge which is three dimension so
this llm is at the center of three
different things you have got prompt you
have got um short-term memory and you
have got long-term memory to make you
understand this better uh so I'm going
to take you to the rag so how does a rag
look like so you have got the llm at the
center of it you have got your data
somewhere available so it could be on
different structures it could be on
database most organizations have data in
their database structure database rdbms
database then you have got documents
which are unstructured like PDF HTML
files internal portals blah blah blah
blah blah then you have got apas let's
say you are a sales team uh probably
your data is in some CRM or Salesforce
right so you need a programmatic call to
make the call and get the answer back so
your data could be of these different
places could be like structured database
like rdbms system it could be
unstructured documents uh PDFs uh HTML
documents anything that you have locally
and then you have got programmatic
access like you're a marketing team you
need data from Google ads you a sales
team you need data from Salesforce you
are your company is heavily into it so
you need data from AWS like billing cost
and all other things so this is
programmatic so you use one of these
methods a structured passing or
unstructured passing a programmatic call
and take all the input data and create
an index an index is what Google creates
at every single moment you have got all
these websites what Google does is
Google creates this index so it is easy
easier for Google to go Travers when
somebody's asking a question and that's
how Google became popular before Google
people were using totally different
thing Google came up with something
called page rank algorithm at the
fundamental of page rank algorithm you
have got this index with the different
parameters of course and definitely
we're not building Google but so index
is what we are building it makes it
easier for you to understand what is
inside the data so now a user comes in
asks a question what is a question who
is the manager of iPhone 16 team so so
that question goes to the index the in
this this system particular system takes
that and picks only the relevant
information see this index might have
information about all the teams iPhone
16 Apple Vision Pro billing accounting
procurement marketing blah blah blah
blah blah so it has all the
information what you are interested in
is only this particular piece which is
what you asked which is iPhone 16
manager so it this particular part is
where it takes only the relevant
information from the index and then it
matches with the query uh The Prompt
that you give and then it finally gives
you sends it to the llm The Prompt what
you asked and the data that you
extracted and it goes to the llm llm
gives the answer back to the user this
is quite different from the chatbot
application if you see I'll give you an
example why so in the chat bot all you
are doing is you have a memory question
is there sometimes you might do uh let's
say a long-term memory by doing user
profiling I'll I'll ignore this for now
you don't have to use this now ignore
this for now so what you're doing is you
have a question you're sending it as a
prompt and you have memory that also
goes to the prompt because that's how
you can do it and you have llm answering
this question and you get the answer
back now you might ask me hey why do I
need to put my thing in the external
data and create an index rather why
can't I keep it in memory if you have
got this question at this point that is
a very important question and you are
thinking in the right direction in fact
people who reached at this point you can
tell me whether you know the answer or
not the reason why we cannot do this uh
or we could not have done it early in
these days of alms is due to an
important factor called
CTX window what is CTX window CTX window
is nothing but called context window
this internal memory and question or the
short-term memory and the question is
bounded by what is the context window of
this particular l so you have an llm the
llm might have context window like 4K
which is quite popular these days or 8K
and even G like LMS have like 1 million
as context window so context window is
there now what you are actually doing
here is you have a question the llm
answers so you have a question one right
and answer one comes back then you have
a question two then you have answer two
by the time you go to question three
what you are sending to the llm is not
just your question 3 you are actually
sending all these things right so let's
say this is 2K this is 1K answer then
again 2K question 1K answer and let's
say this is a 2K question so at the end
of the day when you are hitting the
third level of conversation I'm kind of
exaggerating but let's say 2 + 3 uh 2 +
1 3 3 6 8 so you already hit 8K so
conversation context window so if you
have got 8K token model at this point
your model will hit out of memory error
or it cannot hold it in shortterm memory
and that is exactly why you need rag
ritual augmented generation because this
one is not bound by the conversation of
course you are going to keep it in
conversation but you don't have to stuff
everything inside your question rather
you can keep it inside your index right
because you already indexed and you can
keep it and only the bit that is
relevant comes to you and now you might
be asking how is that possible and for
that you know you go into like a
separate tangential side that talks
about semantics and uh semantic search
and all the other things embedding
semantic search that is quite out of
scope uh if you want to go deep you
should read rag llama index is an
excellent library for you to read about
rag uh they have got really good
developer relation system uh they have
got a lot of Articles uh and you should
definitely read about llama index and
rag if you want Advanced rag but I hope
you get the point going back to our
system that we put together so what do
we have we have a Q and A system at the
front which just takes an input gives an
output nothing else then you have got
the chatbot the input plus history goes
together that is always short-term
memory you get the output the output
also goes back to the input that's why
you keep the conversation history then
you have got a rag retrieval augmented
generation the reason why it is called
retrial augmented generation is because
you have got a retrieval component that
you augment with the llm component and
then you generate the response back so
that is retrial augmented generation and
the applications are enormous there are
a lot of startups in in 2024 when we are
recording this lot of startups just
doing rag so if you can build a rag
solution today in 2024 you can probably
even raise F or you can be a good
successful SAS there are a lot of
companies making really good money solid
money out of it I'll give you an example
in fact like one thing that I've seen
site gp. if you go to site
gp. it says make eii your customer
export Export customer support agent and
I know this is this is a product that is
making a lot of money um hundreds and
thousands of dollars and at the
foundation of it it is a rag it takes
all the information that is available in
your website indexes it or we call it
data injection injection and index is
set and when you ask a question it just
gives you an answer back that's it it's
not just a normal chatbot it is a
chatbot that can answer based on the
existing data so if you are breaking
into llm today I would strongly
encourage you to do some rag system that
is by default something that you should
do so if you're University student
watching this if you're an early in
career professional I would say you
should build a couple of rag examples so
you know there are a lot of no aners in
rag like how do you improve indexing how
do you improve indexing by changing
chunking what kind of algorithms you use
for embedding and what kind of models
are good with rag whether you put the
text at the top is it good whether you
put the text at the bottom is it good
good if the text is in the middle it is
good a lot of components to rag rag is
not just simply what we discuss usually
on this channel you can go Advanced Rag
and I would strongly encourage you to
spend some time in drag unless you want
to get into something that is quite
exciting and interesting but before we
do that I would like to quickly show you
one more thing that not a lot of people
discuss when we talk about llms it is
not necessarily rag it is just like
using short-term memory so it doesn't
use long-term memory but it has its own
potential which is to use llms large
language models for classical NLP task
classical NLP Downstream tasks for
example let's say you want to build a
text classification system what is a
text classification system you give a
sentence for example uh the movie was
complete crap now is it positive or
negative positive or negative you choose
you build you train a text class
classification model just to figure out
this for example or the other example I
can give is you have a review let's say
the movie was amazing and the actress um
was
exceptional now you try to build a model
that will say what kind of review is
this for example is this review about
movie um theater or director or actor so
now you know this is an actor based so
this is what Tex classification in
classical nlps there are lot of other
tasks that you do in classical NLP what
you can do is without having to build
your custom model like let's say bird
based model XG boost based models you
can use llms large language models for
classical NLP problems because large
language models have really good in
context learning and with the current
memory that you have got with a few
short examples or tree tree of thoughts
or a chain of thoughts you can make your
large language models a good
zero NLP classifier or it is applicable
for lot of other tasks as well so one
thing that not a lot of people are
exploring I would encourage you to
explore if you work in classical NLP
problems like labeling or a text
classification entity recognition
whatever it is you can leverage llm now
the question is do you want an llm based
Solution that's a different topic I'm
not talking about you know you looking
for a nail because you have got a hammer
I'm just saying that this is a good
opportunity wherever you don't want to
build models you can use this but of
course if you can build models that will
be probably cheaper than you know making
calls to llms and getting answer back
but summarization text classification
entity recognition I think llms are
exceptional zero short llm uh and down
for Downstream tasks and you should
definitely leverage it now now with this
we have arrived at rag okay so we have
arrived at Rag and we already know what
is rag now we are entering into a very
interesting phase about what everybody
is obsessed with what everybody's love
agents very recent announcements from
Google Microsoft previously open Ai and
every announcement you would have seen
two important things as a common Trend
one is you would have seen
multimodality multimodality what does it
mean it just simply means instead of
just chatting with text you can chat
with images you can ask questions in
voice it can respond back in speech you
can send videos so one important Trend
that you are seeing is
multimodality and the second important
Trend that you see everywhere is Agents
multi-agent setup where you have got
multiple agents you can summon these
agents to do certain tasks and these
agents will do it for you just like men
in black MIB they have a purpose and
they will do certain tasks but before I
jump into agents I want to actually
introduce you to another important
concept called function calling function
calling is the precursor to llm agents
in function calling what you do is you
have a short-term prompt you you have
prompt you have short-term memory uh
sometimes you need external memory
sometimes you don't need external memory
but you are giving the ability of
calling external tools and you are
giving the ability of calling external
tools by doing something called function
calling function calling to be honest is
a terrible name cuz you're not calling
any function here you're you're not
making the llm call anything not at all
all you are doing is you're forcing the
llm to give a structured response back
so you can call and I'll give you an
example what is function calling so
let's say that you have a weather API
okay weather I think everybody goes with
weather AP so I'm going to I'm going to
skip uh let's say you have a currency
converter okay currency converter what
kind of things a currency converter need
okay you need input currency you need
output currency you need date you need
amount technically these are the four
things you need what is the amount that
you want to convert for what is the
input currency what is the output
currency and what is the date for which
you want to do currency conversion let's
keep as a simple APA now typically when
you go to an llm okay and say what is
USD to irr
today first of all llm may not
understand what is today llm might know
USD llm might know INR but lm's memory
is frozen because it's a snapshot see a
large language model is a snapshot so it
memory has been frozen to let's say
September 2023 or something like that
okay so what it cannot do is it cannot
give you the latest information and you
cannot do this with I mean you can do
this with rag kind of like you can every
day take knowledge ingest keep it in
your memory and then give it back not
very efficient um expand this to stock
market a daily data doesn't matter
because everything changes like every
minute and every second so you need
something instant what you do you call
an API if you are a programmer that's
what you would naturally do you call an
API now if you want to call an API uh
what you need to call an API so let's
say this is the information what I need
at the end of the day I want to call it
currency converter right and I'll say
input output date amount right I need to
make a call like this so I need four
arguments that is a solid input could
not be like oh United States dollar and
some other time I'll be like USD some
other time I'll be like US dollar I mean
that will not work right you need a
specific format for everything your
let's say amount should be an integer
right a this should be a date object so
you need to force this llm to give you a
particular response back otherwise what
happens is this LM will throw you
anything for example what I want to say
is what is USD and I so it'll be like oh
USD and I Soo on September 2023 so you
have to force guide the llm to make a
particular type of output and somehow
universally everybody has accepted that
format is going to be Json EX except
anthropic which absolutely loves XML so
if you use anthropic you use XML if you
use any other model you use Json so
you're forcing an llm to give you a
structured response back a Json that can
help you make this function call you can
call this function with that Json so a
guided response into a Json is what
everybody calls function calling you
don't necessarily call the function and
function calling but you get the output
that will help you call function call
right clear now that is exactly what is
a precursor to agent because in a
function call you have the ability to
call a function and agents are nothing
but a bunch of function calls stitched
with tools so what do we have in agents
we have a bunch of function calls plus
tools and I would like to introduce to
you a very interesting solution that can
help you understand more about a
agents if you are too old in the AI
world you would have probably recognized
this immediately and this was the
workflow of something called Baby AGI so
baby AGI was quite a popular thing back
in the day I mean back in the days like
less than one year before I guess or
maybe more than one year a function call
is what I said is the foundation of
Agents but what is an agent now if you
have seen our pyramid you would know
know our agent sits right at the top
like closer to what we our aspirational
figure is now what is this agent how do
you define an agent so it's simple first
of all a chatbot and a rag all of these
guys if you see here they end a text or
you know some kind of thing like input
output images video all these things
right that's where they in one of these
modalities they're done what you achieve
with agent is something that is
absolutely stunning you don't stop at
text response you stop at an action you
trigger an action and that is what
agents are simply you take llm you
connect them with tool you give them a
purpose or goal that is your agent and
that is exactly what baby AG has done
back in the day like there are multiple
agents now but if you see baby a which
is a very wonderful framework you can
see that there is a task like there is
something that has to happen there are
certain tools like for example Vector DB
and all the other things are there and
every agent has a purpose like okay you
have to execute you have to return you
have to do something you have to do
something and they have a goal so you
have tools purpose SL goals and llms and
this all together work for a common goal
and that is your agent there are
multiple agent Frameworks that are quite
popular these days is crew AI L graph
you have got a py autogen and most of
these things you will see first you have
to define a role you have to refine a
goal Define a goal a role goal and then
you have to save which llm that you want
to use as a backend engine and then you
put together a system of one this is
single agent now you put together like
this is a team that is your multi-agent
setup with agents people are doing
amazing things you can make make an
agent book your ticket you can make an
agent let's say read something um
distill something create a note publish
the blog post you can summon these
agents to do a lot of things and
personally for me uh the most time that
I spent reading about agents because you
it's it's becoming quite obvious that
agents are the next Frontier in uh the
way we can take llms forward I mean
there are a lot of different things but
at least personally I'm quite interested
in automation usually and I think agents
are going to be the next big thing in I
mean currently itself is a big thing
Google has got Google's own projects
like they call their own agents I don't
know what they call they have a lot of
different names opena has its own agents
and uh every time you talk to some
company you speak about agents because
you want to summon these agents you want
to connect these llms to like different
dimension and on this Dimension that
what we are connecting is the tools
Dimension so you take llms you have the
function calling ability and once you
connect them to to tools you are
unlocking the potential of something
immense and that is what you call as
agents I'm not going deep into agents
because this is probably I'm hoping it
to be a series depending upon how you
all like it but in the series my next
focus is going to be agents so agent is
quite closer to the top and that takes
us to the almost the end of the video
which is what is our aspirational thing
what is that we are all trying to go
towards to which is L LM OS and this is
inspired by Andre kPa who created this
amazing structure so what is happening
here this talks about using llm at the
center of a conversation or sorry center
of an operating system if you go back in
the day computer was created just for
simple calculation purpose right you
want to add a and you want to add a and
b you want to keep a for one and B for
two and then you want to add them that's
that's what like initially computer was
started like very very very back back in
the days then computation started
increasing computation started becoming
less expensive more compute then we have
the computer that we have today and
garpa is arguing can we have a similar
vision for llm and where the vision is
you keep llm at the center right you
keep llm at the center and at the center
with llm you have Ram which is the
shortterm memory or the context window
then you have long-term memory the diss
system that can be used with rag then
you have the agent structure that you
have with tools and then you connect it
with internet and when you connect it
with other llms to have like a
multi-agent setup or like a peripheral
setup and then you have your peripheral
devices where you have got audio and
video can we put together a system with
all these things working towards a
common goal and that will ideally become
your large language model operating
system this is quite a vision at this
point there are certain implementations
available at this point those
implementations are based on current
understanding they are mostly let's say
llms plus function calling plus agents
multi-agent more tools that is what the
current llm OES it's not like a
radically has a different total View
altoe and that's why if you see even in
my framework that I've created llm o is
currently developing and it is
everything that we have got the tools
the extended tools the peripheral tools
with long-term memory with shortterm
memory just one input from the user
where it can run itself and then it can
execute certain things I think that is a
future that we are heading I'm not sure
when we are going to do it but uh if
somebody says something a for me today a
could be like this could be like the
baby a I mean I don't I don't I don't
trust a as a concept anytime soon but um
yeah leaving the conscious thing
Consciousness and all the other things
out I would say llm o is at the top
where we can expect something closer to
a happen and all these things lead us up
to there so I wanted to keep this video
brief but uh this video is already going
to be like more than half an hour I
wanted this to be like a crash course
where you understand if you don't know
anything about llm OS uh maybe you have
not taken any course so this is going to
help you to see how the future of llm O
is coming and what led us up to there
and uh let me know in the comment
section if you like this kind of content
I'll put together more this took me a
lot of time to create the framework
design put it um in a particular thought
process to you know make it make it
understandable and this is basically
what a lot of llm courses offer so I'm
I'm definitely looking forward to hear
more feedback and if you like this kind
of format subscribe to the channel see
you in another video Happy prompting