Arun Gupta

Monday, 18 March 2024

GEN AI






Here we are going to learn about introduction to AI concept. We will discuss about what actually mean by AI, what we can do with the help of AI, and we'll go with the different Azure AI services. So this is the main agenda behind this module number one. So let's go with the


     

          So this term is actually coined in a 1956. Try to understand this term is just coined over there. Nothing was there, only theoretically they explained that what actually mean by artificial intelligence. And over there they explained that artificial intelligence is nothing but.@1 31 minutes 11 secondsIf any machine or if any software have the capability to work or mimic like a human brain, for example, if you have any machine like a robot, like a computer, or any another machine, or if you have any application or software which have the capability to work or mimic like a human brain.                          


Generative means machines have the capability to generate something. If I ask you, can you write one essay on the artificial intelligence or can you write an essay on the nature? Definitely you will do that. You will write an essay. If I ask you, can you sketch something for me? Definitely you will sketch for me.

 because our brain have the capability to generate something. So under the artificial intelligence, generative AI is one of the workload or one of the, you can say, scenario present through which machines have the capability to generate something. Now generation is happened with different scenarios.

That is, it is going to generate some text, it is going to generate some images, audio file, video file, music file, radio object. Also, machines have the capability to generate some sort of code, so this is called as a generative AI. In very simple words, I can say that if the machine have the capability to generate some content.
with the help of input provided by the user called as a prompt, then it is called as a generative AI. Another workload comes under the AI is the agents and automation. Now what actually mean by agents and automation, which is the extended version of the generative AI.

In the agents and automation, machines are going to perform the task on behalf of us. That means no need of human interaction over there. Let us go with one example. So I'm assuming that our office timing is 9 A.m. to 5 P.m. And my daily task is nothing but I have to take some or I have to see.
what type of mails come over there from the different clients and I have to reply to that mail. Now Tim, I have one question to you people. See, I'm getting the emails from 9 A.m. to 5 P.m. if the client is Indian. But sometimes what happened, I'm getting the emails at 2 A.m., 3 A.m., 1 A.m.Is it possible to reply that e-mail at that instant?
No, because time zone is different. So what we can do, we can create such type of agents, which will look into that e-mail, which will read that e-mail, which will understand the requirement, and it will look into our same emails. That means previously how we deal with that corresponding client.So that machine will understand our language and we will understand the machine language and that bridge is nothing but natural language processing, which is also called as a NLP.

NLP is one of the bridges in between humans and machine, which is going to make the conversation of machine language into human understandable language and vice versa. Then 4th workload is nothing but computer vision, which is going to work with the images and videos.

We have to provide the captions to the e-mail. This type of stuff can be done with the help of computer vision. In the video analysis, we can find out the length of video. We can find out the sentiment of the video, whether that video is positive, negative, or neutral. So these terms comes under the computer vision.

And another workload is nothing but information extraction. That means it is going to extract the information from different resources like website, from local machine, different servers, different databases, data warehouses, or from different service. So it is going to extract the information, save that information very appropriately.And whenever that information is required, it is going to send that information, and last workload is nothing but machine learning. Machines are going to learn from the data, machines are going to learn from the provided information, machines are going to find out some train patterns.



it should be fair enough. That means it should not bias anything. For example, if you created one AI product, which is going to find out whether that person is eligible for loan or not. So it is going to find out the result, yes or no.What type of input we have to provide? We have to provide the information about that person, that is age of the person, annual income of the person, gender of the person, location of the person, and such type of thing. Now remember, when you are creating that AI product, you train your AI product or your AI model or machine learning model.
Now, remember, out of 1000 samples, you consider 700 samples of the female and only 300 samples of the male. Now, try to understand what will happen here. You not trained your model in good proportion. You train your model as 700 samples of the female candidate.
300 samples of the male candidate. So if you observe carefully, you did some bias. You not given or you not provided the equal weightage at the time of training. So what will happen? This corresponding AI product will learn very deeply.
about the female candidate and very less amount for the male candidate. Now suppose if you have to test that corresponding AI product and you provide it near about 10 samples for the testing purpose, remember it will do the biassing with female candidate. Are you getting my point?
@1 52 minutes 51 seconds
Because that model learn very much about the female candidate.


Our brain have that capability to analyse this data. We have these three columns, A, B, and Y. As I told you that A&B are my input, Y is the output. By looking into this for example, you found that Y is equal to A into B. Now you can put any value of A, any value of B.You can find out the predicted value as a Y.
Any value it might be present in lakh term, 1000 term, 100 term, unit term, no problem. Now try to understand how our brain find out that corresponding relation. Suppose if I put here 20 columns, now I put it only three columns. If I put it 20 columns.Large number of columns. Now, try to understand machine learning is similar. What you did just now, machine learning is the same. Machine is going to learn from the provided data what type of trains, patterns, expression present over there, and it is going to find out one common equation.
@1 1 hour 27 minutes 22 seconds
And that equation is nothing but model equation.
So, when algorithm find out that Y is equal to a into B, or when algorithm is going to find out that Y is equal to a + b + c, something like that, that is called as a model equation, and once you got that model equation, you can do any type of prediction.

.


so features are nothing but known characteristics of something, and label is nothing but your output target variable, the thing we want to predict, so you can see predicting the ice cream sales for a given day. Now, if I'm considering the information or what type of data we have.We have the day of week, we have the month related information, we have the temperature related information, and many more. Now, depending upon the day of week, month, temperature, we are going to find out number of ice cream sold. So, features are day of week, month, and temperature

And labels are number of ice cream sold.
OK, so it will find out the equation: Y is equal to or number of ice cream sold is the function of day of week, month, and temperature.
You can see it.

F of X is equal to Y, Y is nothing but label, and that label is depends upon the function of X, that is, day of week, month, and temperature.




Now, when I'm considering the machine learning, machine learning is basically divided into two, called as a supervised machine learning, and next one is the unsupervised machine learning. Now, what actually mean by supervised machine learning? Supervised machine learning is nothing but training data.includes known label. Simply remember, if you have any data and in that data labels are present, then it is called as a supervised machine learning. And if you have any data in which labels are not present, then it is called as a unsupervised machine learning. Label is nothing but Y. That means.ABC provided Y is also provided, then it is a supervised and just only ABC provided, then it is unsupervised. Now in the supervised machine learning, it is divided into two, that is a regression and classification, that is your label or target variable consists of.numeric continuous value that is total number of ice cream sold. We don't know that how much value some will say as a 5, 10, 20, 50, 100, 1000, million.then it is called as a regression. So always remember if your label or target variable is continuous numeric value, then it is called as a regression. We have to use the regression algorithms over there.Classification: if your target variable or that label consists of finite number of classes or finite number of categories like yes, no, true, false, zero, one, good, better, best, then it is called as a classification, and over there we have to use the.
@1 1 hour 33 minutes 8 seconds
classification algorithm.Now, this classification is again divided into two. You can see here, binary classification is there. Let me highlight that. So we have the binary classification. So in the binary classification, only two classes are present, like a yes, no, zero, one, low, high.Likewise, and another type of the classification is the multi-class classification. Over there, multiple classes are present, like good, better, best, grades are A, B, C, D, E, likewise, or you can see the spaces, predict the spaces of the penguin.Based on its measurement, like a spaces A, B, C, likewise, so this is the idea behind the supervised machine learning. Always remember, we have the target variable, we have the label, then we have to use the supervised machine learning algorithm. Again, these are divided into regression and classification.If the target variable consists of continuous value, we have to go with regression. If the target variable consists of only classes or categories, we have to go with the classification. And suppose if you don't have the target variable inside your data, we have to go with the clustering algorithm.that is similar items are grouped together.You can see here different plants are present and depending upon those similarity, whether it is a shape related similarity, colour related similarity, they are keeping into one group.So separate plans into groups based on the common characteristics. This comes under the unsupervised machine level. If you have any question, please let me know.So moving ahead with this, as I mentioned, there is a difference between algorithm and model. So you can see here, we have the data. I have the data something like this, X1, X2, and X3 are my input, are my feature, and Y is the output or label.

Now, before applying this data, what I will do, I will cut this data or I will separate this data. Some samples or more samples you can consider near about 70%, 80% samples I'm keeping for the training purpose. Remaining 20, 30% samples I'm keeping for the validation.That is testing purpose, so only train data or data used for the training purpose, I will pass to the algorithm. We know that algorithm is nothing but number of steps present or the systematic steps are present, which is going to generalise the relationship between X&Y.


As a function, so it will find out the relationship between X&Y, Y is equal to F of X, and when it is find out that equation Y is equal to F of X, that is called as a model equation. Now, once your model equation is ready, team try to understand when we upload the data or when we provide the data to the algorithm.And it will find out some result Y. Now, you can compare this predicted value and these are actual value, and from this predicted and actual value, you will get the evaluation of the model. That means whether your model is good or bad, so always remember, once the model equation is ready.we have to provide the tested data for the evaluation purpose. Then model will find out the Y cap value that is predicted value. This value we have to compare with the actual value that is the data putted for the testing purpose. And from thatcomparison, we can evaluate the model.Remember, the difference between actual value and predicted value is more than error is more. The difference between actual value and predicted value is less, error is very less. That is, model is good.So, this is the idea behind the model training and evaluation. Now, as we seen that there are three types of algorithms comes under the machine learning. First one is the regression, where our target variable is continuous. So, one algorithm is the linear regression. Now, this linear regression algorithm works with the principle of straight line equation Y is equal to MX plusC, where M is the slope and C is the constant term. Now, to that algorithm, if you provide the information, it will find out such type of best fit line. Now, that line is nothing but your model equation, Y is equal to F of X. Now, you can provide any value of X, it will find out the value Y.You can compare this value with the test data, and you can evaluate. Now, for the evaluation purpose, we can use the error term. Error is nothing but actual value minus predicted value. We can find out the error, like a mean absolute error, absolute value of the actual value.Minus predicted value, mean square error, square value of the actual value minus predicted value, then root mean square error, actual value minus predicted value, it's a square and again square root and coefficient of determination that is R.That means we can say that how much changes happen into the Y when we make a small change into the X that is variation in X is represented in Y is called as a coefficient of determination.Now, suppose if we have to solve the problem of classification, then we have to go with such type of sigmoid curve where two classes are present, either zero or one. So, if any sample lies into this region, it will be treated as a zero. Any sample which lies into this region.It is treated as a one, so you can see example: logistic regression, evaluation matrices. We can use the accuracy again. It is the difference between actual value, predicted value, but here the values are zero, one, or class values that are present. Recall is 1 another term.Precision and F1 score precision and recall, mainly focusing on the true positive values, that is, actual value is also positive, predicted value is also positive, how much truly or how much correctly predicted over there is represented by recall and precision.The combination of recall and precision is nothing but F1 score. Then, suppose if we want to solve the problem of the clustering, then we can go with the algorithm like a K-means clustering. Remember, all the clustering algorithm works with the principle of distance. That means, suppose if the distance is less.they are keeping into one group. If the distance is more, they are keeping into another group. Now, suppose if we have the such type of algorithm, like a K-means clustering, hierarchical clustering, DB scan clustering, how we can evaluate that? So we can do the evaluation on the basis of.So moving ahead with this, as I mentioned, there is a difference between algorithm and model. So you can see here, we have the data. I have the data something like this, X1, X2, and X3 are my input, are my feature, and Y is the output or label.Now, before applying this data, what I will do, I will cut this data or I will separate this data. Some samples or more samples you can consider near about 70%, 80% samples I'm keeping for the training purpose. Remaining 20, 30% samples I'm keeping for the validation.That is testing purpose, so only train data or data used for the training purpose, I will pass to the algorithm. We know that algorithm is nothing but number of steps present or the systematic steps are present, which is going to generalise the relationship between X&Y.




So, this is the idea behind the model training and evaluation. Now, as we seen that there are three types of algorithms comes under the machine learning. First one is the regression, where our target variable is continuous. So, one algorithm is the linear regression. Now, this linear regression algorithm works with the principle of straight line equation Y is equal to MX plus C, where M is the slope and C is the constant term. Now, to that algorithm, if you provide the information, it will find out such type of best fit line. Now, that line is nothing but your model equation, Y is equal to F of X. Now, you can provide any value of X, it will find out the value Y.You can compare this value with the test data, and you can evaluate. Now, for the evaluation purpose, we can use the error term. Error is nothing but actual value minus predicted value. We can find out the error, like a mean absolute error, absolute value of the actual value Minus predicted value, mean square error, square value of the actual value minus predicted value, then root mean square error, actual value minus predicted value, it's a square and again square root and coefficient of determination that is R.That means we can say that how much changes happen into the Y when we make a small change into the X that is variation in X is represented in Y is called as a coefficient of determination. Now, suppose if we have to solve the problem of classification, then we have to go with such type of sigmoid curve where two classes are present, either zero or one. So, if any sample lies into this region, it will be treated as a zero. Any sample which lies into this region.It is treated as a one, so you can see example: logistic regression, evaluation matrices. We can use the accuracy again. It is the difference between actual value, predicted value, but here the values are zero, one, or class values that are present. Recall is 1 another term.Precision and F1 score precision and recall, mainly focusing on the true positive values, that is, actual value is also positive, predicted value is also positive, how much truly or how much correctly predicted over there is represented by recall and precision.The combination of recall and precision is nothing but F1 score. Then, suppose if we want to solve the problem of the clustering, then we can go with the algorithm like a K-means clustering. Remember, all the clustering algorithm works with the principle of distance. That means, suppose if the distance is less.they are keeping into one group. If the distance is more, they are keeping into another group. Now, suppose if we have the such type of algorithm, like a K-means clustering, hierarchical clustering, DB scan clustering, how we can evaluate that? So we can do the evaluation on the basis of.



After that, one concept called as a deep learning. Deep learning is nothing but it's like a neural network. Now try to understand inside our brain, neural network is present. Neurons are present. For example, I'm putting one scenario so that you will get the idea.

Our brain, our brain is going to find out this coffee cup of coffee is very hot. Now Tim try to understand to that brain so many things are connected.
So many things are connected, like touch.
Now tell me, out of such type of scenarios, who is responsible to send the information to the brain?ear and nose are connected with the brain, whatever the brain functionalities are there, but in this scenario, they are not taking the response;they are not responsible to provide the input, so these are connected to the number of neurons which are present inside our brain.And brain is going to process that. Now, depending upon that processing, it is going to tell us that wait for some time. Now, inside our brain, these things are happening. Is it possible to make such type of artificial neural network? The answer is yes. And this is going to make by using the input.And brain is going to process that. Now, depending upon that processing, it is going to tell us that wait for some time. Now, inside our brain, these things are happening. Is it possible to make such type of artificial neural network? The answer is yes. And this is going to make by using the input.that is X and the weight value W. Weight is nothing but who is more weighted, who is more responsible for that. In this scenario, eyes and touch have the more weightage. So we are going to assign the more weightage to them. Ear and nose are less responsible for that.So we will assign the very less weightage to that. Now, when we do the wrap up of this, that is X and OW, we are function wrapping into one activation function, which makes one neuron. So one neuron is made by using input and weight. So such type of number of neurons connected with each other It makes the neural network, and that is called as the artificial neural network. Remember, in the artificial neural network, basically three types of layers are present. One is the input layer, which have the capability to take the input from the scenario, input from the data. output layer is present, which have the capability to find out or predict the output, and in between, number of hidden layers are present, number of hidden layers depending upon the application. If application is complex, number of hidden layers are increasing. If application is not so much complex,
number of hidden layers are less. Okay, now if you observe carefully inside our brain what type of things are happening, we can't see that. So these in between layers are called as a hidden layers. You can see in this example, I'm providing one species of the penguin by passing the input parameter, whether it is a length related information or weight the related information likewise. So putting the values X1, X2, X3, and X4, these are four feature. Feature values are provided by assigning the weight. Then hidden layers are going to process that. And finally, we are getting the output in probabilistic way. So for the I can say spaces 0, it's a probability is 0.2, spaces 1 probability is 0.7, spaces 2 probability is 0.1. Now if you observe carefully spaces 1 acquires the maximum probability that is 0.7. So the final answer will be the spaces 1.So such type of things happen in the deep learning, which is the extended version of the machine learning.



Now in the Azure AI Foundry model, different models are available. LLM models are available. The LLM in the sense of large language model. So whatever you are using right now, like a generative AI application, for example, ChatGPT, Copilot, behind that LLM models are present. and through the agents you can achieve your automation. Then for the natural language processing purpose, Azure AI language service is available through which you can do the language translation, language recognition, question answering. You can create a chat bot likewise. Then Azure AI translator service is present. which is going to translate one language into another language. And Azure AI speech service is present, which is going to convert your speech into text format. Then computer vision service is available, through which you can deal with the Azure AI vision service, like you can play or you can deal with the images. you can deal with the video file and one dedicated face service is available. This service is basically used to deal with the face. That means suppose if you enter into one premises, for example airport, so there are different face recognizers are present, some applications are present.which are going to identify your phase. Depending upon that phase, it is going to find out or collect some information, store that information likewise. So under the computer vision, phase service is available. Under the information extraction workload, different services are available like Azure AI doc. document intelligence service. Let us consider one scenario. Suppose inside your organization, similar type of one lab documents are present. and you have to analyse those document manually. Definitely it will take much time to do the analysis manually. So what we can do, we can use that Azure AI document intelligence service where we can upload the document and once the analysis is start, it will find out the appropriate information, extracted information. from those documents. Then we have the Azure AI content understanding, which will understand about your content if you are uploading some books over there, if you are uploading some large corpus over there. This service content understanding service have the capability to understand about the content.





when I'm taking the name as a generative AI, we know that in the last some months of 2021, as ChatGPT come into the picture, whole era of the agent, sorry, artificial intelligence is changed. Because as this GPT models comes into the picture, people are asking so many questions, models are going to generate the response. What do we have to understand behind this? What type of things are present? So, when I'm considering the generative AI, behind the generative AI, one concept is present called as a LLM Large Language Model. Now, what actually this might mean by this large language model? The name itself specified that these models are already learned from the large amount of data, very massive amount of data. For example, hope you remember some models you seen that say 1B or 18B. Right, that means to these models, 7 billion, 18 billion parameters are present, so you can imagine that how big or complex model it is. So, these parameters are nothing but deep learning parameter. These models are made by using deep learning architecture. Now, these language models have the capability to learn from the data. Which is present in language format, image format, audio format, video format, and many more. Once these models are learned from the data, they are going to create one sort of vocabulary, and once the vocabulary is created in appropriate format, you can ask any question to that model. So, you can ask any question to that model. If the vocabulary is created, it has the capability to generate the response for you. Now, let us try to understand which type of response they are going to generate. So, as I mentioned previously, also, they have the capability to generate the response in text format. They have the capability to generate the response in image format, they have the capability to generate the response in video format, audio format, music format, they have the capability to generate the response in 3D object format, and also they have the capability to generate the response in. Code format, so if you observe carefully, they have the very good capability to generate the response in different, different format.OK, so here some examples are provided. You can see the first example is the natural language generation. So when you ask any question like write a cover letter for a job application to this LLM model, as I mentioned, this model already learned from the large amount of data. Created already a good vocabulary depending upon your question, which is called as a prompt. So, in the generative AI, whatever the question asked by the user, it is treated as a prompt. So, when you are putting such type of prompts over there, depending upon your prompt, it is going to generate the response. you can see the quality of response. Dear, please find in close my application for the role of dash, dash, dash.  So, model have the capability or that JNEI tool have the capability to generate the response in text format. Suppose if that model is going to support images also, and if you say like that, create a logo for a florist business, you can see how massively it is going to generate the image for us.
Or if you put the input like this, write a Python code to add 2 numbers, it will be treated as a code generation and it will create the code for us.try to understand there is one concept called as a multimodal model. Multimodal is one of the part of the LLM model.


Now, when I open the Copilot and if I ask such question, write a cover letter for a job application, it will write the application. If I ask this question, create a logo for a florist business, it will create image. So try to understand if any model, any single model, have the capability to work with the different data type.



a text, images, video, audio, music, 3D object, or code, then it is called as a multimodal model. Now a days, there are so many multimodal models that are present and through which we can find out the best result for us by using single model. Now, let us try to understand this scenario a bit more or deeper way, so you can see there are two types of models: language models are present. First one is the small language models are present, another one is the large language models are present. Now, when I'm considering. This language model, this language model works with the principle of transformer architecture. Now, try to understand what actually mean by transformer architecture. So, in the transformer architecture, two blocks are present: first block is called as an encoder, and 2nd block is called as a. Decoder. Decoder is responsible for generation purpose. Encoder is responsible for creating the vocabulary or understanding the input. Now, try to understand with this example or with this diagram. Suppose if you have a very large amount of data, data is present in either in audio format, music format. Text format or bus format, anything you are going to pass this data to the encoder. Now, remember, before or in this encoder architecture, some blocks are present. First block is called as a tokenizer. And what is mean by tokenizer? So suppose if you are uploading so many text documents over there, how machine will understand? So this tokenizer is going to separate your whole sentence into small, small words. It is going to separate your paragraph into small, small words. All other chunks or tokens, so tokenizer is responsible to convert your big or larger data. You can say as a paragraph or sentences into small, small, single, single words. Now, once it is tokenized, it is going to assign one numeric value for that. Fine, after the token answer, what happened? There is one concept called as an embedding model. So, this is so much important. Remember, if you are dealing with the transformer architecture, embedding is so much important. So, embedding is nothing but it is going to. Encode or convert your text tokens into vector format like this. So, we can see dog, cat, puppy, and skateboard. What tokenizer will do? Tokenizer will just give the value one. It will give the value as a two.For the puppy, it will give the value as a three, and for skateboard, it will give the value something like as a four, but from this value, one, two, three, 4, machine will not understand the meaning. How machine will understand the meaning, and for this purpose, we need the embedding. Remember, there are some embedding models that are also present, like a LLM model. So, this embedding model have the capability to convert your token into vector format, something like this. Now, what is the role of this vector or representation of tokens into vector format? Because these embedding vectors encapsulate the semantic meaning or semantic attribute in multiple dimension. Try to understand how machine will understand dog and cat are very close to each other. Dog and puppy are very close to each other. Dog and skateboard are far away from each other because of this embedding vector. This embedding vector will represent your tokens onto the, you can say, dimension format, multi-dimensional format, and it will find out which words are very close to each other.Which words are far away from each other, and it will create a good vocabulary over there. Now, once the good vocabulary is created, you can ask any question over there. You can see like this: when my dog was. You can see input is provided now; it will find out which words are very close to the dog, which words are far away from the dog, which words are similar to the dog, and depending upon that, it will find out the result as a puppy.when my dog was a puppy like this. Okay, so language model works with this principle. They are going to understand the meaning first. Depending upon that understanding, they are going to generate the or creating the response. They are just going to do the completion in very sequential and very accurate way. LLM models are time consuming and also expensive because for the training purpose we have to put some cost over there in terms of compute as well as training data. And also it is going to take some time for the fine tuning purpose. Suppose if you have to tune that model with our own data. Whereas SLMs are very faster, these are not so much expensive because we train this with very small amount of data with very small time duration also. And suppose if we have to fine tune that model with our data, then it is very easy or not so much time consuming. You can see some examples of the LLM. LLMs are GPT 4.5, GPT 5. Then we have the Mistral 7B, that is 7 billion parameters are there. Lamma 3, Lamma 2, these are comes under there. in the SLM, Microsoft 54 or cut to then hugging pairs, GPT new, these are the small language model. So depending upon our application or which type of problem we are solving, we have to select either LLM model or SLM model. Now, you can see how actually it works, or suppose if you are using. Generative AI tool, any, whether you are using ChatGPT, Copilot, Damini, or any other. Remember, as I mentioned, these models already learn from the massive amount of data. They are the good vocabulary. Now, it is in our hand how we are asking the question. If we ask the right question, appropriate question, definitely it will give us the appropriate output. So, what task is what we have to provide the clean, consistent, reliable, and accurate information so that we can expect the desired output, and this is called as a.Prompt engineering. So, while drafting or writing any prompt, we have to consider some parameter, and these parameters are specified in this corresponding slide, so you can see I'm asking one question to the JNEI application. This JNEI application will send my query to the language model.And finally, I will get the result, but as I mentioned, we have to provide the clean, consistent, reliable, and accurate input as a prompt, so always start with the specific goal for what you want the output to be, so you can see in this case, summarise the key consideration for adopting. co-pilot described in this document. So just you have to provide the specific goal. I have to summarise this. Or summarise the key consideration. Second point, provide the source to ground the response to in a specific scope of information. You can provide some sort of document information if you have. Third point, you can add some context to maximise the response appropriateness and relevance. You can see in this case for a corporate executive. So if you provide this for a corporate executive, machine or that LLM model will get the more idea. User provided this context and user is tending toward this corporate executive. Fourth point, you have to set the clear expectation for the response. In this case, you can see format the summary as a no more than six bullet points with a professional tone. So here they specified. Maximum 6 bullet points and professional tone, and fifth one is the last one is the suppose if you put in your prompt, but if you are not getting the desired output, always remember we have to do the iteration. So you have to iterate the based on the previous prompt and responses to refine the result.Many cases, what will happen? You will not get the result in first attempt. So we have to take care over there. We have to iterate that corresponding input and see the result. So this is the idea behind the prompt engineering. Now, next part is nothing but agent. Now, what are the agent? Let us try to understand this scenario.



So, as I mentioned in the encoder, first step is the tokenizer. So, try to understand in the next slide: tokenizer is the first step in training the transformer model, which is going to decompose the training text into small, small chunks called as tokens. I considered one example: sentences. I heard a dog bark loudly at a cat. Now you can see each corresponding word is separated. I is separated, heart is separated, A is separated, dog is separated, bark loudly at cat. All are separated and for each word it is going to assign one numeric value so that machine will understand. Remember machine have their own language which is in numeric format. And we human have our own language, which is non-numeric in nature. So when we are writing something in our language, machine will confuse initially and tokenizer will help there, which is going to convert your language into machine understandable format, numeric format.The sentence, I heard a dog bark loudly at a cat, is represented as one, two, three, 4, 5, 6, 7, three, 8. Three is repeating because A is repeating in that sentence. Similarly, suppose I have to represent the sentence, I heard a cat, I can write like this, one, two, three, 8. So this is the idea behind the tokenizer.After the token and the embedding come into the picture, which is going to find out the relationship between each token in vector format. So the semantic relationship between token is encoded in vector, known as embedding. You can see the token values are 4, 8, 9, 10, but the embedding values are 4. like this. This is a three-dimensional representation of each word. Now from that word, machine will get the idea which words are very close to each other and which words are far away from each other.





And finally, we have the attention mechanism. When we are considering the LLM model, there is one attention mechanism which is based on the deep learning algorithms only or neural networks only. So attention is nothing but it is going to capture the strength of relationship between
tokens using attention technique. Simply remember attention is nothing but it is going to find out that which words are focus one, which words are so much important. So what is the goal of this attention mechanism? It is going to predict the token after that specific sequence. Now in my case I'm going to.
Predict the token after dog, so I represented I heard a dog as a vector and I assigned her and dog are more weighted. That means these words are so much important, more important words are there now. It will show some several possible tokens can come after dog.


The most probable token is added to the sequence in this case as a bark. So if I write the sentence, I hurt a dog, it will find out the appropriate sentence or word is a bark.

On which basis it is going to find out, as we seen that it is going to find out the relationship between each word, and depending upon that relationship, it will find out which word is most probable or very close to that.

Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models.







Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models. Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models. These are not freely available. Very few models are freely available. Open source models are available. But most of the models, we have to pay for that. So depending upon our application, we have to select the appropriate model, whether it is a LLM or SLM. So we must know about the difference between LLM and SLM. Let us try to understand. So, in the LLM model, these LLM models are trained with very sheer, large volume of the data. They are going to learn from the text data, image data likewise, and they consist of billion of parameters. That means you can imagine that how big neural network is present to learn such type of data. Whereas in the SLM, these are trained with very focused text data. For example, If you have to create or if you have to do one application with respect to your business only, you does not require the other side terminologies and all. So in that case, you can provide your own data, you can train your model. So you can select in that case as a SLM, that is small language model. because it consists of very focused text data and very few parameters. Let us jump again back to the LLM. LLM, how the comprehensive language generation capabilities in multiple contexts, because they already learned from the very large amount of data, it has a massive vocabulary over there. But in the SLM, these are focused on the language generation capability in very specialised content because its vocabulary is limited. LLM's size is very large one and it is going to impact on the performance and portability. So when we want to. port that model onto another platform, then it is very difficult. But in the SLM, these are very fast because of the limited vocabulary and portable. LLM models are time consuming and also expensive because for the training purpose we have to put some cost over there in terms of compute as well as training data. And also it is going to take some time for the fine tuning purpose. Suppose if you have to tune that model with our own data. Whereas SLMs are very faster, these are not so much expensive because we train this with very small amount of data with very small time duration also. And suppose if we have to fine tune that model with our data, then it is very easy or not so much time consuming.You can see some examples of the LLM. LLMs are GPT 4.5, GPT 5. Then we have the Mistral 7B, that is 7 billion parameters are there. Lamma 3, Lamma 2, these are comes under there.in the SLM, Microsoft 54 or cut to then hugging pairs, GPT new, these are the small language model. So depending upon our application or which type of problem we are solving, we have to select either LLM model or SLM model.



Generative AI tool, any, whether you are using ChatGPT, Copilot, Damini, or any other. Remember, as I mentioned, these models already learn from the massive amount of data. They are the good vocabulary. Now, it is in our hand how we are asking the question.

If we ask the right question, appropriate question, definitely it will give us the appropriate output. So, what task is what we have to provide the clean, consistent, reliable, and accurate information so that we can expect the desired output, and this is called as a...

Prompt engineering. So, while drafting or writing any prompt, we have to consider some parameter, and these parameters are specified in this corresponding slide, so you can see I'm asking one question to the JNEI application. This JNEI application will send my query to the language model.

And finally, I will get the result, but as I mentioned, we have to provide the clean, consistent, reliable, and accurate input as a prompt, so always start with the specific goal for what you want the output to be, so you can see in this case, summarise the key consideration for adopting.

co-pilot described in this document. So just you have to provide the specific goal. I have to summarise this.

Or summarise the key consideration.

Second point, provide the source to ground the response to in a specific scope of information. You can provide some sort of document information if you have.

Third point, you can add some context to maximise the response appropriateness and relevance. You can see in this case for a corporate executive. So if you provide this for a corporate executive, machine or that LLM model will get the more idea.

User provided this context and user is tending toward this corporate executive. Fourth point, you have to set the clear expectation for the response. In this case, you can see format the summary as a no more than six bullet points with a professional tone. So here they specified.

Maximum 6 bullet points and professional tone, and fifth one is the last one is the suppose if you put in your prompt, but if you are not getting the desired output, always remember we have to do the iteration.

So you have to iterate the based on the previous prompt and responses to refine the result.

Many cases, what will happen? You will not get the result in first attempt. So we have to take care over there. We have to iterate that corresponding input and see the result. So this is the idea behind the prompt engineering. Now, next part is nothing but agent. Now, what are the agent? Let us try to understand this scenario.










So this LLM is going to perform the task on behalf purpose. It is going to understand your team's messages. It is going to understand your about your WhatsApp, Slack, Jira messages, and it will try to make the automation. So you can see here when LLM is going to perform.

Connect with a different tool; it is going to retrieve some information, it is going to take or make some action, it is going to deal with the memory, and depending upon that, it will give you the final output, then it becomes the agent. Let us try to understand: agents are generative AI application because...

The base behind the agent is LLM. So these are generative AI applications that can respond to your on user input or access some situation autonomously and take the appropriate action. That means on behalf of us, they are going to perform the task. They are going to send the messages.

They are going to read the data, they are going to do the automation. Now, when we have to create the agent, always remember first we need the LLM model, because without LLM model we can't do anything, so LLM or that model powers the reasoning and language understanding.

It will understand about the scenario. It will understand about your team's message. It made everything. Then you can define some instruction, whether it is a system message, or you can specify how your LLM model should work, on which role you are, suppose if you are a programme coordinator.

You can specify to the LLM as an instruction system message, you are a programme coordinator, so that that LLM model will learn or gives the response in that way, in that angle only. So you can provide some instruction like define the agent's goals, behavior, and constraint, what type of...

things at you over there. And finally, it is going to connect with different tool, let the agent retrieve the knowledge or take or make the action, or it is going to connect with the different memory. So team remember, very simple term, when LLM is going to connect with the tool, and depending upon that, it is going to take or make some action.

then it is called as a agent. Agents are basically used for the automation purpose. Right now in the organisation in the industry, agents are used for the mainly for the automation purpose.



if the temperature value is high, it will consider so many words; it's a radius is going to increase, it is going to consider so many words, and then...
Yeah, both are very much similar. I will explain you about top PL. So if the temperature value is more, you will get the more unknown words. That means you will get the more creative answer. So in which case we can use that. Suppose if you have to write one poem, which is imaginary, you have to write one poem on rain. And suppose if you have to create one application through which you are going to write some poems or some imaginative type of themes, creative type of things, always we can have to put the temperature value high. Now, top is very much similar to the temperature value, but small difference is that It is going to consider top probability. So suppose I consider the word mori and I have to find out the related words to the mori and if I put the probability is 0.1, can you tell me how many words I will get there? Just tell me high or low.




if the probability is less, means I will get the more words.

Try to understand. See, my base word is the moray, and I have to find out those words whose probability is greater than 0.1. Definitely, so many words will be there, but if I put the restriction, I need only those words whose probability is more than 0.9.

How many words will be there? Very less.

No problem. Let us try to understand. I have one word, good.



exactly. So, see, for good, I'm putting here better probability is 0.8.

Best, I'm putting probability is something 0.8 for you.

Wow, I'm putting probabilities 0.6.

High probabilities point.

OK, now suppose if I put this top P value is 0.21, how many words will be considered?

All 4.














What are the Azure AI Foundry project? So under the Azure AI Foundry, you can create one resource, and in that resource, another different parameters are present. So Tim, I think one person asked the question, suppose if you have to do the integration of different Azure AI services with the agent, We can do with the help of Azure AI Foundry. So, here in the Azure AI Foundry, different models are present. By using those models, you can create your journey application. Suppose if you have to jump or work with the agents.There are different Azure AI Foundry agents are available. We can take those AI agents and different Azure AI services like a language translation, document service, vision service. We can do the integration of that. That means suppose if you want to create a small agent,We can create that also. And suppose if you have to create one agent which is going to solve the problems for your complexity, we can do that also. That means with the help of Azure AI Foundry, we can create gen AI solution. We can create agentic AI solution by integrating.Now, when I open the Azure AI Foundry, there are different models that are available. Now, when I open or when I deploy any model, I can test that model with the help of playground. So chat playground is there. Similarly, another playgrounds are present, like image playground is present, audio type playgrounds are present. Now, when we have to deal with this Azure AI Foundry, we can find out the best model which is going to suit our need. Now, how we can find out that? So, always remember when you enter into the Foundry, there is one option called as a compare models. We can compare multiple models like. GPT-40, GPT-40 mini, GPT-4.5, GPT-5 likewise. And from that comparison, we can find out which is the best model for our scenario, for our business statement. So you can see such type of models available like open AI model, GPT-5. Microsoft Model Five Four, popular third-party models are also available.Once you select that appropriate model after comparison or from your previous business knowledge, then you can deploy your model you want to use in your application. And once model is deployed with the help of Playground, you can test your model, how it's working. Now in this, you can add your data also, you can play with your parameters also.And you can add some examples if you need it Now, in the Azure AI Foundry, we can create agents also. So, to create any agent, what we have to do, we have to specify the agent name, we have to specify the model deployment, which model you are going to use for the agent creation purpose, knowledge tools.ou have to specify the list of tools where that LLM model or agent is going to connect over there like a website from your local machine likewise. Then what are the different action tools are present? Like what type of actions it is going to take, whether it is going to send the e-mail. Outlook will be there, whether it is going to send the messages, WhatsApp will be there, Messenger will be there, likewise. And if you want to connect one agent to another agent, you can connect different agents also. Now once your agent is ready, again, you can test your agent into the playground.So, you can see here one screenshot: when you are asking the question, it is going to generate the response, but actually it is one of the agents.Depending upon this, we have one small exercise, so we will see about that, how we can explore the generative AI in Azure AI Foundry port.

With the help of NLP, as I mentioned, we can make a bridge in between humans and machine. So, for example, if you have one newspaper, if you have one text, or if you have any document, and if you uploaded that document to the machine, initially machine will not understand your document because machines' language is different.and our language is different. So we have to do some sort of adaptation. We have to do some sort of pre-processing. That means we have to convert our raw text. Let me highlight this. We have to convert our raw text into machine understandable format. So you can see here, I'm applying tokenization.So, with the help of tokenization, whatever the raw text is present, it is going to convert into one structured format where, for every token, one numeric value is assigned, called as a token value, and whole process is called as a tokenization. In the pre-processing, another steps are also available, like.we can remove the unwanted white spaces over there. We can remove the special symbols over there likewise. So one of the steps in the pre-processing or after the pre-processing, we can use as a tokenization. Now once the corresponding information or input is present in machine understandable format,Then we can pass that information for the training purpose. Now, in the training, there are different models that are available. So many models are available. Depending upon our application, we have to select the model. Now, what are the models that are available? Let me initialise or let me put some models here.So that you will get the idea, so...

First model I'm putting earlier.

Just one ,Thank you. So let me write here some models. T5 is one of the model we can use for the natural language processing. With the help of this model, we can perform so many things. Then we have one another model, which is called as a BERT bidirectional. Encoder representation for transformer, which is again one of the good model through which we can perform so many NLP related tasks. Now, once we select any model, these are the basic and very simple models from the for the natural language processing. So, once we select any model. This model is going to help us to perform so many steps, so many tasks over there. So which type of task we can perform? So here we can see. Speech-to-text, text-to-speech conversation is possible. Now, remember, for this purpose, we need some sort of model, which are called as a T5 model and BERT model, which are the massive models used for the natural language processing. There are not only two models, another different models are available when we have to play with the.NLP concept. Now, after this basic introduction of NLP, we got the idea with the help of NLP, we are going to perform some sort of stuff like text analysis, opinion mining, machine translation, summarization, conversational EI, and before that, some steps are like present like a pre-processing, converting raw. text into appropriate format, tokenization, convert your natural language data into machine understandable format, and then it is going to do the training. Now, there are different techniques we can use for the language modelling purpose and text analysis purpose So, these techniques are basically divided into two parts, called as a statistical technique, and another one is the semantic modeling, which is also called as a dynamic techniques. Now, with the help of statistical technique, we can perform the classification. For example, you have to find out whether your e-mail is. spam or not. That means just two classes are present. We can do the classification. For this purpose, we can use logistic regression,




















And sometimes it is going to hide that for the security purpose, with the help of Azure AI Language Service, we can do the summarization, means we can summarise the data very long content, we can convert into short one with the same meaning now

Again, this Azure provides 1 translator service. We know that translation is nothing but converting one language into another language. We can do the text translation, converting that small paragraph sentences into appropriate language, or I can say destination language. Then we can do the document translation also.
































































Similar type of file document, and you can create a model. See, training or training time is not required much more, because just we have to upload file document and you have to train that.



Then you can go with some Azure AI content understanding service. Over there, you can upload the images under the Azure AI Foundry services content understanding, and you can analyse the video. So suppose if you upload any video and you want to find out the...



























Building the AI Lunar Landing - Complete Code

#Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]

import os
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.autograd as autograd
from torch.autograd import Variable
from collections import deque, namedtuple
#Part 1 - Building the AI
#Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, state_size, action_size, seed = 42):
    super(Network, self).__init__()
    self.seed = torch.manual_seed(seed)
    self.fc1 = nn.Linear(state_size, 64)
    self.fc2 = nn.Linear(64, 64)
    self.fc3 = nn.Linear(64, action_size)

  def forward(self, state):
    x = self.fc1(state)
    x = F.relu(x)
    x = self.fc2(x)
    x = F.relu(x)
    return self.fc3(x)
#Part 2 - Training the AI
#Setting up the environment
import gymnasium as gym
env = gym.make('LunarLander-v2')
state_shape = env.observation_space.shape
state_size = env.observation_space.shape[0]
number_actions = env.action_space.n
print('State shape: ', state_shape)
print('State size: ', state_size)
print('Number of actions: ', number_actions)
#Initializing the hyperparameters
learning_rate = 5e-4
minibatch_size = 100
discount_factor = 0.99
replay_buffer_size = int(1e5)
interpolation_parameter = 1e-3
#Implementing Experience Replay
class ReplayMemory(object):

  def __init__(self, capacity):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.capacity = capacity
    self.memory = []

  def push(self, event):
    self.memory.append(event)
    if len(self.memory) > self.capacity:
      del self.memory[0]

  def sample(self, batch_size):
    experiences = random.sample(self.memory, k = batch_size)
    states = torch.from_numpy(np.vstack([e[0] for e in experiences if e is not None])).float().to(self.device)
    actions = torch.from_numpy(np.vstack([e[1] for e in experiences if e is not None])).long().to(self.device)
    rewards = torch.from_numpy(np.vstack([e[2] for e in experiences if e is not None])).float().to(self.device)
    next_states = torch.from_numpy(np.vstack([e[3] for e in experiences if e is not None])).float().to(self.device)
    dones = torch.from_numpy(np.vstack([e[4] for e in experiences if e is not None]).astype(np.uint8)).float().to(self.device)
    return states, next_states, actions, rewards, dones
    #Implementing the DQN class
    class Agent():

  def __init__(self, state_size, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.state_size = state_size
    self.action_size = action_size
    self.local_qnetwork = Network(state_size, action_size).to(self.device)
    self.target_qnetwork = Network(state_size, action_size).to(self.device)
    self.optimizer = optim.Adam(self.local_qnetwork.parameters(), lr = learning_rate)
    self.memory = ReplayMemory(replay_buffer_size)
    self.t_step = 0

  def step(self, state, action, reward, next_state, done):
    self.memory.push((state, action, reward, next_state, done))
    self.t_step = (self.t_step + 1) % 4
    if self.t_step == 0:
      if len(self.memory.memory) > minibatch_size:
        experiences = self.memory.sample(100)
        self.learn(experiences, discount_factor)

  def act(self, state, epsilon = 0.):
    state = torch.from_numpy(state).float().unsqueeze(0).to(self.device)
    self.local_qnetwork.eval()
    with torch.no_grad():
      action_values = self.local_qnetwork(state)
    self.local_qnetwork.train()
    if random.random() > epsilon:
      return np.argmax(action_values.cpu().data.numpy())
    else:
      return random.choice(np.arange(self.action_size))

  def learn(self, experiences, discount_factor):
    states, next_states, actions, rewards, dones = experiences
    next_q_targets = self.target_qnetwork(next_states).detach().max(1)[0].unsqueeze(1)
    q_targets = rewards + discount_factor * next_q_targets * (1 - dones)
    q_expected = self.local_qnetwork(states).gather(1, actions)
    loss = F.mse_loss(q_expected, q_targets)
    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()
    self.soft_update(self.local_qnetwork, self.target_qnetwork, interpolation_parameter)

  def soft_update(self, local_model, target_model, interpolation_parameter):
    for target_param, local_param in zip(target_model.parameters(), local_model.parameters()):
      target_param.data.copy_(interpolation_parameter * local_param.data + (1.0 - interpolation_parameter) * target_param.data)
 #Initializing the DQN agent
agent = Agent(state_size, number_actions)
#Training the DQN agent
number_episodes = 2000
maximum_number_timesteps_per_episode = 1000
epsilon_starting_value  = 1.0
epsilon_ending_value  = 0.01
epsilon_decay_value  = 0.995
epsilon = epsilon_starting_value
scores_on_100_episodes = deque(maxlen = 100)

for episode in range(1, number_episodes + 1):
  state, _ = env.reset()
  score = 0
  for t in range(maximum_number_timesteps_per_episode):
    action = agent.act(state, epsilon)
    next_state, reward, done, _, _ = env.step(action)
    agent.step(state, action, reward, next_state, done)
    state = next_state
    score += reward
    if done:
      break
  scores_on_100_episodes.append(score)
  epsilon = max(epsilon_ending_value, epsilon_decay_value * epsilon)
  print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)), end = "")
  if episode % 100 == 0:
    print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)))
  if np.mean(scores_on_100_episodes) >= 200.0:
    print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(episode - 100, np.mean(scores_on_100_episodes)))
    torch.save(agent.local_qnetwork.state_dict(), 'checkpoint.pth')
    break
#Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gym.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env_name):
    env = gym.make(env_name, render_mode='rgb_array')
    state, _ = env.reset()
    done = False
    frames = []
    while not done:
        frame = env.render()
        frames.append(frame)
        action = agent.act(state)
        state, reward, done, _, _ = env.step(action.item())
    env.close()
    imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, 'LunarLander-v2')

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()

Building the AI Pac-Man - Complete Code









#Part 0 - Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]
import os
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from collections import deque
from torch.utils.data import DataLoader, TensorDataset
#Part 1 - Building the AI-Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, action_size, seed = 42):
    super(Network, self).__init__()
    self.seed = torch.manual_seed(seed)
    self.conv1 = nn.Conv2d(3, 32, kernel_size = 8, stride = 4)
    self.bn1 = nn.BatchNorm2d(32)
    self.conv2 = nn.Conv2d(32, 64, kernel_size = 4, stride = 2)
    self.bn2 = nn.BatchNorm2d(64)
    self.conv3 = nn.Conv2d(64, 64, kernel_size = 3, stride = 1)
    self.bn3 = nn.BatchNorm2d(64)
    self.conv4 = nn.Conv2d(64, 128, kernel_size = 3, stride = 1)
    self.bn4 = nn.BatchNorm2d(128)
    self.fc1 = nn.Linear(10 * 10 * 128, 512)
    self.fc2 = nn.Linear(512, 256)
    self.fc3 = nn.Linear(256, action_size)

  def forward(self, state):
    x = F.relu(self.bn1(self.conv1(state)))
    x = F.relu(self.bn2(self.conv2(x)))
    x = F.relu(self.bn3(self.conv3(x)))
    x = F.relu(self.bn4(self.conv4(x)))
    x = x.view(x.size(0), -1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    return self.fc3(x)
#Part 2 - Training the AI-Setting up the environment
 import gymnasium as gym
env = gym.make('MsPacmanDeterministic-v0', full_action_space = False)
state_shape = env.observation_space.shape
state_size = env.observation_space.shape[0]
number_actions = env.action_space.n
print('State shape: ', state_shape)
print('State size: ', state_size)
print('Number of actions: ', number_actions)
#Initializing the hyperparameters
learning_rate = 5e-4
minibatch_size = 64
discount_factor = 0.99
#Preprocessing the frames
from PIL import Image
from torchvision import transforms

def preprocess_frame(frame):
  frame = Image.fromarray(frame)
  preprocess = transforms.Compose([transforms.Resize((128, 128)), transforms.ToTensor()])
  return preprocess(frame).unsqueeze(0)
  #Implementing the DCQN class
  class Agent():

  def __init__(self, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.action_size = action_size
    self.local_qnetwork = Network(action_size).to(self.device)
    self.target_qnetwork = Network(action_size).to(self.device)
    self.optimizer = optim.Adam(self.local_qnetwork.parameters(), lr = learning_rate)
    self.memory = deque(maxlen = 10000)

  def step(self, state, action, reward, next_state, done):
    state = preprocess_frame(state)
    next_state = preprocess_frame(next_state)
    self.memory.append((state, action, reward, next_state, done))
    if len(self.memory) > minibatch_size:
      experiences = random.sample(self.memory, k = minibatch_size)
      self.learn(experiences, discount_factor)

  def act(self, state, epsilon = 0.):
    state = preprocess_frame(state).to(self.device)
    self.local_qnetwork.eval()
    with torch.no_grad():
      action_values = self.local_qnetwork(state)
    self.local_qnetwork.train()
    if random.random() > epsilon:
      return np.argmax(action_values.cpu().data.numpy())
    else:
      return random.choice(np.arange(self.action_size))

  def learn(self, experiences, discount_factor):
    states, actions, rewards, next_states, dones = zip(*experiences)
    states = torch.from_numpy(np.vstack(states)).float().to(self.device)
    actions = torch.from_numpy(np.vstack(actions)).long().to(self.device)
    rewards = torch.from_numpy(np.vstack(rewards)).float().to(self.device)
    next_states = torch.from_numpy(np.vstack(next_states)).float().to(self.device)
    dones = torch.from_numpy(np.vstack(dones).astype(np.uint8)).float().to(self.device)
    next_q_targets = self.target_qnetwork(next_states).detach().max(1)[0].unsqueeze(1)
    q_targets = rewards + discount_factor * next_q_targets * (1 - dones)
    q_expected = self.local_qnetwork(states).gather(1, actions)
    loss = F.mse_loss(q_expected, q_targets)
    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()
 #Initializing the DCQN agent
 agent = Agent(number_actions)
 #Training the DCQN agent
number_episodes = 2000
maximum_number_timesteps_per_episode = 10000
epsilon_starting_value  = 1.0
epsilon_ending_value  = 0.01
epsilon_decay_value  = 0.995
epsilon = epsilon_starting_value
scores_on_100_episodes = deque(maxlen = 100)

for episode in range(1, number_episodes + 1):
  state, _ = env.reset()
  score = 0
  for t in range(maximum_number_timesteps_per_episode):
    action = agent.act(state, epsilon)
    next_state, reward, done, _, _ = env.step(action)
    agent.step(state, action, reward, next_state, done)
    state = next_state
    score += reward
    if done:
      break
  scores_on_100_episodes.append(score)
  epsilon = max(epsilon_ending_value, epsilon_decay_value * epsilon)
  print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)), end = "")
  if episode % 100 == 0:
    print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)))
  if np.mean(scores_on_100_episodes) >= 500.0:
    print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(episode - 100, np.mean(scores_on_100_episodes)))
    torch.save(agent.local_qnet
    work.state_dict(), 'checkpoint.pth')
    break
 #Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gym.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env_name):
    env = gym.make(env_name, render_mode='rgb_array')
    state, _ = env.reset()
    done = False
    frames = []
    while not done:
        frame = env.render()
        frames.append(frame)
        action = agent.act(state)
        state, reward, done, _, _ = env.step(action)
    env.close()
    imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, 'MsPacmanDeterministic-v0')

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()





Building the AI KungFuMaster - Complete Code

#Part 0 - Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]


import cv2
import math
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.multiprocessing as mp
import torch.distributions as distributions
from torch.distributions import Categorical
import gymnasium as gym
from gymnasium import ObservationWrapper
from gymnasium.spaces import Box

#Part 1 - Building the AI Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, action_size):
    super(Network, self).__init__()
    self.conv1 = torch.nn.Conv2d(in_channels = 4,  out_channels = 32, kernel_size = (3,3), stride = 2)
    self.conv2 = torch.nn.Conv2d(in_channels = 32, out_channels = 32, kernel_size = (3,3), stride = 2)
    self.conv3 = torch.nn.Conv2d(in_channels = 32, out_channels = 32, kernel_size = (3,3), stride = 2)
    self.flatten = torch.nn.Flatten()
    self.fc1  = torch.nn.Linear(512, 128)
    self.fc2a = torch.nn.Linear(128, action_size)
    self.fc2s = torch.nn.Linear(128, 1)

  def forward(self, state):
    x = self.conv1(state)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.conv3(x)
    x = F.relu(x)
    x = self.flatten(x)
    x = self.fc1(x)
    x = F.relu(x)
    action_values = self.fc2a(x)
    state_value = self.fc2s(x)[0]
    return action_values, state_value
    #Part 2 - Training the AI Setting up the environment
    class PreprocessAtari(ObservationWrapper):

  def __init__(self, env, height = 42, width = 42, crop = lambda img: img, dim_order = 'pytorch', color = False, n_frames = 4):
    super(PreprocessAtari, self).__init__(env)
    self.img_size = (height, width)
    self.crop = crop
    self.dim_order = dim_order
    self.color = color
    self.frame_stack = n_frames
    n_channels = 3 * n_frames if color else n_frames
    obs_shape = {'tensorflow': (height, width, n_channels), 'pytorch': (n_channels, height, width)}[dim_order]
    self.observation_space = Box(0.0, 1.0, obs_shape)
    self.frames = np.zeros(obs_shape, dtype = np.float32)

  def reset(self):
    self.frames = np.zeros_like(self.frames)
    obs, info = self.env.reset()
    self.update_buffer(obs)
    return self.frames, info

  def observation(self, img):
    img = self.crop(img)
    img = cv2.resize(img, self.img_size)
    if not self.color:
      if len(img.shape) == 3 and img.shape[2] == 3:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = img.astype('float32') / 255.
    if self.color:
      self.frames = np.roll(self.frames, shift = -3, axis = 0)
    else:
      self.frames = np.roll(self.frames, shift = -1, axis = 0)
    if self.color:
      self.frames[-3:] = img
    else:
      self.frames[-1] = img
    return self.frames

  def update_buffer(self, obs):
    self.frames = self.observation(obs)

def make_env():
  env = gym.make("KungFuMasterDeterministic-v0", render_mode = 'rgb_array')
  env = PreprocessAtari(env, height = 42, width = 42, crop = lambda img: img, dim_order = 'pytorch', color = False, n_frames = 4)
  return env

env = make_env()

state_shape = env.observation_space.shape
number_actions = env.action_space.n
print("State shape:", state_shape)
print("Number actions:", number_actions)
print("Action names:", env.env.env.get_action_meanings())

#Initializing the hyperparameters
learning_rate = 1e-4
discount_factor = 0.99
number_environments = 10
#Implementing the A3C class
class Agent():

  def __init__(self, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.action_size = action_size
    self.network = Network(action_size).to(self.device)
    self.optimizer = torch.optim.Adam(self.network.parameters(), lr = learning_rate)

  def act(self, state):
    if state.ndim == 3:
      state = [state]
    state = torch.tensor(state, dtype = torch.float32, device = self.device)
    action_values, _ = self.network(state)
    policy = F.softmax(action_values, dim = -1)
    return np.array([np.random.choice(len(p), p = p) for p in policy.detach().cpu().numpy()])

  def step(self, state, action, reward, next_state, done):
    batch_size = state.shape[0]
    state = torch.tensor(state, dtype = torch.float32, device = self.device)
    next_state = torch.tensor(next_state, dtype = torch.float32, device = self.device)
    reward = torch.tensor(reward, dtype = torch.float32, device = self.device)
    done = torch.tensor(done, dtype = torch.bool, device = self.device).to(dtype = torch.float32)
    action_values, state_value = self.network(state)
    _, next_state_value = self.network(next_state)
    target_state_value = reward + discount_factor * next_state_value * (1 - done)
    advantage = target_state_value - state_value
    probs = F.softmax(action_values, dim = -1)
    logprobs = F.log_softmax(action_values, dim = -1)
    entropy = -torch.sum(probs * logprobs, axis = -1)
    batch_idx = np.arange(batch_size)
    logp_actions = logprobs[batch_idx, action]
    actor_loss = -(logp_actions * advantage.detach()).mean() - 0.001 * entropy.mean()
    critic_loss = F.mse_loss(target_state_value.detach(), state_value)
    total_loss = actor_loss + critic_loss
    self.optimizer.zero_grad()
    total_loss.backward()
    self.optimizer.step()
    #Initializing the A3C agent
    agent = Agent(number_actions)
    #Evaluating our A3C agent on a certain number of episodes
  def evaluate(agent, env, n_episodes = 1):
   episodes_rewards = []
  for _ in range(n_episodes):
    state, _ = env.reset()
    total_reward = 0
    while True:
      action = agent.act(state)
      state, reward, done, info, _ = env.step(action[0])
      total_reward += reward
      if done:
        break
    episodes_rewards.append(total_reward)
return episodes_rewards
#Managing multiple environments simultaneously
class EnvBatch:

  def __init__(self, n_envs = 10):
    self.envs = [make_env() for _ in range(n_envs)]

  def reset(self):
    _states = []
    for env in self.envs:
      _states.append(env.reset()[0])
    return np.array(_states)

  def step(self, actions):
    next_states, rewards, dones, infos, _ = map(np.array, zip(*[env.step(a) for env, a in zip(self.envs, actions)]))
    for i in range(len(self.envs)):
      if dones[i]:
        next_states[i] = self.envs[i].reset()[0]
    return next_states, rewards, dones, infos
#Training the A3C agent
import tqdm

env_batch = EnvBatch(number_environments)
batch_states = env_batch.reset()

with tqdm.trange(0, 3001) as progress_bar:
  for i in progress_bar:
    batch_actions = agent.act(batch_states)
    batch_next_states, batch_rewards, batch_dones, _ = env_batch.step(batch_actions)
    batch_rewards *= 0.01
    agent.step(batch_states, batch_actions, batch_rewards, batch_next_states, batch_dones)
    batch_states = batch_next_states
    if i % 1000 == 0:
      print("Average agent reward: ", np.mean(evaluate(agent, env, n_episodes = 10)))

#Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gymnasium.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env):
  state, _ = env.reset()
  done = False
  frames = []
  while not done:
    frame = env.render()
    frames.append(frame)
    action = agent.act(state)
    state, reward, done, _, _ = env.step(action[0])
  env.close()
  imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, env)

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()



 

No comments:

Post a Comment