Arun Gupta

Monday, 18 March 2024

GEN AI






Here we are going to learn about introduction to AI concept. We will discuss about what actually mean by AI, what we can do with the help of AI, and we'll go with the different Azure AI services. So this is the main agenda behind this module number one. So let's go with the


     

          So this term is actually coined in a 1956. Try to understand this term is just coined over there. Nothing was there, only theoretically they explained that what actually mean by artificial intelligence. And over there they explained that artificial intelligence is nothing but.@1 31 minutes 11 secondsIf any machine or if any software have the capability to work or mimic like a human brain, for example, if you have any machine like a robot, like a computer, or any another machine, or if you have any application or software which have the capability to work or mimic like a human brain.                          


Generative means machines have the capability to generate something. If I ask you, can you write one essay on the artificial intelligence or can you write an essay on the nature? Definitely you will do that. You will write an essay. If I ask you, can you sketch something for me? Definitely you will sketch for me.

 because our brain have the capability to generate something. So under the artificial intelligence, generative AI is one of the workload or one of the, you can say, scenario present through which machines have the capability to generate something. Now generation is happened with different scenarios.

That is, it is going to generate some text, it is going to generate some images, audio file, video file, music file, radio object. Also, machines have the capability to generate some sort of code, so this is called as a generative AI. In very simple words, I can say that if the machine have the capability to generate some content.
with the help of input provided by the user called as a prompt, then it is called as a generative AI. Another workload comes under the AI is the agents and automation. Now what actually mean by agents and automation, which is the extended version of the generative AI.

In the agents and automation, machines are going to perform the task on behalf of us. That means no need of human interaction over there. Let us go with one example. So I'm assuming that our office timing is 9 A.m. to 5 P.m. And my daily task is nothing but I have to take some or I have to see.
what type of mails come over there from the different clients and I have to reply to that mail. Now Tim, I have one question to you people. See, I'm getting the emails from 9 A.m. to 5 P.m. if the client is Indian. But sometimes what happened, I'm getting the emails at 2 A.m., 3 A.m., 1 A.m.Is it possible to reply that e-mail at that instant?
No, because time zone is different. So what we can do, we can create such type of agents, which will look into that e-mail, which will read that e-mail, which will understand the requirement, and it will look into our same emails. That means previously how we deal with that corresponding client.So that machine will understand our language and we will understand the machine language and that bridge is nothing but natural language processing, which is also called as a NLP.

NLP is one of the bridges in between humans and machine, which is going to make the conversation of machine language into human understandable language and vice versa. Then 4th workload is nothing but computer vision, which is going to work with the images and videos.

We have to provide the captions to the e-mail. This type of stuff can be done with the help of computer vision. In the video analysis, we can find out the length of video. We can find out the sentiment of the video, whether that video is positive, negative, or neutral. So these terms comes under the computer vision.

And another workload is nothing but information extraction. That means it is going to extract the information from different resources like website, from local machine, different servers, different databases, data warehouses, or from different service. So it is going to extract the information, save that information very appropriately.And whenever that information is required, it is going to send that information, and last workload is nothing but machine learning. Machines are going to learn from the data, machines are going to learn from the provided information, machines are going to find out some train patterns.



it should be fair enough. That means it should not bias anything. For example, if you created one AI product, which is going to find out whether that person is eligible for loan or not. So it is going to find out the result, yes or no.What type of input we have to provide? We have to provide the information about that person, that is age of the person, annual income of the person, gender of the person, location of the person, and such type of thing. Now remember, when you are creating that AI product, you train your AI product or your AI model or machine learning model.
Now, remember, out of 1000 samples, you consider 700 samples of the female and only 300 samples of the male. Now, try to understand what will happen here. You not trained your model in good proportion. You train your model as 700 samples of the female candidate.
300 samples of the male candidate. So if you observe carefully, you did some bias. You not given or you not provided the equal weightage at the time of training. So what will happen? This corresponding AI product will learn very deeply.
about the female candidate and very less amount for the male candidate. Now suppose if you have to test that corresponding AI product and you provide it near about 10 samples for the testing purpose, remember it will do the biassing with female candidate. Are you getting my point?
@1 52 minutes 51 seconds
Because that model learn very much about the female candidate.


Our brain have that capability to analyse this data. We have these three columns, A, B, and Y. As I told you that A&B are my input, Y is the output. By looking into this for example, you found that Y is equal to A into B. Now you can put any value of A, any value of B.You can find out the predicted value as a Y.
Any value it might be present in lakh term, 1000 term, 100 term, unit term, no problem. Now try to understand how our brain find out that corresponding relation. Suppose if I put here 20 columns, now I put it only three columns. If I put it 20 columns.Large number of columns. Now, try to understand machine learning is similar. What you did just now, machine learning is the same. Machine is going to learn from the provided data what type of trains, patterns, expression present over there, and it is going to find out one common equation.
@1 1 hour 27 minutes 22 seconds
And that equation is nothing but model equation.
So, when algorithm find out that Y is equal to a into B, or when algorithm is going to find out that Y is equal to a + b + c, something like that, that is called as a model equation, and once you got that model equation, you can do any type of prediction.

.


so features are nothing but known characteristics of something, and label is nothing but your output target variable, the thing we want to predict, so you can see predicting the ice cream sales for a given day. Now, if I'm considering the information or what type of data we have.We have the day of week, we have the month related information, we have the temperature related information, and many more. Now, depending upon the day of week, month, temperature, we are going to find out number of ice cream sold. So, features are day of week, month, and temperature

And labels are number of ice cream sold.
OK, so it will find out the equation: Y is equal to or number of ice cream sold is the function of day of week, month, and temperature.
You can see it.

F of X is equal to Y, Y is nothing but label, and that label is depends upon the function of X, that is, day of week, month, and temperature.




Now, when I'm considering the machine learning, machine learning is basically divided into two, called as a supervised machine learning, and next one is the unsupervised machine learning. Now, what actually mean by supervised machine learning? Supervised machine learning is nothing but training data.includes known label. Simply remember, if you have any data and in that data labels are present, then it is called as a supervised machine learning. And if you have any data in which labels are not present, then it is called as a unsupervised machine learning. Label is nothing but Y. That means.ABC provided Y is also provided, then it is a supervised and just only ABC provided, then it is unsupervised. Now in the supervised machine learning, it is divided into two, that is a regression and classification, that is your label or target variable consists of.numeric continuous value that is total number of ice cream sold. We don't know that how much value some will say as a 5, 10, 20, 50, 100, 1000, million.then it is called as a regression. So always remember if your label or target variable is continuous numeric value, then it is called as a regression. We have to use the regression algorithms over there.Classification: if your target variable or that label consists of finite number of classes or finite number of categories like yes, no, true, false, zero, one, good, better, best, then it is called as a classification, and over there we have to use the.
@1 1 hour 33 minutes 8 seconds
classification algorithm.Now, this classification is again divided into two. You can see here, binary classification is there. Let me highlight that. So we have the binary classification. So in the binary classification, only two classes are present, like a yes, no, zero, one, low, high.Likewise, and another type of the classification is the multi-class classification. Over there, multiple classes are present, like good, better, best, grades are A, B, C, D, E, likewise, or you can see the spaces, predict the spaces of the penguin.Based on its measurement, like a spaces A, B, C, likewise, so this is the idea behind the supervised machine learning. Always remember, we have the target variable, we have the label, then we have to use the supervised machine learning algorithm. Again, these are divided into regression and classification.If the target variable consists of continuous value, we have to go with regression. If the target variable consists of only classes or categories, we have to go with the classification. And suppose if you don't have the target variable inside your data, we have to go with the clustering algorithm.that is similar items are grouped together.You can see here different plants are present and depending upon those similarity, whether it is a shape related similarity, colour related similarity, they are keeping into one group.So separate plans into groups based on the common characteristics. This comes under the unsupervised machine level. If you have any question, please let me know.So moving ahead with this, as I mentioned, there is a difference between algorithm and model. So you can see here, we have the data. I have the data something like this, X1, X2, and X3 are my input, are my feature, and Y is the output or label.

Now, before applying this data, what I will do, I will cut this data or I will separate this data. Some samples or more samples you can consider near about 70%, 80% samples I'm keeping for the training purpose. Remaining 20, 30% samples I'm keeping for the validation.That is testing purpose, so only train data or data used for the training purpose, I will pass to the algorithm. We know that algorithm is nothing but number of steps present or the systematic steps are present, which is going to generalise the relationship between X&Y.


As a function, so it will find out the relationship between X&Y, Y is equal to F of X, and when it is find out that equation Y is equal to F of X, that is called as a model equation. Now, once your model equation is ready, team try to understand when we upload the data or when we provide the data to the algorithm.And it will find out some result Y. Now, you can compare this predicted value and these are actual value, and from this predicted and actual value, you will get the evaluation of the model. That means whether your model is good or bad, so always remember, once the model equation is ready.we have to provide the tested data for the evaluation purpose. Then model will find out the Y cap value that is predicted value. This value we have to compare with the actual value that is the data putted for the testing purpose. And from thatcomparison, we can evaluate the model.Remember, the difference between actual value and predicted value is more than error is more. The difference between actual value and predicted value is less, error is very less. That is, model is good.So, this is the idea behind the model training and evaluation. Now, as we seen that there are three types of algorithms comes under the machine learning. First one is the regression, where our target variable is continuous. So, one algorithm is the linear regression. Now, this linear regression algorithm works with the principle of straight line equation Y is equal to MX plusC, where M is the slope and C is the constant term. Now, to that algorithm, if you provide the information, it will find out such type of best fit line. Now, that line is nothing but your model equation, Y is equal to F of X. Now, you can provide any value of X, it will find out the value Y.You can compare this value with the test data, and you can evaluate. Now, for the evaluation purpose, we can use the error term. Error is nothing but actual value minus predicted value. We can find out the error, like a mean absolute error, absolute value of the actual value.Minus predicted value, mean square error, square value of the actual value minus predicted value, then root mean square error, actual value minus predicted value, it's a square and again square root and coefficient of determination that is R.That means we can say that how much changes happen into the Y when we make a small change into the X that is variation in X is represented in Y is called as a coefficient of determination.Now, suppose if we have to solve the problem of classification, then we have to go with such type of sigmoid curve where two classes are present, either zero or one. So, if any sample lies into this region, it will be treated as a zero. Any sample which lies into this region.It is treated as a one, so you can see example: logistic regression, evaluation matrices. We can use the accuracy again. It is the difference between actual value, predicted value, but here the values are zero, one, or class values that are present. Recall is 1 another term.Precision and F1 score precision and recall, mainly focusing on the true positive values, that is, actual value is also positive, predicted value is also positive, how much truly or how much correctly predicted over there is represented by recall and precision.The combination of recall and precision is nothing but F1 score. Then, suppose if we want to solve the problem of the clustering, then we can go with the algorithm like a K-means clustering. Remember, all the clustering algorithm works with the principle of distance. That means, suppose if the distance is less.they are keeping into one group. If the distance is more, they are keeping into another group. Now, suppose if we have the such type of algorithm, like a K-means clustering, hierarchical clustering, DB scan clustering, how we can evaluate that? So we can do the evaluation on the basis of.So moving ahead with this, as I mentioned, there is a difference between algorithm and model. So you can see here, we have the data. I have the data something like this, X1, X2, and X3 are my input, are my feature, and Y is the output or label.Now, before applying this data, what I will do, I will cut this data or I will separate this data. Some samples or more samples you can consider near about 70%, 80% samples I'm keeping for the training purpose. Remaining 20, 30% samples I'm keeping for the validation.That is testing purpose, so only train data or data used for the training purpose, I will pass to the algorithm. We know that algorithm is nothing but number of steps present or the systematic steps are present, which is going to generalise the relationship between X&Y.




So, this is the idea behind the model training and evaluation. Now, as we seen that there are three types of algorithms comes under the machine learning. First one is the regression, where our target variable is continuous. So, one algorithm is the linear regression. Now, this linear regression algorithm works with the principle of straight line equation Y is equal to MX plus C, where M is the slope and C is the constant term. Now, to that algorithm, if you provide the information, it will find out such type of best fit line. Now, that line is nothing but your model equation, Y is equal to F of X. Now, you can provide any value of X, it will find out the value Y.You can compare this value with the test data, and you can evaluate. Now, for the evaluation purpose, we can use the error term. Error is nothing but actual value minus predicted value. We can find out the error, like a mean absolute error, absolute value of the actual value Minus predicted value, mean square error, square value of the actual value minus predicted value, then root mean square error, actual value minus predicted value, it's a square and again square root and coefficient of determination that is R.That means we can say that how much changes happen into the Y when we make a small change into the X that is variation in X is represented in Y is called as a coefficient of determination. Now, suppose if we have to solve the problem of classification, then we have to go with such type of sigmoid curve where two classes are present, either zero or one. So, if any sample lies into this region, it will be treated as a zero. Any sample which lies into this region.It is treated as a one, so you can see example: logistic regression, evaluation matrices. We can use the accuracy again. It is the difference between actual value, predicted value, but here the values are zero, one, or class values that are present. Recall is 1 another term.Precision and F1 score precision and recall, mainly focusing on the true positive values, that is, actual value is also positive, predicted value is also positive, how much truly or how much correctly predicted over there is represented by recall and precision.The combination of recall and precision is nothing but F1 score. Then, suppose if we want to solve the problem of the clustering, then we can go with the algorithm like a K-means clustering. Remember, all the clustering algorithm works with the principle of distance. That means, suppose if the distance is less.they are keeping into one group. If the distance is more, they are keeping into another group. Now, suppose if we have the such type of algorithm, like a K-means clustering, hierarchical clustering, DB scan clustering, how we can evaluate that? So we can do the evaluation on the basis of.



After that, one concept called as a deep learning. Deep learning is nothing but it's like a neural network. Now try to understand inside our brain, neural network is present. Neurons are present. For example, I'm putting one scenario so that you will get the idea.

Our brain, our brain is going to find out this coffee cup of coffee is very hot. Now Tim try to understand to that brain so many things are connected.
So many things are connected, like touch.
Now tell me, out of such type of scenarios, who is responsible to send the information to the brain?ear and nose are connected with the brain, whatever the brain functionalities are there, but in this scenario, they are not taking the response;they are not responsible to provide the input, so these are connected to the number of neurons which are present inside our brain.And brain is going to process that. Now, depending upon that processing, it is going to tell us that wait for some time. Now, inside our brain, these things are happening. Is it possible to make such type of artificial neural network? The answer is yes. And this is going to make by using the input.And brain is going to process that. Now, depending upon that processing, it is going to tell us that wait for some time. Now, inside our brain, these things are happening. Is it possible to make such type of artificial neural network? The answer is yes. And this is going to make by using the input.that is X and the weight value W. Weight is nothing but who is more weighted, who is more responsible for that. In this scenario, eyes and touch have the more weightage. So we are going to assign the more weightage to them. Ear and nose are less responsible for that.So we will assign the very less weightage to that. Now, when we do the wrap up of this, that is X and OW, we are function wrapping into one activation function, which makes one neuron. So one neuron is made by using input and weight. So such type of number of neurons connected with each other It makes the neural network, and that is called as the artificial neural network. Remember, in the artificial neural network, basically three types of layers are present. One is the input layer, which have the capability to take the input from the scenario, input from the data. output layer is present, which have the capability to find out or predict the output, and in between, number of hidden layers are present, number of hidden layers depending upon the application. If application is complex, number of hidden layers are increasing. If application is not so much complex,
number of hidden layers are less. Okay, now if you observe carefully inside our brain what type of things are happening, we can't see that. So these in between layers are called as a hidden layers. You can see in this example, I'm providing one species of the penguin by passing the input parameter, whether it is a length related information or weight the related information likewise. So putting the values X1, X2, X3, and X4, these are four feature. Feature values are provided by assigning the weight. Then hidden layers are going to process that. And finally, we are getting the output in probabilistic way. So for the I can say spaces 0, it's a probability is 0.2, spaces 1 probability is 0.7, spaces 2 probability is 0.1. Now if you observe carefully spaces 1 acquires the maximum probability that is 0.7. So the final answer will be the spaces 1.So such type of things happen in the deep learning, which is the extended version of the machine learning.



Now in the Azure AI Foundry model, different models are available. LLM models are available. The LLM in the sense of large language model. So whatever you are using right now, like a generative AI application, for example, ChatGPT, Copilot, behind that LLM models are present. and through the agents you can achieve your automation. Then for the natural language processing purpose, Azure AI language service is available through which you can do the language translation, language recognition, question answering. You can create a chat bot likewise. Then Azure AI translator service is present. which is going to translate one language into another language. And Azure AI speech service is present, which is going to convert your speech into text format. Then computer vision service is available, through which you can deal with the Azure AI vision service, like you can play or you can deal with the images. you can deal with the video file and one dedicated face service is available. This service is basically used to deal with the face. That means suppose if you enter into one premises, for example airport, so there are different face recognizers are present, some applications are present.which are going to identify your phase. Depending upon that phase, it is going to find out or collect some information, store that information likewise. So under the computer vision, phase service is available. Under the information extraction workload, different services are available like Azure AI doc. document intelligence service. Let us consider one scenario. Suppose inside your organization, similar type of one lab documents are present. and you have to analyse those document manually. Definitely it will take much time to do the analysis manually. So what we can do, we can use that Azure AI document intelligence service where we can upload the document and once the analysis is start, it will find out the appropriate information, extracted information. from those documents. Then we have the Azure AI content understanding, which will understand about your content if you are uploading some books over there, if you are uploading some large corpus over there. This service content understanding service have the capability to understand about the content.





when I'm taking the name as a generative AI, we know that in the last some months of 2021, as ChatGPT come into the picture, whole era of the agent, sorry, artificial intelligence is changed. Because as this GPT models comes into the picture, people are asking so many questions, models are going to generate the response. What do we have to understand behind this? What type of things are present? So, when I'm considering the generative AI, behind the generative AI, one concept is present called as a LLM Large Language Model. Now, what actually this might mean by this large language model? The name itself specified that these models are already learned from the large amount of data, very massive amount of data. For example, hope you remember some models you seen that say 1B or 18B. Right, that means to these models, 7 billion, 18 billion parameters are present, so you can imagine that how big or complex model it is. So, these parameters are nothing but deep learning parameter. These models are made by using deep learning architecture. Now, these language models have the capability to learn from the data. Which is present in language format, image format, audio format, video format, and many more. Once these models are learned from the data, they are going to create one sort of vocabulary, and once the vocabulary is created in appropriate format, you can ask any question to that model. So, you can ask any question to that model. If the vocabulary is created, it has the capability to generate the response for you. Now, let us try to understand which type of response they are going to generate. So, as I mentioned previously, also, they have the capability to generate the response in text format. They have the capability to generate the response in image format, they have the capability to generate the response in video format, audio format, music format, they have the capability to generate the response in 3D object format, and also they have the capability to generate the response in. Code format, so if you observe carefully, they have the very good capability to generate the response in different, different format.OK, so here some examples are provided. You can see the first example is the natural language generation. So when you ask any question like write a cover letter for a job application to this LLM model, as I mentioned, this model already learned from the large amount of data. Created already a good vocabulary depending upon your question, which is called as a prompt. So, in the generative AI, whatever the question asked by the user, it is treated as a prompt. So, when you are putting such type of prompts over there, depending upon your prompt, it is going to generate the response. you can see the quality of response. Dear, please find in close my application for the role of dash, dash, dash.  So, model have the capability or that JNEI tool have the capability to generate the response in text format. Suppose if that model is going to support images also, and if you say like that, create a logo for a florist business, you can see how massively it is going to generate the image for us.
Or if you put the input like this, write a Python code to add 2 numbers, it will be treated as a code generation and it will create the code for us.try to understand there is one concept called as a multimodal model. Multimodal is one of the part of the LLM model.


Now, when I open the Copilot and if I ask such question, write a cover letter for a job application, it will write the application. If I ask this question, create a logo for a florist business, it will create image. So try to understand if any model, any single model, have the capability to work with the different data type.



a text, images, video, audio, music, 3D object, or code, then it is called as a multimodal model. Now a days, there are so many multimodal models that are present and through which we can find out the best result for us by using single model. Now, let us try to understand this scenario a bit more or deeper way, so you can see there are two types of models: language models are present. First one is the small language models are present, another one is the large language models are present. Now, when I'm considering. This language model, this language model works with the principle of transformer architecture. Now, try to understand what actually mean by transformer architecture. So, in the transformer architecture, two blocks are present: first block is called as an encoder, and 2nd block is called as a. Decoder. Decoder is responsible for generation purpose. Encoder is responsible for creating the vocabulary or understanding the input. Now, try to understand with this example or with this diagram. Suppose if you have a very large amount of data, data is present in either in audio format, music format. Text format or bus format, anything you are going to pass this data to the encoder. Now, remember, before or in this encoder architecture, some blocks are present. First block is called as a tokenizer. And what is mean by tokenizer? So suppose if you are uploading so many text documents over there, how machine will understand? So this tokenizer is going to separate your whole sentence into small, small words. It is going to separate your paragraph into small, small words. All other chunks or tokens, so tokenizer is responsible to convert your big or larger data. You can say as a paragraph or sentences into small, small, single, single words. Now, once it is tokenized, it is going to assign one numeric value for that. Fine, after the token answer, what happened? There is one concept called as an embedding model. So, this is so much important. Remember, if you are dealing with the transformer architecture, embedding is so much important. So, embedding is nothing but it is going to. Encode or convert your text tokens into vector format like this. So, we can see dog, cat, puppy, and skateboard. What tokenizer will do? Tokenizer will just give the value one. It will give the value as a two.For the puppy, it will give the value as a three, and for skateboard, it will give the value something like as a four, but from this value, one, two, three, 4, machine will not understand the meaning. How machine will understand the meaning, and for this purpose, we need the embedding. Remember, there are some embedding models that are also present, like a LLM model. So, this embedding model have the capability to convert your token into vector format, something like this. Now, what is the role of this vector or representation of tokens into vector format? Because these embedding vectors encapsulate the semantic meaning or semantic attribute in multiple dimension. Try to understand how machine will understand dog and cat are very close to each other. Dog and puppy are very close to each other. Dog and skateboard are far away from each other because of this embedding vector. This embedding vector will represent your tokens onto the, you can say, dimension format, multi-dimensional format, and it will find out which words are very close to each other.Which words are far away from each other, and it will create a good vocabulary over there. Now, once the good vocabulary is created, you can ask any question over there. You can see like this: when my dog was. You can see input is provided now; it will find out which words are very close to the dog, which words are far away from the dog, which words are similar to the dog, and depending upon that, it will find out the result as a puppy.when my dog was a puppy like this. Okay, so language model works with this principle. They are going to understand the meaning first. Depending upon that understanding, they are going to generate the or creating the response. They are just going to do the completion in very sequential and very accurate way. LLM models are time consuming and also expensive because for the training purpose we have to put some cost over there in terms of compute as well as training data. And also it is going to take some time for the fine tuning purpose. Suppose if you have to tune that model with our own data. Whereas SLMs are very faster, these are not so much expensive because we train this with very small amount of data with very small time duration also. And suppose if we have to fine tune that model with our data, then it is very easy or not so much time consuming. You can see some examples of the LLM. LLMs are GPT 4.5, GPT 5. Then we have the Mistral 7B, that is 7 billion parameters are there. Lamma 3, Lamma 2, these are comes under there. in the SLM, Microsoft 54 or cut to then hugging pairs, GPT new, these are the small language model. So depending upon our application or which type of problem we are solving, we have to select either LLM model or SLM model. Now, you can see how actually it works, or suppose if you are using. Generative AI tool, any, whether you are using ChatGPT, Copilot, Damini, or any other. Remember, as I mentioned, these models already learn from the massive amount of data. They are the good vocabulary. Now, it is in our hand how we are asking the question. If we ask the right question, appropriate question, definitely it will give us the appropriate output. So, what task is what we have to provide the clean, consistent, reliable, and accurate information so that we can expect the desired output, and this is called as a.Prompt engineering. So, while drafting or writing any prompt, we have to consider some parameter, and these parameters are specified in this corresponding slide, so you can see I'm asking one question to the JNEI application. This JNEI application will send my query to the language model.And finally, I will get the result, but as I mentioned, we have to provide the clean, consistent, reliable, and accurate input as a prompt, so always start with the specific goal for what you want the output to be, so you can see in this case, summarise the key consideration for adopting. co-pilot described in this document. So just you have to provide the specific goal. I have to summarise this. Or summarise the key consideration. Second point, provide the source to ground the response to in a specific scope of information. You can provide some sort of document information if you have. Third point, you can add some context to maximise the response appropriateness and relevance. You can see in this case for a corporate executive. So if you provide this for a corporate executive, machine or that LLM model will get the more idea. User provided this context and user is tending toward this corporate executive. Fourth point, you have to set the clear expectation for the response. In this case, you can see format the summary as a no more than six bullet points with a professional tone. So here they specified. Maximum 6 bullet points and professional tone, and fifth one is the last one is the suppose if you put in your prompt, but if you are not getting the desired output, always remember we have to do the iteration. So you have to iterate the based on the previous prompt and responses to refine the result.Many cases, what will happen? You will not get the result in first attempt. So we have to take care over there. We have to iterate that corresponding input and see the result. So this is the idea behind the prompt engineering. Now, next part is nothing but agent. Now, what are the agent? Let us try to understand this scenario.



So, as I mentioned in the encoder, first step is the tokenizer. So, try to understand in the next slide: tokenizer is the first step in training the transformer model, which is going to decompose the training text into small, small chunks called as tokens. I considered one example: sentences. I heard a dog bark loudly at a cat. Now you can see each corresponding word is separated. I is separated, heart is separated, A is separated, dog is separated, bark loudly at cat. All are separated and for each word it is going to assign one numeric value so that machine will understand. Remember machine have their own language which is in numeric format. And we human have our own language, which is non-numeric in nature. So when we are writing something in our language, machine will confuse initially and tokenizer will help there, which is going to convert your language into machine understandable format, numeric format.The sentence, I heard a dog bark loudly at a cat, is represented as one, two, three, 4, 5, 6, 7, three, 8. Three is repeating because A is repeating in that sentence. Similarly, suppose I have to represent the sentence, I heard a cat, I can write like this, one, two, three, 8. So this is the idea behind the tokenizer.After the token and the embedding come into the picture, which is going to find out the relationship between each token in vector format. So the semantic relationship between token is encoded in vector, known as embedding. You can see the token values are 4, 8, 9, 10, but the embedding values are 4. like this. This is a three-dimensional representation of each word. Now from that word, machine will get the idea which words are very close to each other and which words are far away from each other.





And finally, we have the attention mechanism. When we are considering the LLM model, there is one attention mechanism which is based on the deep learning algorithms only or neural networks only. So attention is nothing but it is going to capture the strength of relationship between
tokens using attention technique. Simply remember attention is nothing but it is going to find out that which words are focus one, which words are so much important. So what is the goal of this attention mechanism? It is going to predict the token after that specific sequence. Now in my case I'm going to.
Predict the token after dog, so I represented I heard a dog as a vector and I assigned her and dog are more weighted. That means these words are so much important, more important words are there now. It will show some several possible tokens can come after dog.


The most probable token is added to the sequence in this case as a bark. So if I write the sentence, I hurt a dog, it will find out the appropriate sentence or word is a bark.

On which basis it is going to find out, as we seen that it is going to find out the relationship between each word, and depending upon that relationship, it will find out which word is most probable or very close to that.

Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models.







Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models. Now, as I mentioned, there are two types of language models that are present. LLM, that is large language model, and another one is the SLM, that is small language model. Remember, depending upon your application, you can use this language model because we have to pay for these models. These are not freely available. Very few models are freely available. Open source models are available. But most of the models, we have to pay for that. So depending upon our application, we have to select the appropriate model, whether it is a LLM or SLM. So we must know about the difference between LLM and SLM. Let us try to understand. So, in the LLM model, these LLM models are trained with very sheer, large volume of the data. They are going to learn from the text data, image data likewise, and they consist of billion of parameters. That means you can imagine that how big neural network is present to learn such type of data. Whereas in the SLM, these are trained with very focused text data. For example, If you have to create or if you have to do one application with respect to your business only, you does not require the other side terminologies and all. So in that case, you can provide your own data, you can train your model. So you can select in that case as a SLM, that is small language model. because it consists of very focused text data and very few parameters. Let us jump again back to the LLM. LLM, how the comprehensive language generation capabilities in multiple contexts, because they already learned from the very large amount of data, it has a massive vocabulary over there. But in the SLM, these are focused on the language generation capability in very specialised content because its vocabulary is limited. LLM's size is very large one and it is going to impact on the performance and portability. So when we want to. port that model onto another platform, then it is very difficult. But in the SLM, these are very fast because of the limited vocabulary and portable. LLM models are time consuming and also expensive because for the training purpose we have to put some cost over there in terms of compute as well as training data. And also it is going to take some time for the fine tuning purpose. Suppose if you have to tune that model with our own data. Whereas SLMs are very faster, these are not so much expensive because we train this with very small amount of data with very small time duration also. And suppose if we have to fine tune that model with our data, then it is very easy or not so much time consuming.You can see some examples of the LLM. LLMs are GPT 4.5, GPT 5. Then we have the Mistral 7B, that is 7 billion parameters are there. Lamma 3, Lamma 2, these are comes under there.in the SLM, Microsoft 54 or cut to then hugging pairs, GPT new, these are the small language model. So depending upon our application or which type of problem we are solving, we have to select either LLM model or SLM model.



Generative AI tool, any, whether you are using ChatGPT, Copilot, Damini, or any other. Remember, as I mentioned, these models already learn from the massive amount of data. They are the good vocabulary. Now, it is in our hand how we are asking the question.

If we ask the right question, appropriate question, definitely it will give us the appropriate output. So, what task is what we have to provide the clean, consistent, reliable, and accurate information so that we can expect the desired output, and this is called as a...

Prompt engineering. So, while drafting or writing any prompt, we have to consider some parameter, and these parameters are specified in this corresponding slide, so you can see I'm asking one question to the JNEI application. This JNEI application will send my query to the language model.

And finally, I will get the result, but as I mentioned, we have to provide the clean, consistent, reliable, and accurate input as a prompt, so always start with the specific goal for what you want the output to be, so you can see in this case, summarise the key consideration for adopting.

co-pilot described in this document. So just you have to provide the specific goal. I have to summarise this.

Or summarise the key consideration.

Second point, provide the source to ground the response to in a specific scope of information. You can provide some sort of document information if you have.

Third point, you can add some context to maximise the response appropriateness and relevance. You can see in this case for a corporate executive. So if you provide this for a corporate executive, machine or that LLM model will get the more idea.

User provided this context and user is tending toward this corporate executive. Fourth point, you have to set the clear expectation for the response. In this case, you can see format the summary as a no more than six bullet points with a professional tone. So here they specified.

Maximum 6 bullet points and professional tone, and fifth one is the last one is the suppose if you put in your prompt, but if you are not getting the desired output, always remember we have to do the iteration.

So you have to iterate the based on the previous prompt and responses to refine the result.

Many cases, what will happen? You will not get the result in first attempt. So we have to take care over there. We have to iterate that corresponding input and see the result. So this is the idea behind the prompt engineering. Now, next part is nothing but agent. Now, what are the agent? Let us try to understand this scenario.










So this LLM is going to perform the task on behalf purpose. It is going to understand your team's messages. It is going to understand your about your WhatsApp, Slack, Jira messages, and it will try to make the automation. So you can see here when LLM is going to perform.

Connect with a different tool; it is going to retrieve some information, it is going to take or make some action, it is going to deal with the memory, and depending upon that, it will give you the final output, then it becomes the agent. Let us try to understand: agents are generative AI application because...

The base behind the agent is LLM. So these are generative AI applications that can respond to your on user input or access some situation autonomously and take the appropriate action. That means on behalf of us, they are going to perform the task. They are going to send the messages.

They are going to read the data, they are going to do the automation. Now, when we have to create the agent, always remember first we need the LLM model, because without LLM model we can't do anything, so LLM or that model powers the reasoning and language understanding.

It will understand about the scenario. It will understand about your team's message. It made everything. Then you can define some instruction, whether it is a system message, or you can specify how your LLM model should work, on which role you are, suppose if you are a programme coordinator.

You can specify to the LLM as an instruction system message, you are a programme coordinator, so that that LLM model will learn or gives the response in that way, in that angle only. So you can provide some instruction like define the agent's goals, behavior, and constraint, what type of...

things at you over there. And finally, it is going to connect with different tool, let the agent retrieve the knowledge or take or make the action, or it is going to connect with the different memory. So team remember, very simple term, when LLM is going to connect with the tool, and depending upon that, it is going to take or make some action.

then it is called as a agent. Agents are basically used for the automation purpose. Right now in the organisation in the industry, agents are used for the mainly for the automation purpose.



if the temperature value is high, it will consider so many words; it's a radius is going to increase, it is going to consider so many words, and then...
Yeah, both are very much similar. I will explain you about top PL. So if the temperature value is more, you will get the more unknown words. That means you will get the more creative answer. So in which case we can use that. Suppose if you have to write one poem, which is imaginary, you have to write one poem on rain. And suppose if you have to create one application through which you are going to write some poems or some imaginative type of themes, creative type of things, always we can have to put the temperature value high. Now, top is very much similar to the temperature value, but small difference is that It is going to consider top probability. So suppose I consider the word mori and I have to find out the related words to the mori and if I put the probability is 0.1, can you tell me how many words I will get there? Just tell me high or low.




if the probability is less, means I will get the more words.

Try to understand. See, my base word is the moray, and I have to find out those words whose probability is greater than 0.1. Definitely, so many words will be there, but if I put the restriction, I need only those words whose probability is more than 0.9.

How many words will be there? Very less.

No problem. Let us try to understand. I have one word, good.



exactly. So, see, for good, I'm putting here better probability is 0.8.

Best, I'm putting probability is something 0.8 for you.

Wow, I'm putting probabilities 0.6.

High probabilities point.

OK, now suppose if I put this top P value is 0.21, how many words will be considered?

All 4.














What are the Azure AI Foundry project? So under the Azure AI Foundry, you can create one resource, and in that resource, another different parameters are present. So Tim, I think one person asked the question, suppose if you have to do the integration of different Azure AI services with the agent, We can do with the help of Azure AI Foundry. So, here in the Azure AI Foundry, different models are present. By using those models, you can create your journey application. Suppose if you have to jump or work with the agents.There are different Azure AI Foundry agents are available. We can take those AI agents and different Azure AI services like a language translation, document service, vision service. We can do the integration of that. That means suppose if you want to create a small agent,We can create that also. And suppose if you have to create one agent which is going to solve the problems for your complexity, we can do that also. That means with the help of Azure AI Foundry, we can create gen AI solution. We can create agentic AI solution by integrating.Now, when I open the Azure AI Foundry, there are different models that are available. Now, when I open or when I deploy any model, I can test that model with the help of playground. So chat playground is there. Similarly, another playgrounds are present, like image playground is present, audio type playgrounds are present. Now, when we have to deal with this Azure AI Foundry, we can find out the best model which is going to suit our need. Now, how we can find out that? So, always remember when you enter into the Foundry, there is one option called as a compare models. We can compare multiple models like. GPT-40, GPT-40 mini, GPT-4.5, GPT-5 likewise. And from that comparison, we can find out which is the best model for our scenario, for our business statement. So you can see such type of models available like open AI model, GPT-5. Microsoft Model Five Four, popular third-party models are also available.Once you select that appropriate model after comparison or from your previous business knowledge, then you can deploy your model you want to use in your application. And once model is deployed with the help of Playground, you can test your model, how it's working. Now in this, you can add your data also, you can play with your parameters also.And you can add some examples if you need it Now, in the Azure AI Foundry, we can create agents also. So, to create any agent, what we have to do, we have to specify the agent name, we have to specify the model deployment, which model you are going to use for the agent creation purpose, knowledge tools.ou have to specify the list of tools where that LLM model or agent is going to connect over there like a website from your local machine likewise. Then what are the different action tools are present? Like what type of actions it is going to take, whether it is going to send the e-mail. Outlook will be there, whether it is going to send the messages, WhatsApp will be there, Messenger will be there, likewise. And if you want to connect one agent to another agent, you can connect different agents also. Now once your agent is ready, again, you can test your agent into the playground.So, you can see here one screenshot: when you are asking the question, it is going to generate the response, but actually it is one of the agents.Depending upon this, we have one small exercise, so we will see about that, how we can explore the generative AI in Azure AI Foundry port.

With the help of NLP, as I mentioned, we can make a bridge in between humans and machine. So, for example, if you have one newspaper, if you have one text, or if you have any document, and if you uploaded that document to the machine, initially machine will not understand your document because machines' language is different.and our language is different. So we have to do some sort of adaptation. We have to do some sort of pre-processing. That means we have to convert our raw text. Let me highlight this. We have to convert our raw text into machine understandable format. So you can see here, I'm applying tokenization.So, with the help of tokenization, whatever the raw text is present, it is going to convert into one structured format where, for every token, one numeric value is assigned, called as a token value, and whole process is called as a tokenization. In the pre-processing, another steps are also available, like.we can remove the unwanted white spaces over there. We can remove the special symbols over there likewise. So one of the steps in the pre-processing or after the pre-processing, we can use as a tokenization. Now once the corresponding information or input is present in machine understandable format,Then we can pass that information for the training purpose. Now, in the training, there are different models that are available. So many models are available. Depending upon our application, we have to select the model. Now, what are the models that are available? Let me initialise or let me put some models here.So that you will get the idea, so...

First model I'm putting earlier.

Just one ,Thank you. So let me write here some models. T5 is one of the model we can use for the natural language processing. With the help of this model, we can perform so many things. Then we have one another model, which is called as a BERT bidirectional. Encoder representation for transformer, which is again one of the good model through which we can perform so many NLP related tasks. Now, once we select any model, these are the basic and very simple models from the for the natural language processing. So, once we select any model. This model is going to help us to perform so many steps, so many tasks over there. So which type of task we can perform? So here we can see. Speech-to-text, text-to-speech conversation is possible. Now, remember, for this purpose, we need some sort of model, which are called as a T5 model and BERT model, which are the massive models used for the natural language processing. There are not only two models, another different models are available when we have to play with the.NLP concept. Now, after this basic introduction of NLP, we got the idea with the help of NLP, we are going to perform some sort of stuff like text analysis, opinion mining, machine translation, summarization, conversational EI, and before that, some steps are like present like a pre-processing, converting raw. text into appropriate format, tokenization, convert your natural language data into machine understandable format, and then it is going to do the training. Now, there are different techniques we can use for the language modelling purpose and text analysis purpose So, these techniques are basically divided into two parts, called as a statistical technique, and another one is the semantic modeling, which is also called as a dynamic techniques. Now, with the help of statistical technique, we can perform the classification. For example, you have to find out whether your e-mail is. spam or not. That means just two classes are present. We can do the classification. For this purpose, we can use logistic regression,


these techniques are basically divided into two parts, called as a statistical technique, and another one is the semantic modeling, which is also called as a dynamic techniques. Now, with the help of statistical technique, we can perform the classification. For example, you have to find out whether your e-mail is spam or not. That means just two classes are present. We can do the classification. For this purpose, we can use logistic regression, naive-based algorithm.So these are the static methods. Why static? Because it will understand some keywords present in your e-mail, present in your data information, and then find out whether it is a spam or not, whether it is a positive or negative. Likewise, another method comes under the static technique is the TMIDF, which is also called as a term frequency and inverse document frequency. Tim tried to understand this method is basically used for the purpose of to find out which words are so much important. So suppose if you have three documents and you want to find out the term frequency and inverse document frequency. So how it is going to calculate? So term frequency is nothing but in how many, sorry, in one document that term is occurred how many times? So document number one term is occurred 5 times, document #2 term occurs 4 times, document #3 term occurs 0 time. So this is called as a term frequency. Now, what is a document frequency? In how many documents that term is present? So in this case, two documents consist of that term. So document frequency will be two and its inverse document frequency will be one by two. So term frequency divided by inverse document frequency, we will get the TF IDF value. which is basically used for the purpose of find out which words are so much important over there. This is static method because these rules or whatever the things we are using, classification, TF, IDF, these are some specific, I can say, rule base.But when we are saying something, we don't know that what we will see or what we will say over there, what we are going to write over there. And for this purpose, we have to go with some dynamic model, which are called as a semantic model technique. Semantic in the sense of they are going to understand the meaning of that particular sentence.meaning of that particular paragraph. So under the semantic modeling, 2 methods are present called as a transformer model, which are going to represent the language token as a vector and these vectors are going to find out the relationship between each word. So suppose if I say the sentence, my name is Suraj, so these transformer models are going to find out the relationship between each word like a my and name, my, and is, my, and Suraj likewise. And another method present over there under the semantic model, which is called as an attention mechanism. So attention mechanism is very much similar to the transformer model, but here it is going to focus on the next word.What will be the next appropriate word? So, attention mechanism or attention method is mainly find out the focus word, which words are so much important, which words are making the influence on the another word. For this purpose, we can use the attention mechanism Now, once we have these techniques, then we can work with another parameters also. So in the next slide, you can see that scenario. We can go with the speech processing. Once our data is present in appropriate format, then we can do the text-to-speech conversion, speech-to-text conversion.




text-to-speech conversion, speech-to-text conversion.So when we are converting speech into, sorry, text into speech format, then it is called as the speech synthesis. You can see text is going to convert into speech format called as a speech synthesis. Here we have to provide the step number one, that is we have to convert our text into its tokenized format.that is assigning the numeric value to each unique token. Then it is going to map the token to corresponding phone names. And finally, with the help of these phone names, it is going to generate the audio signal. Exactly opposite thing happened into the speech recognition. Speech recognition is nothing but converting speech into text format.First, it will capture the audio signal, then that audio signal is converted or chopped into break into corresponding phonemes, and those phonemes are mapped into the text tokens. So, this is the mechanism happened in the speech processing in speech synthesis and speech recognition.




Personally identifiable information 
Now, what mean by personally identifiable information? That means, for example, if you are putting some information like your mobile number, credit card number, or any other personal information like your mail address, then there are some methods, algorithms are present which are going to identify that personally identifiable information And sometimes it is going to hide that for the security purpose, with the help of Azure AI Language Service, we can do the summarization, means we can summarise the data very long content, we can convert into short one with the same meaning now.Again, this Azure provides 1 translator service. We know that translation is nothing but converting one language into another language. We can do the text translation, converting that small paragraph sentences into appropriate language, or I can say destination language. Then we can do the document translation also.whole document is going to translate into another language and we can do the custom translation. Someone asked me that suppose if we have to deal with the multiple languages, so over there we can go with the customization option. Some sort of sentences are present in Hindi language, some sort of sentences are present in English languageor some sort of sentences are present in Spanish language and we have to do the translation. Definitely that is possible, but we have to do the customization. We have to provide the data, train that corresponding algorithm, and then we can use that. So such type of services present under the.







So let's go with our next model, which is nothing but computer vision, very interesting model and small one. So we will discuss about computer vision in which basically mainly it is dedicated for the image and video related services.So our focus will be on the computer vision. So here we are going to learn about computer vision concept and how we can use the computer vision in the Azure platform. So when I'm considering the image and image processing, we know that image is going to represent by using matrix format.So, it consists of multiple rows and multiple columns. These are arranged in an appropriate way, like a matrix format Yeah. So here you can see the images looks like this. Now always remember image is going to represent in three ways or image is classified into three types. First one is the binary image. Now what actually mean by binary image? That means each pixel is going to represent either zero or one that is black and white. So this image is also called as a black and black or white image.Then another type of image is present, which is called as a grayscale image. Now in the grayscale image, each pixel is going to represent in the range 0 to 255. That means total 256 shades are present. And third one is the colour image.So, in the colour image, the corresponding scale or pixel value is going to represent from zero to two raised to twenty-four.So here you can see how many combinations are present. Now, if we observe carefully into this image, the minimum value of the pixel is 0, maximum value is 250 for you. So this image is nothing but grayscale image. Now, when I want to do the analysis of image That means I have to find out the meaningful information from that image. So first try to understand what actually mean by image. So an image is an array of number of pixel values. Image is going to represent by using monochrome, that is single channel. Either it is a.Black and white image or grayscale image, or we can represent the image by using multiple channels, which is called as a colour image.Now, when I have to analyse the image, I have to apply some sort of philtres on that image, which are also called as a mask. So, one of the masks is represented here. You can see the mask, how it looks like. So, this is the mask, and when I apply this mask on that image, you can see.Again, I am showing that when I am applying the mask on that image, it is going to apply the mask from left to right and top to bottom in this style.




Now, when I'm working with the image processing, most of the important algorithm is nothing but convolution neural network, which is called as a CNN.So these algorithms have the capability to identify the objects present in the image. It is going to find out the faces present in the image. It is going to find out the peoples present in the image. It has the capability to find out or extract the foreground part of image, background, removal background of the image.Likewise, now for this purpose, when we have to work with the convolution neural network, we have to follow some sort of steps. So, which type of steps we have to follow? You can see this one by one. So, in step number one, we have to provide the labelled images which are going to use to train the model.So, here you can see such type of images are present, like apple, banana, cherry, orange, likewise. Then we have to apply some sort of philtre that is mask, which we applied in the last slide also, and this philtre is going to extract the information or features.from that image. So you can see in step number or in between step two and three, extracted images there. In step #3, we have to map that image into pattern format so that machine or deep learning algorithms are going to learn from that image. Then those pattern formats are provided.To the deep learning or neural network, that is, features are paid into the fully connected neural network. These hidden layers are going to process that input data or image data, and finally, in step #5.We will get the output. Now, the output is based on the probability. The output layer is going to produce the probability value for each possible class label. So, suppose if you uploaded multiple images and you have to find out the class, then it will give you the result like this: apple, banana, orange, likewise.Now, always remember, during the training, we are going to apply some filters, and initially these philtres are going to assign with the random weights Initially, we will get some sort of error over there, but with the help of back propagation, this convolution neural network is going to adjust those errors and it will improve the accuracy. So this is the idea behind the convolution neural network. In simple words, I can say that input images are provided with Label, then mask is applied for the filtering purpose. The image is going to convert into platen format. That platen format is provided to the neural network input layer. Hidden layers are going to process that corresponding image, and output layer is going to responsible to provide the output. put like a probability of classes, it is going to find out the, I can say, base related information or object related information from that image


When we are using image related steps, there is one concept called as a multi-modal model, and these models have the capability to work with the image as well as language. So, for example, if you uploaded any image and you want to find out the caption for that, so image.comes under the image processing, caption comes under the language task. So in the multimodal model, single model is going to perform multiple tasks like a language related task and image related task. For example, if you uploaded image and you want to do the bounding box or object detection in that.So, what happened in the object detection? So, it will find out the object, it will create the bounding boxes for that, and provide the label.You can do the captioning, image will be present and caption is provided in text format. You can do the image tagging and all. So multimodal model is one of the new approach in the modelling which involves the combining language and user model that encode the image and text data They are going to find out the semantic relationship between feature extracted from the images and text extracted from related caption. And the multi-modal model can be used as the foundation model for more specialised adaptive model. So this is the idea behind the multi-modal models


how we can perform that. Then with the help of Azure platform, how we can explore that AI powered information extraction. So let us try to understand first concept called as a information extraction. Simply remember information extraction is nothing but we are taking or digging some information from different resources.Might be it is present in the forms of text or pulse, different received forms, audio file, image file, video file, many more. Now, for this purpose, we have to consider 4 parameters, 4 steps are present. First one is the source identification, so we have to find out the source. First, that means.from where we are going to dig that information. So determine where the information resides and if it needs to be digitized Once we able to identify the resource, then second step we have to take called as an extraction. And for this extraction, we can use different methods, many techniques based on the machine learning to understand and extract the data from that digitised content. Now, once we extracted the data in the raw format,by using different machine learning strategies or with different strategies, then we have to do the transformation and structuring. Because we have to save that information, we have to save that data into appropriate format so that in future we can use that data. So we have to do the transformation and structuring.In which we are going to extract the data, which is going to transform into structured format like a JSON or table format like a CSV, then we can go with the Excel format likewise, and once we do that transformation and structuring, we are going to store that information, and if needed.We can do the integration with the different resources, so in the storage and integration, whatever the process data is present in the structured format is then stored into databases, data lakes, or analytics platform for the purpose of different that is.analys is purpose. You can use this for the filtering purpose. You can use this for the algorithm building purpose, training purpose, likewise. So always remember when we have to extract the information from any resources, we have to follow these four steps. source identification, identify the resource from where we are going to extract the information, then apply some metres for the extractation. Step #2, once the data is extracted, then we have to apply the transformation and convert that data into appropriate structure. And finally, we have to store that information and do the integration.

Next one is the we have to understand the extraction of the data from images. See, sometimes we have to extract the data from images, and if that image consists of some text-related information, we have to deal with that. So, suppose if one image consists of some text-related information.and we have to extract that information from that image, then it is called as a OCR.So, OCR is nothing but optical character recognition, which is going to extract the contact information from scanned business card or conference badges. It is going to capture the information from identities to complete the electronic forms. It is going to scan and store the receipts.or other text, photograph, a sign or a storefront so you can submit it to the translation version and digitise handwritten notes. That means we can extract some information from images also, as we've seen in the last demo also. Then you can extract the data from different forms As we've seen in the last demo also, you have to upload the data. Some good algorithms are present, which are going to extract the information. Along with that


As we've seen in the last demo also, you have to upload the data. Some good algorithms are present, which are going to extract the information. Along with that, they are going to provide the field description. For example, it will treat this as one, two, three, 4 as an invoice number, or John Smith is the name. One, two, three is the street address, England is the country, likewise, so the field name is the key or type of the data entry, and we have to assign the field a description, which is possible with the help of Document Intelligence Service. Also, with the help of Document Intelligence Service, we can extend the values also.And to that values, it is going to assign the description. Now, once you uploaded the form, it is going to create some bounding boxes, and these bounding boxes are going to tell you which information is extracted from your data or from your corresponding document.Now, there is one concept called as a multimodal data extraction. Multimodal in the sense of single model is going to extract the data from different resources. If the resource is in the combination of document, images, video, audio, or text format, imagine that you have to extract one information.or extract information from one website. And in that website, mixture





Now, there is one concept called as a multimodal data extraction. Multimodal in the sense of single model is going to extract the data from different resources. If the resource is in the combination of document, images, video, audio, or text format, imagine that you have to extract one information.or extract information from one website. And in that website, mixture of data types are present, like some content is present in text format, some images are present, some videos are present, some audio files are present. So in that case, you can apply this multimodal data extraction method.In the multimodal data extraction method, step number one is it is going to extract some content by using one Azure method called as a content extraction. Then to that content extraction, some add-on capabilities are there, which are going to extract some content as well as some add-on.And second part is the field extraction. In the field extraction, it is going to find out the supported field or provided field over there. And it is going to again pass to the EIS service for the confidence and grounding purpose. So if you observe carefully at the output end, we will get the structured insight.
@6 3 hours 18 minutes 9 seconds
As well as we will get the content along with add-ons So, this is a capability of multimodal model. Now, depending upon this, we have one small exercise that means how this service is going to extract the information from some sort of data. So we are going to explore AI information and extract some scenarios. So I'm going to open the receipt analyzer. So when I open that receipt analyzer, you can see now receipt is not selected here. We have to upload receipt. So I'm selecting receipt number one. Now, once I select that receipt number one, you can see vendor is the Northwind trenders, date is the August 25th, 2025, and amount is 11.85 EST. Now, one question arises that why it is going to extract only three parameters from that.because in the back end, whatever the algorithm we mentioned, in that they return the logic, it is going to extract the vendor name, date, and amount information Similarly, suppose if I uploaded another second receipt over there, again it will do the analysis and it will find out contoso is the vendor, 25th August 2025 is the date and 19.15 is the amount. If I go with another third document.If you observe carefully, the structure of document is bit different, but algorithm have that capability to understand in very well manner. So it recognise the vendor is the 4th copy, date is the 15th August 2024, and amount is 6.97. ESD, so we can perform such type of steps with the help of information extraction method. Moving ahead with this now, depending upon this, we have one small exercise.


as a digital asset management, which is called as a DAM solution. When another dedicated service is present, which is called as a Azure AI Document Intelligence Service, and this service have the very good capability to work with the pre-built document. For example, if you have similar type of Million documents are present in same structure, then it have the capability to extract.I can see information, that main information, along with that it is going to assign some sort of tokens. It is going to assign some sort of taglines for that. This is the address, vendor address, this is the date, this is the time. Likewise, when another service available. which is called as a Azure AI Content Understanding. And this service includes multi-modal model. That is single model have the capability to work with the text, images, audio, video, music, 3D object file. And it is going to create a custom analyzer also as per our requirement So these three services are available in the Azure platform for the data extraction purpose. Again, I'm repeating Azure AI Vision Service, Document Intelligence Service, and Azure AI Content.




if I upload any document and I have to extract that corresponding text from that, then we can do that very easily, so it will recognise the language. Also, if your document is present in multiple languages, it is going to recognise in different, different languages also 
After that. If you uploaded some sort of receipts over there, then there are some pre-built models available under the document intelligence service named as the invoices receipt ID and recognises and extract key value pairs. As I mentioned, suppose if you have to find out the merchant name, date time, total, tax-related information, we can easily identify that. That is, with the help of Document Intelligence Service, it is not only just extract the information, but also it is giving the field names also. And third one, we can go with some custom model So for example, if you are your organisation working with different, different types of samples, different, different types of data or receipt, then it is not possible to go with one single format. So over there, you can go with some custom models. Now for making this customization, you can upload file documents. Similar type of file document, and you can create a model. See, training or training time is not required much more, because just we have to upload file document and you have to train that.





Then you can go with some Azure AI content understanding service. Over there, you can upload the images under the Azure AI Foundry services content understanding, and you can analyse the video. So suppose if you upload any video and you want to find out the.






Another different data types, so Azure AI search results contain only your data and can include new insights powered by AI, which offered a platform as a service solution, meaning Microsoft manages the infrastructure and availability. Now, when we are implementing this knowledge mining solution, you will be able to see in the diagram. So we have some data with the help of text and speech service, with the help of computer vision service, with the help of document extension service. Then we are implementing search solution Now why we have to apply the search solution? See, day by day data is increasing. Massive amount of data is there. And we have to deal with that data. We have to work with that data. Now, when we need to extract any particular data from the very sheer volume of the data or large volume of the data, we have to apply some indexing terms over there. We have to find out some search solution which will find out the exact information within a small amount of time, and for this purpose Along with data mining, search a solution plays a very vital role. You can see data is locked away into the document in the PDF format, handwritten format, images format, some subfolders are present. Then time consuming and labour intensive to find that particular data. And we have to apply the knowledge mining, which will find out the insights at scale level. So Azure EA Searcher Service plays a very vital role, as I mentioned. So suppose if you have the massive amount of data, you can use that service and extract that information in a very quick way. Now, depending upon that, we have one small exercise that means we are going to extract the data from the content understanding in Azure AI Foundry. So, I'm going to extract that data using Microsoft Found Report.So I have to visit to the foundry. I'm opening that into one another tab. This is the last exercise from our course Not compulsory, but for your understanding, I'm going to show you that we have to go into the Azure AI services So I'm clicking there. We have to go to the Microsoft Foundry. At the last, extreme last, you will be able to see the Azure AI services. Then we have to select here content understanding





Building the AI Lunar Landing - Complete Code

#Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]

import os
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.autograd as autograd
from torch.autograd import Variable
from collections import deque, namedtuple
#Part 1 - Building the AI
#Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, state_size, action_size, seed = 42):
    super(Network, self).__init__()
    self.seed = torch.manual_seed(seed)
    self.fc1 = nn.Linear(state_size, 64)
    self.fc2 = nn.Linear(64, 64)
    self.fc3 = nn.Linear(64, action_size)

  def forward(self, state):
    x = self.fc1(state)
    x = F.relu(x)
    x = self.fc2(x)
    x = F.relu(x)
    return self.fc3(x)
#Part 2 - Training the AI
#Setting up the environment
import gymnasium as gym
env = gym.make('LunarLander-v2')
state_shape = env.observation_space.shape
state_size = env.observation_space.shape[0]
number_actions = env.action_space.n
print('State shape: ', state_shape)
print('State size: ', state_size)
print('Number of actions: ', number_actions)
#Initializing the hyperparameters
learning_rate = 5e-4
minibatch_size = 100
discount_factor = 0.99
replay_buffer_size = int(1e5)
interpolation_parameter = 1e-3
#Implementing Experience Replay
class ReplayMemory(object):

  def __init__(self, capacity):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.capacity = capacity
    self.memory = []

  def push(self, event):
    self.memory.append(event)
    if len(self.memory) > self.capacity:
      del self.memory[0]

  def sample(self, batch_size):
    experiences = random.sample(self.memory, k = batch_size)
    states = torch.from_numpy(np.vstack([e[0] for e in experiences if e is not None])).float().to(self.device)
    actions = torch.from_numpy(np.vstack([e[1] for e in experiences if e is not None])).long().to(self.device)
    rewards = torch.from_numpy(np.vstack([e[2] for e in experiences if e is not None])).float().to(self.device)
    next_states = torch.from_numpy(np.vstack([e[3] for e in experiences if e is not None])).float().to(self.device)
    dones = torch.from_numpy(np.vstack([e[4] for e in experiences if e is not None]).astype(np.uint8)).float().to(self.device)
    return states, next_states, actions, rewards, dones
    #Implementing the DQN class
    class Agent():

  def __init__(self, state_size, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.state_size = state_size
    self.action_size = action_size
    self.local_qnetwork = Network(state_size, action_size).to(self.device)
    self.target_qnetwork = Network(state_size, action_size).to(self.device)
    self.optimizer = optim.Adam(self.local_qnetwork.parameters(), lr = learning_rate)
    self.memory = ReplayMemory(replay_buffer_size)
    self.t_step = 0

  def step(self, state, action, reward, next_state, done):
    self.memory.push((state, action, reward, next_state, done))
    self.t_step = (self.t_step + 1) % 4
    if self.t_step == 0:
      if len(self.memory.memory) > minibatch_size:
        experiences = self.memory.sample(100)
        self.learn(experiences, discount_factor)

  def act(self, state, epsilon = 0.):
    state = torch.from_numpy(state).float().unsqueeze(0).to(self.device)
    self.local_qnetwork.eval()
    with torch.no_grad():
      action_values = self.local_qnetwork(state)
    self.local_qnetwork.train()
    if random.random() > epsilon:
      return np.argmax(action_values.cpu().data.numpy())
    else:
      return random.choice(np.arange(self.action_size))

  def learn(self, experiences, discount_factor):
    states, next_states, actions, rewards, dones = experiences
    next_q_targets = self.target_qnetwork(next_states).detach().max(1)[0].unsqueeze(1)
    q_targets = rewards + discount_factor * next_q_targets * (1 - dones)
    q_expected = self.local_qnetwork(states).gather(1, actions)
    loss = F.mse_loss(q_expected, q_targets)
    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()
    self.soft_update(self.local_qnetwork, self.target_qnetwork, interpolation_parameter)

  def soft_update(self, local_model, target_model, interpolation_parameter):
    for target_param, local_param in zip(target_model.parameters(), local_model.parameters()):
      target_param.data.copy_(interpolation_parameter * local_param.data + (1.0 - interpolation_parameter) * target_param.data)
 #Initializing the DQN agent
agent = Agent(state_size, number_actions)
#Training the DQN agent
number_episodes = 2000
maximum_number_timesteps_per_episode = 1000
epsilon_starting_value  = 1.0
epsilon_ending_value  = 0.01
epsilon_decay_value  = 0.995
epsilon = epsilon_starting_value
scores_on_100_episodes = deque(maxlen = 100)

for episode in range(1, number_episodes + 1):
  state, _ = env.reset()
  score = 0
  for t in range(maximum_number_timesteps_per_episode):
    action = agent.act(state, epsilon)
    next_state, reward, done, _, _ = env.step(action)
    agent.step(state, action, reward, next_state, done)
    state = next_state
    score += reward
    if done:
      break
  scores_on_100_episodes.append(score)
  epsilon = max(epsilon_ending_value, epsilon_decay_value * epsilon)
  print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)), end = "")
  if episode % 100 == 0:
    print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)))
  if np.mean(scores_on_100_episodes) >= 200.0:
    print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(episode - 100, np.mean(scores_on_100_episodes)))
    torch.save(agent.local_qnetwork.state_dict(), 'checkpoint.pth')
    break
#Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gym.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env_name):
    env = gym.make(env_name, render_mode='rgb_array')
    state, _ = env.reset()
    done = False
    frames = []
    while not done:
        frame = env.render()
        frames.append(frame)
        action = agent.act(state)
        state, reward, done, _, _ = env.step(action.item())
    env.close()
    imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, 'LunarLander-v2')

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()

Building the AI Pac-Man - Complete Code









#Part 0 - Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]
import os
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from collections import deque
from torch.utils.data import DataLoader, TensorDataset
#Part 1 - Building the AI-Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, action_size, seed = 42):
    super(Network, self).__init__()
    self.seed = torch.manual_seed(seed)
    self.conv1 = nn.Conv2d(3, 32, kernel_size = 8, stride = 4)
    self.bn1 = nn.BatchNorm2d(32)
    self.conv2 = nn.Conv2d(32, 64, kernel_size = 4, stride = 2)
    self.bn2 = nn.BatchNorm2d(64)
    self.conv3 = nn.Conv2d(64, 64, kernel_size = 3, stride = 1)
    self.bn3 = nn.BatchNorm2d(64)
    self.conv4 = nn.Conv2d(64, 128, kernel_size = 3, stride = 1)
    self.bn4 = nn.BatchNorm2d(128)
    self.fc1 = nn.Linear(10 * 10 * 128, 512)
    self.fc2 = nn.Linear(512, 256)
    self.fc3 = nn.Linear(256, action_size)

  def forward(self, state):
    x = F.relu(self.bn1(self.conv1(state)))
    x = F.relu(self.bn2(self.conv2(x)))
    x = F.relu(self.bn3(self.conv3(x)))
    x = F.relu(self.bn4(self.conv4(x)))
    x = x.view(x.size(0), -1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    return self.fc3(x)
#Part 2 - Training the AI-Setting up the environment
 import gymnasium as gym
env = gym.make('MsPacmanDeterministic-v0', full_action_space = False)
state_shape = env.observation_space.shape
state_size = env.observation_space.shape[0]
number_actions = env.action_space.n
print('State shape: ', state_shape)
print('State size: ', state_size)
print('Number of actions: ', number_actions)
#Initializing the hyperparameters
learning_rate = 5e-4
minibatch_size = 64
discount_factor = 0.99
#Preprocessing the frames
from PIL import Image
from torchvision import transforms

def preprocess_frame(frame):
  frame = Image.fromarray(frame)
  preprocess = transforms.Compose([transforms.Resize((128, 128)), transforms.ToTensor()])
  return preprocess(frame).unsqueeze(0)
  #Implementing the DCQN class
  class Agent():

  def __init__(self, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.action_size = action_size
    self.local_qnetwork = Network(action_size).to(self.device)
    self.target_qnetwork = Network(action_size).to(self.device)
    self.optimizer = optim.Adam(self.local_qnetwork.parameters(), lr = learning_rate)
    self.memory = deque(maxlen = 10000)

  def step(self, state, action, reward, next_state, done):
    state = preprocess_frame(state)
    next_state = preprocess_frame(next_state)
    self.memory.append((state, action, reward, next_state, done))
    if len(self.memory) > minibatch_size:
      experiences = random.sample(self.memory, k = minibatch_size)
      self.learn(experiences, discount_factor)

  def act(self, state, epsilon = 0.):
    state = preprocess_frame(state).to(self.device)
    self.local_qnetwork.eval()
    with torch.no_grad():
      action_values = self.local_qnetwork(state)
    self.local_qnetwork.train()
    if random.random() > epsilon:
      return np.argmax(action_values.cpu().data.numpy())
    else:
      return random.choice(np.arange(self.action_size))

  def learn(self, experiences, discount_factor):
    states, actions, rewards, next_states, dones = zip(*experiences)
    states = torch.from_numpy(np.vstack(states)).float().to(self.device)
    actions = torch.from_numpy(np.vstack(actions)).long().to(self.device)
    rewards = torch.from_numpy(np.vstack(rewards)).float().to(self.device)
    next_states = torch.from_numpy(np.vstack(next_states)).float().to(self.device)
    dones = torch.from_numpy(np.vstack(dones).astype(np.uint8)).float().to(self.device)
    next_q_targets = self.target_qnetwork(next_states).detach().max(1)[0].unsqueeze(1)
    q_targets = rewards + discount_factor * next_q_targets * (1 - dones)
    q_expected = self.local_qnetwork(states).gather(1, actions)
    loss = F.mse_loss(q_expected, q_targets)
    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()
 #Initializing the DCQN agent
 agent = Agent(number_actions)
 #Training the DCQN agent
number_episodes = 2000
maximum_number_timesteps_per_episode = 10000
epsilon_starting_value  = 1.0
epsilon_ending_value  = 0.01
epsilon_decay_value  = 0.995
epsilon = epsilon_starting_value
scores_on_100_episodes = deque(maxlen = 100)

for episode in range(1, number_episodes + 1):
  state, _ = env.reset()
  score = 0
  for t in range(maximum_number_timesteps_per_episode):
    action = agent.act(state, epsilon)
    next_state, reward, done, _, _ = env.step(action)
    agent.step(state, action, reward, next_state, done)
    state = next_state
    score += reward
    if done:
      break
  scores_on_100_episodes.append(score)
  epsilon = max(epsilon_ending_value, epsilon_decay_value * epsilon)
  print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)), end = "")
  if episode % 100 == 0:
    print('\rEpisode {}\tAverage Score: {:.2f}'.format(episode, np.mean(scores_on_100_episodes)))
  if np.mean(scores_on_100_episodes) >= 500.0:
    print('\nEnvironment solved in {:d} episodes!\tAverage Score: {:.2f}'.format(episode - 100, np.mean(scores_on_100_episodes)))
    torch.save(agent.local_qnet
    work.state_dict(), 'checkpoint.pth')
    break
 #Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gym.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env_name):
    env = gym.make(env_name, render_mode='rgb_array')
    state, _ = env.reset()
    done = False
    frames = []
    while not done:
        frame = env.render()
        frames.append(frame)
        action = agent.act(state)
        state, reward, done, _, _ = env.step(action)
    env.close()
    imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, 'MsPacmanDeterministic-v0')

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()





Building the AI KungFuMaster - Complete Code

#Part 0 - Installing the required packages and importing the libraries
!pip install gymnasium
!pip install "gymnasium[atari, accept-rom-license]"
!apt-get install -y swig
!pip install gymnasium[box2d]


import cv2
import math
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.multiprocessing as mp
import torch.distributions as distributions
from torch.distributions import Categorical
import gymnasium as gym
from gymnasium import ObservationWrapper
from gymnasium.spaces import Box

#Part 1 - Building the AI Creating the architecture of the Neural Network
class Network(nn.Module):

  def __init__(self, action_size):
    super(Network, self).__init__()
    self.conv1 = torch.nn.Conv2d(in_channels = 4,  out_channels = 32, kernel_size = (3,3), stride = 2)
    self.conv2 = torch.nn.Conv2d(in_channels = 32, out_channels = 32, kernel_size = (3,3), stride = 2)
    self.conv3 = torch.nn.Conv2d(in_channels = 32, out_channels = 32, kernel_size = (3,3), stride = 2)
    self.flatten = torch.nn.Flatten()
    self.fc1  = torch.nn.Linear(512, 128)
    self.fc2a = torch.nn.Linear(128, action_size)
    self.fc2s = torch.nn.Linear(128, 1)

  def forward(self, state):
    x = self.conv1(state)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.conv3(x)
    x = F.relu(x)
    x = self.flatten(x)
    x = self.fc1(x)
    x = F.relu(x)
    action_values = self.fc2a(x)
    state_value = self.fc2s(x)[0]
    return action_values, state_value
    #Part 2 - Training the AI Setting up the environment
    class PreprocessAtari(ObservationWrapper):

  def __init__(self, env, height = 42, width = 42, crop = lambda img: img, dim_order = 'pytorch', color = False, n_frames = 4):
    super(PreprocessAtari, self).__init__(env)
    self.img_size = (height, width)
    self.crop = crop
    self.dim_order = dim_order
    self.color = color
    self.frame_stack = n_frames
    n_channels = 3 * n_frames if color else n_frames
    obs_shape = {'tensorflow': (height, width, n_channels), 'pytorch': (n_channels, height, width)}[dim_order]
    self.observation_space = Box(0.0, 1.0, obs_shape)
    self.frames = np.zeros(obs_shape, dtype = np.float32)

  def reset(self):
    self.frames = np.zeros_like(self.frames)
    obs, info = self.env.reset()
    self.update_buffer(obs)
    return self.frames, info

  def observation(self, img):
    img = self.crop(img)
    img = cv2.resize(img, self.img_size)
    if not self.color:
      if len(img.shape) == 3 and img.shape[2] == 3:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = img.astype('float32') / 255.
    if self.color:
      self.frames = np.roll(self.frames, shift = -3, axis = 0)
    else:
      self.frames = np.roll(self.frames, shift = -1, axis = 0)
    if self.color:
      self.frames[-3:] = img
    else:
      self.frames[-1] = img
    return self.frames

  def update_buffer(self, obs):
    self.frames = self.observation(obs)

def make_env():
  env = gym.make("KungFuMasterDeterministic-v0", render_mode = 'rgb_array')
  env = PreprocessAtari(env, height = 42, width = 42, crop = lambda img: img, dim_order = 'pytorch', color = False, n_frames = 4)
  return env

env = make_env()

state_shape = env.observation_space.shape
number_actions = env.action_space.n
print("State shape:", state_shape)
print("Number actions:", number_actions)
print("Action names:", env.env.env.get_action_meanings())

#Initializing the hyperparameters
learning_rate = 1e-4
discount_factor = 0.99
number_environments = 10
#Implementing the A3C class
class Agent():

  def __init__(self, action_size):
    self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    self.action_size = action_size
    self.network = Network(action_size).to(self.device)
    self.optimizer = torch.optim.Adam(self.network.parameters(), lr = learning_rate)

  def act(self, state):
    if state.ndim == 3:
      state = [state]
    state = torch.tensor(state, dtype = torch.float32, device = self.device)
    action_values, _ = self.network(state)
    policy = F.softmax(action_values, dim = -1)
    return np.array([np.random.choice(len(p), p = p) for p in policy.detach().cpu().numpy()])

  def step(self, state, action, reward, next_state, done):
    batch_size = state.shape[0]
    state = torch.tensor(state, dtype = torch.float32, device = self.device)
    next_state = torch.tensor(next_state, dtype = torch.float32, device = self.device)
    reward = torch.tensor(reward, dtype = torch.float32, device = self.device)
    done = torch.tensor(done, dtype = torch.bool, device = self.device).to(dtype = torch.float32)
    action_values, state_value = self.network(state)
    _, next_state_value = self.network(next_state)
    target_state_value = reward + discount_factor * next_state_value * (1 - done)
    advantage = target_state_value - state_value
    probs = F.softmax(action_values, dim = -1)
    logprobs = F.log_softmax(action_values, dim = -1)
    entropy = -torch.sum(probs * logprobs, axis = -1)
    batch_idx = np.arange(batch_size)
    logp_actions = logprobs[batch_idx, action]
    actor_loss = -(logp_actions * advantage.detach()).mean() - 0.001 * entropy.mean()
    critic_loss = F.mse_loss(target_state_value.detach(), state_value)
    total_loss = actor_loss + critic_loss
    self.optimizer.zero_grad()
    total_loss.backward()
    self.optimizer.step()
    #Initializing the A3C agent
    agent = Agent(number_actions)
    #Evaluating our A3C agent on a certain number of episodes
  def evaluate(agent, env, n_episodes = 1):
   episodes_rewards = []
  for _ in range(n_episodes):
    state, _ = env.reset()
    total_reward = 0
    while True:
      action = agent.act(state)
      state, reward, done, info, _ = env.step(action[0])
      total_reward += reward
      if done:
        break
    episodes_rewards.append(total_reward)
return episodes_rewards
#Managing multiple environments simultaneously
class EnvBatch:

  def __init__(self, n_envs = 10):
    self.envs = [make_env() for _ in range(n_envs)]

  def reset(self):
    _states = []
    for env in self.envs:
      _states.append(env.reset()[0])
    return np.array(_states)

  def step(self, actions):
    next_states, rewards, dones, infos, _ = map(np.array, zip(*[env.step(a) for env, a in zip(self.envs, actions)]))
    for i in range(len(self.envs)):
      if dones[i]:
        next_states[i] = self.envs[i].reset()[0]
    return next_states, rewards, dones, infos
#Training the A3C agent
import tqdm

env_batch = EnvBatch(number_environments)
batch_states = env_batch.reset()

with tqdm.trange(0, 3001) as progress_bar:
  for i in progress_bar:
    batch_actions = agent.act(batch_states)
    batch_next_states, batch_rewards, batch_dones, _ = env_batch.step(batch_actions)
    batch_rewards *= 0.01
    agent.step(batch_states, batch_actions, batch_rewards, batch_next_states, batch_dones)
    batch_states = batch_next_states
    if i % 1000 == 0:
      print("Average agent reward: ", np.mean(evaluate(agent, env, n_episodes = 10)))

#Part 3 - Visualizing the results
import glob
import io
import base64
import imageio
from IPython.display import HTML, display
from gymnasium.wrappers.monitoring.video_recorder import VideoRecorder

def show_video_of_model(agent, env):
  state, _ = env.reset()
  done = False
  frames = []
  while not done:
    frame = env.render()
    frames.append(frame)
    action = agent.act(state)
    state, reward, done, _, _ = env.step(action[0])
  env.close()
  imageio.mimsave('video.mp4', frames, fps=30)

show_video_of_model(agent, env)

def show_video():
    mp4list = glob.glob('*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        display(HTML(data='''<video alt="test" autoplay
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")

show_video()



 

No comments:

Post a Comment