This material is referenced from (http://www.codeproject.com/Articles/180726/State-management-and-ways-to-handle-Cache-in-a-Web).
Introduction
In this article, I am going to talk about state management but focus mainly on Web Farm and Web Garden scenarios as there are already very good articles available on the topic in CodeProject. Also, I have already written a few articles on state management. Just for the new readers, I am adding the first few sections from my earlier article to give a brief idea about state management. In this article, I'll be discussing about Web Farms/Web Gardens and later will discuss the various approaches to handle cache in a Web Farm/Web Garden scenario.
Basics about state management
As we all know, the web is stateless. A web page is recreated every time it is posted back to the server. In traditional web programming, all the information within a page and controls get wiped off on every postback. To overcome this problem, the ASP.NET framework provides various ways to preserve the states at various stages, like controlstate, viewstate, cookies, session, etc. These can be defined at the client side and server side state management. Please see the image below.
Various options maintain the state
In this article, I will be talking about server side state management techniques. First, let's talk about a very basic and key thing, AppDomain, which was introduced in .NET 2.0,
What is AppDomain
An AppDomain can be defined as a light weight process used for security isolation and availability. The AppDomain is hosted on some process and a process can host multiple AppDomains. So one Appdomain can go down/restart without affecting other AppDomains in the same process.
Role of AppDomain in ASP.NET
AppDomain plays a key role in ASP.NET. When ASP.NET receives the first request, the application manager creates an application domain for it. Application domains are very important because they provide the isolation among various applications on a web server, and every application domain is loaded and unloaded separately, and in an application domain an instance of the class HostingEnvironment
is created which provides access to information about all the application resources. Here is a pictorial view:
ASP.NET handling the first request
The AppDomain is responsible for all the server side side management, which means all the data session (InProc mode), application objects/variable cache, all resides in the AppDomain itself. If the AppDomain goes down, all the data in the webserver will be wiped off. Let's have a view:
All server side state management data resides in the AppDomain
Let's now talk about Web Farm and Web Garden.
What is a Web Farm
A Web Farm is used in highly available applications and which have a lot of users that cannot be served by a single server. They provides better performance and are easily scalable. It means we'll have multiple web servers behind a Network Load Balancer(NLB). Whenever a request comes, it first goes to the load balancer, which checks all the available web servers and finds the web server which has comparatively less requests to serve, and passes the request to that web server. Let's have a pictorial overview.
A Web Farm
Most of the large applications are deployed in a Web Farm scenario. A single server might not able to handle millions of requests in a day, and we provide a virtual IP to the Load Balancer, and the URL is mapped to the Load Balancer; the load balancer takes the decision to pass the request to a specific web server.
In this scenario, the session mode InProc
does not work. We need to use the OutProc
mode, because if the first request is served by server1 and its stores session data, but later for another request the Load Balancer finds that server1 is busy handling other requests, it can pass the request to another server, which obviously will not have the session data and this can result in a bizarre output.
In OutProc
mode, session data is not stored in the AppDomain of the web server. We store the data in another server. We'll discuss it later.
Affinity
There is a setting known as affinity parameter setting, which can be set so that the Load Balancer directs all the requests from one client IP address to the same machine. This allows us to use Session (InProc mode), Application data, and Cache in a Web Farm scenario seamlessly. This means the application would work like it is deployed on a single server. But this has a few limitations like:
- If the server serving the request goes down in between, all the server data would be lost.
- This limits the use of a Web Farm as a Load alancer would be confined to redirecting requests for the same client machines to a single web server only.
What is a Web Garden
When we deploy our application on IIS (6 and above), we assign an application pool to our application. An application pool is used for isolation purposes from other applications deployed on the same web server. An application pool has one worker process (w3wp.exe) normally. An AppDomain is created over this worker process that handles/serves the requests sent by the client machines. All the server data (Session, Cache, Static variables, Application variables) are stored in the AppDomain boundary. But we can have multiple worker processes on the same application pool for performance benefits. This allows better handling for web requests sent by the client machines. But these worker processes do not share the memory and a new AppDomain is created for the same application and each AppDomain will have its own copy of data. This means if some session is stored in an AppDoain's memory and the next request is handled by another web server, it won't have the previously stored session data.
Let's have a pictorial overview.
First two application pools show a Web Garden scenario
A Web Garden provides performance benefits in multi-core processor systems taking benefits from other CPUs.
Affinity
Now the question: Do we have some affinity settings in a Web Garden scenario? Yes.
There is a hard-coded client connection affinity to a worker process instance. So for a given client TCP connection, all the HTTP requests will be handled by the same instance of the worker process.
So as we all have seen, in the case of setting the affinity parameter, we can host our application in a Web Farm/Web Garden scenario without worrying about how the web server data is going to be stored and managed. It will work seamlessly as our application is hosted on a single server.
As we discussed the limitations of the affinity parameter, it is understood that it is not advisable to set the affinity parameter.
More about session management
Let's discuss Session a bit. As we know, Session is stored on the web server. First, let's have a quick look at how Session is stored.
There are two modes for storing Session:
InProc: In this mode, Session values are stored in the AppDomain on the Web server where the application is running. As this is stored in the server memory, it's highly efficient from a performance point of view. But this is not very scalable and robust, because a the users for the application increase, your application may face a tough time processing multiple requests and it can go down. And also for websites that require high availability, this does not work.
OutProc: The data in the Session is not stored in the AppDomain in the Web server memory. And whenever your data goes out from the Web server memory, you need to serialize and deserialize again before using it. There is performance overhead in this as well. The Session is stored in three ways:
- StateServer: Session information is in a state server in a process known as the ASP.NET state service, that is separate from the ASP.NET worker process. This is a single point of failure. If this service/box goes down, your application will stop working abruptly.
- SQL Server: We store the session in SQL Server. .NET provides some default scripts that can be used to install the SQL Server and it gets ready to use to store session data. Also, we can have a cluster by maintaining the Session on several machines. So if one goes down, the user requests can be served from another box.
- Custom aproach: ASP.NET provides us the flexibility to write our own custom provider for maintaining and storing Session data. This allows us to store session data where and how we want.
We have already discussed a lot on these. What about other state management methods like Cache, Application State, static variables? Can we use the Application state in the case of a Web Farm/Web Garden scenario? I have found people not having much ideas about other state management techniques like Application and Cache.
Let's now discuss Cache.
How to handle Cache in a Web Farm/Web Garden scenario
To start with, you must have some basic idea about Cache management in ASP.NET. You can go to the following link which gives you a very good idea about cache management:
Exploring Caching in ASP.NET.
How can we handle Cache which resides in the AppDomain in a Web Farm or Web Garden scenario? The best way is to not use Cache in a Web Farm or Web Garden scenario. It all depends on the requirements, and the kind of data you are going to have in your Cache.
If you have really static data, like country names, it is not going to be changed every now and again. In such a case, it does not matter whether it is in a Web Farm/Web Garden scenario. If the data is not available in the AppDomain, it will get loaded from the database. And obviously, it's not going to change once it is loaded and so no more worries.
Now I will discuss some specific scenarios:
Scenario 1
You have some data in a file system. You have lots of data in your config file which is frequently used. Reading repeatedly from the config file would not be a good approach. It's better to have it in the Cache and retrieve it from there whenever required.
Approach
This is a very basic scenario: you can load the data initially when there is no data in the Cache and set the dependency on the file. This means whenever there is a change in a file, the cache will be invalidated and will get updated, and this will be valid for all AppDomains across the Web Farm/Web Garden.
Cache with dependency on a file
Scenario 2
Let's say we have some master data that is used by the entire application frequently. It might not be a good approach to get the data the database every time you need it. It's better to have it in the Cache. One more thing, this master can be updated by the Admin with an interface.
Approach
I think this is a very basic requirement and the best candidate for using Cache. The data is almost static and will be updated very infrequently. There is some master data which we can load the first time when it is not in the cache. Their can an admin interface which alone can enter the master and update it in the database. At this point, one need to update the Cache for all the AppDomains on all the web servers. In a Web Farm scenario, your data is coming from database, and you can set the dependency on the database so that as soon as the data gets updated in your database, the cache will be invalidated on all the web servers and will get reloaded with the updated data from database. This is a common scenario and you need not worry about data syncing. Here is a pictorial view.
Cache with dependency on database
Scenario 3
This is a common scenario: we store some data in the Cache in the server and it get updated at any time. How do we handle this in a Web Farm or Web Garden scenario?
Approach 1
As soon as the data in any of the cache gets updated, it will invoke a Web Service call to all the web servers which will update the cache in all the associated web servers behind the Load Balancer. So here we can have a table in the database which will have the IP of the web server connected in the Web Farm (the IPs will be the direct IPs of the web servers not the load balancer virtual IP). Let's see this in steps.
- There will a table in the database with the IPs of all the webservers under the load balancer.
- There will be a Web Service that will get the IPs of all the webservers and update the cache.
- The Web Service will be invoked by any of the webservers behind the load balancer on which the cache will get updated.
It is also a good idea to add a new webserver under the load balancer to cater to the new needs. Just add the table which has all the IPs of the webservers.
Cache handling with the Web Service approach in a Web Farm scenario
Approach 2
In this case, your cache is not on the web server but you can store it on some other server. Every web server is going to connect to this machine to get the cached data. You might ask, if the cache is stored on another box, what will be the performance cost? I would suggest you to have a Remoting (TCP) connection to that caching server and it will be very fast to get data from there, that there would not be much difference in having the data in the same web server's memory or in a cache server.
Cache handling with Remoting approach in a Web Farm scenario
Approach 3