Reliability and Load Balancing
Resin 3.0

Features
Installation
Configuration
Web Applications
IOC/AOP
Resources
JSP
Quercus
Servlets and Filters
Databases
Admin (JMX)
CMP
EJB
Amber
EJB 3.0
Security
XML and XSLT
XTP
JMS
Performance
Protocols
Third-party
Troubleshooting/FAQ

index
howto
resin.conf
env
web-app
log
el control
Bean Config
Common Tasks
Relax Schema
Config FAQ
Scrapbook

Virtual Hosting
Load Balancing
Sessions
Clustered Sessions
Tuning
ISP
Virtual Hosting
Common Tasks
Sessions

As traffic increases, web sites need to add additional web servers and servlet engines. Distributing the traffic across the servers and coping when a server restarts is the challenge of load balancing.

  • What is balancing the load?
    • Hardware load-balancer
    • Resin web server using LoadBalanceServlet
    • Web Server (Apache/IIS) with plugin
  • How is session consistency maintained?

In general, the hardware load-balancer will have the best results while using the Resin or Apache/IIS is a low-cost alternative for medium sites.

  1. Hardware Load Balancing
  2. Using Resin as the Load Balancer
    1. The web-tier server does the load balancing
    2. The backend server respond to the requests
    3. Starting the servers
  3. Balancing with the Apache/IIS plugin
  4. What about sessions?
  5. Multiple Web Servers
  6. Multiple Web Servers, Single JVM
  7. See Also

Hardware Load Balancing

Sites with a hardware load balancer will generally put one Resin JVM on each server and configure the load balancer to distribute the load across those JVMs. Although it's possible to configure Resin with Apache/IIS in this configuration, it's not necessary and running Resin as the web server reduces the configuration complexity.

The IP-based sticky sessions provided by hardware load balancers should be used to increase efficiency. IP-based sticky sessions cause the hardware load balancer to use the same server for each request from a certain IP. The IP-sessions will usually send the request to the right server, but there are clients behind firewalls and proxies which will have different IPs for each request even though the session is the same. IP-sessions are only mostly sticky.

Sites using sessions will configure distributed sessions to make sure the users see the same session values.

A typical configuration will use the same resin.conf for all servers and use the -server flag to start the correct one on each machine:

resin.conf for all servers
<resin xmlns="http://caucho.com/ns/resin">
<server>
  <cluster>
    <srun id='a' host='192.168.0.1' port='6802'/>
    <srun id='b' host='192.168.0.2' port='6802'/>
    <srun id='c' host='192.168.0.3' port='6802'/>
    <srun id='d' host='192.168.0.4' port='6802'/>
  </cluster>

  <persistent-store type="cluster">
    <init path="cluster"/>
  </persistent-store>

  <http id='a' host='192.168.0.1' port='80'/>
  <http id='b' host='192.168.0.2' port='80'/>
  <http id='c' host='192.168.0.3' port='80'/>
  <http id='d' host='192.168.0.4' port='80'/>

  <web-app-default>
    <session-config>
      <use-persistent-store/>
    </session-config>
  </web-app-default>

  ...
</server>
</resin>

On Unix, the servers will generally be started using a startup script . Each server will have a different value for -server and for -pid.

Starting each server on Unix
unix-192.168.0.1> bin/httpd.sh -server a -pid server-a.pid start
unix-192.168.0.2> bin/httpd.sh -server b -pid server-b.pid start
unix-192.168.0.3> bin/httpd.sh -server c -pid server-c.pid start
unix-192.168.0.4> bin/httpd.sh -server d -pid server-d.pid start

On Windows, each server is installed as a service.

Installing each server on Windows
win-192.168.0.1> bin/httpd -install-as resin-a -server a
win-192.168.0.1> net start resin-a

win-192.168.0.2> bin/httpd -install-as resin-b -server b
win-192.168.0.2> net start resin-b

win-192.168.0.3> bin/httpd -install-as resin-c -server c
win-192.168.0.3> net start resin-c

win-192.168.0.4> bin/httpd -install-as resin-d -server d
win-192.168.0.4> net start resin-d

Using Resin as the Load Balancer

Resin includes a LoadBalanceServlet that can balance requests to backend servers. Because it is implemented as a servlet, this configuration is the most flexible. A site might use 192.168.0.1 as the frontend load balancer, and send all requests for /foo to the backend host 192.168.0.10 and all requests to /bar to the backend host 192.168.0.11. Since Resin has an integrated HTTP proxy cache, the web-tier machine can cache results for the backend servers.

Using Resin as the load balancing web server requires a minimum of two configuration files: one for the load balancing server, and one for the backend servers. The front configuration will dispatch to the backend servers, while the backend will actually serve the requests.

The web-tier server does the load balancing

In the following example, there are three servers and two conf files. The first server (192.168.0.1), which uses web-tier.conf, is the load balancer. It has an <http> listener, it receives requests from browsers, and dispatches them to the backend servers (192.168.0.10 and 192.168.0.11).

web-tier.conf - used on 192.168.0.1
<resin xmlns="http://caucho.com/ns/resin">
<server>
  <http id="web-a" port="80"/>

  <cache disk-size="1024M" memory-size="256M"/>

  <cluster id="app-tier">
    <srun id="app-a" host="192.168.0.10" port="6800"/>
    <srun id="app-b" host="192.168.0.11" port="6800"/>
  </cluster>

  <host id="">
    <web-app id="/">
      <!-- balance all requests to the servers in cluster a -->
      <servlet>
        <servlet-name>balance-a</servlet-name>
        <servlet-class>com.caucho.servlets.LoadBalanceServlet</servlet-class>
        <init cluster="app-tier"/>
      </servlet>

      <servlet-mapping url-pattern="/*" servlet-name="balance-a"/>
    </web-app>
  </host>
</server>

  • The <http> and <srun> must have different values for server-id.
  • Since the <http> configuration's has a server-id of "front" it needs a -server front argument when you start Resin.
  • The <srun> must have an server-id so they will not be started along with the <http>.
  • The <srun> must have a host and a port so that the LoadBalanceServlet knows where to find the backend servers.
  • The index is important and must match the index in the backend configuration.

The srun entries are included in web-tier.conf so that the LoadBalanceServlet knows where to find the backend servers. The LoadBalanceServlet selects a backend server using a round-robin policy. Although the round-robin policy is simple, in practice it is as effective as complicated balancing policies. In addition, because it's simple, round-robin is more robust and faster than adaptive policies.

The backend server respond to the requests

A seperate conf file is used to configure all of the backend servers. In this case, there are two backend servers, both configured in the conf file app-tier.conf.

Sites using sessions will configure distributed sessions to make sure the users see the same session values.

app-tier.conf for all backend servers
<resin xmlns="http://caucho.com/ns/resin">
<server>
  <cluster id="app-tier">
    <srun id="app-a" host="192.168.0.10" port="6800"/>
    <srun id="app-b" host="192.168.0.11" port="6800"/>
  </cluster>

  <persistent-store type="cluster">
    <init path="cluster"/>
  </persistent-store>

  <web-app-default>
    <session-config>
      <use-persistent-store/>
    </session-config>
  </web-app-default>

  ...
</server>
</resin>

Starting the servers

Starting each server on Unix
unix-192.168.0.1> bin/httpd.sh -conf conf/web-tier.conf -server web-a -pid front.pid start

unix-192.168.0.10> bin/httpd.sh -conf conf/app-tier.conf -server app-a -pid server-a.pid start
unix-192.168.0.11> bin/httpd.sh -conf conf/app-tier.conf -server app-b -pid server-b.pid start

Installing each server on Windows
win-192.168.0.1> bin/httpd -install-as resin-front -conf conf/front.conf 
win-192.168.0.1> net start resin-front

win-192.168.0.10> bin/httpd -install-as resin-a -server app-a
win-192.168.0.10> net start resin-a
win-192.168.0.11> bin/httpd -install-as resin-b -server app-b
win-192.168.0.11> net start resin-b

Balancing with the Apache/IIS plugin

When using Apache or IIS as the webserver, the plugin does the load balancing. It performs the functions of the hardware load balancer or LoadBalanceServlet in the scenarios descriubed above.

To understand how Resin's load balancing works with plugins, it's important to review how the plugin dispatches requests to the backend JVM. The following sequence describes a typical request:

  1. Request arrives at web server (i.e. Apache or IIS).
  2. Plugin (mod_caucho, mod_isapi, etc) checks if it's a Resin request
  3. Plugin selects a backend JVM, i.e. a <srun>
    • If it's an old session, send it to the owner JVM. (sticky-sessions)
    • If it's a new request, send it to the next <srun>, using a round-robin policy.
  4. Plugin sends the request to the backend JVM with a TCP socket.
  5. Plugin receives the response from the backend JVM with the same TCP socket.

The plugin needs to know which requests should go to Resin, i.e. the servlet-mappings and the jsp files. And it needs to know the TCP host/port names of the backend machines, i.e. the <srun> tags. /caucho-status shows all that information in one table. The plugin obtains this information from a running Resin server.

The plugin controls the load balancing since it needs to decide which JVM to use. Because the plugin is key in load-balancing, looking at the /caucho-status will tell you exactly how your system is configured. The JVMs are just passive, waiting for the next request. From the JVM-perspective, a request from a plugin is identical to an HTTP request, except it uses a slightly different encoding. In fact the same JVM can serve as an srun and as an httpd server listening to port 8080, for example. The dual srun/http configuration can be useful for debugging.

What about sessions?

A session needs to stay on the same JVM that started it. Otherwise, each JVM would only see every second or third request and get confused.

To make sure that sessions stay on the same JVM, Resin encodes the cookie with the host number. In the previous example, the hosts would generate cookies like:

indexcookie prefix
1axxx
2bxxx
3cxxx

On the web server, mod_caucho will decode the cookie and send it to the appropriate host. So bX8ZwooOz would go to host2.

In the infrequent case that host2 fails, Resin will send the request to host3. The user might lose the session but that's a minor problem compared to showing a connection failure error. To save sessions, you'll need to use distributed sessions. Also take a look at tcp sessions.

The following example is a typical configuration for a distributed server using an external hardware load-balancer, i.e. where each Resin is acting as the HTTP server. Each server will be started as -server a or -server b to grab its specific configuration.

In this example, sessions will only be stored when the server shuts down, either for maintenance or with a new version of the server. This is the most lightweight configuration, and doesn't affect performance significantly. If the hardware or the JVM crashes, however, the sessions will be lost. (If you want to save sessions for hardware or JVM crashes, remove the <save-only-on-shutdown/> flag.)

resin.conf
<resin xmlns="http://caucho.com/ns/resin">
<server>
  <http id='a' port='80'/>

  <http id='b' port='80'/>

  <http id='c' port='80'/>

  <cluster>
    <srun id='a' port='6802' host='192.168.0.1'/>
    <srun id='b' port='6802' host='192.168.0.2'/>
    <srun id='c' port='6802' host='192.168.0.3'/>
  </cluster>

  <persistent-store type="cluster">
    <init path="cluster"/>
  </persistent-store>

  <web-app-default>
    <!-- enable tcp-store for all hosts/web-apps -->
    <session-config>
      <use-persistent-store/>
      <save-only-on-shutdown/>
    </session-config>
  </web-app-default>

  ...
</server>
</resin>

Multiple Web Servers

Many larger sites like to use multiple web servers with a JVM and a web server on each machine. A router will distribute the load between the machines.

In this configuration, the site needs to take control of its own sessions. Because the router will distribute the load randomly, any persistent session state needs to be handled by a centralized server like a database or use Resin's cluster storage.

Even in this configuration, you can use Resin's load balancing to increase reliability. Each web server should choose its own JVM first, but use another machine as a backup.

In this case, you can use the trick that localhost refers to the preferred host. The configuration would look like:

<resin xmlns="http://caucho.com/ns/resin">
<server>
  <cluster id="app-tier">
    <srun id="app-a" host='localhost' port='6802' index='1'/>
    <srun backup="true" id="b" host='host1' port='6802' index='2'/>
    <srun backup="true" id="c" host='host2' port='6802' index='3'/>
    <srun backup="true" id="d" host='host3' port='6802' index='4'/>
  </cluster>
  ...
  
</server>
</resin>

Alternately, if you're using Apache, you could configure the sruns in the httpd.conf.

host1 httpd.conf
ResinConfigServer host1 6802
ResinConfigServer host2 6802

host2 httpd.conf
ResinConfigServer host1 6802
ResinConfigServer host2 6802

The order must be consistent for all servers so sessions will always go to the correct machine. bXXX must always go to host2.

Multiple Web Servers, Single JVM

Multiple web servers can use the same JVM. For example, a fast plain webserver and an SSL web server may only need a single JVM. (Although a backup would be good.) Since the JVM doesn't care where the request comes from, it can treat each request identically.

This simplifies SSL development. A servlet just needs to check the request.isSecure() method to see if the request is SSL or not. Other than that, all requests are handled identically.

See Also


Virtual Hosting
Common Tasks
Sessions
Copyright © 1998-2006 Caucho Technology, Inc. All rights reserved.
Resin® is a registered trademark, and HardCoretm and Quercustm are trademarks of Caucho Technology, Inc.