linux下双机热备

rxy · 2009-07-09, 15:05

Red hat 9 linux的双机热备安装比较简单，需要的安装文件有以下几个：
heartbeat-1.0.4-2.rh.9.um.1.i386.rpm
heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm
net-snmp-5.0.6-17.i386.rpm

按顺序依次安装：

1、heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
2、net-snmp-5.0.6-17.i386.rpm
3、heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm
4、heartbeat-1.0.4-2.rh.9.um.1.i386.rpm

#rpm -ivh heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm
#rpm -ivh net-snmp-5.0.6-17.i386.rpm
#rpm -ivh heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm
#rpm -ivh heartbeat-1.0.4-2.rh.9.um.1.i386.rpm

安装完成之后，开始配置主服务器。配置文件位于/etc/ha.d下，用rpm安装之后不会产生配置文件，需要从/usr/share/doc/heartbeat-1.0.4下，把ha.cf,,,,authkeys,,,,,,,,haresources,,,,三个文件cp到/etc/ha.d下面。

文件在ha.cf是主要heartbeat的配置文件，authkeys是heartbeat的安全配置文件，haresource文件是heartbeat的资源文件
其文件说明如下：
ha.cf
#############################################################################################
#
# There are lots of options in this file. All you have to have is a set
# of nodes listed {"node ...}
# and one of {serial, bcast, mcast, or ucast}
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
#
# In particular, make sure that the timings and udpport
# et al are set before the heartbeat media are defined!
# All will be fine if you keep them ordered as in this
# example.
#
#
# Note on logging:
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to write debug messages to
debugfile /var/log/ha-debug 【heartbeat的debug信息记录文件】
#
#
# File to write other messages to
#
logfile /var/log/ha-log 【日志文件】
#
#
# Facility to use for syslog()/logger
#
logfacility local0 【记录日志在syslog中，可选项】
#
#
# A note on specifying "how long" times below...
#
# The default time unit is seconds
# 10 means ten seconds
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
#
#
# keepalive: how long between heartbeats?
#
keepalive 3 【每3秒发送一次keeplive消息】
#
# deadtime: how long-to-declare-host-dead?
#
deadtime 15 【如果15秒没有收到keeplive消息将会认为节点已经失效】
#
# warntime: how long before issuing "late heartbeat" warning?
# See the FAQ for how to use warntime to tune deadtime.
#
warntime 10 【在日志中记录最后心跳last heartbeat-best 前的警告时间】
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you've been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 60 【如果节点的机器重启后，可能需要一些时间启动网络，这个时间与deadtime不一样，要单独对待】
#
#
# nice_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails.
#
# The default is "off", which means that it WILL fail
# back to the node which is declared as primary in haresources
#
# "on" means that resources only move to new nodes when
# the nodes they are served on die. This is deemed as a
# "nice" behavior (unless you want to do active-active).
#
nice_failback on 【如果主节点失效之后，重新恢复后，不会再成为主节点，只有当当前主节点失效，此节点才可恢复为主节点】

#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
#
# Baud rate for serial ports...
# (must precede "serial" directives)
#
#baud 19200
#
# serial serialportname ...
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cua/a # Solaris
#
# What UDP port to use for communication?
# [used by bcast and ucast]
#
#udpport 694
#
# What interfaces to broadcast heartbeats over?
#
#bcast eth1 # Linux
#bcast eth1 eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (no reason to differ
# from the port used for broadcast heartbeats)
# [ttl] the ttl value for outbound heartbeats. This affects
# how far the multicast packet will propagate. (1-255)
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# This field should always be set to 0.
#
#
mcast eth1 225.0.0.22 694 1 0 【使用组播225.0.0.22，端口694发送keeplive消息】
#
# Set up a unicast / udp heartbeat medium
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
#
#
# Watchdog is the watchdog timer. If our own heart doesn't beat for
# a minute, then our machine will reboot.
#
#watchdog /dev/watchdog
#
# "Legacy" STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host
# is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you're asking
# for a denial of service attack ;-)
#
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node rh-9-a 【定义节点名称，必须是节点的主机名】
node rh-9-b
#
# Less common options...
#
# Treats 10.10.10.254 as a psuedo-cluster-member
#
#ping [url=http://www.163.com]

长风大侠 · 2009-12-08, 10:37

数据库双机热备有两种典型的方式，一种是比较标准的，两台服务器通过一个共享的存储设备（一般是共享的磁盘阵列或存储区域网SAN），并且安装双机软件，实现双机热备，称为共享方式。另一种方式是通过纯软件的方式，一般称为纯软件方式或镜像方式（Mirror）。

对于共享方式，数据库放在共享的存储设备上。当一台服务器提供服务时，直接在存储设备上进行读写。而当系统切换后，另一台服务器也同样读取该存储设备上的数据。

对于纯软件的方式，通过镜像软件，将数据可以实时复制到另一台服务器上，这样同样的数据就在两台服务器上各存在一份，如果一台服务器出现故障，可以及时切换到另一台服务器。

纯软件方式有三大优点：

1.避免了磁盘阵列的单点故障：对于双机热备，本身即是防范由于单个设备的故障导致服务中断，但磁盘阵列恰恰又形成了一个新的单点。（比如，服务器的可靠系数是99.9%, 磁盘阵列的可靠系数是99.95%，则纯软双机的可靠系数是1-99.9%x99.9%=99.99%，而基于磁盘阵列的双机热备系统的可靠系数则会是略低于99.95%

2.节约投资：不需购买昂贵的磁盘阵列。

3.不受距离的限制：两台服务器不需受SCSI电缆的长度限制（光纤通道的磁盘阵列也不受距离限制，但投资会大得多）。这样，可以更灵活地部署服务器，包括通过物理位置的距离来提高安全性。

纯软件方式以前应用得较少，一方面是由于当时市场上比较流行的双机软件不支持纯软件方式，另一方面是由于少数支持纯软件方式的产品其可靠性不太令人放心。但随着NEC这样的大牌厂商的产品进入市场，应该说纯软件方式将逐渐成为一种方向。

从方案选择的角度，建议在进行双机热备时，如果投资充裕、数据量大（1T以上），可以采用共享的存储设备（如磁盘阵列）的方式，但应尽量选择高可靠性（如著名品牌的）设备，并且考虑选择双控制器的方案。否则，则更好的选择是纯软件方式。当然，这时就一定要选择成熟的、大厂商的经过考验的产品。

2009-07-09, 15:05	#1 (permalink)
rxy 普通会员注册日期: 2009-07-03 帖子: 55	linux下双机热备 Red hat 9 linux的双机热备安装比较简单，需要的安装文件有以下几个： heartbeat-1.0.4-2.rh.9.um.1.i386.rpm heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm net-snmp-5.0.6-17.i386.rpm 按顺序依次安装： 1、heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm 2、net-snmp-5.0.6-17.i386.rpm 3、heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm 4、heartbeat-1.0.4-2.rh.9.um.1.i386.rpm #rpm -ivh heartbeat-pils-1.0.4-2.rh.9.um.1.i386.rpm #rpm -ivh net-snmp-5.0.6-17.i386.rpm #rpm -ivh heartbeat-stonith-1.0.4-2.rh.9.um.1.i386.rpm #rpm -ivh heartbeat-1.0.4-2.rh.9.um.1.i386.rpm 安装完成之后，开始配置主服务器。配置文件位于/etc/ha.d下，用rpm安装之后不会产生配置文件，需要从/usr/share/doc/heartbeat-1.0.4下，把ha.cf,,,,authkeys,,,,,,,,haresources,,,,三个文件cp到/etc/ha.d下面。文件在ha.cf是主要heartbeat的配置文件，authkeys是heartbeat的安全配置文件，haresource文件是heartbeat的资源文件其文件说明如下： ha.cf ############################################################################################# # # There are lots of options in this file. All you have to have is a set # of nodes listed {"node ...} # and one of {serial, bcast, mcast, or ucast} # # ATTENTION: As the configuration file is read line by line, # THE ORDER OF DIRECTIVE MATTERS! # # In particular, make sure that the timings and udpport # et al are set before the heartbeat media are defined! # All will be fine if you keep them ordered as in this # example. # # # Note on logging: # If any of debugfile, logfile and logfacility are defined then they # will be used. If debugfile and/or logfile are not defined and # logfacility is defined then the respective logging and debug # messages will be loged to syslog. If logfacility is not defined # then debugfile and logfile will be used to log messges. If # logfacility is not defined and debugfile and/or logfile are not # defined then defaults will be used for debugfile and logfile as # required and messages will be sent there. # # File to write debug messages to debugfile /var/log/ha-debug 【heartbeat的debug信息记录文件】 # # # File to write other messages to # logfile /var/log/ha-log 【日志文件】 # # # Facility to use for syslog()/logger # logfacility local0 【记录日志在syslog中，可选项】 # # # A note on specifying "how long" times below... # # The default time unit is seconds # 10 means ten seconds # # You can also specify them in milliseconds # 1500ms means 1.5 seconds # # # keepalive: how long between heartbeats? # keepalive 3 【每3秒发送一次keeplive消息】 # # deadtime: how long-to-declare-host-dead? # deadtime 15 【如果15秒没有收到keeplive消息将会认为节点已经失效】 # # warntime: how long before issuing "late heartbeat" warning? # See the FAQ for how to use warntime to tune deadtime. # warntime 10 【在日志中记录最后心跳last heartbeat-best 前的警告时间】 # # # Very first dead time (initdead) # # On some machines/OSes, etc. the network takes a while to come up # and start working right after you've been rebooted. As a result # we have a separate dead time for when things first come up. # It should be at least twice the normal dead time. # initdead 60 【如果节点的机器重启后，可能需要一些时间启动网络，这个时间与deadtime不一样，要单独对待】 # # # nice_failback: determines whether a resource will # automatically fail back to its "primary" node, or remain # on whatever node is serving it until that node fails. # # The default is "off", which means that it WILL fail # back to the node which is declared as primary in haresources # # "on" means that resources only move to new nodes when # the nodes they are served on die. This is deemed as a # "nice" behavior (unless you want to do active-active). # nice_failback on 【如果主节点失效之后，重新恢复后，不会再成为主节点，只有当当前主节点失效，此节点才可恢复为主节点】 # # hopfudge maximum hop count minus number of nodes in config #hopfudge 1 # # # Baud rate for serial ports... # (must precede "serial" directives) # #baud 19200 # # serial serialportname ... #serial /dev/ttyS0 # Linux #serial /dev/cuaa0 # FreeBSD #serial /dev/cua/a # Solaris # # What UDP port to use for communication? # [used by bcast and ucast] # #udpport 694 # # What interfaces to broadcast heartbeats over? # #bcast eth1 # Linux #bcast eth1 eth2 # Linux #bcast le0 # Solaris #bcast le1 le2 # Solaris # # Set up a multicast heartbeat medium # mcast [dev] [mcast group] [port] [ttl] [loop] # # [dev] device to send/rcv heartbeats on # [mcast group] multicast group to join (class D multicast address # 224.0.0.0 - 239.255.255.255) # [port] udp port to sendto/rcvfrom (no reason to differ # from the port used for broadcast heartbeats) # [ttl] the ttl value for outbound heartbeats. This affects # how far the multicast packet will propagate. (1-255) # [loop] toggles loopback for outbound multicast heartbeats. # if enabled, an outbound packet will be looped back and # received by the interface it was sent on. (0 or 1) # This field should always be set to 0. # # mcast eth1 225.0.0.22 694 1 0 【使用组播225.0.0.22，端口694发送keeplive消息】 # # Set up a unicast / udp heartbeat medium # ucast [dev] [peer-ip-addr] # # [dev] device to send/rcv heartbeats on # [peer-ip-addr] IP address of peer to send packets to # #ucast eth0 192.168.1.2 # # # Watchdog is the watchdog timer. If our own heart doesn't beat for # a minute, then our machine will reboot. # #watchdog /dev/watchdog # # "Legacy" STONITH support # Using this directive assumes that there is one stonith # device in the cluster. Parameters to this device are # read from a configuration file. The format of this line is: # # stonith # # NOTE: it is up to you to maintain this file on each node in the # cluster! # #stonith baytech /etc/ha.d/conf/stonith.baytech # # STONITH support # You can configure multiple stonith devices using this directive. # The format of the line is: # stonith_host # is the machine the stonith device is attached # to or * to mean it is accessible from any host. # is the type of stonith device (a list of # supported drives is in /usr/lib/stonith.) # are driver specific parameters. To see the # format for a particular device, run: # stonith -l -t # # # Note that if you put your stonith device access information in # here, and you make this file publically readable, you're asking # for a denial of service attack ;-) # # #stonith_host * baytech 10.0.0.3 mylogin mysecretpassword #stonith_host ken3 rps10 /dev/ttyS1 kathy 0 #stonith_host kathy rps10 /dev/ttyS1 ken3 0 # # Tell what machines are in the cluster # node nodename ... -- must match uname -n node rh-9-a 【定义节点名称，必须是节点的主机名】 node rh-9-b # # Less common options... # # Treats 10.10.10.254 as a psuedo-cluster-member # #ping [url=http://www.163.com] __________________ 金笛邮件售前咨询和售后服务电话：010-82356575-6011，孟小姐

2009-12-08, 10:37	#2 (permalink)
长风大侠初级会员注册日期: 2009-12-08 帖子: 1	数据库双机热备数据库双机热备有两种典型的方式，一种是比较标准的，两台服务器通过一个共享的存储设备（一般是共享的磁盘阵列或存储区域网SAN），并且安装双机软件，实现双机热备，称为共享方式。另一种方式是通过纯软件的方式，一般称为纯软件方式或镜像方式（Mirror）。对于共享方式，数据库放在共享的存储设备上。当一台服务器提供服务时，直接在存储设备上进行读写。而当系统切换后，另一台服务器也同样读取该存储设备上的数据。对于纯软件的方式，通过镜像软件，将数据可以实时复制到另一台服务器上，这样同样的数据就在两台服务器上各存在一份，如果一台服务器出现故障，可以及时切换到另一台服务器。纯软件方式有三大优点： 1.避免了磁盘阵列的单点故障：对于双机热备，本身即是防范由于单个设备的故障导致服务中断，但磁盘阵列恰恰又形成了一个新的单点。（比如，服务器的可靠系数是99.9%, 磁盘阵列的可靠系数是99.95%，则纯软双机的可靠系数是1-99.9%x99.9%=99.99%，而基于磁盘阵列的双机热备系统的可靠系数则会是略低于99.95% 2.节约投资：不需购买昂贵的磁盘阵列。 3.不受距离的限制：两台服务器不需受SCSI电缆的长度限制（光纤通道的磁盘阵列也不受距离限制，但投资会大得多）。这样，可以更灵活地部署服务器，包括通过物理位置的距离来提高安全性。纯软件方式以前应用得较少，一方面是由于当时市场上比较流行的双机软件不支持纯软件方式，另一方面是由于少数支持纯软件方式的产品其可靠性不太令人放心。但随着NEC这样的大牌厂商的产品进入市场，应该说纯软件方式将逐渐成为一种方向。从方案选择的角度，建议在进行双机热备时，如果投资充裕、数据量大（1T以上），可以采用共享的存储设备（如磁盘阵列）的方式，但应尽量选择高可靠性（如著名品牌的）设备，并且考虑选择双控制器的方案。否则，则更好的选择是纯软件方式。当然，这时就一定要选择成熟的、大厂商的经过考验的产品。 __________________ 双机热备技术爱好者！www.expresscluster.net.cn