IBM General Parallel File System (GPFS) and Oracle RAC [ID 302806.1] | |||||
Modified 04-JUN-2009 Type BULLETIN Status PUBLISHED |
In this Document
Purpose
Overview.
Scope and Application
IBM General Parallel File System (GPFS) and Oracle RAC
GPFS and Oracle RAC - Supported Software Configurations:
Key GPFS features for Oracle RAC Databases on AIX 5L and Linux on POWER.
GPFS Tuning Requirements for Oracle.
GPFS 2.3 Installation Examples:
Example migration to GPFS 2.3 from a previous GPFS version:
GPFS and Oracle References and Additional Information:
Oracle MetaLink
GPFS
EtherChannel and Link Aggregation with AIX 5L
Oracle and AIX 5L
References
Applies to:
Oracle Server - Enterprise EditionIBM AIX Based Systems (64-bit)
Purpose
Overview.
GPFS has been verified for use with:
- Oracle9i Real Application Cluster (RAC) and Oracle Database 10g RAC (10.1.0.x and 10.2.0.1) on both AIX 5L v5.3 and v5.2.
- Oracle Database 10g Release 2 (10.2.0.1) RAC on Linux on POWER.
- SUSE Linux Enterprise Server (SLES) 9 for IBM POWER with Service Pack (SP) 2
- Red Hat
Linux (RHEL) 4 for POWER with Update 1Enterprise
- Oracle 11g (11.1.0.6) RAC on both AIX 5L v5.3 and v6.1.
Scope and Application
Software support details can be found in the table GPFS and Oracle RAC - Supported Software Configurations below:GPFS is IBM’s high-performance parallel, scalable file system for IBM UNIX clusters capable of supporting multi-terabytes of storage within a single file system.
GPFS is a shared-disk file system where every cluster node can have parallel, concurrent read/write access to the same file. It is designed to provide high-performance I/O by "striping" data across multiple disks accessed from multiple servers. GPFS provides high availability through logging and replication, and can be configured for automatic failover from both disk and node malfunctions.
- GPFS can be used for components of an Oracle Database 10g RAC configuration including the shared Oracle Home, the Oracle data and log files.GPFS can also be used to complement the Oracle Automatic Storage Management (ASM) feature in Oracle Database 10g; managing the shared Oracle Home.
- Oracle Clusterware binaries should not be placed on GPFS as this reduces cluster functionality while GPFS is recovering, and also limits the ability to perform rolling upgrades of Oracle Clusterware.
- Oracle Clusterware voting disks and the Oracle Cluster Registry (OCR) should not be placed on GPFS as the I/O freeze during GPFS reconfiguration can lead to node eviction, or cluster management activities to fail.
- Oracle Database binaries are supported on GPFS. The system should be configured to support multiple ORACLE_HOME’s so as to maintain the ability to perform rolling patch application.
- Oracle Database 10g database files (E.G. data files, trace files, and archive log files) are supported on GPFS.
GPFS 2.1 and 2.2 were previously approved for Oracle RAC but GPFS 2.3 onwards now offers several new key features including:
- Support for AIX 5L v5.3.
- Single-node quorum with tie-breaker disks.
- Single GPFS cluster type.
- More disaster recovery options.
A summary of key GPFS features for Oracle RAC Databases is given below. All Oracle RAC configurations planning to use GPFS should select GPFS 3.1 or 3.2 while existing GPFS and Oracle RAC users should consider upgrading to GPFS 2.3.
This document also includes:- A Summary of HACMP requirements and options with Oracle RAC and/or GPFS:
- GPFS and AIX tuning requirements for Oracle.
- Sample GPFS installation and configuration scenarios.
- Example migration to GPFS 2.3 from a previous GPFS version.
- A list of GPFS references and additional information.
IBM General Parallel File System (GPFS) and Oracle RAC
GPFS and Oracle RAC - Supported Software Configurations:
Oracle RAC Server | ||||
---|---|---|---|---|
Oracle9i Release 9.2.0.2 for RAC or higher. | Oracle Database 10g (10.1.0.x) for RAC. | Oracle Database 10g (10.2.0.x) for RAC. | Oracle Database 11g (11.1.0.x) for RAC. | |
HACMP is required for Oracle9i RAC ( See HACMP information below) | HACMP is not required for Oracle Database 10g RAC ( See HACMP information below) | HACMP is not required for Oracle Database 10g RAC ( See HACMP information below) | HACMP is not required for Oracle Database 11g RAC ( See HACMP information below) | |
AIX 5L v5.2 ML 04, or later | GPFS 2.1, 2.2, 2.3.0.1 or higher*** | GPFS 2.1, 2.2, 2.3.0.1 or higher*** | GPFS 2.3.0.3 or higher**** | Not Supported |
AIX 5L v5.3, or later | GPFS 2.3.0.1 or higher*** | GPFS 2.3.0.1 or higher*** | GPFS 2.3.0.3 or higher**** | GPFS 3.1 or higher |
AIX 6.1, or later | Not Supported | Not Supported | GPFS 3.1 or higher | GPFS 3.1 or higher |
LINUX on POWER: (SLES 9 for IBM POWER with SP 2 - or higher SP ) | Not Supported | Not Supported | GPFS 2.3.0.6 or higher 2.3.0.x GPFS 3.1.0.8 - or higher 3.1.0.x | Not Supported |
LINUX on POWER: (RHEL 4 for POWER with Update 1 - or higher Update) | Not Supported | Not Supported | GPFS 2.3.0.6 - or higher 2.3.0.x | Not Supported |
**** GPFS 3.X is also certified. So this statement is pertinent to both 2.x and 3.x
- The AIX 5L 64 bit kernel is recommended for Oracle RAC and GPFS configurations.
- GPFS 2.3 requires the 2.3.0.1 upgrade (APAR IY63969) - or later.
- GPFS 2.2 requires GPFS PTF 6 (2.2.1.1) - or later.
- Note: GPFS V2.1 was withdrawn from marketing on April 29, 2005.
- Note: GPFS for AIX 5L, V2.2 was withdrawn from marketing on June 30, 2005.
See Note 341507.1 for the latest software requirements for Linux on POWER and Oracle.
Summary of HACMP requirements and options with Oracle RAC, AIX 5L and GPFS.
Oracle9i RAC always requires HACMP.
HACMP is optional for Oracle Database 10g RAC.
HACMP 5.1 and 5.2 are certified with both Oracle9i and Oracle Database 10g on both AIX 5L v5.2 and v5.3. See Note 282036.1 for the latest, complete set of patch requirements for HACMP, AIX 5L and Oracle.
In Oracle9i RAC, HACMP is:
- required as the Oracle9i RAC clusterware.
- required if using shared concurrent volume groups (raw logical volumes managed by HACMP).
- optional for the GPFS 2.2 node set. Can use RPD (RSCT Peer Domain) instead.
- not needed for GPFS 2.3. RPD is also not required for GPFS 2.3.
- optional for Oracle Database 10g RAC CRS.
- If HACMP configured for Oracle, CRS will use the HACMP node names and numbers.
- if HACMP is configured to provide High Availability for other products, this is compatible with CRS for Oracle Database 10g RAC.
- only required if using shared concurrent volume groups (raw logical volumes managed by HACMP).
- optional for the GPFS 2.2 node set. Can use RPD (RSCT Peer Domain) instead.
- not needed for GPFS 2.3. RPD is also not required for GPFS 2.3.
Therefore, it is possible to have a complete Oracle 10g RAC configuration with Oracle Database10g RAC and GPFS 2.3.In Oracle Database 11g RAC, HACMP is:
- not required
Key GPFS features for Oracle RAC Databases on AIX 5L and Linux on POWER.
- New single-node quorum support in GPFS 2.3 provides 2 node Oracle High Availability for all disk subsystems:
- Using new quorum type of node quorum with tiebreaker disks with 1 or 3 tie-breaker disks.
- Not dependant on storage architecture - designed to work with all storage.
- See GPFS FAQ (Q1 in Disk Specific Questions) for currently verified storage and a storage support statement.
- Only a single "cluster type in GPFS 2.3 removes requirement for additional cluster software:
- No more "hacmp, rpd, lc" cluster types. The "lc" or "loose cluster" type is now implicit.
- No requirement for HACMP or RSCT for GPFS 2.3
- (Note: HACMP is required for Oracle 9i RAC configurations)
- GPFS concept of "nodesets" is removed, simplifying administration.
- New GPFS "nsd" disk types supports SAN and NAS configurations, or a combination of the two.
- Migration from previous GPFS 2.2 cluster types is fully supported and documented.
- Dynamic support capabilities for Oracle Databases:
- Disks can be dynamically added or removed from a GPFS file system.
- Automatic rebalancing of the GPFS file system after disks are added or removed.
- Nodes can be dynamically added or removed from a GPFS cluster.
- Optimal performance with Oracle Databases using best practices:
- Use of Direct I/O with Asynch I/O is the Oracle default for Oracle data and log files.
- GPFS provide choice of large block sizes.
- GPFS best practices tuning for Oracle is documented below.
- New mmpmon™ command to monitor GPFS performance details.
- High Availability and Backup for Oracle Databases:
- GPFS supports hardware RAID configuration.
- GPFS provides its own 2 or 3 way replication of data or metadata or both.
- Can implement AIX 5L EtherChannel and IEEE 802.3ad Link Aggregation or
- Channel Bonding on Linux for the GPFS network.
- Compatible with standard file system backup and restore programs.
- Multiple Disaster Recovery options for Oracle RAC Databases using GPFS:
- Synchronous mirroring utilizing GPFS replication.
- Synchronous mirroring utilizing IBM TotalStorage® Enterprise Storage Server® (ESS) Peer-to-Peer Remote Copy (PPRC).
- Asynchronous mirroring utilizing IBM TotalStorage ESS FlashCopy®.
- GPFS mmfsctl™ command for Disaster Recovery management.
GPFS Tuning Requirements for Oracle.
AIO and DIO options:
By Default, Oracle uses the Asynchronous I/O (AIO) and Direct I/O (DIO) features of AIX to do its own scheduling of I/O directly to disks, bypassing most of GPFS caching and prefetching facilities. Therefore:- Do not use the "dio" mount option for the GPFS file system or change the DIO attribute for any Oracle files.
- The Oracle init.ora parameter "filesystemio_options" setting will be ignored for Oracle files on GPFS.
Configuring LUNS for GPFS and Oracle:
If using RAID devices, configure a single LUN for each RAID device. Do not create LUNs across RAID devices for use by GPFS as this will ultimately result in significant loss in performance as well as making the removal of a bad RAID more difficult. GPFS will stripe across the multiple LUNs (RAIDs) using its own optimized method.GPFS Block Size, Oracle "db_block_size" and "db_file_multiblock_read_count":
For Oracle RAC databases, set the GPFS file system block using the "mmcrfs" command and the "-B" option, to a large value using the following guidelines:- 512 KB is generally suggested.
- 256 KB is suggested when there is significant activity other than Oracle using the file system and many small files exist which are not in the database.
- 1 MB is suggested for file systems of 100 TB or larger.
Set the Oracle "db_block_size" equal to the LUN segment size or a multiple of the LUN pdisk segment size.
Set the Oracle init.ora parameter "db_file_multiblock_read_count" value to pre-fetch one or two full GPFS blocks.
For example, if your GPFS block size is 512 KB and the Oracle block size is 16 KB, set the Oracle "db_file_multiblock_read_count" to either 32 or 64.
GPFS and AIX 5L Tuning for AIO
GPFS Threads
Use the following guidelines to set the GPFS "worker threads" to allow the maximum parallelism of the Oracle AIO threads, and the GPFS "prefetch threads" to benefit Oracle sequential I/O.
On a 64-bit AIX kernel:
- GPFS worker threads can be <= 548.
- GPFS worker threads + GPFS prefetch threads <= 550.
When requiring GPFS sequential I/O, set the prefetch threads between 50 and 100 (the default is 64), and set the worker threads to have the remainder.
Example:
"mmchconfig prefetchThreads=75"
"mmchconfig worker1Threads=475"
"mmchconfig worker1Threads=475"
On a 32-bit AIX kernel:
- GPFS worker threads can be <= 162.
- GPFS worker threads + GPFS prefetch threads <= 164.
Note:
- The 64-bit AIX kernel is recommended for optimal performance with GPFS and Oracle RAC.
- These changes via the "mmchconfig" command require that GPFS be restarted. Refer to the "mmshutdown" and "mmstartup" commands.
Corresponding tuning of AIX AIO maxservers.
The number of AIX AIO kprocs to create should be approximately the same as the GPFS worker1Threads setting. For the AIO maxservers setting:
- On AIX 5L v5.1 systems, it is the total number of AIO kprocs.
- On AIX 5L v5.2 and v5.3 systems, it is the number of kprocs per CPU. It is suggested to set it slightly larger than worker1Threads divided by the number of CPUs. For example if worker1Threads is set to 500 on a 32-way SMP:
- On an AIX 5.1 system, set maxservers to 640
- On AIX 5L v5.2 and v5.3, "maxservers" is a per CPU parameter. Therefore, 640 AIO kprocs / 32 CPUs per system = 20 for maxservers.
Use the "smit aio" configuration option or the "chdev -l aio0 -a maxservers=<value> -P" command to set the value. System reboot will be required for the changes to take affect.
The free "nmon" performance tool can be used to effectively monitor AIO kproc behavior. The tool can be downloaded from:
http://www- 128.ibm.com/developerworks/eserver/articles/analyze_aix/index.html
GPFS and pinned SGA
Oracle databases requiring high performance will usually benefit from running with a pinned Oracle SGA. This is also true when running with GPFS since GPFS uses DIO which requires that the user I/O buffers (in the SGA) be pinned. GPFS would normally pin the I/O buffers on behalf of the application but, if Oracle has already pinned the SGA, GPFS will recognize this and will not duplicate the pinning - saving additional system resources.
Pinning the SGA on AIX 5L requires the following 3 steps.
1. $ /usr/sbin/vmo -r -o v_pinshm=1
2. $ /usr/sbin/vmo -r -o maxpin%=percent_of_real_memoryWhere percent_of_real_memory = ( (size of SGA / size of physical memory) *100) + 33. Set LOCK_SGA parameter to TRUE in the init.ora .
Other important GPFS files system attributes:
- If the GPFS file system contains a shared Oracle Home or CRS Home, the default value for the maximum number of inodes will probably be insufficient for the Oracle Universal Installer (OUI) installation process. Use a command like the following to increase the inode value:
mmchfs /dev/oragpfs -F 50000Inode consumption can be verified through the standard AIX system command:
root@raven:64bit /> df -g /oragpfs
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/oragpfs 130.00 120.90 7% 32692 24% /oragpfs- or through the GPFS command:root@raven:64bit /> mmdf /dev/oragpfs -F
Inode Information
------------------
Total number of inodes: 139264
Total number of free inodes: 106572
- For Oracle RAC node recovery to work correctly, GPFS must be configure to be automatically loaded at boot time and automatically mount the GPFS file systems. Use the following two GPFS commands to configure this:
root@raven:64bit /> mmchconfig autoload=yes
root@raven:64bit /> mmchfs /dev/oragpfs -A yes
mmchfs: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
GPFS 2.3 Installation Examples:
Two GPFS installation examples are provided. In the first example, the tie-breaker disks are part of the GPFS file system. In the second example, GPFS replication is used and the tie-breaker disks are separate from the GPFS file system.
Example 1: Creating A GPFS file system without replication and with tie-breaker disks that are part of the GPFS file system.
1. Pre-installation steps.
- chdev “l aio0 “a maxservers=20
- Select and configure the GPFS network
- Add GPFS interface names to /.rhosts file
A properly configured .rhosts file must exist in the root user's home directory on each node in the GPFS
2. Install GPFS software.
- Note that the GPFS fileset names have changed in GPFS 2.3
- GPFS 2.3.0.1 update (APAR IY63969) is mandatory.
root@raven: /> lslpp -l |grep -i gpfs
gpfs.base 2.3.0.1 APPLIED GPFS File Manager
gpfs.msg.en_US 2.3.0.0 COMMITTED GPFS Server Messages - U.S.
gpfs.base 2.3.0.1 APPLIED GPFS File Manager
gpfs.docs.data 2.3.0.1 APPLIED GPFS Server Manpages and
- Add GPFS to $PATH
export PATH=$PATH:/usr/lpp/mmfs/bin
3. Create a 2 node GPFS cluster.
This cluster consists of two nodes "raven" and "star" each with a private interface also configured; "ravenp" and "starp" respectively.
- Create the GPFS node list file.
For Oracle, both nodes will be of type "quorum"
Example: /tmp/gpfs/node_list contains:
ravenp:quorum
starp:quorum
The hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address.
- Create the GPFS cluster.
Note that the primary and secondary nodes specified in the "mmcrcluster" command are for managing the "cluster configuration" information not the primary and secondary NSD servers. Since this is a 2 node configuration, both nodes will be quorum nodes.
root@raven: /tmp/gpfs> mmcrcluster -n /tmp/gpfs/node_list -p ravenp -s starp
Thu Jan 6 19:17:13 PST 2005: 6027-1664 mmcrcluster: Processing node ravenp
Thu Jan 6 19:17:16 PST 2005: 6027-1664 mmcrcluster: Processing node starp
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Display cluster configuration results as sanity check:
root@raven: /tmp/gpfs> mmlscluster
GPFS cluster information
========================
GPFS cluster name: ravenp
GPFS cluster id: 10383406012703833913
GPFS UID domain: ravenp
Remote shell command: /usr/bin/rsh
Remote file copy command: /usr/bin/rcp
GPFS cluster configuration servers:
-----------------------------------
Primary server: ravenp
Secondary server: starp
Node number Node name IP address Full node name Remarks
---------------------------------------------------------------------------------- -
1 ravenp 144.25.68.193 ravenp quorum node
2 starp 144.25.68.192 starp quorum nodeand ..root@raven: /tmp/gpfs> mmlsconfig
Configuration data for cluster ravenp:
---------------------------------------------------
clusterName ravenp
clusterId 10383406012703833913
clusterType lc
multinode yes
autoload no
useDiskLease yes
maxFeatureLevelAllowed 806
File systems in cluster ravenp:
--------------------------------------------
(none)
4. Create the cluster-wide names for the Network Shared Disks (NSDs) to be used by GPFS.
- Create a file with the list of disks to be used by GPFS
Example:
/tmp/gpfs/disk_list
hdisk5
hdisk6
hdisk7
hdisk8
Since hdisk numbers for the same disk can vary from node to node, these are the hdisk names on the node where the configuration is being done. Use the PVID to identify the same hdisk on each node. If necessary, to help identify the same disk on all nodes, use the "chdev" command to assign missing PVIDs as follows:
chdev -l hdisk9 -a pv=yes
Do not specify the primary and secondary NSD servers in the disk name file since all nodes will be SAN attached in this Oracle configuration.
Make a copy of this file, just in case of problems, since it will be modified by the configuration process.
cp /tmp/gpfs/disk_list /tmp/gpfs/disk_list_bak
In this example, the designated tie-breaker disks will be part of the file system, so they are also included in this file
- Use the "mmcrnsd" command and the GPFS disk descriptor file just created:
root@raven: /tmp/gpfs> mmcrnsd -F /tmp/gpfs/disk_list
mmcrnsd: Processing disk hdisk5
mmcrnsd: Processing disk hdisk6
mmcrnsd: Processing disk hdisk7
mmcrnsd: Processing disk hdisk8
mmcrnsd: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.FYI: Note the new contents of the /tmp/gpfs/disk_list file
# hdisk5 gpfs1nsd:::dataAndMetadata:-1 # hdisk6 gpfs2nsd:::dataAndMetadata:-1 # hdisk7 gpfs3nsd:::dataAndMetadata:-1 # hdisk8 gpfs4nsd:::dataAndMetadata:-1
root@raven: /tmp/gpfs> lspv
hdisk0 00204edabc20fc3f rootvg active
hdisk1 none None
hdisk2 0006580c69653fa6 None
hdisk3 0009005f04c37ea1 raven_vg active
hdisk4 00204eda3be7d2dd raven_raw_vg active
hdisk5 000657fc4aa3d756 None
hdisk6 000657fc4aa3e302 None
hdisk7 000657fc4aa3f26e None
hdisk8 000657fc4aa2a0cc NoneNote that no names are displayed in the volume group fields, for the ‘lspv’ command, since the “DesiredName†was not specified inNote: These are not actual AIX volume groups.
- The mmlsnsd™ command can be to identify the NSD formatted disks:
root@raven: /tmp/gpfs> mmlsnsd
File system Disk name Primary node Backup node
------------------------------------------------ ---------------------------
(free disk) gpfs1nsd (directly attached)
(free disk) gpfs2nsd (directly attached)
(free disk) gpfs3nsd (directly attached)
(free disk) gpfs4nsd (directly attached)
File system Disk name Primary node Backup node
------------------------------------------------ ---------------------------
(free disk) gpfs1nsd (directly attached)
(free disk) gpfs2nsd (directly attached)
(free disk) gpfs3nsd (directly attached)
(free disk) gpfs4nsd (directly attached)
5. Further customize the cluster configuration and designate tie-breaker disks.
- Change GPFS cluster attributes.
mmchconfig prefetchThreads=505
mmchconfig autoload=yes
- Designate the tie-breaker disks.
root@raven: />mmchconfig tiebreakerDisks="gpfs1nsd;gpfs2nsd;gpfs3nsd"
Verifying GPFS is stopped on all nodes ...
mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Display GPFS configuration.
root@raven: /> mmlsconfig
Configuration data for cluster ravenp:
---------------------------------------------------
clusterName ravenp
clusterId 10383406012703833913
clusterType lc
multinode yes
autoload yes
useDiskLease yes
maxFeatureLevelAllowed 806
prefetchThreads 505
tiebreakerDisks gpfs1nsd;gpfs2nsd;gpfs3nsd
File systems in cluster ravenp:
--------------------------------------------
(none)
6. Start GPFS on all nodes:
root@raven: />mmstartup -aThu Jan 20 19:16:46 PST 2005: 6027-1642 mmstartup: Starting GPFS ...
7. Create and mount the GPFS file system:
- Create the GPFS file system using the "mmcrfs" command and the disk descriptor file previously created.
root@raven: /> mmcrfs /oragpfs /dev/oragpfs -F /tmp/gpfs/disk_list -B 1024k -N 2000000 -n 8 -A yes
GPFS: 6027-531 The following disks of oragpfs will be formatted on node ravenp:
gpfs1nsd: size 10485760 KB
gpfs2nsd: size 10485760 KB
gpfs3nsd: size 10485760 KB
gpfs4nsd: size 104857600 KB
GPFS: 6027-540 Formatting file system ...
Creating Inode File
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Flushing Allocation Maps
GPFS: 6027-535 Disks up to size 310 GB can be added to this file system.
GPFS: 6027-572 Completed creation of file system /dev/oragpfs.
mmcrfs: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
- Display GPFS file system attributes for verification:
root@raven: /> mmlsfs oragpfs
flag value description
---- -------------- -----------------------------------------------------
-s roundRobin Stripe method
-f 32768 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 1 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 1 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a 1048576 Estimated average file size
-n 8 Estimated number of nodes that will mount file system
-B 1048576 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 139264 Maximum number of inodes
-V 8.01 File system version. Highest supported version: 8.01
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd Disks in file system
-A yes Automatic mount option
-E yes Exact mtime default mount option
-S no Suppress atime default mount option
-o none Additional mount options
- Mount the GPFS file system:
The GPFS file system is mounted manually the first time using the standard system "mount" command.
root@raven: /> mount /oragpfs
- Allow oracle user access to the GPFS file system:
root@raven: /> chown oracle dba /oragpfs
Example 2: Creating A GPFS file system using GPFS replication and tie-breaker disks that are not part of the GPFS file system.
1. Pre-installation steps.
- chdev “l aio0“a maxservers=20
- Select and/or configure GPFS network
- Add GPFS interface names to /.rhosts file
2. Install GPFS software.
- Note that the GPFS fileset names have changed in GPFS 2.3
- GPFS 2.3.0.1 update (APAR IY63969) is mandatory.
root@raven: /> lslpp -l |grep -i gpfs
gpfs.base 2.3.0.1 APPLIED GPFS File Manager
gpfs.msg.en_US 2.3.0.0 COMMITTED GPFS Server Messages - U.S.
gpfs.base 2.3.0.1 APPLIED GPFS File Manager
gpfs.docs.data 2.3.0.1 APPLIED GPFS Server Manpages and
- Add GPFS to $PATH
export PATH=$PATH:/usr/lpp/mmfs/bin
3. Create a 2 node GPFS cluster.
- Create the node list file:
For Oracle, both nodes will be of type "quorum"
Example: /tmp/gpfs/node_list contains:
ravenp:quorum
starp:quorum
The hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. Since this is a 2 node Oracle configuration, both nodes will be quorum nodes.
Example: /tmp/gpfs/node_list contains:
ravenp:quorum
starp:quorum
The hostname or IP address must refer to the communications adapter over which the GPFS daemons communicate. Alias interfaces are not allowed. Use the original address or a name that is resolved by the host command to that original address. Since this is a 2 node Oracle configuration, both nodes will be quorum nodes.
- Create the GPFS cluster:
Note that the primary and secondary nodes specified in the "mmcrcluster" command are for managing the "cluster configuration" information not the primary and secondary NSD servers.
root@raven: /tmp/gpfs> mmcrcluster -n /tmp/gpfs/node_list -p ravenp -s starp
Thu Jan 6 19:17:13 PST 2005: 6027-1664 mmcrcluster: Processing node ravenp
Thu Jan 6 19:17:16 PST 2005: 6027-1664 mmcrcluster: Processing node starp
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected nodes.
Thu Jan 6 19:17:13 PST 2005: 6027-1664 mmcrcluster: Processing node ravenp
Thu Jan 6 19:17:16 PST 2005: 6027-1664 mmcrcluster: Processing node starp
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
- Display configuration results as sanity check:
root@raven: /tmp/gpfs> mmlsclusterFile systems in cluster ravenp:
GPFS cluster information
========================
GPFS cluster name: ravenp
GPFS cluster id: 10383406012703833913
GPFS UID domain: ravenp
Remote shell command: /usr/bin/rsh
Remote file copy command: /usr/bin/rcp
GPFS cluster configuration servers:
-----------------------------------
Primary server: ravenp
Secondary server: starp
Node number Node name IP address Full node name Remarks
---------------------------------------------------------------------------------- -
1 ravenp 144.25.68.193 ravenp quorum node
2 starp 144.25.68.192 starp quorum node
root@raven: /tmp/gpfs> mmlsconfig
Configuration data for cluster ravenp:
---------------------------------------------------
clusterName ravenp
clusterId 10383406012703833913
clusterType lc
multinode yes
autoload no
useDiskLease yes
maxFeatureLevelAllowed 806
--------------------------------------------
(none)
4. Create the cluster-wide names for the Network Shared Disks (NSDs) to be used for the GPFS file system.
- Create the GPFS disk descriptor file.
The format for each entry is:
DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName
The "FailureGroup" field is used to indicate that hdisk21 and 22 will be in failure group 1 while hdisk22 and 23 will be in failure group 2.
The DesiredName is specified and will appear in the volume group field when the "lspv" command is used. Note: These are not actual AIX volume groups.
Contents of file for data disks /tmp/gpfs/disk_list_data
hdisk21::::1:fg1a
hdisk22::::1:fg1b
hdisk23::::2:fg2a
hdisk24::::2:fg2b
Make a copy of this file just in case, since it will be modified during the configuration process.
cp /tmp/gpfs/disk_list_data /tmp/gpfs/disk_list_data_bak
root@raven: />mmcrnsd -F /tmp/gpfs/disk_list_data
mmcrnsd: Processing disk hdisk21
mmcrnsd: Processing disk hdisk22
mmcrnsd: Processing disk hdisk23
mmcrnsd: Processing disk hdisk24
mmcrnsd: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Display the results of the command as a sanity check.
root@raven: />mmlsnsd
File system Disk name Primary node Backup node
---------------------------------------------------------------------------
(free disk) fg1a (directly attached)
(free disk) fg1b (directly attached)
(free disk) fg2a (directly attached)
(free disk) fg2b (directly attached)
root@raven: />lspv
hdisk0 00c5430c0026114d rootvg active
hdisk1 00c5430cbbcfe588 None
hdisk2 00c5430c92db553f None
hdisk10 00c5430c0cdcd979 None
hdisk11 00c5430c0cdcdc32 None
hdisk12 00c5430c0cdcdedc None
hdisk13 00c5430c0cdce198 None
hdisk14 00c5430c0cdce462 None
hdisk15 00c5430c0cdce9da None
hdisk16 00c5430c0cdcec98 None
hdisk17 00c5430c0cdcef4f None
hdisk18 00c5430c0cdcf20e None
hdisk19 00c5430c0cdcf4c9 None
hdisk20 00c5430c0cdcf747 None
hdisk21 00c5430c0cdcf976 fg1a
hdisk22 00c5430c0cdcfbb3 fg1b
hdisk23 00c5430c92db71ea fg2a
hdisk24 00c5430c0cdcfde3 fg2b
5. Create the cluster-wide names for the Network Shared Disks (NSDs) to be used for the tie-breaker disks.
- Create the GPFS disk descriptor file for the tie-breaker disks.
The format for each entry is:
DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName
Note: If the DesiredName is specified, this name will appear in the volume group field when the lspv command is used. Note: These are not actual volume groups.
Contents of file for data disks /tmp/gpfs/disk_list_
hdisk18:::::tie1
hdisk19:::::tie2
hdisk20:::::tie3
hdisk19:::::tie2
hdisk20:::::tie3
- NSD format the tie-breaker disks using the GPFS disk descriptor file created for tie-breaker disks.
root@n80:64bit /tmp/gpfs> mmcrnsd -F /tmp/gpfs/disk_list_tie
mmcrnsd: Processing disk hdisk18
mmcrnsd: Processing disk hdisk19
mmcrnsd: Processing disk hdisk20
mmcrnsd: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Display the results of the command as a sanity check.
root@n80:64bit /tmp/gpfs> mmlsnsd
File system Disk name Primary node Backup node
---------------------------------------------------------------------------
(free disk) fg1a (directly attached)
(free disk) fg1b (directly attached)
(free disk) fg2a (directly attached)
(free disk) fg2b (directly attached)
(free disk) tie1 (directly attached)
(free disk) tie2 (directly attached)
(free disk) tie3 (directly attached)
6. Identify the tie breaker disks to the cluster configuration using the ‘mmchconfig’ command.
root@raven: />mmchconfig tiebreakerDisks="tie1;tie2;tie3"
Verifying GPFS is stopped on all nodes ...
mmchconfig: Command successfully completed
mmchconfig: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Display tie-breaker disks in new configuration.
root@raven: /> mmlsconfig
Configuration data for cluster ravenp:
-------------------------------------
clusterName ravenp
clusterId 13882357191189485225
clusterType lc
multinode yes
autoload yes
useDiskLease yes
maxFeatureLevelAllowed 806
tiebreakerDisks tie1;tie2;tie3
prefetchThreads 505
File systems in cluster ravenp:
------------------------------
(none)
7. Start GPFS on all nodes.
root@raven: />mmstartup -a
Thu Jan 20 19:16:46 PST 2005: 6027-1642 mmstartup: Starting GPFS ...
8. Create and mount the GPFS file system:
- Create the GPFS file system using the mmcrfs command and the disk descriptor file previously created.
root@raven: /> mmcrfs /oragpfs /dev/oragpfs -F /tmp/gpfs/disk_list_data -B 1024K -n 8 -A yes
GPFS: 6027-531 The following disks of oragpfs will be formatted on node n80:
fg1a: size 17796014 KB
fg1b: size 17796014 KB
fg2a: size 17796014 KB
fg2b: size 17796014 KB
GPFS: 6027-540 Formatting file system ...
Creating Inode File
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Flushing Allocation Maps
GPFS: 6027-535 Disks up to size 84 GB can be added to this file system.
GPFS: 6027-572 Completed creation of file system /dev/oragpfs.
mmcrfs: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
- Mount the GPFS file system:
The GPFS file system is mounted manually the first time using the standard system “mount†command.
root@raven: /> mount /oragpfs
- Allow oracle user access to the GPFS file system:
root@raven: /> chown oracle dba /oragpfs
Example migration to GPFS 2.3 from a previous GPFS version:
The following items should be considered before migrating to GPFS version 2.3:- In previous GPFS releases, clusters could be configured using one of the following cluster types: sp, hacmp, rpd, or lc. In GPFS 2.3, only the lc cluster type is supported implicitly.
- In previous GPFS releases, each cluster type supported different disk types such as virtual shared disks, AIX logical volumes, or NSDs. In GPFS 2.3, only the NSD disk type is supported.
- Prior to GPFS 2.3, a GPFS cluster could be divided into a number of nodesets, which determined all the nodes of the GPFS cluster where a GPFS file system will be mounted. In GPFS 2.3, the concept of a nodeset is removed. All nodes in the GPFS 2.3 cluster are automatically members of one and only one nodeset.
This example is based on the migration instructions in the GPFS documentation.
Migration Steps
1. Ensure all disks in the file systems to be migrated are in working order by issuing "mmlsdisk" command and checking for disk status to be "ready" and availability to be "up".
root@kerma / > mmlsdisk /dev/gpfsdisk
disk driver sector failure holds holds
name type size group metadata data status availability
------------ -------- ------ ------- -------- ----- ------------- ------------
gpfs35lv disk 512 1 yes yes ready up
gpfs36lv disk 512 1 yes yes ready up
gpfs37lv disk 512 1 yes yes ready upStop all user activity on the file systems to be migrated and take a backup of critical user data to ensure protection in the event of a failure
Cleanly unmount all mounted GPFS file system from all cluster nodes. Do not to use force option to unmount the file system on any node.
2. Shutdown GPFS daemons on all nodes of the cluster
root@kerma / > mmshutdown -a
Fri Mar 18 14:50:49 PST 2005: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
Fri Mar 18 14:50:54 PST 2005: 6027-1344 mmshutdown: Shutting down GPFS daemons
jnanag: Shutting down!
kermag: Shutting down!
jnanag: 0513-044 The mmfs Subsystem was requested to stop.
kermag: 0513-044 The mmfs Subsystem was requested to stop.
jnanag: Master did not clean up; attempting cleanup now
jnanag: /var/mmfs/etc/mmfsdown.scr: /usr/bin/lssrc -s mmfs
jnanag: Subsystem Group PID Status
jnanag: mmfs aixmm inoperative
jnanag: /var/mmfs/etc/mmfsdown.scr: /usr/sbin/umount -f -t mmfs
jnanag: Fri Mar 18 14:51:37 2005: GPFS: 6027-311 mmfsd64 is shutting down.
jnanag: Fri Mar 18 14:51:37 2005: Reason for shutdown: mmfsadm shutdown command timed out
Fri Mar 18 14:51:58 PST 2005: 6027-1345 mmshutdown: Finished
3. Export the GPFS file systems (mmexportfs).
This command creates a configuration output file that will be required when finishing the migration to import your file system to new GPFS 2.3 cluster. Preserve this file. It'll also be used in case you want to go back to older versions of GPFS.
root@kerma /tmp/gpfs22 > mmexportfs all -o gpfs22.con
mmexportfs: Processing file system gpfsdisk ...
mmexportfs: Processing disks that do not belong to any file system ...
mmexportfs: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
4. Delete all existing nodes for each nodeset in the cluster (mmdelnode).
root@kerma /tmp/gpfs22 > mmdelnode -a
Verifying GPFS is stopped on all affected nodes ...
mmdelnode: 6027-1370 Removing old nodeset information from the deleted nodes.
This is an asynchronous process.
In case you have more than one nodeset issue
In case you have more than one nodeset issue
root@kerma / > mmdelnode -a -C nodesetid
5. Delete the existing cluster by issuing mmdelcluster. (only a 2.2 or less command)
root@kerma /tmp/gpfs22 > mmdelcluster -a
mmdelcluster: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
mmdelcluster: Command successfully completed
6. Install GPFS 2.3 software on all of the cluster nodes :
The GPFS 2.3 install images have been copied to /tmp/gpfslpp on all nodes.
root@kerma / > installp -agXYd /tmp/gpfslpp gpfs
Installation Summary
--------------------
Name Level Part Event Result
-------------------------------------------------------------------------------
gpfs.msg.en_US 2.3.0.1 USR APPLY SUCCESS
gpfs.docs.data 2.3.0.1 SHARE APPLY SUCCESS
gpfs.base 2.3.0.1 USR APPLY SUCCESS
gpfs.base 2.3.0.1 ROOT APPLY SUCCESS
7. Determine which nodes will be quorum nodes in your GPFS cluster and create a new GPFS cluster across all desired cluster nodes (mmcrcluster).
root@kerma /var/mmfs/gen > mmcrcluster -n nodefile -p kermag -s jnanag
Mon Mar 21 11:25:08 PST 2005: 6027-1664 mmcrcluster: Processing node kermag
Mon Mar 21 11:25:10 PST 2005: 6027-1664 mmcrcluster: Processing node jnanag
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
Where the contents of the node file are:
root@kerma /var/mmfs/gen > cat nodefile
kermag:quorum
jnanag:quorum
8. Complete the movement of the GPFS file system to your new cluster (mmimportfs).
root@jnana /tmp/gpfs22 > mmimportfs gpfsdisk -i gpfs22.con
mmimportfs: Attempting to unfence the disks. This may take a while ...
mmimportfs: Processing file system gpfsdisk ...
mmimportfs: Processing disk gpfs35lv
mmimportfs: Processing disk gpfs36lv
mmimportfs: Processing disk gpfs37lv
mmimportfs: Committing the changes ...
mmimportfs: The following file systems were successfully imported:
gpfsdisk
mmimportfs: The NSD servers for the following disks from file system gpfsdisk were reset or not defined:
gpfs35lv
gpfs36lv
gpfs37lv
mmimportfs: Use the mmchnsd command to assign NSD servers as needed.
mmimportfs: 6027-1371 Propagating the changes to all affected nodes.This is an asynchronous process.
9. Start GPFS on all nodes of the cluster (mmstartup).
root@jnana / > mmstartup -a
Mon Mar 21 14:15:22 PST 2005: 6027-1642 mmstartup: Starting GPFS ...
root@jnana / > df /gpfs
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/gpfsdisk 106756608 106661888 1% 76 1% /gpfs
10. Complete the migration to the new level of GPFS (mmchfs).
Mount the filesystem if not already mounted.
root@jnana / > mount /gpfs
Issue the following command to migrate the file system metadata to the new GPFS 2.3 format.
root@jnana / > mmchfs gpfsdisk -V
GPFS and Oracle References and Additional Information:
Oracle MetaLink
GPFS
The GPFS FAQ site contains the latest information regarding GPFS including software requirements, supported hardware and supported storage configurations.
GPFS 2.3 FAQ:
GPFS Forum at IBM developerWorks:
GPFS 2.3 Summary of Changes:
GPFS 2.3 Announcement Details: http://www-306.ibm.com/common/ssi/OIX.wss? DocURL=http://d03xhttpcl001g.boulder.ibm.com/common/ssi/rep_ca/4/897/ENUS204- 294/index.html&InfoType=AN
GPFS Product Home Page:
GPFS 2.3 Publications:
Direct to man pages for GPFS 2.3 commands:
Migrating to GPFS 2.3 from previous GPFS versions:
Establishing disaster recovery for your GPFS cluster:
EtherChannel and Link Aggregation with AIX 5L
AIX EtherChannel and IEEE 802.3ad Link Aggregation
Oracle and AIX 5L
References
Niciun comentariu:
Trimiteți un comentariu