...

MongoDB Backup/Restore Methodology using IBM® Spectrum Protect™ for Linux on z

by user

on
Category: Documents
88

views

Report

Comments

Transcript

MongoDB Backup/Restore Methodology using IBM® Spectrum Protect™ for Linux on z
MongoDB Backup/Restore Methodology
using IBM® Spectrum Protect™
for Linux on z
Ryan Bertsche
Robert McNamara
Kyle Moser
Dulce Smith
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
1|Page
Table of Contents
1.
Introduction........................................................................................................................................................3
2.
IBM® Spectrum Protect™ background ............................................................................................................4
3.
MongoDB background ......................................................................................................................................5
4.
Benefits ..............................................................................................................................................................9
5.
MongoDB environment setup..........................................................................................................................10
6.
IBM Spectrum ProtectTM (TSM) resources and setup .....................................................................................11
7.
Implementation ................................................................................................................................................20
8.
Test scenarios...................................................................................................................................................23
9.
Performance considerations .............................................................................................................................42
10.
Summary ..........................................................................................................................................................43
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
2|Page
1.
Introduction
IBM has been running virtual machines since the 1970s and has customers that run hundreds to thousands of
Linux images as guests on mainframe z/VM LPARs. The IBM Linux on z Test Integration Center in
Poughkeepsie, NY focuses on running a mix of IBM and open source products to prove the versatility of Linux
on z systems.
The focus of this white paper is to backup fairly large instances of MongoDB that are sharded and replicated
over numerous servers. We will accomplish this by using the IBM® Spectrum Protect™, formerly Tivoli®
Storage Manager (TSM), running as backup and management agents on all of the MongoDB servers. The
purpose of this is to show how MongoDB can be integrated with traditional or existing backup tools like IBM®
Spectrum Protect™, even in production-style scale.
This is all possible using the completely free and open-source MongoDB, version 3.0 and higher. There are
many tools and features embedded directly into MongoDB that assist in the process, and compliment the
backup abilities of MongoDB, which we will cover in the next sections.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
3|Page
2.
IBM® Spectrum Protect™ background
IBM® Spectrum Protect™ provides storage management solutions for multivendor computer environments. It
provides automated, centrally scheduled, policy-managed backup, archive, and space-management capabilities
for file servers, workstations, virtual machines, and applications. Furthermore, IBM® Spectrum Protect™
supports systems of all sizes - including virtual machines, file servers, email, databases, Enterprise Resource
Planning (ERP) systems, mainframes and desktops. IBM® Spectrum Protect™ does all this from a single
environment that expands as data grows.
This white paper covers the backup and recovery functions of IBM® Spectrum Protect™, and it shows how it
can be used to protect MongoDB data. In addition, there are a few specific reasons we choose IBM® Spectrum
Protect™. First, it has a trusted track record for being a reliable backup manager for large enterprise systems.
Many large customers have trusted it for years and may already have it integrated in their enterprise systems.
Even for customers that do not yet use IBM® Spectrum Protect™, there is no need to have separate backup
programs that require separate backup software specific to each program. That is significantly more overhead
than simply using IBM® Spectrum Protect™, for all data.
There are other important reasons to use IBM® Spectrum Protect™ for a backup solution. For instance, the
backup occurs on site and is not dealing with cloud services that could impede the backup or restore leading to
loss of data or time. The data is also more secure, and there is less chance of sensitive data being nefariously
accessed if it stays on site. And finally, IBM® Spectrum Protect™ has a robust platform for managing backed
up data. Some applications will just pile data up on a hard drive on some other server and will require it to be
manually restored. Or you would have to manually manage where data is stored and what to do with it over
time. IBM® Spectrum Protect™ can be configured with automatic procedures, like writing data to disk and
tape, and has numerous archiving features. And when it comes to restoring the data, IBM® Spectrum Protect™
agents make it simple to get to the most recent backup for that particular machine and handles the process of
putting the data back where it belongs.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
4|Page
3.
MongoDB background
MongoDB is an open source database considered to be the most popular and fastest growing NoSQL database
mostly because of how well it works in areas where traditional SQL databases have trouble. It is very good for
dealing with large sets of unstructured data and has exceptionally good read times on the data that is stored.
That, combined with powerful queries that are written in JavaScript, makes MongoDB a powerful tool for
modern applications like mobile and analytics that require frequent reads and consumption of data. While it is
not a replacement for all SQL applications that store structured data, it does give a modern solution for the
massive amounts of unstructured data and mobile traffic.
In addition, MongoDB is designed to be highly scalable and available. These features are built into the design
and structure of a MongoDB environment. MongoDB in a production environment is actually a cluster of
processes running different tasks, usually running on different machines. It consists of three different types of
servers:
a.
Config Servers - These servers store metadata about the locations of data. For a production environment,
there needs to be exactly three config servers. These are the metadata servers that hold all the important
information for the clustered DB. They hold all the information about where data is stored and how much
there is.
b.
Query Routers - These special instances of MongoDB are the interface or gateway between outside
applications and the data stored in MongoDB. Requests come into these servers, and the data requested
gets returned to the application through the query router. There is no data stored permanently in these
servers, which gives an added layer of security when connecting to the outside world. These MongoDB
instances work by querying the config servers to find where the data should be stored. Then they then
intelligently fetch the data and return it to the application. They also act as the management interfaces for
doing cluster-level management. While all the configurations are actually stored on the config servers,
query routers are the administrative console through which you access all settings and preferences. There
can be any number of MongoDB servers, but you need at least one for the database to be functional.
c.
Shards - This is where the data is actually being stored in the system. The purpose of sharing is to
horizontally scale the NoSQL database. The data is broken into pieces (aka shards) among a set of servers
for the purpose of keeping the data consistent and available, while avoiding I/O bottlenecks. The servers
are also automatically load balanced in order to prevent one server becoming disproportionally full or large.
The idea is to break up the dataset as logical as possible and spread it across different machines. More
machines mean more availability. Additional shards can be added, and the database will redistribute the
data equally, allowing mongo to handle even more traffic. Of course, each shard is not an individual
server. Each shard is actually a series of duplicated servers called a replica set.
•
Replica Sets - This is the solution to data redundancy and availability issues that can be faced with
large databases, especially those of the NoSQL variety. The goal of this system is to have everything
backed up, in a synchronized fashion, to the point where the complete failure of an entire server could
be handled automatically with no downtime. Replica sets have an election amongst themselves to pick
a single server to be the primary. This primary is the one responsible for communicating with the
handling writes for replica set. All writes are handled first by the primary server, which writes them to
an operation log that it distributes to the secondary members of the replica set. The secondary
members then play back that log and apply all operations to their own data. One very interesting
feature of replica sets is that while the primary handles all the writes, data can be read from all of the
replica servers at the same time. This means that a read operational can occur concurrently on the
same piece of data across a replica set. This leads to great availability. As far as failover goes, the
replica sets are designed to automatically detect and failover for the loss of any server in the set,
including the primary. With the loss of a primary, the servers will automatically elect a new primary
and continue operation.
Figure 1: MongoDB Structural Diagram
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
5|Page
As you can tell from the diagram, in a production environment, each shard of a database is also a replica set.
This is so there is always built-in redundancy with the data. There is no place in the MongoDB cluster where
the data exists only once.
The fact is, MongoDB is built with the purpose of persistent and reliable data, which is why it has native
sharding and replica sets. Sharding is primarily focused on the scalability of the data by dividing a database
among multiple servers. ‘The replica sets’ are sets of identical shards that sit on separate servers for
redundancy and speed.
There is also a feature called journaling, in which the database stores the pending write operations in a journal
file before they are written to disk. These files will grow as large as 1GB in size and only be deleted once all
the operations in the journal have been completed or a clean shutdown of the server has occurred. In the case of
a dirty shutdown, as soon as the server is turned back on, the journal file is read and played back to make sure
all the operations are committed, and commit the ones that are not. This, again, can be done automatically once
the MongoDB instance on the server is brought back online. In the instance of some write errors, MongoDB
will actually restart itself in order to fix the issue. It will restart, read from the journal and then, once the journal
reads are completed, it will delete the old journal file and create a new one.
Between journaling, shading and replica sets, there is a fair amount of automated failover for minor hiccups,
like on a primary shard losing connectivity and going offline. Things would continue to run without flaw and
the downed shard could be brought back up automatically after a restart. However, on bigger files systems that
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
6|Page
are storing important data, with security and reliability issues, these techniques alone aren't quite enough. In
this paper, we use IBM® Spectrum Protect™ as a solution to make MongoDB's built-in failover feature more
robust, enabling fully scheduled backups for not just MongoDB but an entire enterprise ecosystem.
In an enterprise system, you need to have full backups of all of your data for record keeping, audits and disaster
recovery amongst other things. MongoDB’s redundancy is more about availability and does not provide a good
way to handle these large, long-term backups. By using IBM® Spectrum Protect™, we will be able to merge
MongoDB with existing enterprise infrastructure that already has backup policies in place.
In our backup procedure, IBM® Spectrum Protect™ will use a built-in Red Hat (and most other major Linux
distros) operating system feature called LVM Snapshot to do a filesystem copy of our MongoDB files on a
replicated server. This will happen in a series of steps that will be extensively described in the following
sections. What should be pointed out at this point is that all the journals and database files need to be in the
same logical volume to avoid having to snapshot necessary amounts of useless data. This is best practice for
MongoDB, anyway; so it is assumed that this basic rule is followed. In the end, the snapshot is compressed and
stored by IBM® Spectrum Protect™ according to its own storage procedures.
The MongoDB backup process relies on a dedicated replica set member for each shard that is used strictly for
backup purposes. For availability purposes in a production environment, there would be multiple replicas of a
single server. For our backup purposes, we want a replica that is predefined to be the backup instance. That
means it cannot be a primary server, and MongoDB gives configuration options to make this happen. It will
continue to operate as a functional member of the replica set while it is being backed up. However, we know
that the backup procedure will take some resources on the backup server, and we don’t want to slow down the
cluster. The production environment and backup schedule should be designed with that in mind.
This backup procedure displays a few minimum requirements for the MongoDB cluster. This coincides with
the minimum requirements for a production environment set for by the MongoDB documentation.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
7|Page
4.
Benefits
The following are some of the benefits IBM® Spectrum Protect™ offers as a backup and restore tool to
enhance MongoDB’s capabilities:
• Single backup manager for all applications, keeping complexity and overhead lower
• Robust and configurable storage mediums and procedures
• Separate server to do backup
• Cataloging of backups
• Backup changed data only
• Built-in scheduling
• Single control point
• Database for logging, configuration, statistics, metadata
In the next sections, we will show you how some of these features are utilized to make MongoDB a more robust
database.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
8|Page
5.
MongoDB environment setup
Set up your replicated MongoDB servers, according to best practices, with replicas and shards.
1.
2.
Our servers had the following minimum configurations:
•
5GB of storage
•
1GB of RAM (more is recommended)
•
MongoDB 3.0 or greater should be used
•
RHEL 7.1 on z/VM
The backup environment will need to have, at a minimum, the following qualifications:
•
There must be 3 Configuration servers
•
The database is divided into at least 2 shards
o
•
Those shards are split on an optimized shard key
There are at least 3 servers running as a replica set for each shard
o
i.e. 1 primary server, 2 secondary servers - per shard
•
All servers in the cluster are networked by their domain name, and not directly by their IP addresses,
for failover purposes
•
At least one MongoDB Router instance (mongos) is able to handle all requests. There should be
enough routers to sufficiently handle incoming and outgoing requests. More routers should be added
as needed.
The procedure for backing up MongoDB is the main topic of this paper. This paper does not cover the
entire procedure for creating or maintaining a production MongoDB instance. For more information on
that and other MongoDB configurations, the official documentation is a great resource.
http://docs.mongodb.org/manual/
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
9|Page
6.
IBM Spectrum ProtectTM (TSM) resources and setup
1.
IBM® Spectrum Protect™ (TSM) Server resources
To demonstrate the consistency and flexibility of the MongoDB Linux on z backup/restore methodology, the
following IBM® Spectrum Protect™ (TSM) Server resources were each individually tested for this evaluation:
2.
•
System p AIX
IBM® Spectrum Protect™ (TSM) Server version 6.3.5.0
•
System p AIX
IBM® Spectrum Protect™ (TSM) Server version 7.1.3.0
•
System z Linux IBM® Spectrum Protect™ (TSM) Server version 6.3.5.0
•
System z Linux IBM® Spectrum Protect™ (TSM) Server version 7.1.3.0
IBM® Spectrum Protect™ (TSM) Client resources
To demonstrate the consistency of the MongoDB Linux on z backup/restore methodology, the following IBM®
Spectrum Protect™ (TSM) Client resources were each individually tested for this evaluation:
3.
•
System z Linux IBM® Spectrum Protect™ (TSM) Client 7.1.2.0
•
System z Linux IBM® Spectrum Protect™ (TSM) Client 7.1.3.0
•
System z Linux IBM® Spectrum Protect™ (TSM) Client 7.1.4.0
Install and Configure the IBM® Spectrum Protect™ (TSM) Client
The IBM® Spectrum Protect™ Client, version 7.1.4.0, has been installed on each Linux on z MongoDB
server. Here are steps to performing this:
a. Copy the install tar file, 7.1.4.0-TIV-TSMBAC-LinuxS390.tar, from your install repository to the
target server:
•
Create a folder on the target Mongo server to download the file to:
[root@LTMNGO01 /]# mkdir /tmp/tsm
[root@LTMNGO01 /]#
•
Copy the 7.1.4.0-TIV-TSMBAC-LinuxS390.tar tar file from source repository (e.g.
/images/tsm.clients/7.3) to the target directory (e.g. /tmp/tsm) using SCP. On the target MongoDB
server, LTMNGO01, execute:
[root@LTMNGO01 tsm]# scp -pr 10.20.xx.xxx:/images/tsm.clients/7.1.4.0/7.1.4.0-TIV-TSMBACLinuxS390.tar /tmp/tsm
Password:
7.1.4.0-TIV-TSMBAC-LinuxS390.tar
100% 140MB 46.5MB/s
00:03
[root@LTMNGO01 tsm]#
[root@LTMNGO01 tsm]# ls
7.1.4.0-TIV-TSMBAC-LinuxS390.tar
[root@LTMNGO01 tsm]#
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
10 | P a g e
b. Untar the 7.1.4.0-TIV-TSMBAC-LinuxS390.tar file located on the target MongoDB server:
[root@LTMNGO01 tsm]# tar -xvpf 7.1.4.0-TIV-TSMBAC-LinuxS390.tar
README_enu.htm
README_api_enu.htm
TIVsm-API64.s390x.rpm
TIVsm-BA.s390x.rpm
TIVsm-JBB.s390x.rpm
TIVsm-APIcit.s390x.rpm
TIVsm-BAcit.s390x.rpm
TIVsm-filepath-7.1.4-0-rhel59.s390x.rpm
TIVsm-filepath-7.1.4-0-rhel64.s390x.rpm
TIVsm-filepath-7.1.4-0-rhel70.s390x.rpm
TIVsm-filepath-7.1.4-0-sles11sp2.s390x.rpm
TIVsm-filepath-source.tar.gz
gskcrypt64-8.0.50.52.linux.s390x.rpm
gskssl64-8.0.50.52.linux.s390x.rpm
c. Install/update the 4 following rpms:
TIVsm-API64.s390x.rpm
TIVsm-BA.s390x.rpm
gskcrypt64-8.0.50.52.linux.s390x.rpm
gskssl64-8.0.50.52.linux.s390x.rpm
[root@LTMNGO01 tsm]# rpm -i gskcrypt64-8.0.50.52.linux.s390x.rpm
[root@LTMNGO01 tsm]# rpm -i gskssl64-8.0.50.52.linux.s390x.rpm
[root@LTMNGO01 tsm]# rpm -i TIVsm-API64.s390x.rpm
[root@LTMNGO01 tsm]# rpm -i TIVsm-BA.s390x.rpm
[root@LTMNGO01 tsm]#
[root@LTMNGO01 tsm]# rpm -qa | grep gsk
gskssl64-8.0-50.52.s390x
gskcrypt64-8.0-50.52.s390x
[root@LTMNGO01 tsm]#
[root@LTMNGO01 tsm]# rpm -qa | grep TIVsm
TIVsm-API64-7.1.4-0.s390x
TIVsm-BA-7.1.4-0.s390x
[root@LTMNGO01 tsm]#
d. Create and update the /opt/tivoli/tsm/client/ba/bin/dsm.sys and /opt/tivoli/tsm/client/ba/bin/dsm.opt
files.
Note: Anytime you make a change to the dsm.sys or dsm.opt files, issue /etc/init.d/tsm restart for the
IBM® Spectrum Protect™ Client scheduler daemon to re-read these files.
•
Change directory
[root@LTMNGO01 tsm]# cd /opt/tivoli/tsm/client/ba/bin
[root@LTMNGO01 bin]#
•
Copy the dsm.opt.smp file to dsm.opt:
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
11 | P a g e
[root@LTMNGO01 bin]# cp dsm.opt.smp dsm.opt
[root@LTMNGO01 bin]#
•
Copy the dsm.sys.smp file to dsm.sys:
[root@LTMNGO01 bin]# cp dsm.sys.smp dsm.sys
[root@LTMNGO01 bin]#
•
List the files directory:
[root@LTMNGO01 bin]# ls -al dsm.sys dsm.opt
-r--r--r-- 1 root root 834 Dec 14 12:52 dsm.opt
-r--r--r-- 1 root root 971 Dec 14 12:47 dsm.sys
[root@LTMNGO01 bin]#
•
Update the dsm.opt file:
-
This is what dsm.opt currently has:
[root@LTMNGO01 bin]# cat dsm.opt
************************************************************************
* Tivoli Storage Manager
*
*
*
* Sample Client User Options file for UNIX (dsm.opt.smp)
*
************************************************************************
*
*
*
*
*
This file contains an option you can use to specify the TSM
server to contact if more than one is defined in your client
system options file (dsm.sys). Copy dsm.opt.smp to dsm.opt.
If you enter a server name for the option below, remove the
leading asterisk (*).
************************************************************************
* SErvername
A server name defined in the dsm.sys file
[root@LTMNGO01 bin]#
-
Add the following line at the end of the file (where TSMLINUX is the name of the TSM
server):
SErvername TSMLINUX
-
Now, it reads:
[root@LTMNGO01 bin]# cat dsm.opt
************************************************************************
* Tivoli Storage Manager
*
*
*
* Sample Client User Options file for UNIX (dsm.opt.smp)
*
************************************************************************
* This file contains an option you can use to specify the TSM
* server to contact if more than one is defined in your client
* system options file (dsm.sys). Copy dsm.opt.smp to dsm.opt.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
12 | P a g e
* If you enter a server name for the option below, remove the
* leading asterisk (*).
************************************************************************
* SErvername
A server name defined in the dsm.sys file
SErvername
TSMLINUX
SErvername
TSMAIX
[root@LTMNGO01 bin]#
•
Update the dsm.sys file:
This is what it the sample IBM® Spectrum Protect™ Client dsm.sys file looks like:
[root@ltmngo02 bin]# cat dsm.sys
*****************************************************************
* Tivoli Storage Manager
*
*
* Sample Client System Options file for UNIX (dsm.sys.smp)
*
*****************************************************************
*
*
*
*
*
This file contains the minimum options required to get started
using TSM. Copy dsm.sys.smp to dsm.sys. In the dsm.sys file,
enter the appropriate values for each option listed below and
remove the leading asterisk (*) for each one.
* If your client node communicates with multiple TSM servers, be
* sure to add a stanza, beginning with the SERVERNAME option, for
* each additional server.
*****************************************************************
SErvername server_a
COMMMethod
TCPip
TCPPort
1500
TCPServeraddress node.domain.company.COM
[root@ltmngo01 bin]#
As an example of a dsm.sys file configured with 2 TSM Servers, the following is an updated dsm.sys
file, containing 2 IBM® Spectrum Protect™ servername stanzas and corresponding entries for both the
Linux on z IBM® Spectrum Protect™ 7.1.3.0 Server (TSMLINUX) and the System p AIX TSM
7.1.3.0 Server (TSMAIX). Either or both TSM Servers could be used for MongoDB Server image
backup purposes, dependent on the data protection requirements of the specific installation. The same
dsm.sys TSM Server stanza and corresponding entries configuration rules could be used to add more
TSM Servers, should they be required by a specific installation.
For the following dsm.sys file, all modified values/entries from the default dsm.sys file are shown in
the color blue. In addition, all stanzas shown in the color green are performance related and currently
set with the maximum values for optimal MongoDB Server image backup performance. Please note
that in an environment with a heavily utilized/resource constrained (i.e., CPU, memory, I/O, network,
etc.) MongoDB Server, configuring these dsm.sys performance related entries with their optimal
performance values could negatively impact the MongoDB Server performance. In such situations, it
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
13 | P a g e
would be better to start with the default values for these performance related stanzas and incrementally
evaluate any changes from there.
[root@LTMNGO01 bin]# cat dsm.sys
SErvername
TSMLINUX
COMMMethod
TCPIP
TCPPort
1500
TCPServeraddress
10.1.1.1
QUERYSCHEDPERIOD
6
NODENAME
LTMNGO01
SCHEDLOGNAME
/tmp/tsm1/dsmsched.log
ERRORLOGNAME
/tmp/tsm1/dsmerror.log
SCHEDLOGRETENTION 7 D
ERRORLOGRETENTION 7 D
EXCLUDE.DIR
/proc
PASSWORDACCESS
GENERATE
COMPRESSION
NO
DISKBUFFSIZE
1023
TCPBUFFSIZE
512
TCPWINDOWSIZE
512
TCPNODELAY
YES
USELARGEBUFFERS
YES
TXNBYTELIMIT
64000
SErvername
COMMMethod
TCPPort
TCPServeraddress
QUERYSCHEDPERIOD
NODENAME
SCHEDLOGNAME
ERRORLOGNAME
SCHEDLOGRETENTION
ERRORLOGRETENTION
EXCLUDE.DIR
PASSWORDACCESS
COMPRESSION
DISKBUFFSIZE
TCPBUFFSIZE
TCPWINDOWSIZE
TCPNODELAY
USELARGEBUFFERS
TXNBYTELIMIT
TSMAIX
TCPIP
1500
10.1.1.2
6
LTMNGO01
/tmp/tsm2/dsmsched.log
/tmp/tsm2/dsmerror.log
7D
7D
/proc
GENERATE
NO
1023
512
512
YES
YES
64000
e. Self-register the IBM® Spectrum Protect™ Client with the IBM® Spectrum Protect™ server TSMLINUX
Note: To enable open IBM® Spectrum Protect™ Client registration with the IBM® Spectrum Protect™
Server, the IBM® Spectrum Protect™ Server REGISTRATION setting must be set to OPEN.
Execute the dsmc -se=TSMLINUX command and answer the given questions:
[root@LTMNGO01 bin]# dsmc -se=TSMLINUX
IBM Tivoli Storage Manager
Command Line Backup-Archive Client Interface
Client Version 7, Release 1, Level 4.0
Client date/time: 12/14/2015 13:00:39
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
14 | P a g e
(c) Copyright by IBM Corporation and other(s) 1990, 2015. All Rights Reserved.
Node Name: LTMNGO01
Your ID (LTMNGO01) is not currently registered with the server.
Enter the following information to set up a new ID:
Please enter a password:
Enter new password for verification:
Enter contact information: [email protected]
Session established with server TSMLINUX: Linux/s390x
Server Version 7, Release 1, Level 1.300
Server date/time: 12/14/2015 13:00:54 Last access: 12/14/2015 13:00:54
tsm>
At this point, the IBM® Spectrum Protect™ client, version 7.1.4.0, has been installed on the MongoDB
server.
4.
Install and configure the IBM® Spectrum Protect™ (TSM) client scheduler
To run the IBM® Spectrum Protect™ client scheduler daemon at system start as a service, follow these
steps.
a. Place the following script in /etc/init.d as the file tsm.
#! /bin/bash
#
# tsm
starts and stops Tivoli Storage Manager Client
#
# chkconfig: 2345 99 00
# description: Starts and stops Tivoli Storage Manager Client
#
# config: /opt/tivoli/tsm/client/ba/bin/dsm.sys
# processname: dsmc
# Source function library.
. /etc/init.d/functions
# See how we were called.
prog="tsm"
# Define environment variables
PATH=$PATH:/opt/tivoli/tsm/client/ba/bin
LANG="en_US"
LOCK="/var/lock/subsys/$prog"
PID="/var/run/$prog.pid"
export LANG
start () {
echo -n $"Starting $prog: "
mkdir -p /tmp/tsm
dsmc schedule > /tmp/tsm/dsmsched.log 2>/tmp/tsm/dsmerror.log &
pgrep dsmc > $PID && success
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && touch $LOCK
return $RETVAL
}
stop () {
echo -n $"Stopping $prog: "
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
15 | P a g e
killproc dsmc
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f $LOCK
[ $RETVAL -eq 0 ] && rm -f $PID
return $RETVAL
}
rhstatus() {
status dsmc
}
restart() {
stop
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
rhstatus
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 1
esac
exit $?
For additional information, please see this link:
https://adsm.org/forum/index.php?threads/how-to-start-tsm-client-scheduler-in-linux.26676/
b. Issue the following 4 commands to configure the "tsm" service for run levels 3, 4, and 5
chmod 755 /etc/init.d/tsm
chkconfig --add tsm
chkconfig --level 345 tsm on
chkconfig --list
Note: the following are the manual IBM® Spectrum Protect™ client scheduler daemon controls.
To start:
/etc/init.d/tsm start
To stop:
/etc/init.d/tsm stop
To restart: /etc/init.d/tsm restart
c. Check IBM® Spectrum Protect™ client scheduler process is running.
To check that the dsmc scheduler process is running, execute: ps –ef | grep dsmc
[root@LTMNGO01 bin]# ps -ef | grep dsmc | grep -v grep
root 42685 1 0 15:18 ?
00:00:00 dsmc schedule
[root@LTMNGO01 bin]#
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
16 | P a g e
7.
Implementation
The goal of our IBM® Spectrum Protect™ agent backup is to provide a robust and reliable backup solution
for MongoDB sharded environments that can be done entirely on premises. For customers already using the
robust IBM® Spectrum Protect™ software, or for customers that may have more to backup than just
MongoDB, this provides a unified solution.
The best way to capture a point-in-time backup would be to initiate LVM snapshots simultaneously across
all the servers being backed up. The servers will continue to operate in the cluster and the snapshot will
freeze the data being backed up. After the backup has been completed, the server will continue to operate
as normal. So, in this backup method, there is no downtime for any of the MongoDB instances. An
overview of the entire backup procedure is described below, with the specific commands given in the
testing section.
The way IBM® Spectrum Protect™ will backup the data is as follows:
•
A script will be run to shut off the MongoDB balancer across the entire cluster to ensure a quick and
accurate backup
•
One of the three config servers will be shut down for the backup procedure - that way, no data can be
changed
•
IBM® Spectrum Protect™ backup procedure will be called on the designated backup replica members
and the one config server simultaneously
•
Once the backup is complete, the config server will be turned back on, and it will automatically begin
to ‘catch-up’ with the other 2 config servers
•
The balancer will be resumed, and the MongoDB cluster will resume all normal operations
Restoring the backup:
•
The backup is restored by using IBM® Spectrum Protect™ commands to restore the entire logical
volume to the machine
o There must be a volume group of the same name on the machine being restored to, with enough
space to hold the backup volume
o You should make sure all of the DNS and hostname settings of the new machine match that of the
backed up machine
o Make sure that the configuration file for this new machine is the same as a the one it is being
restored from
•
Once the command is run, you start the MongoDB instance, and it acts just like the original server that
was backed up
Keeping track of the hostnames and configuration files for an environment can become an increasingly
complex task. We handled this by using dynamic configuration files and hostnames that were generated
using Ansible. Either that, or the MongoDB instances could be deployed with some sort of infrastructure
as code, like Chef, or maybe containerized with Docker, or possibly even a cloud-based implementation.
Either way, the setup and configuration is worth backing up because that can cause a delay in the recovery
of a system.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
17 | P a g e
Steps to setting up the MongoDB cluster with shards and replicas:
1.
Install MongoDB on each server in the cluster (at least 2 shards of 3 replicas each, plus 3 config
servers, and at least 1 router server, for a total of at least 10)
2.
Time to configure the first replica set
a.
Go into the first server of replica ‘RS1’
i. In the bash shell, enter mongo ipaddress:port to enter the MongoDB shell for that instance.
ii. In the MongoDB shell, enter rs.initiate() to initialize the replica set, starting with this server.
iii. In the MongoDB shell, issue rs.add(“ipaddress:port”) for each additional mongos instances
that is part of the replica set. (Note: you do not need to run this command on the MongoDB
instance you are currently in).
b. Set one of the replicas to a priority 0 server to be the backup designated server.
i. Run rs.conf() and rs.status() to get information about your replica sets, and double check that
everything was added correctly. The order the servers are displayed corresponds to their array
number in the set.
ii. Run cfg = rs.conf() to assign the whole replica config to an array variable.
iii. Run cfg.members[0].priority = 0 to set priority of replica set[0] to 0.
iv. Run cfg.members[n] before running to make sure you’re configuring the desired member of
the replica set.
v. Run rs.reconfig(cfg) to make the changes stick.
vi. Confirm the changes stuck with the rs.conf() and rs.status() commands
Note: You cannot change the priority of the current primary to 0. In order to do that, you must issue
the command: rs.stepDown(), first, to force the primary to step down. Then you can set the priority of
the former primary to 0 so it won’t be primary again.
3.
The first replica is up. It is now time to initialize the sharding
a.
First, you will need to add the initial replica set RS1 to the shard group.
i. Run the command mongo ipaddress:27017/admin to connect to the mongos admin shell
ii. Run sh.addShard(“rs1/ipaddress:port, ipaddress:port”) in the mongos shell for each member
of the replica set you wish to add to the shard.
Note: When adding a replica set to a shard, you only need to give the name of one replica set
member for the entire replica set to be added to the shard.
4.
At this point, start the driver to add data to the current collection. It will make it easier to find a shard
key once we add the second set
a.
5.
To accomplish this, you should run some test data through your application and write it into
MongoDB. You will point it to your mongos instances
Now that you have some test data, you should bring up the other replica sets you would like to add to
the server
a.
For each additional shard you wish to add, you need to bring up one additional replica set
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
18 | P a g e
i. Simply follow the commands in step 2 and bring up as many replica sets as you need shards
6.
Now it’s time to add all of the additional replica sets we just configured into the shard pool
a.
You will want to make sure to add all shard groups to the shard pool
i. This time, repeat step 3 for each replica set. You only need to login to the mongos shell once,
but repeat step 3-a-ii for every replica set being added to the shard.
b. The next step is to enable sharing of the particular database we are using
i. Connect to a mongos admin shell (See step 3-a-i)
ii. Enter the command sh.enableSharding(“databaseName)
7.
Your database is now ready to have its collections sharded. Once it is sharded, the data in that
collection will be distributed evenly between all the shards. First, you must assign a shard key for each
collection. This process is a little more in depth, and takes some finessing. So, this is described in
greater detail in the section below.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
19 | P a g e
8.
Test scenarios
In order to clearly show the steps in the backup process, we will demonstrate the proper method to shard a
database consisting of multiple collections. We will demonstrate this with AcmeAir, one of the drivers we used
in the testing of our MongoDB backup procedures. The source code and documentation for this driver can be
found at https://github.com/acmeair/acmeair-nodejs . AcmeAir is an open source application that is meant to
resemble an airline company website that includes flight lookup and a booking system. There are customers
that have data attached to their login accounts, as well as flights to books, flights with layovers, bookings for
flights, and current customer sessions online. This uses a nodeJS environment to run a web program that has
traffic driven against it by a JMeter automation driver. Traffic is sent to the Web page via API, and the
application uses a nodeJS driver to run traffic against our MongoDB system. The question still remains, how on
earth do you shard the data? This is an area where MongoDB could mostly use improvement with dynamic
shard values. Right now, we are going to walk through the index sharing of the AcmeAir database with you.
The Acmeair database contains 5 collections, each of which will be sharded at some point (e.g. booking, flight,
flightSegment). Sharding is enabled for the database, and then each one of the five collections should get its
own shard index and each collection will be independently sharded.
First, we need to load the database with some initial data and run a few runs of the driver to get some test data.
The database will be ready to shard but will not yet be actively sharded. We do this because we don’t know
what the values in those collections will be, or the distribution of the values. If we knew the expected
distribution beforehand, we could shard before adding data to the collection. Then, we will access the
MongoDB shell of the primary server with the command:
mongo ipaddress:port
Where the port is the port that the primary replica of the database is running.
Then, you will be in the MongoDB shell. Next, type in the command:
use acmeair
This will switch the focus to the correct database we will be working with.
Next, we need to get all the collection in the database with the command:
db.getCollectionInfos()
This will give a list of the collections that exist within the database. In our AcmeAir example, the output from
this command is as follows:
rs1:PRIMARY> db.getCollectionInfos()
[
{
"name" : "airportCodeMapping",
"options" : {
}
},
{
"name" : "booking",
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
20 | P a g e
"options" : {
}
},
{
"name" : "customer",
"options" : {
}
},
{
"name" : "customerSession",
"options" : {
}
},
{
"name" : "flight",
"options" : {
}
},
{
"name" : "flightSegment",
"options" : {
}
}
]
Alternately, you can use the following command for a more compressed list of collection names:
db.getCollectionNames()
Which for us resulted in the following output:
[
"airportCodeMapping",
"booking",
"customer",
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
21 | P a g e
"customerSession",
"flight",
"flightSegment"
]
After you have the names of the collections, you can find out the fields they are made up of with the command:
db.collectionName.find()
With the collectionName being the actual name of the collection, such as airportCodeMapping.
Here is the example of the output of that command:
rs1:PRIMARY> db.airportCodeMapping.find()
{ "_id" : "BOM", "airportName" : "Mumbai" }
{ "_id" : "DEL", "airportName" : "Delhi" }
{ "_id" : "FRA", "airportName" : "Frankfurt" }
{ "_id" : "HKG", "airportName" : "Hong Kong" }
{ "_id" : "LHR", "airportName" : "London" }
{ "_id" : "YUL", "airportName" : "Montreal" }
{ "_id" : "SVO", "airportName" : "Moscow" }
{ "_id" : "JFK", "airportName" : "New York" }
{ "_id" : "CDG", "airportName" : "Paris" }
{ "_id" : "FCO", "airportName" : "Rome" }
{ "_id" : "SIN", "airportName" : "Singapore" }
{ "_id" : "SYD", "airportName" : "Sydney" }
{ "_id" : "IKA", "airportName" : "Tehran" }
{ "_id" : "NRT", "airportName" : "Tokyo" }
Mongo requires that you manually choose the shard field, and in this case, we will choose the airportName
field.
After you have decided the fields that you would like to shard, it is time to make it happen. The first step in the
process is to get into the mongos admin shell, which you should already be in from the last step.
Then you need to issue the use dataBaseName to switch focus to the database of your choice.
These next steps will be repeated for each collection in your database. Remember, it is the collections (more or
less the equivalent of tables in a traditional relational SQL DB) that make up the database getting sharded. And,
the sharding process is essentially indexing a collection based on one field of the collection and then splitting
that collection at the halfway point of that index.
Now, the steps:
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
22 | P a g e
1. Before you index, you should check if a good index already exits, with the command
db.collectionName.getIndex()
2. Run the command db.collectionName.createIndex( { fieldName : 1} ) . This command indexed the
collection collectionName on the field fieldname. The 1 after the field name stands for the ordering, in
this case, in descending order. -1 would signify ascending order.
• Note: The indexing of a field is only necessary if data is already stored in the index. If you know the
format of the data beforehand, you can skip this manual indexing. MongoDB will automatically make
the shard key the index. After that, you can set the shard key with the command
sh.shardCollection(“db.collection”, shard-key ). For example:
sh.shardCollection(“acmeair.airportCodeMapping”, {“airportName”: 1})
Backup Procedure and scripting
Now that we have a fully setup MongoDB database, we can start a backup. Interestingly, there are multiple
storage engines that are available in MongoDB, MMAPv1 and WiredTiger. For our backup and restore
procedure, we chose to use MMAPv1 because it was more stable. However, with our backup procedure, it
makes no difference which storage engine you decide to use. One of the main reasons we settled on a LVM
snapshot to get a point in time backup of our system is that we actually cannot otherwise guarantee a perfect
moment-in-time backup without completely shutting down each MongoDB backup instance. There is a
MongoDB method, fsynclock(), which is supposed to stop writes to a database, but it turns out that the
fsynclock() method in MongoDB does not guarantee that WiredTiger actually stops writing. Since that creates a
data integrity issue, we use the LVM snapshot, instead. That actual makes the process simpler, and our only
remaining issues are to, 2) stop balancer and, 2) try to get the IBM® Spectrum Protect™ agent to backup with
some synchronization.
Part of what needs to be scripted is the stopping of the balancer. This balancer is the locking mechanism, which
manages the redistribution of data amongst the shards in a MongoDB cluster. For example, if you sharded a
collection on the field Last Name, the split between 2 shards might be the letter ‘L’. If MongoDB notices there
are significantly more entries in the ‘A-L’ shard than the ‘M-Z’ shard, it may change the split point from ‘L’ to
‘K’. In this case, all the data with the last name starting with an ‘L’ would move from shard 1 to shard 2. It
makes sense to turn the balancer off because you don’t want the balancer to move things during the backup
process. New data will still be placed on the correct shard based on its shard key, but the shard point will not
change when the balancer is off. That means that MongoDB can still accept new writes, as well as read while
the balancer is turned off.
It is possible for the data to become unbalanced during this time if there is a large influx of data from one
section of the shard key. This can be reduced by using more complex shard keys, like a hash key, or shading on
a different value altogether. Either way, MongoDB will re-balance itself once the balancer is re-enabled. You
must be careful that the balancer is stopped as well as disabled and not just one or the other. It is possible that
you run into a problem of a stuck balancer. We found that the best way to deal with that is to find the mongos
instance that is causing the trouble and bringing it down softly. This is accomplished through the following
procedure:
1. Connect to one of the mongos instances through the MongoDB shell
mongo ipaddr:27017
2. Now that you are in the MongoDB shell, you want to use the config database. Then, see the state of said
database
use config
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
23 | P a g e
sh.status()
3. Near the top of that data printout will be a field called balancer. The data looks like:
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Thu Oct 08 2015 16:18:45 GMT-0400 (EDT) by
ltmngo10:27017:1444334119:1804289383:Balancer:846930886
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
4
This tells you if the balancer is enabled, if it is running, and when it started. The balancer can be disabled,
and still be running, because disabling the balancer doesn’t stop the running processes but simply waits for
them to stop. To do this, run the command:
sh.setBalancerState(false)
5.
You can check again and see when the balancer stops running to begin the backup. If the balancer does not
stop running, and the lock start time from above is more than a couple of hours old, you may need to
manually fix it
6.
First, try to get MongoDB to take care of it with the command:
sh.stopBalancer()
• This will give you live prompts of the active process to shutdown the balancer
7.
In our experience, if that doesn’t work, you will need to shut down the server that holds the lock. In the
status above, it lists which server holds the lock. It should be a mongos server that can be brought down
safely and then brought back up without much consequence. From the Bash shell, run:
ps aux | grep mongos
• This will give you the PID of the mongos instance you need to kill
kill xxxxx
• This will kill the process ID of the number xxxxx that you give it
• Bring it back online and give it some time to time out the lock before you try again (about 15-30 mins, at
most). The default timeout should be 900,000 milliseconds, or about 5 minutes
8.
Once the balancer is stopped, the mongos data servers are ready to be backed up, but the config server is
not
• To get a clean config server backup, it makes the most sense to completely shutdown one of the config
servers. We do this because bringing the service down totally guarantees that there will be no reads or
writes to that configuration, and we have a true moment-in-time backup. There is also the benefit that
the config servers won’t try and do any chunk migrations or allow any sort of config changes when one is
down, making sure the backup will work
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
24 | P a g e
• This is done with by stopping the running mongod service on the desired backup device with the
command service mongos stop
• Now we are ready to start the IBM® Spectrum Protect™ backup
About the Backup process
IBM® Spectrum Protect™ has a built in method called backup image dev/vg/logical_volume. On Linux
systems, this method will run a backup procedure that involves backing up the specified logical volume with
Linux’s built-in Logical Volume Manager (LVM). Specifically, the LVM creates a snapshot of the logical
volume and IBM® Spectrum Protect™ makes a backup of that snapshot and uploads it to the IBM® Spectrum
Protect™ server.
The way an LVM snapshot works is to create a copy-on-write clone of the logical volume in the free space of
the volume group. What that really means is that the LVM makes a virtual copy of the actual logical volume by
creating a virtual drive in the free space of the volume group. It stores hardlinks in the virtual snapshot drive
that point to the data on the real logical volume. That means that a snapshot on a static dataset would take
almost 0 storage space (storage space of hard links is negligible). When something is about to get written to the
original copy, the original data will be copied to the snapshot before the new data is written on the actual disk.
What that means is that you can keep the instance up and running with new writes happening to the database, all
while you take a moment-in-time backup that starts the moment you start the command. The snapshot always
contains the data exactly as it was on the logical volume when you started the snapshot.
It is important to note that the journal files automatically created by MongoDB are also part of this snapshot.
The journal files are MongoDB method of tracking recent changes that are being made before they are written
to disk. This is especially important in a backup process, because MongoDB holds some data in memory before
it flushes it to disk in a batch. By also backing up the journal, we have a record of all the transactions that have
occurred on the database, but have not been completely written to disk. The snapshot function will be able to
capture the state of both the database and the journal file at the exact same time to ensure that no data is lost in
the process. MongoDB will read from the journal file when it is restored and apply all pending or incomplete
updates.
There is, of course, some storage considerations to keep in mind, which may vary from instance to instance.
Logically, any data that changes during the live backup gets written to disk. This space for the snapshot is taken
from the free space of the volume group that logical volume belongs to. You need to have enough free space in
the volume group to cover all changes that may occur during backup time. Worst case scenario, you would
need an equal amount of free space in the volume group as the logical volume occupies. This, however, is most
often not going to be the case, depending on what percentage of your database gets overwritten during the time
a backup may take. In addition, that space can be added to the volume group for the backup and then removed
afterward.
Despite the extra disk space needed, it is made up with the ability to do live backups. We don’t have to stop
any of our database servers for a backup. So, there should be no significant performance impact while the
backup is in progress. The primary servers in the shard will still operate at full speed, with only a reduction in
read time for any queries being processed by the backup server. This can be eliminated as well with some
special replica server modes, like ‘Hidden’ that prevent reads from happening on the backup server.
It’s also worth noting that this is the backup method that MongoDB currently recommends, and many of the
cloud backup companies currently use. There are a few backup and dump tools provided by MongoDB, but by
their own admission, they are too slow to scale past a very small database of a couple gigabytes. In this case,
IBM® Spectrum Protect™ provides a much cleaner way of implementing the LVM snapshot backup, and it
handles the storage and safekeeping of the files on the backend server.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
25 | P a g e
As for the actual IBM® Spectrum Protect™ commands, it is very simple. On all the backup servers, at the
same time, issue the command:
dsmc backup image /dev/volumeGroup/data
This is the basic command that initiates the backup from the command line on the client servers, but there are a
few parts and options you should know.
First, the dsmc is the IBM® Spectrum Protect™ command, on the Linux command line that the program
responds to. All of our IBM® Spectrum Protect™ commands ran from the shell will begin with dsmc.
The next two words, backup image, are the commands that tells IBM® Spectrum Protect™ what to do
(backup), and what technique to use (image).
The last part of this command is the logical volume name that you would like to back up. It is important to note
that you must use the device path (/dev/volume_group/volume) and not just use the name of the mount point
(like /data). Using the file system mount point (e.g. /data) will be interpreted differently by IBM® Spectrum
Protect™, and it will try to do a file system backup instead of an LVM logical volume snapshot.
There are also a couple other commands that are useful and worth knowing.
The first one is snapshotcahesize. It is an option that allows you to set the percentage of the total size of the
logical volume that IBM® Spectrum Protect™ will reserve from the volume group. So if your /data directory is
100GB, passing 10 to this option will cause LVM to reserve 10GB from the volume group for the snapshot. If
more than 10GB of the original 100GB gets changed during the backup, the IBM® Spectrum Protect™ agent
will return an error and the backup will fail. In the event of a mission critical, 100% time-sensitive backup, you
will want to have 100% of the logical volume free and unallocated in the volume group (in this example:
100GB).
dsmc backup image -snapshotcachesize=10 /dev/volumeGroup/data
Remember that the default is 100%, so if you do not use this flag, the volume group free space will have to be
greater than or equal to the actual size of the logical volume that you are using. In our testing, we use a value of
about 25, and never even came close to running out of space. This should be monitored and set on a case-bycase basis. You will also notice that the flag is joined to the value by an equal sign, which is different than the
Linux norm.
This is what the command would look like designating 55% cache size:
dsmc backup image-snapshotcachesize=55 /dev/volumeGroup/data
One last important command line option is compression. This option allows you to enable the compressing of
this backup before it is uploaded to the IBM® Spectrum Protect™ server. We will get into the details of
compression shortly, but right now know that the command is as follows.
dsmc backup image -compressi=yes /dev/volumeGroup/data
That is all there is to backing up our MongoDB cluster. With a properly designed LVM layout, and well
planned configuration, IBM® Spectrum Protect™ can take care of the backup in one command.
You should be aware that this is a point-in-time backup. This is a backup that is meant to capture a perfect copy
of the database the instant the backup process is started. Despite the MongoDB data servers being live, and
taking reads and writes, the data that is backed up is the data that is on the disk the moment the backup is
started. All data that comes in or gets removed between the start and finish of the backup process will be
reflected in the current state of the server but not the backup data.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
26 | P a g e
Executing the commands
There are a few different ways that you can go about executing the commands to perform the backup process.
There are a few different commands you need to issue to MongoDB, as well as an IBM® Spectrum Protect™
command. The exact method for scripting this will come down to the scripting preferences and skill set of the
database administrator. One very important factor in this scripting is the need for all of the IBM® Spectrum
Protect™ backup process to start at nearly the same time. In order for the point-in-time snapshot to work, you
don’t want any data to come into any of the data servers that is not going to be recognized in the config server.
This data will be orphaned in the database with no reference to it in the config server - it is pointless! To avoid
that as much as possible, there needs to be a reliable synchronization. There are a few methods that we believe
will satisfy this need:
• Cron and shell scripts - Scripting this out in a shell script and then running a synced cron on all the backup
machines is the classic way to handle Linux administration. You may need to observe the status of the
IBM® Spectrum Protect™ service to make sure that the backup happens at the same time on all the
machines, and that it is relatively error free.
• Issuing the MongoDB commands - Issuing the shutdown and startup commands to the running database can
be a little tricky. But there are quite a few ways to do it. You can actually write the shutdown commands in
JavaScript® (Mongos’ native language) and pass that JavaScript® file into the MongoDB shell through bash.
There are also Python drivers, if that is your preferred scripting method, called pymongo. You can do just
about anything from the pymongo driver that you need to manage MongoDB. There are also drivers in just
about every language that you can use.
• Automation Software - This includes software like Chef, Puppet, Ansible and Salt Stack. Many companies
already use these services to manage theirs infrastructure, and there are many great benefits to using these for
MongoDB. MongoDB scales by adding new machines, shards and replicas. The cluster can get complex
quickly, and you don’t want to manage that by hand. But specifically, in terms of backups, these services
give you a single point to manage the backup from. This allows you to synchronize all the servers through
the Automation software. We chose to use Ansible in this paper for building and backing up our MongoDB
clusters. Not only was it flexible, but also we were able to go from using the first time to having a working
MongoDB script in a few days.
Restoring the database
Once again, when restoring the database, the IBM® Spectrum Protect™ command is very simple, but it’s
everything else that takes some setup.
1.
Assumptions
You are trying to recover from a point-in-time snapshot, either on fresh servers, or on servers that
previously ran MongoDB where the data is corrupted or gone etc. Either way, the process is nearly
identical.
2.
Important Considerations
There are a few design feature of MongoDB that will affect how we restore the backups. To review what
was already covered:
• There are always three config servers in a production cluster that are essentially identical.
• There are multiple shards that each hold a small slice of the whole MongoDB database.
• Each shard is replicated, which means there are multiple MongoDB instances that hold identical data.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
27 | P a g e
• MongoDB server instances don’t persist any data, and only need a configuration file pointing to the
MongoDB configuration servers to run. The config servers hold the sharding rules and the configuration
for most of the cluster. The config servers also hold all the metadata that knows where the actual data is
stored.
• The MongoDB instances actually store the information about their own primary and secondary servers.
• Mongo servers are identified by hostname. So, you need 3 things to fully restore a system: the config file,
the data, and the hostname. These facts play a determinate role in deciding how to backup and restore.
There are a few important conclusions we draw based on these observations.
First, if a server is identical to another server, you don’t need to take backups of each server and restore
them individually. Instead, take a backup of one of the identical servers. Now when you restore it, you
propagate it to all the identical servers in the cluster.
Secondly, since all the configuration files are part of the backup, a well-planned and well-executed restore
would restore the old configurations and require no extra configuration by the end user. In the end, you
just power the restored machines on, and it is just as it was when it was backed up. This makes restorations
much quicker and much simpler, especially at scale. This can be done by keeping the configuration file in
the logical volume with the backup data, or by dynamically creating the configuration files with one of the
automation platforms mentioned above.
Finally, the servers being recognized by hostname make it easy to replace hardware and restore to totally
different machines. If you need to change some or all of the hardware out when doing a restore, it’s as
simple as changing the hostnames on the machines to match what was in the original configuration. This
way your entire configuration is intact and MongoDB can continue to communicate across the cluster over
the network.
3.
Configuring the Server
You need to get your server in the exact or similar state it was in before MongoDB crashed. This is why it
is highly suggested to use some automation software to create your infrastructure as code. This makes it
really easy to deploy a structurally identical server. In any case, you need to make sure certain steps are
done before running any IBM® Spectrum Protect™ restore features. These are very nearly identical to the
steps taken to originally set up the server. These steps must be taken on all servers that will be a part of the
restored cluster.
• Make sure that MongoDB is installed and the identical config file is in the right place. This includes
making sure all of the directories needed for logging and PID files are there. These files should be owned
by the user who will be running the mongos instance. If not, use chown to make sure the mongo process
can access any of those files.
• Make sure you create a logical volume with at least as much space as the original logical volume that was
backed up (can be determined by the size of the backup file).
lvcreate -L 10G -n data rhel7_system
• There may be a prompt asking if you want to overwrite a filesystem block, respond with yes. The
underlying filesystem that we need is actually on the image we will restore.
• IBM® Spectrum Protect™ requires that there is an allocated logical volume to restore the backup. It will
not restore to free volume group space. It needs the shell of the logical volume.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
28 | P a g e
• You cannot restore a logical volume smaller than the backup image size. If you try that, it will fail. If it is
larger, it will still work, but only if you refrain from building a filesystem on the newly created LVM.
• If you create a filesystem, specifically an XFS filesystem on the newly created shell logical volume, and
try to restore a smaller backup image, the IBM® Spectrum Protect™ agent will try to shrink the file
system to automatically match them together. The problem is XFS does not have a filesystem shrink
command, so it fails. For this type of restoration from a full LVM snapshot, you do not need to build a
filesystem on the logical volume being restored to. It should be avoided because of potential errors that
have been discovered.
• Make sure that the mount point exists for attaching the LVM once it is restored. If not, create it with the
command mkdir /FileName (i.e. mkdir /data)
• Make sure that the IBM® Spectrum Protect™ agent is installed on the machine, in a directory different
than the one you will be restoring MongoDB onto (e.g. If IBM® Spectrum Protect™ is stored in /opt, you
should not be trying to restore the /opt logical volume)
• If you are using different hardware, make sure you change the hostname to match the hostname in the
config file, and that hosts resides in the replica set of the image you are about to restore.
4.
Running the restoration command
• If you are restoring to the same machine that the backup was originally taken from, you will use the
command:
dsmc restore image /dev/rhel7_system/data
• If you wish to restore the IBM® Spectrum Protect™ image to a different machine, you will need to
authorize this new machine to access the backup images of the original. To do that, you do the
following:
• Access the server where the backup originated. Enter the IBM® Spectrum Protect™ shell by entering
the following terminal command: dsmc
• Your prompt should now be tsm>
• From here, you can query what nodes have access with the q access command, or view all the backed
up files with the q files command. What you want to do is run the command set access backup "*"
nodeName. set access backup "*" nodeName. This will give all nodes with the nodeName given access
to the backup files of the current node.
• Now that the target restoration server can access the backup, you restore using the command:
dsmc restore image -fromnode=LTMNGO09 /dev/rhel7_system/data
/dev/rhel7_system/data
• Now that the lvm is restored, we need to remount it to the filesystem. Run:
mount /dev/mapper/<volumeGroup> - <name> <mountPoint>
(ex. ‘mount dev/mapper/rhel7_system-data /data’)
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
29 | P a g e
• Once you have done that, you can start the MongoDB instance on that server and try to connect to the
local MongoDB shell with mongo —port <port>. Once you can verify that the system is up and
running in a proper configuration, stop the MongoDB instance
This is where the restoration process can differ slightly. Because we have identical servers, we only
took a backup image of one of the replicas. Now that we are restoring, we need to restore that one
image back to all of the replicas. The two easiest and simple ways of doing this are to either:
1. Restore to one server using IBM® Spectrum Protect™ and use a copy command to move the files
from the restored server to all its replicas
Or,
2. Install the IBM® Spectrum Protect™ agents on all the machines and simultaneously restore the
image from the IBM® Spectrum Protect™ server to watch the replicated servers. This requires
doing the aforementioned step of giving access to a backup to every other node in the replica set.
The pros of copying files after the restore is needing less IBM® Spectrum Protect™ agents. Since we
only need backups from one machine, most of the IBM® Spectrum Protect™ agents would be installed
strictly to do restores and never any backups. When copying files from the restored backup server to
the replica sets, the IBM® Spectrum Protect™ server is doing less work, and the IBM® Spectrum
Protect™ agents only need to be installed on the backup servers. Depending on the pricing package of
IBM® Spectrum Protect™, this can play a factor, especially at scale. Of course, you have to program
a way to copy and distribute files from the restored server to the others so there is a little bit more
customization required.
Using the IBM® Spectrum Protect™ restore function for all your servers can be a little bit simpler.
However, there is more overhead of having an IBM® Spectrum Protect™ agent on all machines in the
cluster. There is also an added strain on the IBM® Spectrum Protect™ server, having every single
node in the MongoDB cluster having to restore the LVM image directly from it. For the sake of the
paper, we used both methods and found that both were satisfactory.
Method1 - Copying the files. Restore to the server with the IBM® Spectrum Protect™ agent as
explained above. Go into the data directory of the restored volume and TCP the data to the other
servers
scp -r /data/mongo/….. hostname1:/
(e.g. ‘scp -r /data/mongo root@ltmgo04:/data’)
Then make sure you do a chown on the files when they arrive at the other server to make sure the files
are owned by the user who runs MongoDB (by default, mongod). You could also use rsync or some
other file copying/synchronization method to pass the files between the servers.
Method2 – Using IBM® Spectrum Protect™ on all servers. For this method, you need to install
IBM® Spectrum Protect™ on all the servers and run the set access methods that were given above.
They need to be run on every backup machine for every member of their own replica set, so they have
free access to that IBM® Spectrum Protect™ image. Once that is done, you use the remote restore
method, also given above to restore the file system, following all the restore steps above. Then you
should chown the data directory to ensure it is owned by the user that runs MongoDB.
This restoration process goes the same for all of MongoDB data servers, and the MongoDB
configuration servers. The only restoration that the mongos servers need is for the configuration file to
be put on the server they will run from.
Now that we know how to get all the individual servers back up, we need to make sure we bring the
cluster up in the right sequence.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
30 | P a g e
At this point in the restore, MongoDB instance should have all its data on it and have been individually
tested to make sure it came up. Then the MongoDB service should have been shut off again, leaving
all the MongoDB services currently down.
Moving forward, we will start the MongoDB services, and keep them on, barring any errors. This is
the launch sequence for restoring the MongoDB instances, and it is just about the same as the original
startup.
First, you will start each member of the single replica set. You want to make sure that there is a
primary and secondary and that they can all communicate with each other and recognize themselves as
part of the same replica set. You will do this with all the different replica sets, leaving them on and
running as you move on to the next.
Second, once all the replica sets are active, it’s time to move on to the config servers. These must all
be turned on before any of the mongos instances. Once all three are up and running, we can start to
bring up the mongos instances. Make sure you do not change the hostnames of the config servers
(ever). Even if you restored to new config server machines, you should change their hostnames to
match what was in the original mongos config.
You should start by bringing up one mongos instance and checking the log to make sure you can
connect to all the config server instances. Once you see that, connect to the MongoDB shell through
the mongos instance. Once there, you can run shard tests and look at the data and make sure
everything is there. If everything is indeed working, you have successfully restored a MongoDB
backup and are running an active cluster. Bring up as many mongos instances as you want and enjoy
your freshly restored server.
5. Where to put the data
As a result of various factors, having to deal with the backup procedure, we determined that the data
portion of MongoDB should be kept in a separate volume group and mounted to the /data directory. Now,
it is very possible to mount it to an existing volume group, but it must be its own logical volume. We do
not want to waste any space when we do our backup, and we want to be able to mount it with ease in a
known location.
6. Testing
For our testing process, we needed to make sure that our backup and restore process worked efficiently,
while maintaining data integrity. We also wanted to collect some performance metrics to make sure that
our backup servers, as well as the while MongoDB cluster, continued to operate efficiently.
To do this, we ran two different test drivers against the database. First, it’s the AcmeAir driver introduced
earlier in the paper. This was chosen because it is a robust test driver for MongoDB that has multiple
collections. It also has JMeter drivers to vary the amount of traffic generated per second on the database.
That allowed us to keep track of entire cluster speeds before, after, and throughout the backup process.
Second, it’s a custom Python script made specifically to test the data integrity of the database. It would
write from a predefined JSON data set, and the Python script would insert certain markers into each entry
written into MongoDB. This included the order that the items were written, as well as exact time stamps of
them being added. By doing this, we were able to compare the data that was in the database after the
restore to the timestamps and numbering from Python.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
31 | P a g e
We were able to verify the integrity of the data by checking the time of the start of the backup process that
we marked to the last entry in the restored MongoDB cluster. If those times matched, we knew the pointin-time backup had succeeded. In addition, we were able to look at the numbering sequence to see that
there were no gaps in the data, such as having entry 200 and 202, but not having entry 201. Using these
techniques we were able to drive various workloads against the servers, all while verifying the integrity of
the workload.
We will start with a baseline test of running a backup while no traffic is being driven to the MongoDB
environment with compression enabled. The graph shows the CPU IDLE percentages. After that, it’s the
standard out from the IBM® Spectrum Protect™ backup agent after calling a backup. It gives information
on the time it took as well as the compression percentages and network transfer speeds.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
32 | P a g e
Figure 2: CPU usage example #1: Backup with compression
Compressed backup starts at 12:15:01 and ends at 12:18:45
Figure 3: CPU usage example #2: Backup without compression
For a direct comparison, here is the same backup procedure on the same exact MongoDB instances with
compression disabled. Non-compressed backup starts at 12:30:01 and ends at
2:33:01
Figure 4: Workload baseline, with 2 drivers pulling from 2 separate shared databases, no backups
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
33 | P a g e
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
34 | P a g e
Figure 5: During the backup w/ Active Workloads & Compression
Compressed backup starts at 11:30:01 and ends at 11:33:51
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
35 | P a g e
Figure 6: During the backup w/ Active Workloads & NO compression
Non-compressed backup starts at 11:45:04 and ends at 11:48:11
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
36 | P a g e
9.
Performance considerations
Considerations to be aware of during the backup process:
Time - During our testing, the backup of a consistent set amount of MongoDB Server TSM image backup data
per server required between 2.5 to 3.5 minutes depending on network speeds. These backups will be done
concurrently across all shards but, at a much larger scale, it could become a time issue. However, there is the
option of adding more shards to reduce the size of an individual backup. In addition, the time is not a huge
factor because there is no performance issue in the MongoDB cluster performance during the backup.
CPU Performance - There is an additional CPU utilization that can be observed during the process. Doing an
uncompressed backup saw the %idle drop between 10-20 points during the 2.5 to 3.5 minutes it ran. The most
extreme case was when compression was turned on. We saw some machines go to 0% idle for short periods of
time (30 seconds) when doing a compressed backup, which is certainly not ideal. These machines are indeed
live when the backup occurs and may be called upon to do lookups, and such a high CPU utilization can cause a
bottleneck in MongoDB. However, we noticed no discernable drop in MongoDB cluster performance during
the backup with compression process. It is certainly a tradeoff between CPU and network traffic and storage.
It should be decided on a case-by-case basis which of the methods is best. It is important to remember that
MongoDB with MMAPv1 storage engine is not typically bound by CPU utilization. The stressor is more likely
to be the memory space and the disk I/O. Given that in our test cases the backup server was a secondary replica
that only handles occasional reads, and no direct writes, the CPU cycles used by the backup process should be
negligible. By design, the backup process takes place on an backup server that is under less load than the
primary server and can backup as quickly, while remaining available to handle requests during a usage spike,
and to stay current with the rest of the database without needing to catch up after the backup.
MongoDB Performance - The backups we do leave all the servers, including the backup servers active and able
to perform normal tasks. This should leave the performance of MongoDB unchanged except in the most
stressful of scenarios. The biggest performance tradeoff during the backup process is the disabling of one of the
config servers. This means that the mongos servers must retrieve all the metadata from 2 rather than 3 servers.
However, version 3.0 of MongoDB will not scale past 3 config servers, which means there really is no
performance gain in having more config servers; it’s more of a redundancy measure. Once the config server
comes back online, it will be go into a sync mode in order to catch up on any metadata changes that may have
occurred while it was offline. This normally completes in seconds and should cause no issues at all.
Compression vs non-compression backups
For the System p AIX and System z IBM® Spectrum Protect™ Servers, a series of image backups were
performed between compressed and non-compressed backups, and across the 2 TSM 6.3.5.0 Servers and 2
IBM® Spectrum Protect™ 7.1.3.0 Servers, and 3 levels of TSM 7.1 Clients. The compressed image backups
consistently required a moderate amount of additional time to complete, with the non-compressed image
backups consistently faster in all TSM/Spectrum Protect Server and Client combinations.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
37 | P a g e
10. Summary
MongoDB continues to grow in popularity and will find widespread use in Linux on z environments. IBM®
Spectrum Protect™, formerly Tivoli® Storage Manager, is a mature, proven, and strategic world class data
protection platform that helps organizations of all sizes meet their data protection requirements.
This paper documents methods to backup MongoDB data with IBM® Spectrum Protect™. The IBM®
Spectrum Protect™ backup function is a good option for data protection if the customer already has IBM®
Spectrum Protect™ established in their data center.
The backup procedure reviewed here recommends using IBM® Spectrum Protect™ to create Linux file system
snapshots against a MongoDB replica set member dedicated as a backup server. The minimum requirements to
setup the backup environment are defined above. The steps are outlined to do the backup of the data and journal
files. The backup can be done currently with application writes to the data with no application down time.
Elapsed time and CPU utilization measurements were captured during the testing using a consistent set amount
of MongoDB Server TSM image backup data, consisting of MongoDB data and journal files. Measurements
were taken with compression disabled and enabled with the same amount of source data. CPU usage was very
high during the compression tests. Each customer will have to determine if this trade-off is justified in their own
environment. Measurements were also taken with and without workload and with one and two data shards.
IBM® Spectrum Protect™ restore of a MongoDB database was also tested and documented. The steps to
prepare the target file systems are included along with the IBM® Spectrum Protect™ commands to initiate the
restore. The recommended restore option is to, 1) recover to one replica set member using IBM® Spectrum
Protect™, and 2) use Linux commands to copy the data to other members in the replica set. Another option is
to, 1) have IBM® Spectrum Protect™ agents on all servers and, 2) use IBM® Spectrum Protect™ to restore to
all of them. Hence, this paper provides multiple options - we show how flexible this solution can be - and we
strive to allow the customers determine what will work best for their environments.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
38 | P a g e
Copyright IBM Corporation 2016
IBM Systems
Route 100 Somers, New York 10589
U.S.A.
Produced in the United States of America,
01/2016
All Rights Reserved
IBM, IBM logo, ECKD, HiperSockets, z Systems, EC12, z13, System p, Spectrum Protect, Tivoli, and WebSphere
are trademarks or registered trademarks of the International Business Machines Corporation.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United
States, other countries, or both.
Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
All statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending
upon considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the
storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will
achieve throughput improvements equivalent to the performance ratios stated here.
© Copyright IBM Corp. 2016. All rights reserved.
January 7, 2016
39 | P a g e
Fly UP