How to Back Up Selected MySQL Databases

I recently had to do a backup of a 30 very large databases within a MySQL instance that had a total of 60+ databases.  I did NOT want to back up the other databases. In fact, the other databases had very large tables in them and I had very little disk space, so I could not afford to back them up. In this post I will share what I learned.

I’ve simplified the problem to illustrate the idea.  My goal in this post is to backup one database when an instance has many, with mixed InnoDB and MyIsam tables.

  • I have 3 databases called dbtest1, dbtest2, and dbtest3, in one instance of MySQL .  I only want to back up  the tables in database dbtest2

3dbs

  • Each database has one InnoDB and one MyISAM table
  • The InnoDB tables are quite large, and I have limited disk space for the backup
  • I am using file-per-table option for all InnoDB tables

The –databases Option

First I tried the–databases=dbname  option in the mysqlbackup command line.  This option filters  all non-innoDB files, including FRM, MYD, MYI etc. .

mysqlbackup –u username –p –databases=”dbtest2–backup_dir=yourdirectory backup

This translates to:

  • Backup all the non-InnoDB files (including frm files) in the database I’ve specified (dbtest2)
  • Do NOT backup the non-InnoDB files in any other databases
  • Back up all IBD files in the instance

results1A full backup of dbtest2  was saved (as outlined above) but  the IBD files  in all other databases in the instance were also backed up.  So this is not the exact option I needed.

The –include  Option

Next I added  the –include option to filter IBD files in other databases.  It’s important to think about the—include option as filtering IBD files (not InnoDB tables, which in my mind include IBD and FRM files)

mysqlbackup –u username –p –databases=”dbtest2” –include=”dbtest2  backup_dir=yourdirectory  backup

This translates to:

  • Backup all the non-InnoDB files (including frm files) in the database I’ve specified (dbtest2), as before
  • Do NOT backup the non-InnoDB files in any other databases, as before
  • Back up all IBD files only in the dbtest2 database

.results2

This gave me exactly what I wanted – one directory with all the IBD, FRM, MYD, and MYI files for dbtest2.  All other databases are completely ignored, so I did not run out of disk!.

Note that the –include syntax does not permit you to list more than one database at a time.  However, it does allow you to use regular expression syntax such as :

–include=db.                        (include all databases starting with db followed by any character)

  • or

–include=dbtest[12]             (include dbtest1 and dbtest2)

Compression

Consider trying  compression – in my original scenario, I also used compression which was a great help and compressed my 1300 gigabyte files 80% in the backup.  Of course the downside to compression is you must decompress to restore, but it is definitely worth considering if you have space limitations.

A Visual Guide to the MySQL Performance Schema

If you haven’t explored the MySQL Performance Schema yet, this is a good place to start.  This is Performance Schema 101, a basic introduction to the MySQL 5.6 performance_schema, which records runtime statistics from the MySQL database. The performance_schema is intended to provide access to useful information about server execution while having minimal impact on server performance.  Performance_schema is the name of both the storage engine and the database itself, and it was initially implemented  in MySQL 5.5. In MySQL 5.6 the engineers added quite a bit of new instrumentation.

The performance_schema database uses views or temporary tables that actually use little to no persistent disk storage .Memory allocation is all done at server startup, so there is no ongoing memory reallocation or sizing, which is great for performance.

I categorize the performance_schema tables into four areas:

Snap8Types of Tables in the Performance Schema

The Setup tables keep track of the logistics – who, what, and how  to monitor, and where to save this data. Its important to understand what is in these tables so you know who and what is being reported on and what can be changed.

Current Events tables contain the most recently collected monitored event.

History tables are similar to Current Events tables, but contain a  history of what has occurred over time.

Object Instances tables record what instances of objects (files, locks, mutexes, etc) are instrumented.

In this post we will take a closer look at the setup tables.  Most tables in the performance_schema are read only, but some of the setup tables allow data manipulation language (DML)  so that  configuration can be changed. Remember, however, that all tables are cleared when the server is shut down.  There are also options you can set in the .cnf file for some setup values, discussed below. There are five setup tables:

ps20

Setup tables  (Note all tables are prefixed with setup_ )

Who to Monitor  setup_actorsThe setup_actors table tells MySQL who to monitor.  It contains only three columns, HOST, USER, and ROLE (the latter is not currently used).  The performance schema looks in the setup_actors table to see if a connecting thread should be monitored.  Initially this table is set for monitoring of all foreground threads (all fields are set to %).

ps3You can insert, update  and delete rows, for example if you want to monitor only a specific user, or if you want to exclude a set of users like batch loading jobs.  Change the HOST and USER fields to indicate this.  Any changes to this table apply only to new connections.

ps21

What tables to monitor   

The setup_objects table contains the tables you are ps24monitoring.  By default, all tables are monitored except those in the mysql, performance_schema, and information _schema databases.

ps32    

If you have a table that you don’t want to monitor, for example a logging table, you can exclude it through use of this table. The default size for this table is 100 rows, but the size can be increased through the performance_schema_setup_objects_size variable.

How to modify:  Insert or update this table.

What instrumentation to collect      

The setup_instruments table indicates what  can be monitored by listing the setup_instruments“instruments” currently available in the MySQL Server. Think of the MySQL Server as being  or equipped with pieces of code that perform specified functions or measurement, called “instruments”.  Each of these coded  instruments can collect certain specific information.  Each instrument has a NAME that begins with one of four categories:  statement, stage, idle,  or wait. Instruments use a naming convention (linear taxonomy) that goes from general on the left to specific on the right, for example:

statement/sql/create_table
wait/io/file/myisam/log
  • Statement indicates a sql statement (like SELECT) or command (like CONNECT)
  • Stages is the stage of statement processing, the , like SORTING RESULT
  • Wait is an instrumented wait event, like WAIT I/O/FILE or WAIT/LOCK
  • Idle is when a connection is idle, that is,  a socket is idle if it is waiting for a request from the client

There are over 500 instruments listed in this table in 5.6. You will not normally need to collect on all the instrumentation, so each instrument can be enabled or disabled. Out of the box, less than half of these instruments are enabled (ENABLED = ‘YES’) . If an instrument is not enabled, no information is collected. When and if you need to investigate in more detail, the other instruments can be set as needed by updating the ENABLED column.  The TIMED column indicates whether that instrument is timed or not.  If it is, the unit of measurement is stored in the setup_timers table

ps7You can also control instruments at server startup with the option –performance_schema_instrument = ‘instrument_name=value’

To control an instrument at server startup, use 
an option of this form: --performance_schema_instrument='instrument_name=value'

Where To Store It

The setup_consumers table indicates where the monitored data will go.  The setup_consumerssetup_consumers table has only two columns – NAME and ENABLED.

ps1  The global_instrumentation consumer is the highest level and must be enabled for any monitoring to be collected in tables. If the global_instrumentation column is enabled, then instrumentation for global states is kept, and the next level, thread_instrumentation is checked.  If thread_instrumentation is  enabled, the Performance Schema maintains thread-specific information. Most consumers NAMES are  tables (the events%  listed above) where data will be written.  Note that several of the tables are history tables, which could be disabled if you are not interested in keeping history.

We also have digest tables.  The statements_digest contains a normalized view of SQL statements, taking out specific values, white space, and comments from similar SQL statements and grouping them together. You can then see what groups of statements are executing and how frequently they run.  If the statements_digest is enabled, this information gets written to the  events_statements_summary_by_digest table.

How to modify:  Update or use performance_schema_consumer_name=value

At the MySQL Virtual Developers Day , Mark Leith  gave a great talk explaining the MySQL performance schema.  This talk is now available online:

https://oracle.6connex.com/portal/mysql/login/

Mark also has an excellent blog:

http://www.markleith.co.uk/2012/07/13/monitoring-processes-with-performance-schema-in-mysql-5-6/

Marc Alff’s blog post :  On configuring the Performance Schema http://marcalff.blogspot.com/2013/04/on-configuring-performance-schema.html?m=1

We have excellent documentation available on the performance schema:

http://dev.mysql.com/doc/refman/5.5/en/performance-schema.html

I hope you enjoy exploring the performance_schema.

MySQL and Oracle: Playing in the Same Sandbox

Did you know that approximately 70% of Oracle customers are also using MySQL?  The use cases for each database are often different – and sometime overlap.  But the needs of a database are the same – among them are security, integration with other products, and strong monitoring.  One of the advantages of the MySQL / Oracle relationship is that Oracle is integrating MySQL with many of its other software products such as:

  • Goldengate, for real time heterogeneous replication from/to MySQL and other databases.  Goldengate real life use cases include real time replication of selected MySQL data collected online to a data warehouse in Oracle, Teradata, Neteeza, etc; query offloading from a transactional system built on Oracle, DB2 Z series, SQL Server, etc to a MySQL query instance; and real time reporting by real time replication of a subset of data from corporate applications into a dedicated MySQL data mart.  Goldengate for MySQL is available now.
  • Database Firewall – Are you worried about SQL Injection?  Database firewall acts as your first line of defense by examining all incoming SQL transactions and using a SQL grammar based technology that can categorize millions of SQL statements into a small number of SQL characteristics.  Initially, you use the software to monitor incoming transactions.  It learns about normal activity in your system.  When you are ready to go into defensive mode, Database Firewall uses this SQL whitelist to create policies so you can block, log and notify on any abnormal SQL.  This is available with MySQL now, and is also available for other databases including Oracle, SQL Server, IBM DB2 and Sybase.
  • Audit Vault will consolidate the audit stream from audit data into a secure repository with built in reporting for auditors and security personnel.  This certification will be phased in with the release of the MySQL audit API and subsequent integration with Audit Vault.
  • Oracle Secure Backup – This product provides encrypted tape backups, vault rotation, and policy driven media management and is integrated with MySQL Enterprise Backup. Secure Backup for MySQL is available now.

These are just a few of the product integrations.  At the Southern California Linux Expo (SCaLE), I had the opportunity to present this topic on MySQL Friday.   You can find my slides at Slideshare:

Playing in the Same Sandbox:  MySQL and Oracle

The Oracle product certifications and integrations will allow you to use a common set of tools for Oracle and MySQL databases, and provide MySQL additional security and cost effective use. You’lll find more information on each of these products here:
Database Firewall

Goldengate

Audit Vault

Secure Backup

Aloha – MySQL Dives into the Thread Pool

By now you have probably heard about the MySQL thread pool plugin and API, but you may not have fully processed the details. Here’s the quick summary:  With the new thread pool plugin, there is now an alternative way to handle connection threads in MySQL Enterprise Edition.  With the plugin, MySQL connection threads are shared like an extraordinarily well managed timeshare in Hawaii.  When one connection is “idle”, asking nothing of and expecting nothing from the database, another connection can use that same thread for its database requests.  Threads are released by each connection as soon as the request is completed and  go back into the pool for re-use – just like the theoretical timeshare is up for grabs on the weeks you are not there.

In the older, and still default connection thread model, threads are dedicated to a single client  for the life of the connection and there are as many threads as there are clients currently connected to the database.  This has some disadvantages when the server workload must scale to handle large numbers of connections, and the overhead can be signficant. This occurs for several reasons:

  • Lots of threads use lots of memory and can make the CPU cache ineffective
  • Too many active threads trying to execute in parallel may cause a high level of resource contention and be inappropriate for the amount of parallelism available

The new thread pool plugin offers an alternative thread pool implementation, and focuses on limiting the number of concurrent, short running statements to mazimize performance and reduce overhead.  By limiting the number of concurrent, short running statements and sharing threads, we can control the number of active threads at any one time.  Thread management has been revamped and by managing these threads in a highly efficient manner, we end up reducing overhead and maintaining performance levels as the number of users increase.

Here are the mechanics:  In the new plugin, threads are organized into groups (16 by default but configurable up to 64 on server startup).  Each group starts with one thread and can increase to a maximum of 4096 threads.  Additional threads are created only when necessary.  Each incoming connection request is assigned to a group by round robin. Each group has one listener thread that listens for incoming statement requests.

When a statement request comes in, it is executed immediately by the group’s listener thread if it is not busy and there are no other statement requests waiting.  If the statement request finishes quickly, the listener thread then efficiently returns to listening and is available to execute the next incoming request, preventing the need for a new thread to be created.   If the request does not finish quickly, it runs to completion but another thread is  created as the new listener.

If the listener thread is busy, the request is queued.  There will be a very brief time (configurable with the thread_pool_stall_limit system variable which defaults to 60 ms) while we wait to see if the currently executing statement will finish quickly or not. If it finishes quickly (under thread_pool_stall_limit), we can re-use this thread for the next request in the queue, eliminating the overhead of creating a new thread or having too many short statement trying to execute in parallel .

You can see how this thread pool design strives to have one thread executing per group at any time . The number of groups (thread_pool_size_variable) is very important, because it approximates the number of short running statements that will be executing concurrently at any one time.  Long running statements are prevented from causing other statements to wait, since if they go beyond the thread_pool_stall_limit, another thread will be started and the next request in the queue will execute on it.

Your predominant storage engine will help determine the number of groups you should have.  For InnoDB, between 16 and 36 groups seems to work well in many cases, but for MyISAM set it much lower (4-8).

There are two queues for waiting statements, low and high priority.  The low priority queue contains:

  • all statements for non-transactional storage engines
  • all statements if autocommit is enabled
  • the first statement in  an InnoDB transaction

These statements do not languish in the low priority queue forever since they will get kicked over to the high priority queue when the thread_pool_kickup_timer times them out. However, there is a maximum number of statements that can be moved per time period to keep things under control.

The high priority queue contains

  • any subsequent statements in InnoDB transactions, and
  • any statements kicked up from the low priority queue.

You can find the thread pool plugin and other commercial extensions in MySQL 5.5.16 and above, available on http://support.oracle.com and the Oracle Software Delivery Cloud https://edelivery.oracle.com .  This release contains a plugin library object file which must be placed in the appropriate directory.  The server must then be started with the –plugin-load option. Documentation  and complete install directions for the plugin can be found at http://dev.mysql.com/doc/refman/5.5/en/thread-pool-plugin.html.   There is also a thread pool API available in the Community Edition.

Happy swimming!

Lynn Ferrante has worked with databases in the enterprise for her whole career at MySQL, Oracle, Sybase, and Ingres.  She also worked on an open source project called GenMapp (University of California, San Francisco), and contributed to the development of many database applications in the fields of energy and environment

Monitoring Your MySQL Backup

In California we are always thinking about backups.   Living near an earthquake fault line makes this necessary.  For me, it is the Hayward Fault (it runs from goal post to goal post  in University of Californa Berkeley stadium).  We are strongly advised to have backup systems for water, food, and medical emergencies.  It’s necessary to monitor your food and water emergency supplies so if the worst happens, you don’t have spoiled food or water as your backup (who knew water expires?) .   Plan for the worst, hope for the best, but keep an eye on those supplies and replenish them when necessary!  And most of all, make sure your good intentions end up as actual physical supplies in the garage!

Backups are also an incredibly critical part of the enterprise  environment.  It’s all about being able to successfully restore your database when needed.  There are many ways to backup your database, but are you monitoring your MySQL backups to make sure they are going to be there when you need them?  In this post, we’ll cover the new features in MySQL Enterprise Monitor (MEM) 2.3.5  that help monitor backups.

In the last post  I covered some of the new features in MySQL Enterprise Backup (MEB) that allow you to write single file backups, stream these backups  to  remote servers or other devices, stream to media management software like Oracle Secure Backup, and take advantage of tape encryption capabilities (Steps 1-4 from the previous post ) . This is the final post  in this series which describes the new Backup Advisor in MEM and the underlying mysql tables (Step 5). 

MySQL Enterprise Monitor (MEM) is the monitoring software supplied with the Enterprise Edition subscription of MySQL  and if you are interested in trying it, you can download it  for a trial at https:edelivery.oracle.com.  I find MEM very helpful because it provides  proactive monitoring for your MySQL databases.  This allows you to increase your productivity because you automate monitoring and help speed up diagnosis of potential issues. 

MEB is a backup tool included with an enterprise subscription from Oracle/MySQL, also available for trial from http://edelivery.oracle.com.  MEB was previously known as InnoDB Hot Backup, and provides hot, non-blocking backups for InnoDB tables, and “warm” backups for MyISAM tables. 

Monitor Your Backup
MEM 2.3.5  (and above) has a new Backup Advisor that can be used to monitor backups (note, MEM 2.3.6 was released in early September).  The Backup Advisor alerts you to backup success or failure, excessive backup lock time, backups that are too old, and failure to use incremental backups.  Here are some screen shots.  The first is  from the Details and Advanced Tabs for the “Backup Succeeded” rule  in MEM:

This rule will let you know how long the backup took to complete and how long locks were held. 

Full backups that are older than a threshold number of days (default is 7) are reported since out of date backups will only cause delays or problems if you ever need to restore from them:

You will also see if any excessive lock time in your backup.  :


Backups are always a balancing act between performance, storage space, and restoral time.  Incremental backups only backup the data that has changed since the last backup, save on storage space, and are faster than a full backup.  I encourage the use of incremental backups in your backup strategy.  MEM will notify you if  incremental backups are not enabled:

 

You can customize any of the thresholds to suit your environment.

Tables Behind the Curtain

MEM uses the backup progress information written into the mysql.backup_progress table, and status information from the mysql.backup_history table.  You can query these tables to get backup status information if you are not using MEM (but you will not receive the alerts and notifications that MEM provides).

Here I’ve queried from the backup_history table, which keeps a history of the backups I’ve completed:

The backup_progress table shows the state of the backup as it progresses from start to finish:

MySQL Enterprise Backup and Enterprise Monitor new features bring us one step closer to a true enterprise backup environment. Streaming, integration with media management systems, and the ability to take advantage of tape encryption features coupled with the new Backup Advisor in MEM will help achieve that state we all need to plan for but hope to never see – the ability to quickly restore a database when needed.