See Jeff Barr’s blog post:
You can now create MySQL, PostgreSQL, and Oracle Relational Database Service (RDS) database instances with up to 6TB of storage and SQL Server RDS database instances with up to 4TB of storage when using the Provisioned IOPS and General Purpose (SSD) storage types.
Existing MySQL, PostgreSQL, and Oracle RDS database instances can be scaled to these new database storage limits without any downtime.
Read more about this at http://bit.ly/awsrdsstorage
This week Amazon Web Services added transparent data at rest encryption capabilities to their Relation Database Service (RDS) MySQL and PostgreSQL offerings. No code changes required! Customer managed keys are used through the AWS Key Management Service, which provides for key creation and rotation, usage policies, and key auditing.
You can learn more about this fantastic new feature for both RDS MySQL and RDS PostgreSQL here: bit.ly/rdsencryption
and more about MySQL and PostgreSQL on RDS here: http://bit.ly/rdsoverview
I recently had to do a backup of a 30 very large databases within a MySQL instance that had a total of 60+ databases. I did NOT want to back up the other databases. In fact, the other databases had very large tables in them and I had very little disk space, so I could not afford to back them up. In this post I will share what I learned.
I’ve simplified the problem to illustrate the idea. My goal in this post is to backup one database when an instance has many, with mixed InnoDB and MyIsam tables.
- I have 3 databases called dbtest1, dbtest2, and dbtest3, in one instance of MySQL . I only want to back up the tables in database dbtest2
- Each database has one InnoDB and one MyISAM table
- The InnoDB tables are quite large, and I have limited disk space for the backup
- I am using file-per-table option for all InnoDB tables
The –databases Option
First I tried the–databases=dbname option in the mysqlbackup command line. This option filters all non-innoDB files, including FRM, MYD, MYI etc. .
mysqlbackup –u username –p –databases=”dbtest2” –backup_dir=yourdirectory backup
This translates to:
- Backup all the non-InnoDB files (including frm files) in the database I’ve specified (dbtest2)
- Do NOT backup the non-InnoDB files in any other databases
- Back up all IBD files in the instance
The –include Option
Next I added the –include option to filter IBD files in other databases. It’s important to think about the—include option as filtering IBD files (not InnoDB tables, which in my mind include IBD and FRM files)
mysqlbackup –u username –p –databases=”dbtest2” –include=”dbtest2” backup_dir=yourdirectory backup
This translates to:
- Backup all the non-InnoDB files (including frm files) in the database I’ve specified (dbtest2), as before
- Do NOT backup the non-InnoDB files in any other databases, as before
- Back up all IBD files only in the dbtest2 database
This gave me exactly what I wanted – one directory with all the IBD, FRM, MYD, and MYI files for dbtest2. All other databases are completely ignored, so I did not run out of disk!.
Note that the –include syntax does not permit you to list more than one database at a time. However, it does allow you to use regular expression syntax such as :
–include=db. (include all databases starting with db followed by any character)
–include=dbtest (include dbtest1 and dbtest2)
Consider trying compression – in my original scenario, I also used compression which was a great help and compressed my 1300 gigabyte files 80% in the backup. Of course the downside to compression is you must decompress to restore, but it is definitely worth considering if you have space limitations.
If you haven’t explored the MySQL Performance Schema yet, this is a good place to start. This is Performance Schema 101, a basic introduction to the MySQL 5.6 performance_schema, which records runtime statistics from the MySQL database. The performance_schema is intended to provide access to useful information about server execution while having minimal impact on server performance. Performance_schema is the name of both the storage engine and the database itself, and it was initially implemented in MySQL 5.5. In MySQL 5.6 the engineers added quite a bit of new instrumentation.
The performance_schema database uses views or temporary tables that actually use little to no persistent disk storage .Memory allocation is all done at server startup, so there is no ongoing memory reallocation or sizing, which is great for performance.
I categorize the performance_schema tables into four areas:
The Setup tables keep track of the logistics – who, what, and how to monitor, and where to save this data. Its important to understand what is in these tables so you know who and what is being reported on and what can be changed.
Current Events tables contain the most recently collected monitored event.
History tables are similar to Current Events tables, but contain a history of what has occurred over time.
Object Instances tables record what instances of objects (files, locks, mutexes, etc) are instrumented.
In this post we will take a closer look at the setup tables. Most tables in the performance_schema are read only, but some of the setup tables allow data manipulation language (DML) so that configuration can be changed. Remember, however, that all tables are cleared when the server is shut down. There are also options you can set in the .cnf file for some setup values, discussed below. There are five setup tables:
Setup tables (Note all tables are prefixed with setup_ )
Who to Monitor The setup_actors table tells MySQL who to monitor. It contains only three columns, HOST, USER, and ROLE (the latter is not currently used). The performance schema looks in the setup_actors table to see if a connecting thread should be monitored. Initially this table is set for monitoring of all foreground threads (all fields are set to %).
You can insert, update and delete rows, for example if you want to monitor only a specific user, or if you want to exclude a set of users like batch loading jobs. Change the HOST and USER fields to indicate this. Any changes to this table apply only to new connections.
What tables to monitor
If you have a table that you don’t want to monitor, for example a logging table, you can exclude it through use of this table. The default size for this table is 100 rows, but the size can be increased through the performance_schema_setup_objects_size variable.
How to modify: Insert or update this table.
What instrumentation to collect
The setup_instruments table indicates what can be monitored by listing the “instruments” currently available in the MySQL Server. Think of the MySQL Server as being or equipped with pieces of code that perform specified functions or measurement, called “instruments”. Each of these coded instruments can collect certain specific information. Each instrument has a NAME that begins with one of four categories: statement, stage, idle, or wait. Instruments use a naming convention (linear taxonomy) that goes from general on the left to specific on the right, for example:
- Statement indicates a sql statement (like SELECT) or command (like CONNECT)
- Stages is the stage of statement processing, the , like SORTING RESULT
- Wait is an instrumented wait event, like WAIT I/O/FILE or WAIT/LOCK
- Idle is when a connection is idle, that is, a socket is idle if it is waiting for a request from the client
There are over 500 instruments listed in this table in 5.6. You will not normally need to collect on all the instrumentation, so each instrument can be enabled or disabled. Out of the box, less than half of these instruments are enabled (ENABLED = ‘YES’) . If an instrument is not enabled, no information is collected. When and if you need to investigate in more detail, the other instruments can be set as needed by updating the ENABLED column. The TIMED column indicates whether that instrument is timed or not. If it is, the unit of measurement is stored in the setup_timers table
To control an instrument at server startup, use an option of this form: --performance_schema_instrument='instrument_name=value'
Where To Store It
The global_instrumentation consumer is the highest level and must be enabled for any monitoring to be collected in tables. If the global_instrumentation column is enabled, then instrumentation for global states is kept, and the next level, thread_instrumentation is checked. If thread_instrumentation is enabled, the Performance Schema maintains thread-specific information. Most consumers NAMES are tables (the events% listed above) where data will be written. Note that several of the tables are history tables, which could be disabled if you are not interested in keeping history.
We also have digest tables. The statements_digest contains a normalized view of SQL statements, taking out specific values, white space, and comments from similar SQL statements and grouping them together. You can then see what groups of statements are executing and how frequently they run. If the statements_digest is enabled, this information gets written to the events_statements_summary_by_digest table.
How to modify: Update or use performance_schema_consumer_name=value
At the MySQL Virtual Developers Day , Mark Leith gave a great talk explaining the MySQL performance schema. This talk is now available online:
Mark also has an excellent blog:
We have excellent documentation available on the performance schema:
I hope you enjoy exploring the performance_schema.
Did you know that approximately 70% of Oracle customers are also using MySQL? The use cases for each database are often different – and sometime overlap. But the needs of a database are the same – among them are security, integration with other products, and strong monitoring. One of the advantages of the MySQL / Oracle relationship is that Oracle is integrating MySQL with many of its other software products such as:
- Goldengate, for real time heterogeneous replication from/to MySQL and other databases. Goldengate real life use cases include real time replication of selected MySQL data collected online to a data warehouse in Oracle, Teradata, Neteeza, etc; query offloading from a transactional system built on Oracle, DB2 Z series, SQL Server, etc to a MySQL query instance; and real time reporting by real time replication of a subset of data from corporate applications into a dedicated MySQL data mart. Goldengate for MySQL is available now.
- Database Firewall – Are you worried about SQL Injection? Database firewall acts as your first line of defense by examining all incoming SQL transactions and using a SQL grammar based technology that can categorize millions of SQL statements into a small number of SQL characteristics. Initially, you use the software to monitor incoming transactions. It learns about normal activity in your system. When you are ready to go into defensive mode, Database Firewall uses this SQL whitelist to create policies so you can block, log and notify on any abnormal SQL. This is available with MySQL now, and is also available for other databases including Oracle, SQL Server, IBM DB2 and Sybase.
- Audit Vault will consolidate the audit stream from audit data into a secure repository with built in reporting for auditors and security personnel. This certification will be phased in with the release of the MySQL audit API and subsequent integration with Audit Vault.
- Oracle Secure Backup – This product provides encrypted tape backups, vault rotation, and policy driven media management and is integrated with MySQL Enterprise Backup. Secure Backup for MySQL is available now.
These are just a few of the product integrations. At the Southern California Linux Expo (SCaLE), I had the opportunity to present this topic on MySQL Friday. You can find my slides at Slideshare:
The Oracle product certifications and integrations will allow you to use a common set of tools for Oracle and MySQL databases, and provide MySQL additional security and cost effective use. You’lll find more information on each of these products here:
By now you have probably heard about the MySQL thread pool plugin and API, but you may not have fully processed the details. Here’s the quick summary: With the new thread pool plugin, there is now an alternative way to handle connection threads in MySQL Enterprise Edition. With the plugin, MySQL connection threads are shared like an extraordinarily well managed timeshare in Hawaii. When one connection is “idle”, asking nothing of and expecting nothing from the database, another connection can use that same thread for its database requests. Threads are released by each connection as soon as the request is completed and go back into the pool for re-use – just like the theoretical timeshare is up for grabs on the weeks you are not there.
In the older, and still default connection thread model, threads are dedicated to a single client for the life of the connection and there are as many threads as there are clients currently connected to the database. This has some disadvantages when the server workload must scale to handle large numbers of connections, and the overhead can be signficant. This occurs for several reasons:
- Lots of threads use lots of memory and can make the CPU cache ineffective
- Too many active threads trying to execute in parallel may cause a high level of resource contention and be inappropriate for the amount of parallelism available
The new thread pool plugin offers an alternative thread pool implementation, and focuses on limiting the number of concurrent, short running statements to mazimize performance and reduce overhead. By limiting the number of concurrent, short running statements and sharing threads, we can control the number of active threads at any one time. Thread management has been revamped and by managing these threads in a highly efficient manner, we end up reducing overhead and maintaining performance levels as the number of users increase.
Here are the mechanics: In the new plugin, threads are organized into groups (16 by default but configurable up to 64 on server startup). Each group starts with one thread and can increase to a maximum of 4096 threads. Additional threads are created only when necessary. Each incoming connection request is assigned to a group by round robin. Each group has one listener thread that listens for incoming statement requests.
When a statement request comes in, it is executed immediately by the group’s listener thread if it is not busy and there are no other statement requests waiting. If the statement request finishes quickly, the listener thread then efficiently returns to listening and is available to execute the next incoming request, preventing the need for a new thread to be created. If the request does not finish quickly, it runs to completion but another thread is created as the new listener.
If the listener thread is busy, the request is queued. There will be a very brief time (configurable with the thread_pool_stall_limit system variable which defaults to 60 ms) while we wait to see if the currently executing statement will finish quickly or not. If it finishes quickly (under thread_pool_stall_limit), we can re-use this thread for the next request in the queue, eliminating the overhead of creating a new thread or having too many short statement trying to execute in parallel .
You can see how this thread pool design strives to have one thread executing per group at any time . The number of groups (thread_pool_size_variable) is very important, because it approximates the number of short running statements that will be executing concurrently at any one time. Long running statements are prevented from causing other statements to wait, since if they go beyond the thread_pool_stall_limit, another thread will be started and the next request in the queue will execute on it.
Your predominant storage engine will help determine the number of groups you should have. For InnoDB, between 16 and 36 groups seems to work well in many cases, but for MyISAM set it much lower (4-8).
There are two queues for waiting statements, low and high priority. The low priority queue contains:
- all statements for non-transactional storage engines
- all statements if autocommit is enabled
- the first statement in an InnoDB transaction
These statements do not languish in the low priority queue forever since they will get kicked over to the high priority queue when the thread_pool_kickup_timer times them out. However, there is a maximum number of statements that can be moved per time period to keep things under control.
The high priority queue contains
- any subsequent statements in InnoDB transactions, and
- any statements kicked up from the low priority queue.
You can find the thread pool plugin and other commercial extensions in MySQL 5.5.16 and above, available on http://support.oracle.com and the Oracle Software Delivery Cloud https://edelivery.oracle.com . This release contains a plugin library object file which must be placed in the appropriate directory. The server must then be started with the –plugin-load option. Documentation and complete install directions for the plugin can be found at http://dev.mysql.com/doc/refman/5.5/en/thread-pool-plugin.html. There is also a thread pool API available in the Community Edition.
Lynn Ferrante has worked with databases in the enterprise for her whole career at MySQL, Oracle, Sybase, and Ingres. She also worked on an open source project called GenMapp (University of California, San Francisco), and contributed to the development of many database applications in the fields of energy and environment