The Fastest Way to Import Text, XML, and CSV Files into MySQL Tables

By: Morpheus Data

One of the best ways to improve the performance of MySQL databases is to determine the optimal approach for importing data from other sources, such as text files, XML, and CSV files. The key is to correlate the source data with the table structure.

Data is always on the move: from a Web form to an order-processing database, from a spreadsheet to an inventory database, or from a text file to customer list. One of the most common MySQL database operations is importing data from such an external source directly into a table. Data importing is also one of the tasks most likely to create a performance bottleneck.

The basic steps entailed in importing a text file to a MySQL table are covered in a Stack Overflow post from November 2012: first, use the LOAD DATA INFILE command.

The basic MySQL commands for creating a table and importing a text file into the table. Source: Stack Overflow

Note that you may need to enable the parameter “–local-infile=1” to get the command to run. You can also specify which columns the text file loads into:

This MySQL command specifies the columns into which the text file will be imported. Source: Stack Overflow

In this example, the file’s text is placed into variables “@col1, @col2, @col3,” so “myid” appears in column 1, “mydecimal” appears in column 3, and column 2 has a null value.

The table resulting when LOAD DATA is run with the target column specified. Source: Stack Overflow

The fastest way to import XML files into a MySQL table

As Database Journal’s Rob Gravelle explains in a March 17, 2014, article, stored procedures would appear to be the best way to import XML data into MySQL tables, but after version 5.0.7, MySQL’s LOAD XML INFILE and LOAD DATA INFILE statements can’t run within a Stored Procedure. There’s also no way to map XML data to table structures, among other limitations.

However, you can get around most of these limitations if you can target the XML file using a rigid and known structure per proc. The example Gravelle presents uses an XML file whose rows are all contained within an file, and whose columns are represented by a named attribute:

You can use a stored procedure to import XML data into a MySQL table if you specify the table structure beforehand. Source: Database Journal

The table you’re importing to has an int ID and two varchars: because the ID is the primary key, it can’t have nulls or duplicate values; last_name allows duplicates but not nulls; and first_name allows up to 100 characters of nearly any data type.

The MySQL table into which the XML file will be imported has the same three fields as the file. Source: Database Journal

Gravelle’s approach for overcoming MySQL’s import restrictions uses the “proc-friendly” Load_File() and ExtractValue() functions.

MySQL’s XML-import limitations can be overcome by using the Load_file() and ExtractValue() functions. Source: Database Journal

Benchmarking techniques for importing CSV files to MySQL tables

When he tested various ways to import a CSV file into MySQL 5.6 and 5.7, Jaime Crespo discovered a technique that he claims improves the import time for MyISAM by 262 percent to 284 percent, and for InnoDB by 171 percent to 229 percent. The results of his tests are reported in an October 8, 2014, post on Crespo’s MySQL DBA for Hire blog.

Crespo’s test file was more than 3GB in size and had nearly 47 million rows. One of the fastest methods in Crespo’s tests was by grouping queries in a multi-insert statement, which is used by “mysqldump”. Crespo also attempted to improve LOAD DATA performance by augmenting the key_cache_size and by disabling the Performance Schema.

Crespo concludes that the fastest way to load CSV data into a MySQL table without using raw files is to use LOAD DATA syntax. Also, using parallelization for InnoDB boosts import speeds.

Whether you’re a CIO, IT leader or DevOps ninja, you won’t find a more straightforward way to monitor your MySQL, MongoDB, Redis, and ElasticSearch databases than by using the dashboard interface of the Morpheus Data platform as a service (Paas) (check out this infographic to learn more about PaaS). Morpheus is the first and only infrastructure agnostic cloud management solution, as well as the only PaaS solution to support SQL, NoSQL, and in-memory databases. 

You can provision, deploy, and host your databases from a single dashboard. The service includes a free full replica set for each database instance, as well as automatic daily backups of MySQL and Redis databases. Visit the Morpheus site to sign up for a free demo!

Watch this 2-minute intro video to see how Morpheus Data can save you time, money and sanity.