How to generate one million records in SQL

By | May 18, 2023

SQL stands for Structured Query Language. It is used for designing and maintaining structured data in a relational database management system (RDBMS). Structured data are the data that is stored in the form of a table with a structure of rows and columns. SQL allows us to create, access, and manipulate data and the database with the help of query operations.

In this article, we are going to learn How to generate one million records in SQL in our MySQL database. These records are also known as sample data, which we sometimes need to test our APIs or perform some operations on larger datasets. We are going to generate the following type of record data:

  • Random data
  • Sequential data
  • Object data
Note: We are going to use MySQL (8.0.28) and MySQL Workbench. In case the queries are not working for you, you may need to change the function name used to correspond to your MySQL versions.
How to check MySQL version
Fig 1- MySQL Version

Steps to generate millions of records

Before moving further, we first create a database and name it sample_data_db to store millions of records.

MySQL Database
Fig 2- sample_data_db

We are using RECURSIVE, UNION ALL, RAND and LIMIT clauses to generate our sample records.

1. Generate million of random numbers

We will be using the following SQL query to generate one million random records: First, we need to set the cte_max_recursion_depth to 1000000. The cte_max_recursion_depth is a system variable that limits the number of recursion levels for CTEs (Common Table Expressions). The sever terminates the execution of any CTE that crosses the value of this variable. By default, its value is 1000. A recursive common table expression query is used to execute recursive operations within SQL. A recursive CTE query continuously refers to its previous result until it returns an empty result.

In the below query, temp(n) represents the result set table built recursively with a defined column, n. The first SELECT statement is called an anchor member and is used to define the first row of the recursive table. For the first row of recursive table temp, we are using the RAND() function, as RAND() will return values between 0 (inclusive) and 1 (exclusive), so we are multiplying the output with 100 and then taking its floor for ease of reading the numbers. After this, we took UNION ALL, which iteratively added the rows to the temp table. The second SELECT statement is called a recursive member, which will be responsible for building the rows of the temp table on iterations, and this second query will be executed until the execution reaches the LIMIT value of 1000000.

The third and last SELECT statement will fetch all the numbers from the recursive table temp.

SQL
/*By default, cte_max_recusion_depth is 1000, so we are increasing for
this open session to 1 million, as we want to store 1 million records,
This will make the recursion depth 1000000.*/
SET SESSION cte_max_recursion_depth=1000000;

/*Create table random_data inside sample_data_db database*/
/*temp: recursive table*/
CREATE TABLE sample_data_db.random_data AS
WITH RECURSIVE temp(n) AS (
    SELECT FLOOR(RAND() * 100)
    UNION ALL
    SELECT FLOOR(RAND() * 100) FROM temp
    LIMIT 1000000
)
SELECT * FROM temp;
How to generate one million random numbers
Fig 3- SQL Query to generate 1 million random numbers

Below is the count of all the numbers generated and stored in the random_data table. For this, we have used the COUNT() function.

COUNT() function in SQL
Fig 4- Output

2. Generate million of sequential numbers

Likewise, we can generate sequential numbers in the same manner by modifying the above query. In the below query, we are recursively calling the temp(n) table with the defined column “n“. For the first row of this table, we SELECT 1 and then UNION ALL, which will recursively add columns to the temp table. The second SELECT statement is responsible for building the rows of the temp table and the third and last SELECT statement will fetch all the numbers from the recursive table temp.

SQL
/*By default, cte_max_recusion_depth is 1000, so we are increasing for
this open session to 1 million, as we want to store 1 million records,
This will make the recursion depth 1000000.*/
SET SESSION cte_max_recursion_depth=1000000;

/*Create table sequential_data inside sample_data_db database*/
/*temp: recursive table*/
CREATE TABLE sample_data_db.sequential_data AS
WITH RECURSIVE temp(n) AS (
    SELECT 1
    UNION ALL
    SELECT n+1 FROM temp
    LIMIT 1000000
)
SELECT * FROM temp;
How to generate one million sequential data
Fig 5- SQL Query to generate 1 million sequential numbers

Below is the output for the sequential number generated with the help of the above SQL query.Below is the output for the sequential number generated with the help of the above SQL query.

SELECT statement in SQL
Fig 6- Output

3. Generate million of object data

The object data is like an entity with a few fields. For example, an employee or student might have fields like name, age, etc. Here, we are going to generate millions of records for students with fields such as ID, name, and age. As we are generating random data for testing purposes, we have repetitive data in our records.

First, we need to create a table student in which we will fill in a few random records that will be used to generate our millions of repetitive records.

SQL
CREATE TABLE sample_data_db.student(
id INT PRIMARY KEY,
name VARCHAR(10),
age INT );

INSERT INTO sample_data_db.student(id, name, age) VALUES
(1, 'Shivangi', 14),
(2, "Rahul", 15),
(3, "Priya", 14),
(4, "Rohan", 17),
(5, "Shubham", 18),
(6, "Anushka", 17);
How to create table in SQL
Fig 7- Student table

In the below query, we are creating the student_data table, which will hold the millions of repetitive student records from the above-created table, student. We use recursive CTEs to generate a result set table temp(id, idx, name, age) built recursively. The first SELECT statement is used to create the first row of the temp table, which sets the values (1, 1, ‘Bhoomika’, 20) to the first row. After this, we took UNION ALL, which iteratively added the rows to the temp table. The second SELECT statement takes the ID from the first row and increases it by 1. As we have a total of 6 rows in our student table, we are using the RAND() function and performing modulo operations to get values between 1 and 6, then adding 1 to them and assigning them to temporary idx. The third SELECT statement is used to build the name field of the temp() table, and then we are building the age field as we did while generating the random numbers in the above steps.

SQL
/*By default, cte_max_recusion_depth is 1000, so we are increasing for
this open session to 1 million, as we want to store 1 million records,
This will make the recursion depth 1000000.*/
SET SESSION cte_max_recursion_depth=1000000;

/*Create table student_data inside sample_data_db database*/
/*temp: recursive table*/
CREATE TABLE sample_data_db.student_data AS
WITH RECURSIVE temp(id, idx, name, age) AS(
    SELECT 1, 1, 'Bhoomika', 20
    UNION ALL
    SELECT
        temp.id + 1 AS id,
        ABS(RAND() % 6) + 1 AS idx,
        (SELECT name FROM sample_data_db.student WHERE id = idx) AS name,
        FLOOR(ABS(RAND() * (20-10 + 1)) +10 ) AS age
    FROM temp
    LIMIT 1000000
)
SELECT id, name, age FROM temp;
How to generate one million object data in SQL
Fig 8- SQL Query to generate 1 million object data

Below is the output for the object data generated with the help of the above SQL query.

Fig 9- Output

Leave a Reply

Your email address will not be published. Required fields are marked *