When developing and optimizing MySQL databases, we often face the choice of using JSON fields or relational tables to manage data. This article will explore performance optimization methods for using JSON fields and relational tables in MySQL.

The advantages of JSON fields are compact data structure, reduced number of tables, and fewer join operations. However, parsing JSON fields incurs significant overhead, has limited index support, and may have lower query efficiency on large datasets.

The advantages of relational tables are clear relationship models, comprehensive index support, and efficient query optimization. However, it requires additional tables and join operations, which are typically efficient in relational databases.

In scenarios with millions and tens of millions of users, what are the advantages and disadvantages of these two approaches in terms of efficient querying and flexibility? Here are the considerations:

Scenario: A membership system that allows members to create organizations and share them with other members. In the struct of Organization, Permissions are used to record member IDs and roles. Under the organization, the ability to add courses and units while allowing member sharing at different levels needs to be maintained.

Query Efficiency of JSON Fields in MySQL

First, let’s consider the case of organizations being shared with other members.

JSON data type:

MySQL 5.7 and higher versions support the JSON data type. The JSON_CONTAINS function can be used to query records that contain specific sub-objects. Query example:


type Permission struct {
    UserID string `json:"user_id"`
    Role   string `json:"role"`
}

type Organization struct {
    Id              string       `gorm:"primaryKey;size:24"`
    Title           string       `gorm:"size:255"`
    Order           int32        `gorm:"size:10"`
    SourceLanguage  string       `gorm:"size:255"`
    TargetLanguage  string       `gorm:"size:255"`
    CreaterId       string       `gorm:"size:24;index;comment:'only note the creator id, not the permission'"`
    Permissions     string       `gorm:"type:json;column:permissions"`
    CreatedAt       time.Time
    UpdatedAt       time.Time
    DeletedAt       gorm.DeletedAt `gorm:"index"`
}

Querying can be done using JSON_CONTAINS to search for organizations created or with permissions for a specific member.

SELECT * FROM origanizations WHERE JSON_CONTAINS(permissions, '{"user_id": "user123"}');

Performance considerations: Index: MySQL supports virtual generated columns and indexes on JSON fields, which can significantly improve query performance. Parsing and matching: JSON queries require parsing each matching record, so the parsing cost is high on large datasets.

Query Efficiency Using Relational Tables

Data model: Create a relational table (e.g., organization_permissions) to store the relationship between origanization and user permissions.

CREATE TABLE organization_permissions (
    organization_id INT NOT NULL,
    user_id VARCHAR(255) NOT NULL,
    role VARCHAR(50),
    PRIMARY KEY (organization_id, user_id),
    FOREIGN KEY (organization_id) REFERENCES organization (id)
);

SELECT organization.*
FROM organization
JOIN organization_permissions ON organization.id = organization_permissions.organization_id
WHERE organization_permissions.user_id = 'user123';

Performance considerations: Index: Indexes can be created on the user_id column of the organization_permissions table to speed up queries. Query optimization: The query optimizer of a relational database can efficiently handle join queries.

Performance Comparison

JSON field queries:

Pros: Compact data structure, reduced number of tables and join operations. Cons: Parsing JSON fields incurs significant overhead, limited index support, and potentially lower query efficiency on large datasets.

Relational table queries:

Pros: Clear relationship model, comprehensive index support, and efficient query optimization. Cons: Requires additional tables and join operations, but these operations are typically efficient in relational databases.

Performance at the Million and Ten Million User Levels

Million-level:

The performance difference between JSON field queries and relational table queries is not significant, but relational table queries usually have an advantage, especially with index support. The query optimizer for relational tables can efficiently handle large amounts of data.

Ten million-level:

Parsing overhead for JSON field queries may significantly increase, affecting query performance. Relational table queries usually perform better with index support because indexes can significantly speed up lookup operations.

Recommendation: For systems with a large number of users (millions or tens of millions), it is recommended to use relational tables to manage permission data. The reasons are as follows:

Query performance: The indexes and query optimizer of a relational database can efficiently handle large-scale data, ensuring efficient querying. Data management: Clear relationship models make data management and maintenance simpler. Scalability: It is easier to horizontally and vertically scale to accommodate growing data volume and query demands.

Considerations for Expansion

If the organization needs to support the addition of courses and units while allowing member sharing at different levels

Managing Multiple Types with Separate Permission Tables

In the case of non-separation of permissions, the main advantage is simplicity, with only one permission table, making it easier to manage and maintain. Unified querying: When querying permissions, there is no need to determine the specific entity type, only one table needs to be queried.

If, in the case of a small number of members, organizations, courses, and units need to support shared permissions (such as admin, editor, view), separate permission tables can be created for each entity type, or a common permission table can be created to manage permissions for different entities. For consistency and simplicity in design, it is recommended to use a common permission table.

Index complexity: Since the table contains permission records for all entity types, index design and optimization become more complex, which may affect query performance.

A separate Permission can be designed, with the EntityType field used to distinguish different entity types (such as organization, course, unit). The EntityId field stores the ID of the entity, and the Permission field stores the permission type (such as admin, editor, view).

Organization, Course, and Unit tables represent organizations, courses, and units, respectively. Each table has a CreatorId field for recording the ID of the creator.

// Permission represents the permissions a member has on various entities.
type Permission struct {
    Id             string    `gorm:"primaryKey;size:24"`
    MemberId       string    `gorm:"size:24;index"`
    EntityType     string    `gorm:"size:50"` // Type of entity: organization, course, unit
    EntityId       string    `gorm:"size:24;index"`
    Permission     string    `gorm:"size:50"` // Permission type: admin, editor, view
    CreatedAt      time.Time
    UpdatedAt      time.Time
    DeletedAt      gorm.DeletedAt `gorm:"index"`
}

When using a common permission table, it is easy to query a member’s permissions for different entities. For example, to query a member’s permissions in a specific organization:

var permissions []Permission
db.Where("member_id = ? AND entity_type = ? AND entity_id = ?", memberId, "organization", organizationId).Find(&permissions)

The disadvantage is that in the case of millions of members, a single permission table may contain tens of millions of records, and query and update operations may become slow, resulting in significant performance bottlenecks. Although indexes can speed up queries, maintaining indexes also incurs overhead, affecting write operation performance.

Managing Multiple Types with Separated Permissions

In the case of millions of members, using separated permissions can result in each permission table having a relatively small number of records, for example, the organization permission table may only have a few million records. Query and update operations have better performance, and indexes are more efficient. Concurrent access performance is higher, reducing lock contention.

However, in the case of tens of millions of members, a single permission table may contain billions of records, resulting in significantly reduced query and update operation performance. It requires comprehensive consideration of database design, index optimization, query optimization, caching mechanisms, partitioning strategies, and load balancing, among other aspects. For example, vertical partitioning: separate permission tables for different entity types to reduce the data volume and complexity of a single table. Horizontal partitioning: partitioning large tables based on a field, such as time, geography, or hash partitioning. This distributes data across multiple partitions, improving query and write performance, and so on.

Here, we have considered the scenario of millions of members, and it may be more beneficial for performance and scalability to partition the Permission table. By creating separate permission tables for each entity type, we can avoid the performance bottleneck of a single permission table and optimize the indexes and queries for each table more flexibly. For example, in the following example, separate permission tables are created for Organization, Course, and Unit: OrganizationPermission, CoursePermission, and UnitPermission. Each permission table records the ID of the relevant entity (OrganizationId, CourseId, UnitId), the ID of the member (MemberId), and the permission type (Permission).


// OrganizationPermission represents the permissions a member has on an organization.
type OrganizationPermission struct {
    OrganizationId string    `gorm:"size:24;index"`
    MemberId       string    `gorm:"size:24;index"`
    Permission     string    `gorm:"size:50"` // Permission type: admin, editor, view
    CreatedAt      time.Time
    UpdatedAt      time.Time
    DeletedAt      gorm.DeletedAt `gorm:"index"`
}

// CoursePermission represents the permissions a member has on a course.
type CoursePermission struct {
    CourseId   string    `gorm:"size:24;index"`
    MemberId   string    `gorm:"size:24;index"`
    Permission string    `gorm:"size:50"` // Permission type: admin, editor, view
    CreatedAt  time.Time
    UpdatedAt  time.Time
    DeletedAt  gorm.DeletedAt `gorm:"index"`
}

// UnitPermission represents the permissions a member has on a unit.
type UnitPermission struct {
    UnitId     string    `gorm:"size:24;index"`
    MemberId   string    `gorm:"size:24;index"`
    Permission string    `gorm:"size:50"` // Permission type: admin, editor, view
    CreatedAt  time.Time
    UpdatedAt  time.Time
    DeletedAt  gorm.DeletedAt `gorm:"index"`
}

However, in summary, for systems with a large number of users, it is recommended to use relational tables to manage permission data. The indexes and query optimizer of a relational database can efficiently handle large-scale data, ensuring efficient querying. Additionally, the clear relationship model makes data management and maintenance simpler, and it is easier to scale the system.

Through proper database design and optimization, it is possible to ensure good performance and scalability in systems with millions and tens of millions of users.