Database Guidelines for Golang Projects
Introduction
This document provides comprehensive guidelines for database design and interaction within Golang projects. It covers naming conventions, table design principles, SQL writing best practices, transaction management, and performance considerations. Adhering to these guidelines will promote consistency, maintainability, and optimal performance in your database interactions.
Scope
These guidelines primarily focus on MySQL databases, but many principles are applicable to other relational database systems as well. Specific instructions for Redis (caching layer) can be found in separate documentation.
Target Audience
These guidelines are intended for all developers working on Golang backend projects that interact with databases.
Naming Conventions
Strict adherence to naming conventions is mandatory. Review these conventions before designing database resources.
General Considerations:
- Avoid using any MySQL keywords or reserved words for database, table, field, or index names. For example, don't use
type
as a field name; useuser_type
instead. - Use only English for naming and comments. Avoid non-English characters.
- Keep names concise and descriptive.
Database (DB)
- Format:
lower_case_with_underscores
with suffix_db
. Example:account_db
,rent_account_db
,gop_txn_db
,rent_admin_db
- Prefix: Use the project code as a prefix. Example:
rent_account_db
- Region Specific: If the database is region-specific, add the region code before
_db
. Example:rent_admin_vn_db
- Sharding: For sharded databases, append the shard ID as a suffix after
_db
, padding to 8 digits. Follow the naming conventions in the rent Database Design Guide. Example:rent_account_db_00000000
,log_db_00002018
- Length: The length of the database name should be less than 42 characters due to DNS restrictions when connecting to the database.
Table
- Format:
lower_case_with_underscores
with suffix_tab
. Example:user_tab
- Sharding: For sharded tables, append the shard ID as a suffix after
_tab
, padding to 8 digits, starting from 0. Example:account_tab_00000000
,order_tab_00001000
,login_log_tab_00201801
,audit_log_tab_20180101
- Length: The length of the table name should be less than 48 characters.
Field
- Format:
lower_case_with_underscores
. Example:order_id
,create_time
- Foreign Key Reference: If the field is a foreign key referencing another table's integer primary key, suffix it with
_id
.
Index
- Format:
lower_case_with_underscores
, joined keys with prefixidx_
. Example:idx_order_id
,idx_key1_key2
- Unique Index: For unique indices, use the prefix
uniq_idx_
. Example:uniq_idx_email
Table Design Guidelines
These guidelines are essential to follow before creating a database schema.
Category | Rule | Description |
---|---|---|
Charset & Engine | MUST | Charset: utf8mb4 , Engine: InnoDB . utf8mb4 is essential for supporting a wide range of characters. |
Table Collation | RECOMMENDED | Table collation should typically use utf8mb4_unicode_ci (recommended for most cases) or latin1_general_ci (only for specific cases with legacy systems or performance reasons and after careful consideration). |
Field Collation | MUST | Use *_cs (case-sensitive) in the field name if the field requires case-sensitive collation. Example: email_cs . Be mindful of performance implications with case-sensitive collations. |
Primary Key | MUST | If the primary key is not sequential and data is not deleted frequently, create an auto-incrementing primary key column named id of type BIGINT UNSIGNED NOT NULL AUTO_INCREMENT . This aids in indexing and performance. |
MUST | Avoid using foreign keys in MySQL due to performance overhead and complexities in table sharding. Handle relationships in application code. | |
Table Sharding | RECOMMENDED | Use table sharding when data size exceeds 10GB or the number of records exceeds 10 million. Consider the sharding strategy carefully based on data access patterns. |
Triggers/Stored Procedures | MUST | Avoid using events, triggers, or stored procedures. Complex logic should reside in the application layer for better maintainability and portability. |
Common Columns | MUST | Columns like user_id , shop_id , order_id , item_id must be defined as BIGINT(20) NOT NULL . Use BIGINT to accommodate large numbers. |
Timestamp | MUST | Use INT UNSIGNED or BIGINT UNSIGNED for UNIX timestamps in seconds. If using UNIX timestamps in milliseconds, use BIGINT UNSIGNED . Choose the appropriate type based on the expected range of values. |
String Types | MUST | Use CHAR for fixed-length strings and VARCHAR for variable-length strings. Use TEXT , MEDIUMTEXT , LONGTEXT , or BLOB for strings longer than 5000 ( or variable size ) characters. Match the text length to the most suitable type and minimize size growth. |
RECOMMENDED | InnoDB page size: 16k. Consider separating frequently accessed columns from less frequently accessed large columns to optimize disk I/O. | |
Numeric Types | RECOMMENDED | Use numeric types that don't lose precision (INT , BIGINT , DECIMAL ) over types that may cause precision loss (FLOAT , DOUBLE ). Prefer UNSIGNED types where negative values are not required to increase the maximum representable positive value. |
Default Values | RECOMMENDED | Don't set default values unless genuinely necessary, especially when adding new fields to existing tables to prevent unnecessary schema changes during upgrades/rollbacks. Instead, handle defaults in application logic. |
Nullability | RECOMMENDED | Avoid NULL unless it represents a meaningful "unknown" or "missing" state. Use NOT NULL with a default value where applicable (except for TEXT , MEDIUMTEXT , LONGTEXT , BLOB ). This improves query performance and reduces complexity. |
ENUM Type | MUST | Do not use ENUM type. Use VARCHAR with a CHECK constraint or a lookup table for better flexibility and maintainability. |
Add Column | MUST | When adding a new column, don't specify location (FIRST / AFTER ). Append the new column as the last one. Explicitly specifying column order can cause issues with schema evolution. |
Indexes | RECOMMENDED | Create indexes on columns with high selectivity (distinct values). Put the most selective column first in composite indexes. Avoid creating indexes on every column, as this can degrade write performance. |
RECOMMENDED | Create prefix indexes for long VARCHAR/TEXT columns. Example: INDEX idx_long_text (long_text(100)); This indexes only the first 100 characters, saving space and improving performance. | |
JSON Columns | RECOMMENDED | MySQL 5.7+ supports JSON columns. Consider using them for storing semi-structured data, but be aware of performance implications. Index specific elements within the JSON document using virtual columns. |
SQL Usage - Queries | MUST | Avoid implicit data-type conversion in WHERE clauses. This can prevent indexes from being used. Example: WHERE id = 123 (integer) is preferred over WHERE id = '123' (string). |
MUST | Do not use INSERT ... ON DUPLICATE KEY UPDATE (can cause deadlocks in MySQL 5.7, especially older versions like 5.7.26. Fixed in later versions. Use INSERT IGNORE or separate SELECT and UPDATE statements instead to handle conflict resolution.). | |
MUST | Avoid using subqueries (especially correlated subqueries), WHERE NOT IN , != , NOT LIKE (especially leading wildcards), and WHERE LIKE '%abc%' . These constructs often lead to poor performance. Explore alternatives like joins, EXISTS , and full-text search. | |
MUST | For efficient pagination, avoid using OFFSET with large LIMIT values. Use cursor-based pagination (e.g., filtering based on the last seen ID or timestamp). Always include an ORDER BY clause with LIMIT . Ensure that the ORDER BY clause includes a unique column to prevent skipping or duplicating results. | |
RECOMMENDED | Always use ORDER BY for consistent sorting of records when ordering is logically required. The order of records is not guaranteed without an explicit ORDER BY clause. | |
Cache Layer | RECOMMENDED | For user-facing applications, implement a caching layer (e.g., Redis) to reduce database load and improve response times. Implement rate limiting to prevent abuse. Consider read-write separation to direct reads to replicas and writes to the primary database instance. |
Delete/Update | RECOMMENDED | When writing scripts for data manipulation, ensure operations don't exceed 1000 rows per second and maintain a 1-second interval between operations. Ensure that the MySQL instance's QPS (Queries Per Second) remains below 3k/s to avoid overloading the server. |
Transaction Management | MUST | For transactions, split large transactions into smaller batches of updates to minimize lock contention and improve concurrency. |
RECOMMENDED | Minimize external dependencies (e.g., RPC calls, Codis/Copi2 operations) within transactions to reduce the risk of failures and long-running transactions. | |
Transaction Isolation | RECOMMENDED | Ensure appropriate isolation levels for transactions, adhering to the ACID principles. The default isolation level for MySQL is REPEATABLE READ (RR ). Consider using READ COMMITTED for read-heavy workloads and shorter transactions, but be aware of the possibility of non-repeatable reads. |
Data Archiving | RECOMMENDED | Implement a data archiving strategy to move infrequently accessed data to a separate storage system. This will improve the performance of the primary database. |
Data Masking / Anonymization | MUST | When dealing with sensitive data (e.g., PII), implement data masking or anonymization techniques to protect user privacy and comply with regulations. |
Monitoring | MUST | Implement robust monitoring systems to track database performance metrics such as query execution time, connection pool usage, and resource utilization. Set up alerts to notify administrators of potential issues. |
SQL Writing Best Practices
Beyond the table design guidelines, here are some best practices for writing SQL queries:
-
Use Prepared Statements: Always use prepared statements to prevent SQL injection vulnerabilities and improve performance (by reusing query execution plans). Most Golang database drivers provide good support for prepared statements.
-
Explain Your Queries: Use
EXPLAIN
to analyze query execution plans and identify potential bottlenecks. Pay attention to indexes being used, the number of rows examined, and the join types. -
Limit Result Sets: Avoid selecting all columns (
SELECT *
) unless absolutely necessary. Specify only the columns you need to reduce data transfer and memory usage. -
Use Joins Wisely: Understand the different types of joins (INNER, LEFT, RIGHT, FULL) and choose the appropriate join based on the desired result set. Optimize join performance by ensuring that joined columns are indexed.
-
Optimize
WHERE
Clauses: Write efficientWHERE
clauses that use indexes effectively. Avoid using functions or calculations in theWHERE
clause that prevent index usage. -
Batch Operations: When inserting, updating, or deleting multiple rows, use batch operations (e.g.,
INSERT INTO ... VALUES (...), (...), (...)
) to reduce the number of round trips to the database. -
Connection Pooling: Use connection pooling to manage database connections efficiently. This avoids the overhead of creating a new connection for each query. The
database/sql
package in Go provides built-in connection pooling.
Transaction Management
Transactions are crucial for maintaining data integrity.
-
Start and Commit/Rollback Explicitly: Always start transactions explicitly and either commit or rollback the transaction based on the outcome of the operations within the transaction.
-
Keep Transactions Short: Long-running transactions can lead to lock contention and performance issues. Keep transactions as short as possible by performing only the necessary operations within the transaction.
-
Handle Errors Properly: Properly handle errors within transactions and rollback the transaction if any error occurs.
-
Use Appropriate Isolation Levels: Choose the appropriate isolation level based on the application's requirements. The default isolation level (REPEATABLE READ) provides good consistency, but higher isolation levels can impact performance.
Examples
This section provides examples of common SQL patterns and how to optimize them:
Example 1: Efficient Pagination
Inefficient (using OFFSET):
SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 100000;