Use replica database for read-only transactions
Some time ago we had a problem with performance caused by generating big reports for users. During one of the firehouse meetings, there was an idea to generate reports from the replica database. We’ve solved the issue differently but the idea got me interested. In this post, I’ll configure spring-boot based application to use the replica database for all read-only transactions.
The problem itself turned out to be relatively easy to fix and resolve. That’s why the idea to route all reads to the slave database has not been implemented (and in this version might’ve caused us more harm than good). But I’ve never done anything like that so I decided to see what can be done and how hard it is when you want to use different database connections for the same persistence unit.
First things first - the architecture. There is a monolithic application which is not so easy to split and as always there are other business priorities. We have two instances of the database - one master and slave instance as a backup database ready to be used (everything is running on AWS’s RDS) - on the left of the schema. The idea is to write data to the master database as we are doing it right now and when the transaction is read-only execute select query on read replica - right side of the schema.
docker-compose
Before I can even begin I need to set up some testing environment. Here is docker-compose.yaml which will spin two MariaDB instances - one running as master and second as a slave database with replication configured:
version: '3'
services:
maria-master:
image: "mariadb:10.4"
environment:
MYSQL_ROOT_PASSWORD: master
ports:
- 3301:3306
volumes:
- ./docker/master-init/init-scripts:/docker-entrypoint-initdb.d
- ./docker/master-init/cnf:/etc/mysql/conf.d
maria-slave:
image: "mariadb:10.4"
environment:
MYSQL_ROOT_PASSWORD: slave
ports:
- 3302:3306
volumes:
- ./docker/slave-init/cnf:/etc/mysql/conf.d
- ./docker/slave-init/init-scripts:/docker-entrypoint-initdb.d
Replication is configured "just to work" - you probably should not use it as your production replication configuration ;). Details of the configuration of master-replica files can be found in docker-compose.yaml and in folders of GitHub repository.
Application boilerplate
With the hardest part already implemented it’s time to configure the application. Starting from pom.xml I’ve simply used start.spring.io to generate maven project for me :) After that, I’ve added datasource-proxy dependency to log queries with information which database instance has been used to execute SQL. Complete pom.xml is on GitHub.
Application configuration
spring.jpa.hibernate.ddl-auto=none
spring.jpa.generate-ddl=false
spring.jpa.database=mysql
app.datasource.master.jdbcUrl=jdbc:mysql://localhost:3301/sample
app.datasource.master.username=root
app.datasource.master.password=master
app.datasource.master.driverClassName=com.mysql.cj.jdbc.Driver
app.datasource.slave.jdbcUrl=jdbc:mysql://localhost:3302/sample
app.datasource.slave.username=root
app.datasource.slave.password=slave
app.datasource.slave.driverClassName=com.mysql.cj.jdbc.Driver
The only important thing here is that I’ve not used default spring properties as I want to have control over datasource creation.
Before configuring the application, I have to implement something which will allow to decide which database should be used and when. The easiest way will be to use the transaction scope to pick database instance that should be used. I’ve quickly located PlatformTransactionManager#getTransaction which has all the information I need in input parameter. Now I just need to figure out how to change datasource in this place. Transactions are thread-bound in spring so I decided to keep it consistent and switch datasource based on the thread that is currently connecting to DB. Spring-jdbc ships with datasource that allows to select datasource on some custom key (it took me some time to find it ;)) and with this we have everything we need.
Starting from DataSource router:
class MasterReplicaRoutingDataSource extends AbstractRoutingDataSource {
private static final ThreadLocal<Type> currentDataSource = new ThreadLocal<>();
MasterReplicaRoutingDataSource(DataSource master, DataSource slave) {
Map<Object, Object> dataSources = new HashMap<>();
dataSources.put(Type.MASTER, master);
dataSources.put(Type.REPLICA, slave);
super.setTargetDataSources(dataSources);
super.setDefaultTargetDataSource(master);
}
static void setReadonlyDataSource(boolean isReadonly) {
currentDataSource.set(isReadonly ? Type.REPLICA : Type.MASTER);
}
@Override
protected Object determineCurrentLookupKey() {
return currentDataSource.get();
}
private enum Type {
MASTER, REPLICA;
}
}
Note currentDataSource
which is a ThreadLocal
variable it stores information about datasource that should be used in the current thread.
Spring implementation does all of the heavy liftings.
Next is transaction manager implementation that will inform underlying data source which database connection should be used:
class ReplicaAwareTransactionManager implements PlatformTransactionManager {
private final PlatformTransactionManager wrapped;
ReplicaAwareTransactionManager(PlatformTransactionManager wrapped) {
this.wrapped = wrapped;
}
@Override
public TransactionStatus getTransaction(TransactionDefinition definition) throws TransactionException {
MasterReplicaRoutingDataSource.setReadonlyDataSource(definition.isReadOnly());
return wrapped.getTransaction(definition);
}
@Override
public void commit(TransactionStatus status) throws TransactionException {
wrapped.commit(status);
}
@Override
public void rollback(TransactionStatus status) throws TransactionException {
wrapped.rollback(status);
}
}
It’s acting as a very simple proxy to real transaction manager (this is really complicated stuff and I don’t want to mess it up any more than I have to). I’m just letting know my datasource what kind of transaction we are initialing here.
Spring configuration to use master for writes and replica for read-only transactions.
With this configured all I have to do is to set it all up within spring context:
|
|
Let’s focus on datasource configuration. At the beginning (line 7 and 13) we have Hikari connection pool configuration of master and slave database instances. Next connection pool is wrapped in SQL statements logging utility (line 24) and finally passed as arguments to DataSource wrapper which will decide which connection should be used (line 19). With this, we have datasource configured with nice SQL logging capabilities.
Next is entity manager factory and transaction manager configuration. It is pretty standard except that the real one (line 50) is wrapped with a custom implementation (line 45) which delegates all hard to work to spring JpaTransactionManager.
With all this configured I can write a simple application and see how it all works:
public static void main(String[] args) {
ConfigurableApplicationContext ctx = SpringApplication.run(ReadFromReplicatApplication.class, args);
final TaskRepository taskRepository = ctx.getBean(TaskRepository.class);
taskRepository.save(new Task("first", "Some task 1"));
log.info("all tasks: {}", taskRepository.findAll());
taskRepository.save(new Task("second", "Some task 2"));
log.info("all tasks: {}", taskRepository.findAll());
}
Which will run with following output:
|
|
Note highlighted lines which logs queries and database instance that’s been used for query execution.
Nested transactions
Added on 2.2.2020. Dušan Zahoranský thank you for rising this.
When working with nested transactions we should consider following cases:
outer transaction is readonly and inner transaction is readonly
outer transaction is readwrite and inner transaction is readwrite
outer transaction is readonly and inner transaction is readwrite
outer transaction is readwrite and inner transaction is readonly
Of course nesting can go deeper but rules of usage are the same:
We don’t want to write to replica database as it might cause some issues with synchronization
We don’t want to read data from replica database just after writing some data to DB as synchronization might’ve not completed yet.
From available cases first and second are obvious and we should not care about them as we’ll be using the same datasource the whole time.
Configurations when we are changing transaction type are interesting. To check what we can do about it I’ve implemented two very simple services with nested transactions:
@Component
public class TaskServiceInner {
private final TaskRepository taskRepository;
public TaskServiceInner(TaskRepository taskRepository) {
this.taskRepository = taskRepository;
}
@Transactional(readOnly = true)
public List<Task> findAllReadonly() {
return taskRepository.findAll();
}
@Transactional
public void save(Task task) {
taskRepository.saveAndFlush(task);
}
}
@Component
public class TaskServiceOuter {
private final TaskServiceInner taskServiceInner;
public TaskServiceOuter(TaskServiceInner taskServiceInner) {
this.taskServiceInner = taskServiceInner;
}
@Transactional(readOnly = true)
public List<Task> saveAndFindAllReadonly() {
taskServiceInner.save(new Task("a task", "some description"));
return taskServiceInner.findAllReadonly();
}
@Transactional
public List<Task> saveAndFindAllReadwrite() {
taskServiceInner.save(new Task("a task", "some description"));
return taskServiceInner.findAllReadonly();
}
}
Complete TaskServiceInner.java and TaskServiceOuter.java are on GitHub.
As names suggest. Outer is used to initialize most outer transaction type and is delegating all the work to inner service which uses different transaction types (readonly and not-readonly).
Going back to transactions configuration we can check what will happen when outer is readonly and inner is readwrite by calling outerService.saveAndFindAllReadonly
.
With mysql here are logs produced:
Name:replica, Time:2, Success:False, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["insert into task (created_at, description, title) values (?, ?, ?)"], Params:[(2020-02-02 18:26:18.553,some description,a task)]
SQL Error: 0, SQLState: S1009
ERROR Connection is read-only. Queries leading to data modification are not allowed
This is not good (but not as bad as it might’ve been).
It tries to do a insert to the replica database, luckily MySQL driver is handling this and throws an error java.sql.SQLException: Connection is read-only. Queries leading to data modification are not allowed
.
The problem is it might not work with all databases (it depends on the driver and spring not hibernate doesn’t handle it) and might vary from driver to driver depending on it’s internal implantation.
The best option will be to test it with your database or/and add security check to ReplicaAwareTransactionManager#getTransaction
which might look like this:
|
|
It’ll throw an exception in case transaction is active and requested datasource is readonly but transaction is readwrite. In case transaction type changes from readonly to readwrite exception will be thrown so you should notice it while testing your application.
The next case we should consider is situation when outer transaction is readwrite and inner transaction is readonly.
We can test it by calling outerService.saveAndFindAllReadwrite
:
Name:master, Time:2, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["insert into task (created_at, description, title) values (?, ?, ?)"], Params:[(2020-02-02 18:35:51.247,some description,a task)]
Name:master, Time:5, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["select task0_.task_id as task_id1_0_, task0_.created_at as created_2_0_, task0_.description as descript3_0_, task0_.title as title4_0_ from task task0_"], Params:[()]
save & findAll [Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}, Task{title='a task', description='some description'}]
Not exactly what I’d expect but it’s working quite well.
Both insert and select calls are executed on master database which is quite surprising to me because when I’ve been debugging it getTransaction
have been called multiple times with different transaction configurations.
After a bit of debugging I’ve noticed that datasource is acquired only once per transaction and as a result master database is used because the most outer transaction is readwrite.
As a result master database have been using throughout whole transaction although under the hood transaction level changed couple of times…
I don’t like hanging on vendor implementation like this.
I’d prefer to introduce some security check just to be sure.
Also I’m not familiar with how connection between spring and hibernate works so I’d introduced additional layer of protection just in case something changes in the future versions of spring or hiberante.
Adding one simple if statement to ReplicaAwareTransactionManager#getTransaction
should do the trick:
|
|
Implementing above changes should make things safe as once datasource is picked for current transaction it’ll not be changed no matter how many different transaction types are used.
Drawbacks
The biggest one is that with this solution we have no guarantee when saved data will be replicated to slave instance - eventually consistency (depending on replication configuration). This means that in case you have two transactions to serve one request (one saving data and another one reading it) you might not be able to read what you’ve written. That in some cases might be a serious issue as this is applied globally on all transactions. This can be avoided by introducing some additional annotations but it’s out of the scope :) If you are interested in how to do it you can check out spring.factories and how to create proxy for class.
Summary
The solution might not be the simplest one but it turned out to be much less complicated than I initially thought. I’m happy we didn’t have to deliver this to production. I think the above solution requires a lot of testing to be sure it works correctly (multi threading code is always hard). I’m also positively surprised how easy it was to redirect all read-only transactions to the replica database.
Working samples can be found on my GitHub.
If you've enjoyed or found this post useful you might also like: