Nov. 8, 2023
Today I learned about the Git --force-with-lease
command.
I’ll restate a description of the command but it allows you to rebase and force push with less risks than using --force
. If you had pushed your changes up to a remote branch and somebody else had pushed up new changes to that remote branch while you’re rebasing. After you finish rebasing, a normal git push
will fail because the local refs don’t match the remote ones. Usually people will git push --force
at this point because they were just rebasing their branch. However, this would wipe away commits on the remote branch. This is easy to do if you’re working on a branch and don’t realize that somebody else had commit to it.
Nov. 7, 2023
Today I learned that DISTINCT
statements can cause efficiency problems.
I was looking at some code that made a few joins across tables and used distinct for the column names. The query was causing high CPU load and it wasn’t immediately obvious from first glance why that was happening. You could imagine the query looked like this (with a few more joins and selects):
SELECT DISTINCT transactions.date, transactions.total_amount, transaction_items.amount, transaction_items.name
FROM transactions
INNER JOIN transaction_items ON transaction_items.transaction_id = transactions.id
WHERE transactions.date BETWEEN '2023-01-01' AND '2023-12-31';
From a high level, in order to execute this statement the database will filter for the rows that we need, grab the data we need from those rows, and return the unique ones. That seems simple enough for a database as long as the date
column is indexed. However, the important part to consider is what the database needs to do to ensure all the data returned is unique. The database would need to keep all that data in memory someplace and use CPU time to sort these columns then remove the duplicates. Without the inclusion of a unique index, the sort will take time because there’s no guarantee that each column is unique.
Aug. 21, 2023
Today I learned about MySQL transaction isolation levels.
I was trying to debug issues where concurrent processes didn’t seem to be locking rows in the database correctly. A simplified version of the offending code could look something like this (comments added for those unfamiliar with Rails):
# Reader: Assume the record we're processing is not processed
# Query for the row with ID 123 for table some_active_record_classes
some_active_record_class = SomeActiveRecordClass.where(id: 123).first
ActiveRecord::Base.transaction do # Open a database transaction
# Query for associated data from another table processed_items
processed_items = ProcessedItem.where(foreign_key: some_active_record_class.id)
# Lock the row with ID 123
some_active_record_class.lock!
# Requery the database for ID 123 to ensure we have the most up to date data after locking
some_active_record_class.reload!
# Early exit if the record is processed
return if some_active_record_class.processed == true
# Simulate long running logic by waiting for 5 seconds
sleep(5)
# Update the DB
some_active_record_class.update!(processed: true, processed_count: processed_items.size)
end # Release the lock, commit, and close the transaction
When running that code concurrently, you’d expect that only one process would do the updates and the other process(es) would exit immediately after grabbing the lock. However, we were seeing that all processes would update the database. Originally we thought the problem was that we weren’t opening a lock at all or we weren’t opening the lock correctly. We were correct on one point of the hypothesis: we were opening a lock but the logic to open it was incorrect.