Nov. 8, 2023

TIL: Git Force With Lease

Today I learned about the Git --force-with-lease command.

I’ll restate a description of the command but it allows you to rebase and force push with less risks than using --force. If you had pushed your changes up to a remote branch and somebody else had pushed up new changes to that remote branch while you’re rebasing. After you finish rebasing, a normal git push will fail because the local refs don’t match the remote ones. Usually people will git push --force at this point because they were just rebasing their branch. However, this would wipe away commits on the remote branch. This is easy to do if you’re working on a branch and don’t realize that somebody else had commit to it.

Nov. 7, 2023

TIL: SQL DISTINCT Efficiency

Today I learned that DISTINCT statements can cause efficiency problems.

I was looking at some code that made a few joins across tables and used distinct for the column names. The query was causing high CPU load and it wasn’t immediately obvious from first glance why that was happening. You could imagine the query looked like this (with a few more joins and selects):

SELECT DISTINCT transactions.date, transactions.total_amount, transaction_items.amount, transaction_items.name
FROM transactions
INNER JOIN transaction_items ON transaction_items.transaction_id = transactions.id
WHERE transactions.date BETWEEN '2023-01-01' AND '2023-12-31';

From a high level, in order to execute this statement the database will filter for the rows that we need, grab the data we need from those rows, and return the unique ones. That seems simple enough for a database as long as the date column is indexed. However, the important part to consider is what the database needs to do to ensure all the data returned is unique. The database would need to keep all that data in memory someplace and use CPU time to sort these columns then remove the duplicates. Without the inclusion of a unique index, the sort will take time because there’s no guarantee that each column is unique.

Aug. 21, 2023

TIL: MySQL Transaction Isolation Levels

Today I learned about MySQL transaction isolation levels.

I was trying to debug issues where concurrent processes didn’t seem to be locking rows in the database correctly. A simplified version of the offending code could look something like this (comments added for those unfamiliar with Rails):

# Reader: Assume the record we're processing is not processed
# Query for the row with ID 123 for table some_active_record_classes
some_active_record_class = SomeActiveRecordClass.where(id: 123).first

ActiveRecord::Base.transaction do # Open a database transaction
  # Query for associated data from another table processed_items
  processed_items = ProcessedItem.where(foreign_key: some_active_record_class.id)

  # Lock the row with ID 123
  some_active_record_class.lock!

  # Requery the database for ID 123 to ensure we have the most up to date data after locking
  some_active_record_class.reload!

  # Early exit if the record is processed
  return if some_active_record_class.processed == true

  # Simulate long running logic by waiting for 5 seconds
  sleep(5)

  # Update the DB
  some_active_record_class.update!(processed: true, processed_count: processed_items.size)
end # Release the lock, commit, and close the transaction

When running that code concurrently, you’d expect that only one process would do the updates and the other process(es) would exit immediately after grabbing the lock. However, we were seeing that all processes would update the database. Originally we thought the problem was that we weren’t opening a lock at all or we weren’t opening the lock correctly. We were correct on one point of the hypothesis: we were opening a lock but the logic to open it was incorrect.

Til

TIL: Git Force With Lease

TIL: SQL DISTINCT Efficiency

TIL: MySQL Transaction Isolation Levels