Skip to content

add db cleanup for PyPy#472

Open
mattip wants to merge 2 commits intopython:mainfrom
mattip:sqlachemy-expunge_all
Open

add db cleanup for PyPy#472
mattip wants to merge 2 commits intopython:mainfrom
mattip:sqlachemy-expunge_all

Conversation

@mattip
Copy link
Copy Markdown
Contributor

@mattip mattip commented May 10, 2026

Claude helped me come up with this fix for hundreds of emitted warnings, with the justification:

SQLite reuses primary key IDs after a full table DELETE (it has no AUTOINCREMENT sequence to preserve), the next loop iteration inserts new Person and Address objects with the same IDs (1..N) as the previous run.

On CPython this is harmless: refcounting immediately collects the old objects when new_person is reassigned each iteration, clearing the identity map's weakrefs before the next loop. On PyPy, GC is deferred - the old objects remain alive, so the identity map still holds live entries for IDs 1..N. When the new objects are committed with the same keys, SQLAlchemy's _register_persistent detects the collision and emits:

SAWarning: Identity map already had an identity for (Person, (1,), None), replacing it with newly flushed object. Are there load operations occurring inside of an event handler within the flush?

This warning fires for every inserted row on every benchmark loop, flooding stderr and causing the benchmark to fail on PyPy.

Fix: call session.expunge_all() immediately after the bulk deletes to explicitly clear the stale session state before timing begins.

It would be nice to have someone who understands more about sqlalchemy than me review this for correctness, I am not sure this is the right place for a fix.

# drop rows created by the previous benchmark
session.query(Person).delete(synchronize_session=False)
session.query(Address).delete(synchronize_session=False)
session.expunge_all()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a more realistic test to use a new session for each loop here

like

for loops in range(loops):
    with DBSession() as session:
         # everything else here

cheaper than calling expunge_all() and makes it clear where we're starting the operation

total_dt = 0.0

for loops in range(loops):
with DBSession() as session:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. does it work / fix the problem? this "with" syntax for the session was introduced probably long after this test suite was written

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this fixes the warning/crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants