After comprehensive analysis and implementation attempts, meaningful performance improvements through parallelization are NOT achievable for the move-file generator due to JavaScript's single-threaded nature and the synchronous nature of the Nx Tree API.
The existing optimizations (AST caching, file tree caching, pattern analysis, smart file cache) already provide excellent performance (~5-6ms per file in large workspaces).
- Small file move: ~1935ms
- Medium file move: ~2076ms
- Large file move: ~2620ms
- Moving 10 small files: ~2022ms (~202ms per file)
- Moving 15 files with glob patterns: ~2057ms (~137ms per file)
- Moving file with 20 imports: ~2128ms
- Moving file with 50 irrelevant files: ~2030ms
- 10+ projects cross-project move: ~42859ms
- 100+ large files: ~5184ms (~51ms per file)
- 50 intra-project dependencies: ~2253ms (~45ms per import)
- Large workspace (15 projects, 450 files): ~2619ms (~5.82ms per file)
- Small file move: ~1940ms (no change)
- Medium file move: ~2053ms (no change)
- Large file move: ~2653ms (no change)
- Moving 10 small files: ~2023ms (~202ms per file) (no change)
- Moving 15 files with glob patterns: ~2051ms (~137ms per file) (no change)
- Moving file with 20 imports: ~2138ms (no change)
- Moving file with 50 irrelevant files: ~2019ms (no change)
- 10+ projects cross-project move: ~42039ms (no change)
- 100+ large files: ~5339ms (~53ms per file) (no change)
- 50 intra-project dependencies: ~2292ms (~46ms per import) (no change)
- Large workspace (15 projects, 450 files): ~2768ms (~6.2ms per file) (no change)
- All synchronous operations execute sequentially regardless of Promise.all() usage
- The Nx Tree API methods (read, write, delete) are synchronous
- jscodeshift supports async transforms (returns Promise), but our current implementation uses synchronous operations within transforms
- Only async I/O operations benefit from Promise-based concurrency
// These operations are synchronous, so Promise.all doesn't help
tree.write(filePath, content); // Synchronous
tree.delete(filePath); // Synchronous
tree.read(filePath, 'utf-8'); // SynchronousNote on jscodeshift async support: While jscodeshift transforms CAN be async (returning a Promise), the bottleneck in our implementation is the synchronous Nx Tree API, not the transform execution itself. Making our transforms async would not improve performance because the tree operations remain synchronous.
The current implementation already includes:
- AST Cache: Prevents redundant parsing
- File Tree Cache: Avoids repeated traversals
- Pattern Analysis Cache: Optimizes glob operations
- Smart File Cache: Caches file existence checks
These optimizations already provide near-optimal performance.
jscodeshift DOES support async transforms, as documented in their codebase:
// Async transform example from jscodeshift test fixtures
module.exports = function (fileInfo, api, options) {
return new Promise((resolve) => {
setTimeout(() => {
resolve(
api
.jscodeshift(fileInfo.source)
.findVariableDeclarators('sum')
.renameTo('addition')
.toSource(),
);
}, 100);
});
};The jscodeshift Worker uses await transform(), enabling async transform execution. However, this doesn't help our use case because:
- The bottleneck is NOT the transform execution - it's the synchronous Nx Tree API calls
- Our transforms already use caching - repeated parsing is already avoided
- Async transforms would still execute sequentially when processing multiple files because each transform must complete before writing to the tree
- The tree operations cannot be made async -
tree.write(),tree.read(), andtree.delete()are inherently synchronous
Using Worker Threads for parallelization would require:
- Serializing/deserializing data between threads
- Spinning up and managing worker threads
- Coordinating shared state
For the file sizes involved (< 50KB typically), the overhead exceeds any potential benefits.
File: generator.ts - updateImportPathsInDependentProjects()
Original attempt (reverted):
// This was unnecessary - Promise.all() on non-Promise values
const results = await Promise.all(
projectEntries.map(([name, project]) => {
const hasImports = checkForImportsInProject(
tree,
project,
sourceImportPath,
);
return hasImports ? [name, project] : null;
}),
);Why it was wrong:
checkForImportsInProject()is synchronous (returns boolean, not Promise)- The map callback doesn't return Promises
Promise.all()on non-Promise values just wraps and unwraps them unnecessarily- No actual parallelization occurs
Final implementation (simple filter):
candidates = projectEntries
.filter(([, project]) =>
checkForImportsInProject(tree, project, sourceImportPath),
)
.map(([name, project]) => [name, project] as [string, ProjectConfiguration]);Result: Cleaner code without unnecessary Promise wrapping.
File: generator.ts - batch move execution
Original attempt (reverted due to safety concerns):
// This was UNSAFE - reverted
await Promise.all(
contexts.map((ctx, i) => executeMove(...))
);Why it was unsafe:
- Multiple files might be moved to the same target project
updateProjectSourceFilesCache()modifies shared cache arrays- Concurrent array modifications (splice, push) could cause race conditions
- Cache corruption could lead to incorrect import updates
Final implementation (sequential):
// Sequential execution to prevent race conditions
for (let i = 0; i < contexts.length; i++) {
await executeMove(tree, fileOptions, projects, projectGraph, ctx, true);
}Result: Batch moves remain sequential to ensure cache consistency and correctness.
✓ File content reading (but already cached) ✓ AST parsing (but synchronous, so no benefit) ✓ Import checking across different projects (but synchronous operations, no actual concurrency)
✗ Tree write operations (must maintain consistency) ✗ Cache updates (must be atomic - shared cache arrays modified by multiple operations) ✗ Index file modifications (same file, multiple writes) ✗ Batch file moves (shared cache corruption risk when moving to same target project)
- Pros: True CPU parallelization
- Cons: High overhead (serialization, thread management), complexity
- Verdict: ❌ Rejected - overhead exceeds benefits for file sizes involved
- Pros: Supported by jscodeshift (as of v17.1.2), could enable parallel transform execution
- Cons: Bottleneck is synchronous Tree API, not transform execution; would add complexity without performance gain
- Verdict: ❌ Rejected - wouldn't address the actual bottleneck
Detailed Analysis of Async Transforms:
While jscodeshift supports async transforms (returning Promises), this feature is designed for transforms that need to perform async I/O operations (e.g., fetching data, reading external files). Our transforms don't need async I/O - they:
- Read from the synchronous Tree API (
tree.read()) - Parse AST (synchronous, cached)
- Transform AST (synchronous)
- Write to the synchronous Tree API (
tree.write())
Making the transform function async wouldn't parallelize these operations because the Tree API calls remain synchronous. The transforms would still execute sequentially when processing multiple files.
- Pros: Prevents blocking event loop
- Cons: Doesn't improve throughput, adds overhead
- Verdict: ❌ Rejected - no performance benefit
- Pros: Could reduce memory usage
- Cons: AST parsing requires full file content
- Verdict: ❌ Rejected - not applicable to AST parsing
The existing implementation is already near-optimal. The combination of:
- AST caching
- File tree caching
- Pattern analysis optimization
- Smart file cache
- Early exit optimizations
- Single-pass AST traversal
...provides excellent performance (~5-6ms per file in large workspaces) that cannot be meaningfully improved through parallelization given JavaScript's single-threaded nature and the synchronous Nx Tree API.
- Keep existing optimizations - they are highly effective
- Monitor cache hit rates - log cache statistics for performance insights
- Profile new use cases - if performance degrades, investigate specific bottlenecks
- Consider parallelization only for - truly async I/O-bound operations (none currently exist in this generator)