Parallelization Analysis for move-file Generator

Executive Summary

After comprehensive analysis and implementation attempts, meaningful performance improvements through parallelization are NOT achievable for the move-file generator due to JavaScript's single-threaded nature and the synchronous nature of the Nx Tree API.

The existing optimizations (AST caching, file tree caching, pattern analysis, smart file cache) already provide excellent performance (~5-6ms per file in large workspaces).

Performance Baseline

Benchmark Results (Before Optimization Attempt)

Small file move: ~1935ms
Medium file move: ~2076ms
Large file move: ~2620ms
Moving 10 small files: ~2022ms (~202ms per file)
Moving 15 files with glob patterns: ~2057ms (~137ms per file)
Moving file with 20 imports: ~2128ms
Moving file with 50 irrelevant files: ~2030ms

Stress Test Results (Before Optimization Attempt)

10+ projects cross-project move: ~42859ms
100+ large files: ~5184ms (~51ms per file)
50 intra-project dependencies: ~2253ms (~45ms per import)
Large workspace (15 projects, 450 files): ~2619ms (~5.82ms per file)

Performance After Parallelization Attempts

Benchmark Results (After Optimization Attempt)

Small file move: ~1940ms (no change)
Medium file move: ~2053ms (no change)
Large file move: ~2653ms (no change)
Moving 10 small files: ~2023ms (~202ms per file) (no change)
Moving 15 files with glob patterns: ~2051ms (~137ms per file) (no change)
Moving file with 20 imports: ~2138ms (no change)
Moving file with 50 irrelevant files: ~2019ms (no change)

Stress Test Results (After Optimization Attempt)

10+ projects cross-project move: ~42039ms (no change)
100+ large files: ~5339ms (~53ms per file) (no change)
50 intra-project dependencies: ~2292ms (~46ms per import) (no change)
Large workspace (15 projects, 450 files): ~2768ms (~6.2ms per file) (no change)

Why Parallelization Doesn't Help

1. JavaScript Single-Threaded Execution

All synchronous operations execute sequentially regardless of Promise.all() usage
The Nx Tree API methods (read, write, delete) are synchronous
jscodeshift supports async transforms (returns Promise), but our current implementation uses synchronous operations within transforms
Only async I/O operations benefit from Promise-based concurrency

2. Synchronous Tree Mutations

// These operations are synchronous, so Promise.all doesn't help
tree.write(filePath, content); // Synchronous
tree.delete(filePath); // Synchronous
tree.read(filePath, 'utf-8'); // Synchronous

Note on jscodeshift async support: While jscodeshift transforms CAN be async (returning a Promise), the bottleneck in our implementation is the synchronous Nx Tree API, not the transform execution itself. Making our transforms async would not improve performance because the tree operations remain synchronous.

3. Existing Caching is Highly Effective

The current implementation already includes:

AST Cache: Prevents redundant parsing
File Tree Cache: Avoids repeated traversals
Pattern Analysis Cache: Optimizes glob operations
Smart File Cache: Caches file existence checks

These optimizations already provide near-optimal performance.

4. jscodeshift Async Transform Support

jscodeshift DOES support async transforms, as documented in their codebase:

// Async transform example from jscodeshift test fixtures
module.exports = function (fileInfo, api, options) {
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve(
        api
          .jscodeshift(fileInfo.source)
          .findVariableDeclarators('sum')
          .renameTo('addition')
          .toSource(),
      );
    }, 100);
  });
};

The jscodeshift Worker uses await transform(), enabling async transform execution. However, this doesn't help our use case because:

The bottleneck is NOT the transform execution - it's the synchronous Nx Tree API calls
Our transforms already use caching - repeated parsing is already avoided
Async transforms would still execute sequentially when processing multiple files because each transform must complete before writing to the tree
The tree operations cannot be made async - tree.write(), tree.read(), and tree.delete() are inherently synchronous

5. Worker Thread Overhead

Using Worker Threads for parallelization would require:

Serializing/deserializing data between threads
Spinning up and managing worker threads
Coordinating shared state

For the file sizes involved (< 50KB typically), the overhead exceeds any potential benefits.

What Was Implemented

1. Parallel Project Import Checking (Reverted)

File: generator.ts - updateImportPathsInDependentProjects()

Original attempt (reverted):

// This was unnecessary - Promise.all() on non-Promise values
const results = await Promise.all(
  projectEntries.map(([name, project]) => {
    const hasImports = checkForImportsInProject(
      tree,
      project,
      sourceImportPath,
    );
    return hasImports ? [name, project] : null;
  }),
);

Why it was wrong:

checkForImportsInProject() is synchronous (returns boolean, not Promise)
The map callback doesn't return Promises
Promise.all() on non-Promise values just wraps and unwraps them unnecessarily
No actual parallelization occurs

Final implementation (simple filter):

candidates = projectEntries
  .filter(([, project]) =>
    checkForImportsInProject(tree, project, sourceImportPath),
  )
  .map(([name, project]) => [name, project] as [string, ProjectConfiguration]);

Result: Cleaner code without unnecessary Promise wrapping.

2. Parallel Batch File Moves (Reverted)

File: generator.ts - batch move execution

Original attempt (reverted due to safety concerns):

// This was UNSAFE - reverted
await Promise.all(
  contexts.map((ctx, i) => executeMove(...))
);

Why it was unsafe:

Multiple files might be moved to the same target project
updateProjectSourceFilesCache() modifies shared cache arrays
Concurrent array modifications (splice, push) could cause race conditions
Cache corruption could lead to incorrect import updates

Final implementation (sequential):

// Sequential execution to prevent race conditions
for (let i = 0; i < contexts.length; i++) {
  await executeMove(tree, fileOptions, projects, projectGraph, ctx, true);
}

Result: Batch moves remain sequential to ensure cache consistency and correctness.

Operations Analyzed for Parallelization

Safe for Parallelization (Read-Only)

✓ File content reading (but already cached) ✓ AST parsing (but synchronous, so no benefit) ✓ Import checking across different projects (but synchronous operations, no actual concurrency)

Unsafe for Parallelization (Mutations)

✗ Tree write operations (must maintain consistency) ✗ Cache updates (must be atomic - shared cache arrays modified by multiple operations) ✗ Index file modifications (same file, multiple writes) ✗ Batch file moves (shared cache corruption risk when moving to same target project)

Alternative Approaches Considered

1. Worker Threads for AST Parsing

Pros: True CPU parallelization
Cons: High overhead (serialization, thread management), complexity
Verdict: ❌ Rejected - overhead exceeds benefits for file sizes involved

2. Async jscodeshift Transforms

Pros: Supported by jscodeshift (as of v17.1.2), could enable parallel transform execution
Cons: Bottleneck is synchronous Tree API, not transform execution; would add complexity without performance gain
Verdict: ❌ Rejected - wouldn't address the actual bottleneck

Detailed Analysis of Async Transforms:

While jscodeshift supports async transforms (returning Promises), this feature is designed for transforms that need to perform async I/O operations (e.g., fetching data, reading external files). Our transforms don't need async I/O - they:

Read from the synchronous Tree API (tree.read())
Parse AST (synchronous, cached)
Transform AST (synchronous)
Write to the synchronous Tree API (tree.write())

Making the transform function async wouldn't parallelize these operations because the Tree API calls remain synchronous. The transforms would still execute sequentially when processing multiple files.

3. Batch Processing with Event Loop Yielding

Pros: Prevents blocking event loop
Cons: Doesn't improve throughput, adds overhead
Verdict: ❌ Rejected - no performance benefit

3. Streaming/Incremental Processing

Pros: Could reduce memory usage
Cons: AST parsing requires full file content
Verdict: ❌ Rejected - not applicable to AST parsing

Conclusion

The existing implementation is already near-optimal. The combination of:

AST caching
File tree caching
Pattern analysis optimization
Smart file cache
Early exit optimizations
Single-pass AST traversal

...provides excellent performance (~5-6ms per file in large workspaces) that cannot be meaningfully improved through parallelization given JavaScript's single-threaded nature and the synchronous Nx Tree API.

Recommendations

Keep existing optimizations - they are highly effective
Monitor cache hit rates - log cache statistics for performance insights
Profile new use cases - if performance degrades, investigate specific bottlenecks
Consider parallelization only for - truly async I/O-bound operations (none currently exist in this generator)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelization Analysis for move-file Generator

Executive Summary

Performance Baseline

Benchmark Results (Before Optimization Attempt)

Stress Test Results (Before Optimization Attempt)

Performance After Parallelization Attempts

Benchmark Results (After Optimization Attempt)

Stress Test Results (After Optimization Attempt)

Why Parallelization Doesn't Help

1. JavaScript Single-Threaded Execution

2. Synchronous Tree Mutations

3. Existing Caching is Highly Effective

4. jscodeshift Async Transform Support

5. Worker Thread Overhead

What Was Implemented

1. Parallel Project Import Checking (Reverted)

2. Parallel Batch File Moves (Reverted)

Operations Analyzed for Parallelization

Safe for Parallelization (Read-Only)

Unsafe for Parallelization (Mutations)

Alternative Approaches Considered

1. Worker Threads for AST Parsing

2. Async jscodeshift Transforms

3. Batch Processing with Event Loop Yielding

3. Streaming/Incremental Processing

Conclusion

Recommendations

References

Uh oh!

FilesExpand file tree

PARALLELIZATION_ANALYSIS.md

Latest commit

History

PARALLELIZATION_ANALYSIS.md

File metadata and controls

Parallelization Analysis for move-file Generator

Executive Summary

Performance Baseline

Benchmark Results (Before Optimization Attempt)

Stress Test Results (Before Optimization Attempt)

Performance After Parallelization Attempts

Benchmark Results (After Optimization Attempt)

Stress Test Results (After Optimization Attempt)

Why Parallelization Doesn't Help

1. JavaScript Single-Threaded Execution

2. Synchronous Tree Mutations

3. Existing Caching is Highly Effective

4. jscodeshift Async Transform Support

5. Worker Thread Overhead

What Was Implemented

1. Parallel Project Import Checking (Reverted)

2. Parallel Batch File Moves (Reverted)

Operations Analyzed for Parallelization

Safe for Parallelization (Read-Only)

Unsafe for Parallelization (Mutations)

Alternative Approaches Considered

1. Worker Threads for AST Parsing

2. Async jscodeshift Transforms

3. Batch Processing with Event Loop Yielding

3. Streaming/Incremental Processing

Conclusion

Recommendations

References