Skip to content

Latest commit

 

History

History
260 lines (183 loc) · 9.91 KB

File metadata and controls

260 lines (183 loc) · 9.91 KB

Parallelization Analysis for move-file Generator

Executive Summary

After comprehensive analysis and implementation attempts, meaningful performance improvements through parallelization are NOT achievable for the move-file generator due to JavaScript's single-threaded nature and the synchronous nature of the Nx Tree API.

The existing optimizations (AST caching, file tree caching, pattern analysis, smart file cache) already provide excellent performance (~5-6ms per file in large workspaces).

Performance Baseline

Benchmark Results (Before Optimization Attempt)

  • Small file move: ~1935ms
  • Medium file move: ~2076ms
  • Large file move: ~2620ms
  • Moving 10 small files: ~2022ms (~202ms per file)
  • Moving 15 files with glob patterns: ~2057ms (~137ms per file)
  • Moving file with 20 imports: ~2128ms
  • Moving file with 50 irrelevant files: ~2030ms

Stress Test Results (Before Optimization Attempt)

  • 10+ projects cross-project move: ~42859ms
  • 100+ large files: ~5184ms (~51ms per file)
  • 50 intra-project dependencies: ~2253ms (~45ms per import)
  • Large workspace (15 projects, 450 files): ~2619ms (~5.82ms per file)

Performance After Parallelization Attempts

Benchmark Results (After Optimization Attempt)

  • Small file move: ~1940ms (no change)
  • Medium file move: ~2053ms (no change)
  • Large file move: ~2653ms (no change)
  • Moving 10 small files: ~2023ms (~202ms per file) (no change)
  • Moving 15 files with glob patterns: ~2051ms (~137ms per file) (no change)
  • Moving file with 20 imports: ~2138ms (no change)
  • Moving file with 50 irrelevant files: ~2019ms (no change)

Stress Test Results (After Optimization Attempt)

  • 10+ projects cross-project move: ~42039ms (no change)
  • 100+ large files: ~5339ms (~53ms per file) (no change)
  • 50 intra-project dependencies: ~2292ms (~46ms per import) (no change)
  • Large workspace (15 projects, 450 files): ~2768ms (~6.2ms per file) (no change)

Why Parallelization Doesn't Help

1. JavaScript Single-Threaded Execution

  • All synchronous operations execute sequentially regardless of Promise.all() usage
  • The Nx Tree API methods (read, write, delete) are synchronous
  • jscodeshift supports async transforms (returns Promise), but our current implementation uses synchronous operations within transforms
  • Only async I/O operations benefit from Promise-based concurrency

2. Synchronous Tree Mutations

// These operations are synchronous, so Promise.all doesn't help
tree.write(filePath, content); // Synchronous
tree.delete(filePath); // Synchronous
tree.read(filePath, 'utf-8'); // Synchronous

Note on jscodeshift async support: While jscodeshift transforms CAN be async (returning a Promise), the bottleneck in our implementation is the synchronous Nx Tree API, not the transform execution itself. Making our transforms async would not improve performance because the tree operations remain synchronous.

3. Existing Caching is Highly Effective

The current implementation already includes:

  • AST Cache: Prevents redundant parsing
  • File Tree Cache: Avoids repeated traversals
  • Pattern Analysis Cache: Optimizes glob operations
  • Smart File Cache: Caches file existence checks

These optimizations already provide near-optimal performance.

4. jscodeshift Async Transform Support

jscodeshift DOES support async transforms, as documented in their codebase:

// Async transform example from jscodeshift test fixtures
module.exports = function (fileInfo, api, options) {
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve(
        api
          .jscodeshift(fileInfo.source)
          .findVariableDeclarators('sum')
          .renameTo('addition')
          .toSource(),
      );
    }, 100);
  });
};

The jscodeshift Worker uses await transform(), enabling async transform execution. However, this doesn't help our use case because:

  1. The bottleneck is NOT the transform execution - it's the synchronous Nx Tree API calls
  2. Our transforms already use caching - repeated parsing is already avoided
  3. Async transforms would still execute sequentially when processing multiple files because each transform must complete before writing to the tree
  4. The tree operations cannot be made async - tree.write(), tree.read(), and tree.delete() are inherently synchronous

5. Worker Thread Overhead

Using Worker Threads for parallelization would require:

  • Serializing/deserializing data between threads
  • Spinning up and managing worker threads
  • Coordinating shared state

For the file sizes involved (< 50KB typically), the overhead exceeds any potential benefits.

What Was Implemented

1. Parallel Project Import Checking (Reverted)

File: generator.ts - updateImportPathsInDependentProjects()

Original attempt (reverted):

// This was unnecessary - Promise.all() on non-Promise values
const results = await Promise.all(
  projectEntries.map(([name, project]) => {
    const hasImports = checkForImportsInProject(
      tree,
      project,
      sourceImportPath,
    );
    return hasImports ? [name, project] : null;
  }),
);

Why it was wrong:

  • checkForImportsInProject() is synchronous (returns boolean, not Promise)
  • The map callback doesn't return Promises
  • Promise.all() on non-Promise values just wraps and unwraps them unnecessarily
  • No actual parallelization occurs

Final implementation (simple filter):

candidates = projectEntries
  .filter(([, project]) =>
    checkForImportsInProject(tree, project, sourceImportPath),
  )
  .map(([name, project]) => [name, project] as [string, ProjectConfiguration]);

Result: Cleaner code without unnecessary Promise wrapping.

2. Parallel Batch File Moves (Reverted)

File: generator.ts - batch move execution

Original attempt (reverted due to safety concerns):

// This was UNSAFE - reverted
await Promise.all(
  contexts.map((ctx, i) => executeMove(...))
);

Why it was unsafe:

  • Multiple files might be moved to the same target project
  • updateProjectSourceFilesCache() modifies shared cache arrays
  • Concurrent array modifications (splice, push) could cause race conditions
  • Cache corruption could lead to incorrect import updates

Final implementation (sequential):

// Sequential execution to prevent race conditions
for (let i = 0; i < contexts.length; i++) {
  await executeMove(tree, fileOptions, projects, projectGraph, ctx, true);
}

Result: Batch moves remain sequential to ensure cache consistency and correctness.

Operations Analyzed for Parallelization

Safe for Parallelization (Read-Only)

✓ File content reading (but already cached) ✓ AST parsing (but synchronous, so no benefit) ✓ Import checking across different projects (but synchronous operations, no actual concurrency)

Unsafe for Parallelization (Mutations)

✗ Tree write operations (must maintain consistency) ✗ Cache updates (must be atomic - shared cache arrays modified by multiple operations) ✗ Index file modifications (same file, multiple writes) ✗ Batch file moves (shared cache corruption risk when moving to same target project)

Alternative Approaches Considered

1. Worker Threads for AST Parsing

  • Pros: True CPU parallelization
  • Cons: High overhead (serialization, thread management), complexity
  • Verdict: ❌ Rejected - overhead exceeds benefits for file sizes involved

2. Async jscodeshift Transforms

  • Pros: Supported by jscodeshift (as of v17.1.2), could enable parallel transform execution
  • Cons: Bottleneck is synchronous Tree API, not transform execution; would add complexity without performance gain
  • Verdict: ❌ Rejected - wouldn't address the actual bottleneck

Detailed Analysis of Async Transforms:

While jscodeshift supports async transforms (returning Promises), this feature is designed for transforms that need to perform async I/O operations (e.g., fetching data, reading external files). Our transforms don't need async I/O - they:

  1. Read from the synchronous Tree API (tree.read())
  2. Parse AST (synchronous, cached)
  3. Transform AST (synchronous)
  4. Write to the synchronous Tree API (tree.write())

Making the transform function async wouldn't parallelize these operations because the Tree API calls remain synchronous. The transforms would still execute sequentially when processing multiple files.

3. Batch Processing with Event Loop Yielding

  • Pros: Prevents blocking event loop
  • Cons: Doesn't improve throughput, adds overhead
  • Verdict: ❌ Rejected - no performance benefit

3. Streaming/Incremental Processing

  • Pros: Could reduce memory usage
  • Cons: AST parsing requires full file content
  • Verdict: ❌ Rejected - not applicable to AST parsing

Conclusion

The existing implementation is already near-optimal. The combination of:

  1. AST caching
  2. File tree caching
  3. Pattern analysis optimization
  4. Smart file cache
  5. Early exit optimizations
  6. Single-pass AST traversal

...provides excellent performance (~5-6ms per file in large workspaces) that cannot be meaningfully improved through parallelization given JavaScript's single-threaded nature and the synchronous Nx Tree API.

Recommendations

  1. Keep existing optimizations - they are highly effective
  2. Monitor cache hit rates - log cache statistics for performance insights
  3. Profile new use cases - if performance degrades, investigate specific bottlenecks
  4. Consider parallelization only for - truly async I/O-bound operations (none currently exist in this generator)

References