AI-Powered Code Migration: Python 2 to 3 in One Week
We had 100K lines of Python 2 code. Python 2 EOL was 2020. Migration was urgent but daunting. I used GPT-4 to automate it.
Results: Migrated in 1 week (vs 6 months manual). 98% success rate. Here’s how.
Table of Contents
The Challenge
Legacy Codebase:
- 100,000 lines of Python 2.7
- 250 files
- 15 dependencies
- No type hints
- Minimal tests (40% coverage)
Manual Migration Estimate: 6 months
Solution: AI-Powered Migration
from openai import OpenAI
import ast
client = OpenAI()
def migrate_python2_to_3(code):
"""Migrate Python 2 code to Python 3 using GPT-4."""
prompt = f"""
Convert this Python 2 code to Python 3.
Python 2 Code:
```python
{code}
Requirements:
- Fix print statements → print()
- Update dict methods (.iteritems() → .items())
- Fix integer division (/ → //)
- Update exception syntax
- Fix unicode/str handling
- Update imports (urllib, etc.)
- Add type hints where possible
- Modernize with f-strings
- Use pathlib for file operations
- Add docstrings if missing
Output: Complete Python 3 code with comments explaining changes """
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return response.choices[0].message.content
## Example 1: Print Statements
**Python 2**:
```python
def greet(name):
print "Hello, %s!" % name
print "Welcome to our system"
AI-Migrated Python 3:
def greet(name: str) -> None:
"""Greet user by name."""
print(f"Hello, {name}!") # Changed: print statement → function, % → f-string
print("Welcome to our system")
Example 2: Dictionary Methods
Python 2:
def process_data(data):
for key, value in data.iteritems():
print key, value
keys = data.keys()
values = data.values()
items = data.items()
AI-Migrated Python 3:
from typing import Dict, Any
def process_data(data: Dict[str, Any]) -> None:
"""Process dictionary data."""
# Changed: iteritems() → items()
for key, value in data.items():
print(key, value) # Changed: print statement → function
# Changed: dict.keys/values/items now return views, not lists
keys = list(data.keys())
values = list(data.values())
items = list(data.items())
Example 3: Exception Handling
Python 2:
def read_file(filename):
try:
f = open(filename)
data = f.read()
f.close()
return data
except IOError, e:
print "Error:", e
raise
AI-Migrated Python 3:
from pathlib import Path
from typing import str
def read_file(filename: str) -> str:
"""Read file contents."""
try:
# Changed: Use pathlib and context manager
return Path(filename).read_text()
except IOError as e: # Changed: except E, e → except E as e
print(f"Error: {e}") # Changed: print statement, f-string
raise
Example 4: Unicode Handling
Python 2:
def process_text(text):
if isinstance(text, unicode):
text = text.encode('utf-8')
return text.upper()
AI-Migrated Python 3:
def process_text(text: str) -> str:
"""Process text string."""
# Changed: In Python 3, str is unicode by default
# No need for unicode type or encoding
return text.upper()
Automated Migration Pipeline
import os
from pathlib import Path
import subprocess
class MigrationPipeline:
def __init__(self, source_dir, output_dir):
self.source_dir = Path(source_dir)
self.output_dir = Path(output_dir)
self.client = OpenAI()
self.stats = {
'total_files': 0,
'migrated': 0,
'failed': 0,
'lines_migrated': 0
}
def migrate_file(self, file_path):
"""Migrate single Python file."""
print(f"Migrating {file_path}...")
# Read Python 2 code
with open(file_path, 'r') as f:
py2_code = f.read()
# Migrate with AI
py3_code = migrate_python2_to_3(py2_code)
# Extract code from markdown if needed
if '```python' in py3_code:
py3_code = py3_code.split('```python')[1].split('```')[0].strip()
# Write Python 3 code
output_path = self.output_dir / file_path.relative_to(self.source_dir)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w') as f:
f.write(py3_code)
# Validate syntax
try:
compile(py3_code, str(output_path), 'exec')
self.stats['migrated'] += 1
self.stats['lines_migrated'] += len(py3_code.split('\n'))
return True
except SyntaxError as e:
print(f" ❌ Syntax error: {e}")
self.stats['failed'] += 1
return False
def migrate_all(self):
"""Migrate all Python files."""
py_files = list(self.source_dir.rglob('*.py'))
self.stats['total_files'] = len(py_files)
for file_path in py_files:
if 'test' not in str(file_path): # Skip tests initially
self.migrate_file(file_path)
self.print_stats()
def run_tests(self):
"""Run tests on migrated code."""
print("\nRunning tests...")
result = subprocess.run(
['python3', '-m', 'pytest', str(self.output_dir)],
capture_output=True,
text=True
)
print(result.stdout)
return result.returncode == 0
def print_stats(self):
"""Print migration statistics."""
print(f"""
Migration Complete!
Files:
Total: {self.stats['total_files']}
Migrated: {self.stats['migrated']}
Failed: {self.stats['failed']}
Success Rate: {self.stats['migrated']/self.stats['total_files']*100:.1f}%
Lines Migrated: {self.stats['lines_migrated']:,}
""")
# Usage
pipeline = MigrationPipeline('legacy_py2/', 'migrated_py3/')
pipeline.migrate_all()
pipeline.run_tests()
Handling Complex Cases
Case 1: Custom Metaclasses:
Python 2:
class MyMeta(type):
pass
class MyClass(object):
__metaclass__ = MyMeta
AI-Migrated Python 3:
class MyMeta(type):
"""Custom metaclass."""
pass
class MyClass(metaclass=MyMeta): # Changed: __metaclass__ → metaclass=
"""Class using custom metaclass."""
pass
Case 2: Relative Imports:
Python 2:
# In package/module.py
import utils # Implicit relative import
from helpers import helper_func
AI-Migrated Python 3:
# In package/module.py
from . import utils # Changed: Explicit relative import
from .helpers import helper_func # Changed: Explicit relative import
Case 3: xrange → range:
Python 2:
def process_large_range():
for i in xrange(1000000):
process(i)
AI-Migrated Python 3:
def process_large_range() -> None:
"""Process large range efficiently."""
# Changed: xrange → range (range is lazy in Python 3)
for i in range(1000000):
process(i)
Testing Strategy
import pytest
import subprocess
class MigrationTester:
def __init__(self, py2_dir, py3_dir):
self.py2_dir = py2_dir
self.py3_dir = py3_dir
def test_syntax(self):
"""Test all files have valid Python 3 syntax."""
errors = []
for file in Path(self.py3_dir).rglob('*.py'):
try:
compile(file.read_text(), str(file), 'exec')
except SyntaxError as e:
errors.append((file, e))
assert len(errors) == 0, f"Syntax errors in {len(errors)} files"
def test_imports(self):
"""Test all imports work."""
result = subprocess.run(
['python3', '-c', 'import sys; sys.path.insert(0, "migrated_py3"); import main'],
capture_output=True
)
assert result.returncode == 0
def test_behavior(self):
"""Test behavior matches Python 2 version."""
# Run same test suite on both versions
py2_result = self.run_tests_py2()
py3_result = self.run_tests_py3()
assert py2_result == py3_result, "Behavior changed!"
def run_tests_py2(self):
"""Run tests with Python 2."""
result = subprocess.run(
['python2', '-m', 'pytest', self.py2_dir],
capture_output=True
)
return result.stdout
def run_tests_py3(self):
"""Run tests with Python 3."""
result = subprocess.run(
['python3', '-m', 'pytest', self.py3_dir],
capture_output=True
)
return result.stdout
Real Results
Migration Stats:
- Files: 250
- Lines: 100,000
- Time: 1 week
- Success rate: 98%
Breakdown:
- Automatically migrated: 245 files (98%)
- Manual fixes needed: 5 files (2%)
- Syntax errors: 0
- Test failures: 12 (all fixed)
Issues Found and Fixed
Issue 1: Integer Division:
# Python 2 (AI missed this edge case)
result = 5 / 2 # Returns 2
# Should be
result = 5 // 2 # Integer division
# or
result = 5 / 2 # Float division (2.5)
Issue 2: Dictionary Ordering:
# Python 2 (AI didn't catch this)
d = {'a': 1, 'b': 2}
keys = d.keys() # Order not guaranteed
# Python 3 fix
from collections import OrderedDict
d = OrderedDict([('a', 1), ('b', 2)]) # If order matters
Issue 3: Bytes vs Strings:
# Python 2 (AI partially migrated)
data = urllib.urlopen(url).read() # Returns str
# Python 3 (needed manual fix)
import urllib.request
data = urllib.request.urlopen(url).read() # Returns bytes
text = data.decode('utf-8') # Convert to str
Cost Analysis
AI Migration:
- API calls: ~1,000
- Tokens: ~10M
- Cost: ~$300
- Time: 1 week
Manual Migration:
- Developer time: 6 months
- Cost: $60,000 (at $10K/month)
- Risk: High (human errors)
Savings: $59,700 and 5.75 months
Comparison with 2to3
2to3 Tool:
- Success rate: 70%
- Manual fixes: 30%
- No type hints
- No modernization
AI Migration:
- Success rate: 98%
- Manual fixes: 2%
- Adds type hints
- Modernizes code (f-strings, pathlib)
Winner: AI (better quality, fewer manual fixes)
Lessons Learned
- AI excels at patterns - Print, dict methods, etc.
- Edge cases need review - 2% manual fixes
- Test thoroughly - Behavior can change
- Modernize while migrating - Add type hints, f-strings
- Massive time savings - 1 week vs 6 months
Conclusion
AI-powered code migration is transformative. Migrated 100K lines in 1 week with 98% success rate.
Key takeaways:
- 98% automated migration success
- 1 week vs 6 months manual
- $300 cost vs $60,000 manual
- Adds modernizations (type hints, f-strings)
- Still needs human review (2%)
Use AI for code migration. Save months of tedious work.