Using AI to translate code

1

A recent discussion about using AI to rewrite an old program prompted me to wonder about the practical limitations in AI coding. Using AI as a coding assistant comes with risks, but what are the actual limitations (and benefits) of using AI to assist in writing source code.

Translating old code

One area where developers might use AI is to translate code written in an obscure or outdated programming language into a more modern language that today’s developers can understand. When I was an undergraduate physics student in the early 1990s, I learned FORTRAN 77 so I could write my own data analysis programs. So I started there: can AI rewrite old FORTRAN 77 programs into a more modern language like C?

I don’t have an account with Claude, so instead I used Microsoft’s Copilot. I started with this basic prompt:

Rewrite this FORTRAN 77 program in C:

Copilot cheerfully replied that it was ready:

Sure, I’d be happy to help! Please provide the FORTRAN 77 program you want to rewrite in C.

Area of a unit circle

Let’s start with a simple example. I entered a FORTRAN 77 program to calculate the area of a unit circle:

      PROGRAM CIRCLE
      REAL AREA,R
      PARAMETER(PI=3.141592)
      R=1.0
      AREA = PI*R*R
      PRINT *,AREA
      END

The program is not very complex, so it’s a good starting point for testing the limits of AI. The program (called CIRCLE) uses two floating point variables, AREA and R, and a constant value parameter, PI. After initializing the variable R to 1.0, the program calculates the area as π r² and prints the result.

Copilot was able to translate this program, although it used a const float value for PI instead of using a #define statement, as a human programmer might. But the result is still a program that works the same:

Here’s the equivalent C program for your FORTRAN 77 code:

#include <stdio.h>

int main() {
    const float PI = 3.141592;
    float area, r;

    r = 1.0;
    area = PI * r * r;

    printf("%f\n", area);

    return 0;
}

Copilot also summarized the program after showing the updated source code:

This C program calculates the area of a circle with radius 1.0 using the value of π (pi) and prints the result. If you have any questions or need further assistance, feel free to ask!

Copilot translated this FORTRAN 77 program to C

Copilot also presented a series of “buttons” with suggested followup prompts, including “Can you explain the code?” and “What are some improvements for this code?”

I asked it to suggest improvements to the code, and Copilot responded with a list of possible suggestions, plus an updated version of the program:

Here are a few improvements you can make to the C code:

  1. Use double for higher precision:
    • Instead of float, use double for the variables to increase precision.
  2. Define constants using #define:
    • Use #define for defining constants like PI.
  3. Add comments:
    • Include comments to explain the code for better readability.
  4. Check for user input:
    • Allow the user to input the radius instead of hardcoding it.

Here’s the improved version of the code:

#include <stdio.h>

#define PI 3.141592

int main() {
    double area, r;

    // Prompt user for the radius
    printf("Enter the radius of the circle: ");
    scanf("%lf", &r);

    // Calculate the area
    area = PI * r * r;

    // Print the result
    printf("The area of the circle is: %lf\n", area);

    return 0;
}

These changes make the code more flexible, precise, and user-friendly. If you have any other questions or need further assistance, feel free to ask!

That’s not a bad improvement to the program. I was impressed that Copilot even recognized the benefit of using #define for the PI value, since it will never change. That’s effectively the same as FORTRAN’s PARAMETER statement.

Adding a list of numbers

FORTRAN 77 supported a loop structure that modern programmers may find quite odd. The DO statement performed a series of instructions until the specified line label. Where later versions of the language, such as Fortran90, required a CONTINUE (written in all-lowercase, starting with Fortran90) as the last statement in the loop, earlier versions of FORTRAN allowed any valid statement to be the last statement in a loop.

For example, these two loops perform the same function, but the second version may be slightly easier for modern programmers to read because of the CONTINUE statement:

      DO 10 I = 1, 10
10    SUM = SUM + I

and:

      DO 10 I = 1, 10
         SUM = SUM + I
 10   CONTINUE

I asked Copilot to translate a FORTRAN 77 program that used the DO loop to add a list of numbers from 1 to 10:

      PROGRAM MAIN
      INTEGER SUM,I
      PRINT *, 'ADD THE NUMBERS FROM 1 TO 10 ...'
      SUM = 0
      DO 10 I = 1, 10
10    SUM = SUM + I
      PRINT *, SUM
      END

Copilot correctly rewrote this in C, using a for loop. This isn’t a very complicated program, but I was impressed that Copilot correctly translated the DO loop:

Here’s the equivalent C program for your FORTRAN 77 code:

#include <stdio.h>

int main() {
    int sum = 0;
    int i;

    printf("ADD THE NUMBERS FROM 1 TO 10 ...\n");

    for (i = 1; i <= 10; i++) {
        sum += i;
    }

    printf("%d\n", sum);

    return 0;
}

This C program calculates the sum of numbers from 1 to 10 and prints the result. If you have any questions or need further assistance, feel free to ask!

Copilot translated this FORTRAN 77 program to C

Again, Copilot suggested several follow-up prompts, including “can you explain the loop structure?” and “What are some optimizations for this code?” Using the prompt, I asked Copilot to optimize the program for me:

Here are some optimizations and improvements you can make to the C code:

  1. Use a formula for the sum:
    • Instead of using a loop, you can use the formula for the sum of the first n natural numbers: sum = \frac{n(n+1)}{2}. This reduces the time complexity from O(n) to O(1).
  2. Add comments:
    • Include comments to explain the code for better readability.
  3. Use meaningful variable names:
    • Although sum and i are common, using more descriptive names can improve code readability.

Here’s the optimized version of the code:

#include <stdio.h>

int main() {
    int sum;
    int n = 10; // The upper limit of the range

    // Calculate the sum using the formula for the sum of the first n natural numbers
    sum = n * (n + 1) / 2;

    // Print the result
    printf("The sum of numbers from 1 to %d is: %d\n", n, sum);

    return 0;
}

These changes make the code more efficient and easier to understand. If you have any other questions or need further assistance, feel free to ask!

It’s interesting that Copilot recognized this code as being a strict implementation of adding a list of numbers starting from 1, and suggested using a formula instead of a loop. For the specific use case of adding numbers from 1 to n, this works.

The Collatz Conjecture

In mathematics, the Collatz Conjecture, sometimes referred to as the “hailstone sequence,” proposes two operations on any number:

  • If the number is odd, multiply by 3 and add 1
  • If the number is even, divide by 2

Collatz suggests that for any positive number greater than zero, the result will eventually fall to one.

One way to write a FORTRAN 77 program to iterate the Collatz rules is with the computed GOTO statement. Unlike a typical GOTO statement that jumps to a label, old-style FORTRAN also supported a GOTO that accepted a value and a list of labels. If the value was 1, the program would jump to the first label; for 2, the program jumped to the second label; and so on.

This is a particularly tricky GOTO statement for any programmer to navigate. It is made even more difficult by the rule that if the value is out of range for the list (such as zero or a negative value, or the value 5 when there are only four values in the list) then execution proceeds normally to the next statement.

Thus, this statement only jumps to label 30 if the modulo (remainder) of N divided by 2 is 1. Effectively, only jump to label 30 if N is odd, and continue execution if N is even:

      GOTO(30) MOD(N,2)

I asked Copilot to translate this program into C. The program is a nest of GOTO statements that would have been easier to understand had it used other structures:

      PROGRAMCOLLATZ
      INTEGERN
      PRINT*,'ENTER STARTING N'
      READ*,N
10    PRINT*,N
      IF(N.LE.1) GOTO99
      GOTO(30) MOD(N,2)

20    N=N/2
      GOTO10
30    N=3*N+1
      GOTO10
99    END

I’ll admit it; I was amazed that Copilot not only navigated the algorithm behind the complicated use of GOTO statements, but also correctly interpreted it as the Collatz sequence:

Here’s the equivalent C program for your FORTRAN 77 code:

#include <stdio.h>

int main() {
    int n;

    // Prompt user for the starting number
    printf("ENTER STARTING N\n");
    scanf("%d", &n);

    // Collatz sequence
    while (n > 1) {
        printf("%d\n", n);
        if (n % 2 == 0) {
            n = n / 2;
        } else {
            n = 3 * n + 1;
        }
    }

    // Print the final value (which should be 1)
    printf("%d\n", n);

    return 0;
}

This C program implements the Collatz sequence, starting from a user-provided number and printing each step until it reaches 1. If you have any questions or need further assistance, feel free to ask!

Copilot translated this FORTRAN 77 program to C

Copilot also suggested several prompts to dig deeper, including “Can you explain the Collatz conjecture?” and “How to handle invalid input values?” It didn’t provide a prompt for “What are some optimizations for this code?” but I asked anyway:

Here are some optimizations and improvements you can make to the C code:

  1. Add comments:
    • Include comments to explain the code for better readability.
  2. Use a function:
    • Encapsulate the Collatz sequence logic in a function to make the code more modular and reusable.
  3. Check for valid input:
    • Ensure the user inputs a positive integer.
  4. Optimize the loop:
    • Print the final value outside the loop to avoid redundancy.

Here’s the optimized version of the code:

#include <stdio.h>

// Function to perform the Collatz sequence
void collatz(int n) {
    while (n > 1) {
        printf("%d\n", n);
        if (n % 2 == 0) {
            n = n / 2;
        } else {
            n = 3 * n + 1;
        }
    }
    // Print the final value (which should be 1)
    printf("%d\n", n);
}

int main() {
    int n;

    // Prompt user for the starting number
    printf("ENTER STARTING N\n");
    if (scanf("%d", &n) != 1 || n <= 0) {
        printf("Please enter a positive integer.\n");
        return 1;
    }

    // Perform the Collatz sequence
    collatz(n);

    return 0;
}

These changes make the code more readable, modular, and robust. If you have any other questions or need further assistance, feel free to ask!

I’m impressed

While these are short, demonstration programs, I was impressed that AI did so well translating from an old language like FORTRAN 77 into a more modern language like C. It seems to correctly understand some tricky components of the FORTRAN 77 language, converting them correctly to C program flow.

Copilot advertises a list of supported file formats where you can simply upload a source code file ending in py, js, java, php, c, cpp, cxx, h, go, r, and other recognized programming languages. My input samples were in FORTRAN 77, which is not a recognized file type in Copilot, but prompting the AI with “Rewrite this FORTRAN 77 program in C:” seems to provide the necessary context for Copilot to read the next query as a FORTRAN 77 program.

I remain cautious about using AI for programming, especially for any sensitive projects where I don’t want to risk an AI co-author inserting unwanted code. But this experiment demonstrates at least a possibility to use AI responsibly to translate from one programming language to another, such as for migrating legacy code. Perhaps an AI wouldn’t fare so well with more complex code that uses more “corner cases” of the language, but using AI might help with the “heavy lifting” of migrating large portions of old code to newer programming languages.

Leave a Reply