How to become an AI Engineer in 2024

Learn these skills to thrive as a developer in 2024.

Every time we post a job opening for a Junior AI Engineer at INK Solutions, in the first 24 hours we receive hundreds of applications. It may seem like the competition is too fierce to beat, but I’ll explain how easy it is to become the top candidate.

In the job posting, we include a strict requirement - in order for you resume to even be considered, you must record a 1-2 minute video introducing yourself. Guess how many candidates do it? Less than 5%. Majority of applicants don’t even read the job posting.

By making sure your proposal is tailored to the company, you already get ahead of 95% of competition. This will get you noticed. Without a skillset however, you still won’t get hired. Let’s explore what skills you need to learn, how to learn them, and how to prove your competence.

Skills to Learn

You can mistakenly assume that machine learning expertise is a requirement. It is a very valuable skill, but most business problems are solved by integrating existing models like Llama or stable-diffusion. These models are extremely powerful, and integrating them is much cheaper and faster than training one from scratch. For more niche use-cases, you can rely on fine-tuning.

That being said, you should understand how models are trained and basic mathematics behind it (gradient descent, loss function etc.). No need to go super in-depth here, just have a high-level understanding.

Machine learning does have its applications. Some projects really do require training a proprietary model. This is something we do sometimes at INK, but only when it’s really necessary, or when it’s a client’s requirement. Building applications on top of existing models is much more common. We only need a small handful of ML engineers. If that’s something that sparks your interest, go ahead and learnt it.

Be an engineer first

Being a good engineer is the foundation without which you can’t excel at your job. If you’re already a software developer, this is great news for you. Architecture design, databases, design patterns and other skills you learnt prior to the AI boom are still relevant.

AI is just a part of software. Writing high-quality code is the foundation.

It’s worth noting that even this field had changed drastically in the past 2 years. Nowadays, you are expected to use code generation tools. Personally, GPT-4 made me multiple times more productive.

I avoid writing boilerplate code, and I try to delegate whatever I can to GPT. I also use it for brainstorming. I treat it like my employee. Ability to ask the model right questions to gain knowledge will take you very far.

Research shows that ChatGPT can save you 2.5 h a day. For programmers, from experience I can tell you, it saves more.

One of the methods I use to skyrocket productivity is the following. I create a general structure for the code and put high-level comments describing what it should do. Then I copy paste the code into ChatGPT Plus and instruct it to implement it while focusing on quality and maintainability.

Here’s the prompt:

Implement the TODOs. Ensure your code is senior level. It must be performant, high-quality and readable.

public class MealPlanGenerationService {

private final MealRepository mealRepository;

public List<Meal> generate(Integer numMeals, Integer targetKcal) {

/**

* TODO: query meals and output numMeals meals so that total calories * add up to targetKcal as closely as possible

* /

}

}

public record Meal(String title, Integer kcal, List<Ingredient> ingredients) {} ...

And here is the code that was generated:

import java.util.ArrayList;
import java.util.List;

public class MealPlanGenerationService {
    private final MealRepository mealRepository;

    public MealPlanGenerationService(MealRepository mealRepository) {
        this.mealRepository = mealRepository;
    }

    public List<Meal> generate(Integer numMeals, Integer targetKcal) {
        List<Meal> allMeals = mealRepository.findAll();
        return findClosestMealPlan(allMeals, numMeals, targetKcal);
    }

    private List<Meal> findClosestMealPlan(List<Meal> meals, int numMeals, int targetKcal) {
        // Dynamic programming table where dp[i][j] represents the closest we can get to j kcal using i meals
        int[][] dp = new int[numMeals + 1][targetKcal + 1];

        for (int i = 1; i <= numMeals; i++) {
            for (int j = 1; j <= targetKcal; j++) {
                dp[i][j] = dp[i - 1][j]; // Initialize with the value of one less meal
                for (Meal meal : meals) {
                    if (meal.kcal() <= j) {
                        dp[i][j] = Math.max(dp[i][j], dp[i - 1][j - meal.kcal()] + meal.kcal());
                    }
                }
            }
        }

        // Backtrack to find the meals that make up the closest total kcal
        List<Meal> selectedMeals = new ArrayList<>();
        int remainingKcal = targetKcal;
        for (int i = numMeals; i > 0 && remainingKcal > 0; i--) {
            for (Meal meal : meals) {
                if (meal.kcal() <= remainingKcal && dp[i][remainingKcal] == dp[i - 1][remainingKcal - meal.kcal()] + meal.kcal()) {
                    selectedMeals.add(meal);
                    remainingKcal -= meal.kcal();
                    break;
                }
            }
        }

        return selectedMeals;
    }
}

How long would it take you to implement this algorithm yourself? Unless you’re DSA genius, probably a longer while. Leveraging an LLM, you can have it ready in seconds.

With this technique you can write senior level code with just a basic understanding of a programming language / framework.

Prompt Engineering

This leads me to the next skill - prompt engineering. The ability to craft good prompts is one of the most high-leverage skills you could master. The rule “garbage in, garbage out” applies to LLMs more than to anything.

In this article I will not go into depth, but I can illustrate how important this is.

Consider this example. Let’s ask gpt-3.5 a trick question.


The answer is incorrect. We can improve the result by applying what is called a “Self-Consistency” technique.

This becomes even more significant when prompt size increases. LLMs might make up (hallucinate) facts in a very confident way. This becomes dangerous when building Q&A chatbots, for example for customer support.

If you want to learn more about prompt engineering, I recommend a free course by Andrew Ng: https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/

Fine-Tuning

Fine-tuning is basically further training of an existing model. You can do it with GPT, Llama, Mistral, stable-diffusion or really any open-source model. For instance, you could fine-tune a language model to specialize in sentiment analysis or to create content specifically for areas such as healthcare or legal literature, by training it with data specific to those fields.

For a real-world example, consider training GPT to recognize and interpret technical jargon and abbreviations used in the IT industry. By feeding it a dataset rich in IT terminology and context, the model becomes more adept at understanding and generating text that's specific to information technology, making it a tool for creating technical documentation.

This is also useful for cases where we want the model to perform a task difficult to explain in a prompt, for example to teach it flirting (yes, it was an actual commercial project I worked on 😆).

Here’s the process for fine-tuning

  1. Prepare the dataset: 50-200 examples conversations

  2. Start the fine-tuning job: for GPT, it can be done directly through their API. The fine-tuned model will be then automatically deployed and available to use.

Find out more about fine-tuning GPT here: https://platform.openai.com/docs/guides/fine-tuning

Python

We usually write applications in Java, but Python is invaluable for data science and validating ideas. I’m talking about Jupyter (or other notebook interface) here.

It enables you to play with data. As AI engineer you should be very comfortable transforming, visualizing and interpreting data.

Get familiar with the following libraries:

  • pandas

  • numpy

  • matplotlib

  • langchain

  • openai

What to build?

I’m a believer in learning by doing. By building 2-3 solid applications and putting them in your portfolio, you will automatically become one of the top candidates.

Let’s go over exactly what to build (and how to do it).

Retrieval-Augmented Generation Chatbot

This is probably the most common business application of GPT API. As powerful as LLMs are, all they do is predict the next word. By using a retrieval-augmented approach, the chatbot can pull relevant information from a database or knowledge base, providing more accurate and contextually appropriate responses.

Consider an example.

For a healthcare company, there are numerous inquiries related to appointments, medical records, insurance queries, and general health questions. Implementing a chatbot can streamline these interactions, ensuring that patients receive timely and accurate responses, while also reducing the workload on human staff.

You can build a chatbot that answers common questions and can execute some simple actions (save an appointment in the database, update customer address etc.).

There are two key components to RAG.

  1. Large Language Model - to generate human-like responses

  2. Knowledge Base - to store additional information in form of embeddings (vector representation of sentences)

When user asks the chatbot the question, it pull relevant information from the knowledge base, and provides it to the LLM with instructions on how to respond.

Example question/answer (QA) prompt that would be passed to LLM.

You are a customer service chatbot for an XYZ company. Answer the following question.

User's question:
"How can I get a refund?"

Guidelines:
- You are only allowed to answer questions related to the XYZ company.
- If question can't be answered provided on relevant knowledge, do not answer.

Relevant knowledge:

XYZ Company offers refunds for products returned in their original condition
within 30 days of purchase. Services are eligible for a refund if a cancellation
request is made within 14 days of service completion. Digital products are only
refundable if they are defective or not as described. Refunds will be processed
to the original payment method and may take up to 10 business days. Please note
that shipping costs are non-refundable.

Your answer:

As for executing actions, you could have another prompt (executed before the QA prompt) that categorizes it as ACTION or QA

Categorize user's request as ACTION or QUESTION.

ACTION - if user requests to cancel subscription, set appointment or send email
QUESTION - if user asked an actual question, eg. "Are you open on weekends?

User: "I would like to set an appointment for Friday"
Category:

In this case, it will be categorized as ACTION. If user’s request is categorized as action, a different prompt is sent:

Your task is to select the appropriate action for the user's request.

Example:

User: "Can you you cancel my subscription? Id is product-subscription-123
and verification code is JF1FAE24"
Action: cancel_subscription("product-subscription-123", "JF1FAE24")

Guidelines:
- If you don't have all relevant information, ask for clarification
- If no action matches user's request, say "Sorry, I can't do that"

List of actions:

create_appointment(phone_number, date)
cancel_subscription(subscription_id, verification_code)
send_email(receiver, content)

Chat History:
User: "I would like to set an appointment for Friday"
Assistant: "What is your phone number?"
User: "(555) 555-1234"

Action:

LLM will respond with create_appointment("(555) 555-1234", "Friday"). Your backend then needs to parse this respond and be programmed to execute some service when this action is returned, for example save the appointment in the database.

Here’s how it would look like high-level.

Content Factory

Build a machine that pumps out blogs posts on a given topic. Whatever topic you’re interested in. Let’s assume fitness. The bot will come up with ideas for fitness articles and write them. It should also generate images. OpenAI offers image generating API.

The problem with generating content using LLMs, is that it doesn’t sound very human. If you notice that’s painfully visible in your application, you can include in the prompt instruction to speak like a certain person, for example “Write it in style of Alex Hormozi”. If that doesn’t help you can fine-tune the model to answer like somebody you know.

Bonus points if you include a mechanism that allows user to like/dislike content generated by AI, and next generated items would be adjusted accordingly.

Recommendation System

This is a valuable application of AI, money-wise. This is also one of the reasons why TikTok is so addictive. You can build something similar to impress recruiters.

Example: Whatsapp bot that sends you memes everyday. You can like or dislike a meme. Algorithm will then adjust itself to meet your preferences.

How to implement that? Let’s start with the simplest method.

First, you need the database of memes. I’ll leave this part up to you.

For each meme, you would have to generate hashtags. This can be done using OpenAI’s Vision API.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Generate hashtags for the image."},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F7pp55ytrrpy31.png",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

Result:

  1. #CodingHumor

  2. #ProgrammerJokes

  3. #JavaHumor

  4. #NerdLife

  5. #GeekCulture

  6. #JustProgrammerThings

  7. #TechMemes

Now from these hashtags, we need to generate embeddings. OpenAI also offers an API for that.

Whenever user likes a meme, it is recorded in the database. Same for dislikes. Then, when selecting the next meme to recommend, backend finds a meme which’s cosine similarity is closest to user’s liked memes, and furthest from the disliked ones.

You could also embed the image itself as matrix of pixels. There is room for experimentation here. Building this will teach you how to apply linear algebra and operations on vectors. It will also impress the employers a lot.

Good luck!