Best Practices When Fetching Data from a Database

Let’s look at how we can use different techniques for faster, more maintainable code!

Ensure you are familiar with asynchronous execution before trying out these techniques.

Our focus is primarily on where we’re pulling data from tables performing CRUD operations. We aren’t focusing on queries, but they should not be excluded in thought.

Introduction

Say I have a blog component, and I want it to render a summary of the blog posts. The summary should look like this:

  • Is mint and chocolate the ultimate pair?” by Jordon C. Views: 8,128
  • Volcanic eruption in the ocean!” by Ken T. Views: 28
  • Is fast food cancer-causing?” by Tom H. Views: 17

To fetch the data needed to render our summary here’s a basic example:

async function postSummary() {
  let posts = await fetchPosts();
  let summary = [];
  for (let post of posts) {
    let author = await post.fetchAuthor();
    let views = await post.fetchViews();
    let milestone =
      views.viewCount > 0
      ? await fetchMilestone(views.viewCount)
      : undefined;
    summary.push({
      title: post.title,
      authorName: author.name,
      viewCount: views.viewCount,
      milestone
    });
  }
  return summary;
}

For us, we have to interact with our database 3 times. This likely doesn’t run very fast because we loop in a series, and as we scale using this method, a few extra seconds could have a huge impact on our ability to engage visitors.

Let’s try batching…

Batching

A major improvement would be to make our requests in batches, telling our database to fetch multiple things at once. Let’s say we have a function fetchAllAuthors(posts) that takes a list of posts and returns a list of authors equal to the list of posts. Our code may look like this:

async function postsSummary() {
  let posts = await fetchPosts();
  let authors = await fetchAllAuthors(posts);
  let views = await fetchViews(posts);

  let nonZeroViewCounts = [];
  let nonZeroViewCountIndices = [];
  views.forEach((view, idx) => {
    if (view.viewCount > 0) {
      nonZeroViewCounts.push(view.viewCount);
      nonZeroViewCountIndices.push(idx);
    }
  });
  let nonZeroMilestones = await fetchMilestones(nonZeroViewCounts);
  let milestones = Array(posts.length).fill(undefined);
  for (let i = 0; i < nonZeroMilestones.length; i++) {
    let originalIndex = nonZeroViewCountIndices[i];
    milestones[originalIndex] = nonZeroMilestones[i];
  }

  let summary = [];
  for (let i = 0; i < posts.length; i++) {
    summary.push({
      title: posts[i].title,
      authorName: authors[i].name,
      viewCount: views[i].viewCount,
      milestone: milestones[i],
    });
  }
  return summary;
}

Nothing slows down the speed in software development like a complex system. Complexity is inevitable, but it’s also a considerable speed inhibitor.

Let’s try another technique…

Caching

I recommend that most caches shouldn’t last longer than the web request. The memory is freed when the promise resolves, so it’s easier to cache more without worrying about RAM usage. Scoping to one user’s request reduces the chance of accidental privacy leaks due to data contamination between different users’ data.

We get a huge benefit from caching essentially all reads from our database. This is best implemented at a lower level of the application frameworks so that all of your application code that fetches from your database can benefit. Here are quick examples

function memo(func){
  var cache = {};
    return function(){
      var key = JSON.stringify(arguments);
      if (cache[key]){
        console.log(cache)
        return cache[key];
      }
      else{
        val = func.apply(null, arguments);
        cache[key] = val;
        return val; 
      }
  }
}

Using moize library:

import moize from 'moize';

const method = (a, b) => {
  return a + b;
};

const memoized = moize(method);
memoized(2, 4); // 6
memoized(2, 4); // 6, pulled from cache

We might put a @memoize decorator on the post object’s fetchViews() function, which will make a cache that lives only as long as that post is in memory, or a similar function – so that it runs only once even if we change functions that use it.

Parallelism

Revisiting our first example, our biggest problem is that we were processing the posts array in a series. Using parallelism, we process every post, all resolving in parallel thanks to Promise.all()

async function postsSummary() {
  let posts = await fetchPosts();
  let summary = Promise.all(posts.map((post) => postSummary(post)));
  return summary;
}

async function postSummary(post) {
  let author = await post.fetchAuthor();
  let views = await post.fetchViews();
  let milestone =
    views.viewCount > 0
    ? await fetchMilestone(views.viewCount)
    : undefined;
  return {
    title: post.title,
    authorName: author.name,
    viewCount: views.viewCount,
    milestone,
  };
}

This code is easier to maintain as we use a helper function that only focuses on one post. Also, using a map() to avoid mutation.

How does this code perform? This largely is dependent on how our fetch functions are implemented. If we request both the author and views per post, plus the one to fetch the post initially – if we had 100 posts, we would have around 301 interactions with our database. Many requests will take place concurrently, so this will be much faster than our first example. “Yay asynchronous!” Here’s the downside, each database request has a sizable fixed cost, we’re incurring that cost many times, making it slower than using batches as above.

Optimizing Parallelism

Let’s look at a way to optimize the above example:

async function postSummary(post) {
  let author = await post.fetchAuthor();
  let views = await post.fetchViews();
  let milestone =
    views.viewCount > 0
    ? await fetchMilestone(views.viewCount)
    : undefined;
  return ...;
}

Notice that we wait until the author has been fetched before we begin fetching the post’s stats. We can parallelize these as well. We can try this:

  let [
    author,
    views,
  ] = await Promise.all([
    post.fetchAuthor(),
    post.fetchViews(),
  ]);
  let milestone =
    views.viewCount > 0
    ? await fetchMilestone(views.viewCount)
    : undefined;
  return ...;
}

This is still not optimal. If fetching the author takes longer than fetching the views, we unnecessarily wait for the author before we fetch the milestone.

You only want to fetch views once so we can try something like this:

async function postSummary(post) {
  let [
    author,
    [views, milestone],
  ] = await Promise.all([
    post.fetchAuthor(),
    fetchViewsAndMilestone(),
  ]);
  return ...;

  async function fetchViewsAndMilestone() {
    let views = await post.fetchViews();
    let milestone =
      views.viewCount > 0
      ? await fetchMilestone(views.viewCount)
      : undefined;
    return [views, milestone];
  }
}

This code uses optimal parallelism.

Let’s look back at how our function works without data fetching code:

async function postSummary(post) {
  ...
  return {
    title: post.title,
    authorName: ...,
    viewCount: ...,
    milestone: ...,
  };
}

We should think about each separate as there independent. Let’s write code for this:

async function postSummary(post) {
  async function fetchAuthorName() {
    let author = await post.fetchAuthor();
    return author.name;
  }

  async function fetchViewCount() {
    let views = await post.fetchViews();
    return views.viewCount;
  }

  async function fetchViewCountMilestone() {
    let views = await post.fetchViews();
    return
      views.viewCount > 0
      ? await fetchMilestone(views.viewCount)
      : undefined;
  }

  let [
    authorName,
    viewCount,
    milestone,
  ] = await Promise.all([
    fetchAuthorName(),
    fetchViewCount(),
    fetchViewCountMilestone(),
  ]);

  return { title: post.title, authorName, viewCount, milestone };
}

Although there are more lines of code this is maintainable. The problem is, we fetch the same thing more than once and this is never what we want, I recommend caching within the request.

Conclusion

The above techniques are inspired by one thing: “Simplifying local reasoning.”

By this I mean we shouldn’t keep our entire system in our head, being able to maintain and fix parts in isolation is valuable long-term. We should be able to work on one part without the worry of breaking others. Understanding the principles of each technique will help you write your own implementations.

We are a technology company that specializes in building business platforms. If you have any questions about the techniques discussed in this post or about any of our services, feel free to send us an email!

Photo by Kevin Ku on Unsplash