Writing an API Client

Why write an API client

Most automation flows require interacting with HTTP APIs. Communicating with these APIs requires an HTTP client. A ubiquitous example of a client is a web browser. However, for most automation tasks, a web browser is too heavy. We turn to relatively lightweight clients like cURL or in Windows land Invoke-WebRequest.

For automation purposes specifically, we often use the client provided by the language or a popular third-party client. In Python most developers default to using the Requests library. The native urllib.request module is perfectly fine albeit less ergonomic. I’ll be covering how to make a basic HTTP request in the languages that I frequently use later but for now we will be using Python.

Generally, it is advisable to seek out an official API client provided by the developers of that API. For example, Stripe and Twilio both have official SDKs for various languages. These should be used instead of rolling your own client. It can be desirable to roll your own in certain situation, for example, if the automation script will be deployed in a resource constrained environment.

In this example, we will be creating an API client that can be integrated within other pieces of code.

Why create a dedicated API Client?

you can definitely use requests.get('http://example.com/api/xyz') several times in your application
you need to stay on the same abstraction level as the rest of the code to better reason about logic. For example, it is clearer to write client.get_active_facilities() vs requests.get('https://some.domain.com/api/v1/facilities?status=active)
really helps when you want to communicate with multiple API providers. (facade approach)

We will be creating a client for: https://developers.google.com/blogger ( you can find a totally random API with the search: inurl:developers OR inurl:docs AND intext:api)

Aside: Google Dorking

Using search engine advanced features can allow you to narrow and filter huge swaths of information. You can get specific pages that match certain patterns. For examples, see the exploit db for searches that reveal exploits!

Setting up

Refer to the article setting up a python project.

In this article, we will not be going over the intricacies of setting up a python project, but very quickly:

Either use pip or pipenv to create your python environment. There are other alternatives as well for example Poetry.

I will be covering using pipenv. We will do the following:

install requests and google-auth-oauthlib
optionally (recommended): install ipython and other tools (we’ll use it for REPLing!)

# latest and greatest python
pipenv --python 3.9

# for debugging and formatting
pipenv install --dev pytest ipython 'black==21.7b0' isort flake8

# the actual dependencies
pipenv install requests google-auth-oauthlib

# and enter the virtual environment (similar to . ./venv/bin/activate
pipenv shell

You may want to init the git repo at this point as well. At this point you will have the following files in the repo:

.
├── Pipfile
└── Pipfile.lock

Familiarize yourself with the documentation

Look at:

The documentation and models for the API you will be working with (https://developers.google.com/blogger)
The documentation for requests

As I’m going through the documentation for blogger, I realize that I’ll need to figure out authentication. Aside from that, I see that the endpoints are simple. The resources are:

Blog
Post
Page
Comment

They are arranged as:

                         +-------------+
                         |  BLOG       |
                         +-----+-------+
                     +---------+----------+
                +----v-----+         +----v------+
                | POST     |         | PAGE      |
                +----+-----+         +----+------+
                     +
                 +---v-----+
                 | COMMENT |
                 +---------+

The data model

A blog is a top level entity and it contains one or more pages and posts. Both pages and posts can have comments. We need to keep this in mind when designing a client. Generally, modeling the domain allows you to decide whether or not you want an interface that facilitates more native interfaces such as post.append(comment) vs client.create_comment(post_id, comment_content).

Authentication

For the blogger API, it seems like we’ll need to use the following authentication mechanisms to retrieve our data:

API key
OAuth (larger discussion)

We’ll need to use OAuth, and this means working with GCP credentials. This complicates things.

So we install a helper library to do OAuth for us.

Install: https://google-auth-oauthlib.readthedocs.io/en/latest/index.html

Create credentials on the GCP cloud console (could be any GCP account)
Download the client config / secrets json file
In IPython / python run

# scopes can be found at: https://developers.google.com/identity/protocols/oauth2/scopes#blogger
# instructions at: https://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html
iaf = InstalledAppFlow.from_client_secrets_file(
  'path_to_your_client_secrets.json',
  scopes=[
    "https://www.googleapis.com/auth/userinfo.email",
    "https://www.googleapis.com/auth/userinfo.profile",
    "openid",
    "https://www.googleapis.com/auth/blogger"
])

# start a local server to get access to your own data, sign into the account which will contain the blogger data
credentials = iaf.run_local_server(port=8082)

# create the session
session = iaf.authorized_session()

Let’s try getting a blog with requests. You can see that the API documentation states that the endpoint for retrieving a blog looks like: https://www.googleapis.com/blogger/v3/blogs/byurl?url={url}, where url is the url of the blog, in IPython try:

# use session from earlier, this should give you your blog now
blog = session.get('https://www.googleapis.com/blogger/v3/blogs/byurl?url=https://your_blog.com/').json()

# get the posts of the blog by
posts = session.get(f"https://www.googleapis.com/blogger/v3/blogs/{blog['id']}/posts").json()
post, *_ = posts["items"]

# replace the post content by just updating the content field
session.put(f"https://www.googleapis.com/blogger/v3/blogs/{blog['id']}/posts/{post['id']}", json={**post, "content": f"{post['content']}<p>added via API <strong>client</strong></p>"})


# and the same with patching
session.patch(f"https://www.googleapis.com/blogger/v3/blogs/{blog['id']}/posts/{post['id']}", json={"content": "{post['content']}<p>added via API <strong>client</strong>"})

# can we create posts?

At this point you have directly interacted with your blog via an API! The pieces are coming together, but we’ll be creating our own API client that has our desired ergonomics.

What do we want to do now?

get our posts
add a post
update an existing post

Pagination

pagination is something you should consider, but can omit for the initial iteration. Generally, APIs limit the amount of data returned per request to avoid causing a DOS by running out of memory.

Writing API Client From Scratch

Should we use a class or module? This is Python specific.
Managing API tokens / credentials
Deciding on “ergonomics”

Class or Module?

In python, a major decision point is to either use a module to expose an interface that operates on data or to use an actual class. For example, the following are almost equivalent ways to create an interface that operates on private data:

# using a module
_the_data = {}

def get_from_the_data(key: str) -> str:
  return _the_data[key]

And it is used as:

import the_module

the_module.get_from_the_data('key')

I think this is perfectly valid and should work in a lot of cases. However, the drawback is that now you’ll have a singleton data structure that can be globally modified. If that’s what you need, then it’s a perfect approach.

The class analogue is simply:

class TheClass:
  def __init__(self):
    self._the_data = {}

  def get_from_the_data(self, key: str) -> str:
    return self._the_data[key]

And it can be used as:

from module_containing_class import TheClass

the_class = TheClass()
the_class.get_from_the_data('key')

What’s better? What are the trade-offs? Great questions that I will not get into in this post.

Managing API Credentials / Tokens

For this, I’ll just offer one tidbit of advice… DO NOT HARDCODE YOUR TOKENS. Just don’t. At a bare minimum just use environment variables.

$ SECRET="fjlas43ioj3=" your_program

And that’s it. In python, you can access the secrets with os.environ or os.getenv. If your configuration is more complicated, it makes sense to start looking into offloading the configuration to an editable medium such as an ini or yaml file. I don’t recommend json files because, editing those are a pain. Python has built in support for ini files, use it!

Ergonomics

What’s the best way to get posts? Well, in the above scenario, we really just used GET to get the posts. Which is sufficient. In this case, we’ll want something like get_all_posts

Stateful?

Stateful - have a blog instance expose methods such as create_post, get_post, a writable Post stream? This will lead to code that looks like blog.posts[0].update_content(new_content)

Functional?

Have the client take in a set of inputs and return outputs. client.get_posts(blog_id) or client.update_post(post_id, title="new title", content="new content")

The difference and our approach

We’ll take the latter approach. We’ll be using the client as part of a pipeline process so having functional ergonomics will play better here. Furthermore, it is often easier to test.

Modeling

Refer to section above on “data models”. Here we will be creating our domain models.

Let’s model a blog. We know that it has a id, name, description, url etc

@dataclass
class BloggerBlog:
  id: int
  name: str
  description: str
  url: str
  blogger_url: str  # a blogger self_link
  published_at: datetime
  updated_at: datetime

Here is how we’ll represent a post:

@dataclass
class BloggerPost:
  id: int
  blog_id: int
  author_id: int
  title: str
  html_contents: str
  url: str
  blogger_url: str  # a blogger self_link
  published_at: datetime
  updated_at: datetime
  comments: Optional[List["Comment"]] = None

Notice how we slightly change how we name the properties versus how they are given via the API? We are not allowing the API to decide our ergonomics! We choose the domain model the way we want to interact with it.

In the above, I have decided to include a list of comments in the blog post. This will allow me to optionally include comments when retrieving posts for a blog.

Here is what a comment (and a blog author) will look like:

@dataclass
class BlogAuthor:
  id: int
  url: str
  name: str


@dataclass
class BloggerComment:
  id: int
  post_id: int
  blog_id: int
  blogger_url: str
  published_at: datetime
  updated_at: datetime
  content: str
  author: BloggerAuthor

With all that, we’ll be able to start developing some tests. Developing tests will allow us to evaluate our design in an iterative fashion.

Testing

For testing, writing unit tests that mock out the correct responses should be fine. We are really testing the “happy” path. If the API breaks, our client will break. If this is not desired, then handling those types of errors should be a part of your testing plan. For the purpose of this post, I’ll skip that and stick with the “happy” path.

Fixtures

We’ll copy the responses we got when manually testing the API and use those as inputs to our tests. The tests we’ll need are asserting that given an API response, our client outputs the correct objects. One way to easily to this is to write the fixtures to a file with python’s json module. Alternatively, you can use curl + jq.

`pytest` Tests

We’ll be using pytest to write our tests. I find that writing pytest tests are more enjoyable and the plugins more useful. But the python standard library unittest works just as well.

Here is an example of a test we’ll be writing within your repos top-level tests directory.

# tests/test_blogger_client.py
from blogger_client import client

class TestBloggerClient:
  def test_get_posts(self):
    # Given a blogger posts response
    posts_response = fixtures["posts"]  # (1)

    # And the session returns the response
    session = make_session_mock({"/endpoint": {"get": post_response}})  # (2)

    # When the client returns the list of posts via the authenticated session
    client = client.BloggerClient(session=session)  # (3)
    posts = client.get_posts('423423')

    # Then it returns the expected posts
    assert len(posts) == len(post_response["items"])  # (4)
    assert posts[0]["id"] == post_response["items"][0]["id"]


def make_session_mock(session_config):
  # create a mock session to inject into our client: # (5)
  return mock.MagicMock(
    spec=requests.Session,
    autospec=True,
    **{
      f'{method.lower()}.side_effect': lambda url, *args, **kwargs: session_config[method][url]
      for method in session_config
    }
  )

There are a couple of things in there to unpack

we use a fixtures repository to maintain the fixtures so that we do not have to load a file for each run. We can use pytest fixtures for this, but for now, we opt for a simpler route
the session is the boundary for things which we do not control, in this case, we are creating a mock session object that returns our expected data
We instantiate our client (although we haven’t coded it yet) and we evaluate the method which we decided upon above to fetch the posts (would fetch_posts have been better? Now is the time to decide!)
finally we assert some rudimentary properties of the expected results
As an aside, you can create your own mock replacement of a request session object. Alternatively you can use the responses library. (I suggest doing the latter, I’ll be going over using test utilities in a later post)

Now that we have our first test, we can start coding up the client and make sure our tests pass.

Aside: TDD

Test Driven Development (TDD) is a paradigm of software development that emphasizes writing passing tests before code implementation it allows one to create more decoupled designs early on due to the desire for easier tests (are you mocking? or injecting dependencies?) my take is, it allows quicker iteration by not constantly running your program and tediously checking output… that’s why I like it the upfront harness setup requires effort but the benefits are seen immediately after running your first test. Also, it fights “writers block”! You can couple a test runner with a watcher program such as watch to execute tests when related files have changed.

The minimal amount of code we need to achieve or pass the test is below:

create the client instantiation functions
create the list_posts method
create the BloggerPost class
create the API response adapters

See the changes at this commit

In a similar workflow, you will be able to create additional methods such as listing blog comments. For now, we’ll defer that since the goal is to read/create/update posts. Let’s turn our attention to the next core goal, creating posts.

The following APIs should work for now:

post = client.create_post(blog_id, author_id, title, content)
client.update_post(blog_id, post.id, title=None, content)

To do this, we will once again see which endpoints we’ll need to interact with:

POST https://www.googleapis.com/blogger/v3/blogs/{blog_id}/posts to create posts
PATCH https://www.googleapis.com/blogger/v3/blogs/{blog_id}/posts/{post_id} to update existing posts` title or content

There is a section in the docs about creating a post via HTTP, it is unclear but it seems like all that is needed is a title and content. Let’s test to verify this and also capture the output as a fixture:

>>> resp = session.post(f"https://www.googleapis.com/blogger/v3/blogs/{blog['id']}/posts", json={"content": "<h1>hello there</h1>", "title": "new post!"})
>>> resp.json()
{'kind': 'blogger#post',
 'id': '3989436910247577401',
 'status': 'LIVE',
 'blog': {'id': '4660844935009290279'},
 'published': '2021-09-07T19:38:00-07:00',
 'updated': '2021-09-07T19:38:09-07:00',
 'url': 'http://www.sohailkhan.me/2021/09/new-post.html',
 'selfLink': 'https://www.googleapis.com/blogger/v3/blogs/4660844935009290279/posts/3989436910247577401',
 'title': 'new post!',
 'content': '<h1>hello there</h1>',
 'author': {'id': '10424770373055138157',
  'displayName': "Sohail's Tech Blog",
  'url': 'https://www.blogger.com/profile/10424770373055138157',
  'image': {'url': '//www.blogger.com/img/blogger_logo_round_35.png'}},
 'replies': {'totalItems': '0',
  'selfLink': 'https://www.googleapis.com/blogger/v3/blogs/4660844935009290279/posts/3989436910247577401/comments'},
 'readerComments': 'ALLOW',
 'etag': '"dGltZXN0YW1wOiAxNjMxMDY4Njg5OTE3Cm9mZnNldDogLTI1MjAwMDAwCg"'}

Nice. You can see that the docs don’t specify the minimum set of attributes required for creating a post. Sometimes, you just have to experiment. What would happen if we leave out a title? Can we specify the readerComments attribute? All good questions to consider.

Next, I will go over adding the ability to create posts.

But before that, let’s add the tests:

from blogger_client import client

class TestBloggerClient:

  # ... above this we have tests for listing posts from above
  def test_create_post(self):
    # Given a blogger posts create response
    posts_create_response = fixtures["posts_create"]  # (1)

    # And a blog / author combo with post content
    data = {
      "blog_id": 'my-blog-id',
      "title": 'new post!',
      "html_content": '<h1>hello there</h1>',  # (2)
    }

    # And the session returns the response
    session = make_session({"/endpoint": {"post": post_response}})  # (3)

    # When the client receives a request to create a post
    client = client.BloggerClient(session=session)
    new_post = client.create_post(blog_id=data['blog_id'], title=data["title"], html_content=data["html_content"])

    # Then it creates the post
    assert new_post_metadata["id"] == post_create_response["id"]

From above, there isn’t anything new. Our mock session (#3) is still doing the same thing, except this time it uses our posts fixture from above (#1). One thing I want to point out is that I decided to change the API of the function to take in html_content. This is because we are accepting an arbitrary HTML string and it is always good to be explicit. Developing this way can lead you to re-designing your API, and that’s fine. That’s the whole point.

Next, we’ll add the minimal amount of code to satisfy the tests, see the commit here. To note, we only need to add a single method. The bulk of the work was done in the earlier commit.

Run the tests now, all green. The ability to update posts will not be covered but should be done in a similar way.

In the end, you should now be able to import the client and do work, exactly how you used the client within the tests! If you want to see the full repo, check out the github link.

In a follow up post, we’ll cover:

how to expose the client we created as a CLI (we’ll be using invoke)
how to make the client installable
how to publish the client
how to maintain the client

For now, I’ll be demonstrating a simple pipeline in which we may use the client (I’m using another API client that I have made for Trello as well):

# trello client
trello_client = TrelloClient()
publishable_posts = trello_client.list_cards(board_id, list_name='publishable')
for publishable_post in publishable_posts:
  if publishable_post.custom_attributes.is_on_local_computer:
    post_slug = publishable_post.custom_attributes.post_slug
    post_content_filepath = Path(posts_location / f"{post_slug}.md")

    # you can read in the file content and pass the bytes to stdin as well!
    converted_output = subprocess.run(f'pandoc -f markdown -t html {post_content_filepath}'.split(), capture_output=True, check=True)
    client.create_post(blog_id=blog_id, title=publishable_post.title, html_content=converted_output.stdout.decode('utf-8'))

And now you have a blog publishing pipeline that you can extend for automated trello -> blogger publishing! At this point, the sky is the limit.

Search This Blog

Sohail's Tech Blog