RSS

API Deployment News

These are the news items I've curated in my monitoring of the API space that are related to deploying APIs and thought worth enough to include in my research. I'm using all of these links to better understand how APIs are being deployed across a diverse range of implementations.

Working To Keep Programming At Edges Of The API Conversation

I’m fascinated by the dominating power of programming languages. There are many ideological forces at play in the technology sector, but the dogma that exists within each programming language community continues to amaze me. The potential absence of programming language dogma within the world of APIs is one of the reasons I feel it has been successful, but alas, other forms of dogma tends to creep in around specific API approaches and philosophies, making API evangelism and adoption always a challenge.

The absence of programming languages in the API design, management, and testing discussion is why they have been so successful. People in these disciplines have ben focused on the language agnostic aspects of just doing business with APIs. It is also one of the reasons the API deployment conversation still is still so fragmented, with so many ways of getting things done. When it comes to API deployment, everyone likes to bring their programming language beliefs to the table, and let it affect how we actually deliver this API, and in my opinion, why API gateways have the potential to make a comeback, and even excel when it comes to playing the role of API intermediary, proxy, and gateway.

Programming language dogma is why many groups have so much trouble going API first. They see APIs as code, and have trouble transcending the constraints of their development environment. I’ve seen many web or HTTP APIs called Java API, Python APIs, or reflect a specific language style. It is hard for developers to transcend their primary programming language, and learn multiple languages, or think in a language agnostic way. It is not easy for us to think out of our boxes, and consider external views, and empathize with people who operate within other programming or platform dimensions. It is just easier to see the world through our lenses, making the world of APIs either illogical, or something we need to bend to our way of doing things.

I’m in the process of evolving from my PHP and Node.js realm to a Go reality. I’m not abandoning the PHP world because many of my government and institutional clients still operate in this world, and I’m using Node.js for a lot of serverless API stuff I’m doing. However I can’t ignore the Go buzz I keep coming stumbling upon. I also feel like it is time for a paradigm shift, forcing me out of my comfort zone and push me to think in a new language. This is something I like to do every five years, shifting my reality, keeping me on my toes, and forcing me to not get too comfortable. I find that this keeps me humble and thinking across programming languages, which is something that helps me realize the power of APIs, and how they transcend programming languages, and make data, content, algorithms, and other resources more accessible via the web.


API Gateway As A Just Node And Moving Beyond Backend and Frontend

The more I study where the API management, gateway, and proxy layer is headed, the less I’m seeing a front-end or a backend, I’m seeing just a node. A node that can connect to existing databases, or what is traditionally considered a backend system, but also can just as easily proxy and be a gateway to any existing API. A lot has been talked about when it comes to API gateways deploying APIs from an existing database. There has also been considerable discussion around proxying existing internal APIs or web services, so that we can deploy newer APIs. However, I don’t think there has been near as much discussion around proxying other 3rd party public APIs–which flips the backend and frontend discuss on its head for me.

After profiling the connector and plugin marketplaces for a handful of the leading API management providers I am seeing a number of API deployment opportunities for Twilio, Twitter, SendGrid, etc. Allowing API providers to publish API facades for commonly used public APIs, and obfuscate away the 3rd party provider, and make the API your own. Allowing providers to build a stack of APIs from existing backend systems, private APIs and services, as well as 3rd party public APIs. Shifting gateways and proxies from being API deployment and management gatekeepers for backend systems, to being nodes of connectivity for any system, service, and API that you can get your hand on. Changing how we think about designing, deploying, and managing APIs at scale.

I feel like this conversation is why API deployment is such a hard thing to discuss. It can mean so many different things, and be accomplished in so many ways. It can be driven by a databases, or strictly using code, or be just taking an existing API and turning it into something new. I don’t think it is something that is well understood amongst developer circles, let alone in the business world. An API gateway can be integration just as much as it can be about deployment. It can be both simultaneously. Further complexing what APIs are all about, but also making the concept of the API gateway a more powerful concept, continuing to renew this relic of our web services past, into something that will accommodate the future.

What I really like about this notion, is that it ensures we will all be API providers as well as consumers. The one-sided nature of API providers in recent years has always frustrated me. It has allowed people to stay isolated within their organizations, and not see the world from the API consumer perspective. While API gateways as a node won’t bring everyone out of their bubble, it will force some API providers to experience more pain than they have historically. Understanding what it takes to to not just provide APIs, but what it takes to do so in a more reliable, consistent and friendly way. If we all feel the pain our integration and have to depend on each other, the chances we will show a little more empathy will increase.


My GraphQL Thoughts After Almost Two Years

It has been almost a year and half since I first wrote my post questioning that GraphQL folks didn’t want to do the hard work of API design, which I also clarified that I was keeping my mind open regarding the approach to delivering APIs. I’ve covered several GraphQL implementations since then, as well as my post on waiting the GraphQL assault out-to which I received a stupid amount of, “dude you just don’t get it!”, and “why you gotta be so mean?” responses.

GraphQL Is A Tool In My Toolbox
I’ll start with my official stance on GraphQL. It is a tool in my API toolbox. When evaluating projects, and making recommendations regarding what technology to use, it exists alongside REST, Hypermedia, gRPC, Server-Sent Events (SSE), Websockets, Kafka, and other tools. I’m actively discussing the options with my clients, helping them understand the pros and cons of each approach, and working to help them define their client use cases, and when it makes sense I’m recommending a query language layer like GraphQL. When there are a large number of data points, and a knowledgeable, known group of developers being targeted for building web, mobile, and other client applications, it can make sense. I’m finding GraphQL to be a great option to augment a full stack of web APIs, and in many cases a streaming API, and other event-driven architectural considerations.

GraphQL Does Not Replace REST
Despite what many of the pundits and startups would like to be able to convince everyone of, GraphQL will not replace REST. Sorry, it doesn’t reach as wide of an audience as REST does, and it still keeps APIs in the database, and data literate club. It does make some very complex things simpler, but it also makes some very simple things more complex. I’ve reviewed almost 5 brand new GraphQL APIs lately where the on-boarding time was in the 1-2 hour range, rather than minutes with many other more RESTful APIs. GraphQL augments RESTful approaches, and in some cases it can replace them, but it will not entirely do away with the simplicity of REST. Sorry, I know you don’t want to understand the benefits REST brings to the table, and desperately want there to be a one size fits all solution, but it just isn’t the reality on the ground. Insisting that GraphQL will replace REST does everyone more harm than good, hurts the overall API community, and will impede GraphQL adoption–regardless of what you want to believe.

The GraphQL Should Embrace The Web
One of my biggest concerns around the future of GraphQL is its lack of a focus on the web. If you want to see the adoption you envision, it has to be bigger than Facebook, Apollo, and the handful of cornerstone implementations like Github, Pinterest, and others. If you want to convert others into being believers, then propose the query language to become part of the web, and improve web caching so that it rises to the level of the promises being made about GraphQL. I know underneath the Facebook umbrella of GraphQL and React it seems like everything revolves around you, but there really is a bigger world out there. There are more mountains to conquer, and Facebook won’t be around forever. To many of the folks I’m trying to educate about GraphQL, they can’t help but see the shadow of Facebook, and their lack of respect for the web. GraphQL may move fast, and break some things, but it won’t have the longevity all y’all believe so passionately in, if you don’t change your tune. I’ve been doing this 30 years now, and seen a lot of technology come and go. My recommendation is to embrace the web, bake GraphQL into what is already happening, and we’ll all be better off for it.

GraphQL Does Not Do What OpenAPI Does
I hear a lot of pushback on my OpenAPI work, telling me that GraphQL does everything that OpenAPI does, and more! No, no it doesn’t. By saying that you are declaring that you don’t have a clue what OpenAPI does, and that you are just pushing a trend, and have very little interest in the viability of the APIs I’m building, or my client’s needs. There is an overlap in what GraphQL delivers, and what OpenAPI does, but they both have their advantages, and honestly OpenAPI has a lot more tooling, adoption, and enjoys a diverse community of support. I know in your bubble that GraphQL and React dominates the landscape, but on my landscape it is just a blip, and there are other tools in the toolbox to consider. Along with the web, the GraphQL should be embracing the OpenAPI community, understanding the overlap, and reaching out to the OAI to help define the GraphQL layer for the specification. Both communities would benefit from this work, rather than trying to replace something you don’t actually understand.

GraphQL Dogma Continues To Get In The Way
I am writing this post because I had another GraphQL believer mess up my chances for it being in the roadmap, because of their overzealous beliefs, and allowing dogma to drive their contribution to the conversation. This is the 3rd time I’ve had this happen on high profile projects, and while the GraphQL conversation hasn’t been ruled out, it has been slowed, pushed back, and taken off the table, due to pushing the topic in unnecessary ways. The conversation unfortunately wasn’t about why GraphQL was a good option, the conversations was dominated by GraphQL being THE SOLUTION, and how it was going to fix everything web service and API that came before it. This combined with inadequate questions regarding IP concerns conveyed, and GraphQL being a “Facebook solution”, set back the conversations. In each of these situations I stepped in to help regulate the conversations, and answer questions, but the damage was already done, and leadership was turned off to the concept of GraphQL because of overzealous, dogmatic beliefs in this relevant technology. Which brings me back to why I’m pushing back on GraphQL, not because I do not get the value it brings, but because the rhetoric around how it is being sold and pushed.

No, I Do Get What GraphQL Does
This post will result in the usual number of Tweets, DMs, and emails telling me I just don’t get GraphQL. I do. I’ve installed and played with it, and hacked against several implementations. I get it. It makes sense. I see why folks feel like it is useful. The database guy in me gets why it is a viable solution. It has a place in the API toolbox. However, the rhetoric around it is preventing me from putting it to use in some relevant projects. You don’t need to convince me of its usefulness. I see it. I’m also not interested in convincing y’all of the merits of REST, Hypermedia, gRPC, or the other tools in the toolbox. I’m interested in applying the tools as part of my work, and the tone around GraphQL over the last year or more hasn’t been helping the cause. So, please don’t tell me I don’t get what GraphQL does, I do. I don’t think y’all get the big picture of the API space, and how to help ensure GraphQL is in it for the long haul, and achieving the big picture adoption y’all envision.

Let’s Tone Down The Rhetoric
I’m writing about this because the GraphQL rhetoric got in my way recently. I’m still waiting out the GraphQL assault. I’m not looking to get on the bandwagon, or argue with the GraphQL cult, I’m just looking to deliver on the projects I’m working on. I don’t need to be schooled on the benefits of GraphQL, and what the future will hold. I’m just looking to get business done, and support the needs of my clients. I know that the whole aggressive trends stuff works in Silicon Valley, and the startup community. But in government, and some of the financial, insurance, and healthcare spaces I’ve been putting GraphQL on the table, the aggressive rhetoric isn’t working. It is working against the adoption of the query language. Honestly it isn’t just the rhetoric, I don’t feel the community is doing enough to satisfy the wider IP, and Facebook connection to help make the sale. Anyways, I’m just looking to vent, and push back on the rhetoric aound GraphQL replacing REST. After a year and a half I’m convinced of the utility of GraphQL, however the wider tooling, and messaging around it still hasn’t won me over.


OpenAPI Is The Contract For Your Microservice

I’ve talked about how generating an OpenAPI (fka Swagger) definition from code is still the dominate way that microservice owners are producing this artifact. This is a by-product of developers seeing it as just another JSON artifact in the pipeline, and from it being primarily used to create API documentation, often times using Swagger UI–which is also why it is still called Swagger, and not OpenAPI. I’m continuing my campaign to help the projects I’m consulting on be more successful with their overall microservices strategy by helping them better understand how they can work in concert by focus in on OpenAPI, and realizing that it is the central contract for their service.

Each Service Beings With An OpenAPI Contract There is no reason that microservices should start with writing code. It is expensive, rigid, and time consuming. The contract that a service provides to clients can be hammered out using OpenAPI, and made available to consumers as a machine readable artifact (JSON or YAML), as well as visualized using documentation like Swagger UI, Redocs, and other open source tooling. This means that teams need to put down their IDE’s, and begin either handwriting their OpenAPI definitions, or being using an open source editor like Swagger Editor, Apicurio, API GUI, or even within the Postman development environment. The entire surface area of a service can be defined using OpenAPI, and then provided using mocked version of the service, with documentation for usage by UI and other application developers. All before code has to be written, making microservices development much more agile, flexible, iterative, and cost effective.

Mocking Of Each Microservice To Hammer Out Contract Each OpenAPI can be used to generate a mock representation of the service using Postman, Stoplight.io, or other OpenAPI-driven mocking solution. There are a number of services, and tooling available that takes an OpenAPI, an generates a mock API, as well as the resulting data. Each service should have the ability to be deployed locally as a mock service by any stakeholder, published and shared with other team members as a mock service, and shared as a demonstration of what the service does, or will do. Mock representations of services will minimize builds, the writing of code, and refactoring to accommodate rapid changes during the API development process. Code shouldn’t be generated or crafted until the surface area of an API has been worked out, and reflects the contract that each service will provide.

OpenAPI Documentation Always AVailable In Repository Each microservice should be self-contained, and always documented. Swagger UI, Redoc, and other API documentation generated from OpenAPI has changed how we deliver API documentation. OpenAPI generated documentation should be available by default within each service’s repository, linked from the README, and readily available for running using static website solutions like Github Pages, or available running locally through the localhost. API documentation isn’t just for the microservices owner / steward to use, it is meant for other stakeholders, and potential consumers. API documentation for a service should be always on, always available, and not something that needs to be generated, built, or deployed. API documentation is a default tool that should be present for EVERY microservice, and treated as a first class citizen as part of its evolution.

Bringing An API To Life Using It’s OpenAPI Contract Once an OpenAPI contract has been been defined, designed, and iterated upon by service owner / steward, as well as a handful of potential consumers and clients, it is ready for development. A finished (enough) OpenAPI can be used to generate server side code using a popular language framework, build out as part of an API gateway solution, or common proxy services and tooling. In some cases the resulting build will be a finished API ready for use, but most of the time it will take some further connecting, refinement, and polishing before it is a production ready API. Regardless, there is no reason for an API to be developed, generated, or built, until the OpenAPI contract is ready, providing the required business value each microservice is being designed to deliver. Writing code, when an API will change is an inefficient use of time, in a virtualized API design lifecycle.

OpenAPI-Driven Monitoring, Testing, and Performance A read-to-go OpenAPI contract can be used to generate API tests, monitors, and deliver performance tests to ensure that services are meeting their business service level agreements. The details of the OpenAPI contract become the assertions of each test, which can be executed against an API on a regular basis to measure not just the overall availability of an API, but whether or not it is actually meeting specific, granular business use cases articulated within the OpenAPI contract. Every detail of the OpenAPI becomes the contract for ensuring each microservice is doing what has been promised, and something that can be articulated and shared with humans via documentation, as well as programmatically by other systems, services, and tooling employed to monitor and test accordingly to a wider strategy.

Empowering Security To Be Directed By The OpenAPI Contract An OpenAPI provides the entire details of the surface area of an API. In addition to being used to generate tests, monitors, and performance checks, it can be used to inform security scanning, fuzzing, and other vital security practices. There are a growing number of services and tooling emerging that allow for building models, policies, and executing security audits based upon OpenAPI contracts. Taking the paths, parameters, definitions, security, and authentication and using them as actionable details for ensuring security across not just an individual service, but potentially hundreds, or thousands of services being developed across many different teams. OpenAPI quickly is becoming not just the technical and business contract, but also the political contract for how you do business on web in a secure way.

OpenAPI Provides API Discovery By Default An OpenAPI describes the entire surface area for the request and response of each API, providing 100% coverage for all interfaces a services will possess. While this OpenAPI definition will be generated mocks, code, documentation, testing, monitoring, security, and serving other stops along the lifecycle, it provides much needed discovery across groups, and by consumers. Anytime a new application is being developed, teams can search across the team Github, Gitlab, Bitbucket, or Team Foundation Server (TFS), and see what services already exist before they begin planning any new services. Service catalogs, directories, search engines, and other discovery mechanisms can use OpenAPIs across services to index, and make them available to other systems, applications, and most importantly to other humans who are looking for services that will help them solve problems.

OpenAPI Deliver The Integration Contract For Client OpenAPI definitions can be imported in Postman, Stoplight, and other API design, development, and client tooling, allowing for quick setup of environment, and collaborating with integration across teams. OpenAPIs are also used to generate SDKs, and deploy them using existing continuous integration (CI) pipelines by companies like APIMATIC. OpenAPIs deliver the client contract we need to just learn about an API, get to work developing a new web or mobile application, or manage updates and version changes as part of our existing CI pipelines. OpenAPIs deliver the integration contract needed for all levels of clients, helping teams go from discovery to integration with as little friction as possible. Without this contract in place, on-boarding with one service is time consuming, and doing it across tens, or hundreds of services becomes impossible.

OpenAPI Delivers Governance At Scale Across Teams Delivering consistent APIs within a single team takes discipline. Delivering consistent APIs across many teams takes governance. OpenAPI provides the building blocks to ensure APIs are defined, designed, mocked, deployed, documented, tested, monitored, perform, secured, discovered, and integrated with consistently. The OpenAPI contract is an artifact that governs every stop along the lifecycle, and then at scale becomes the measure for how well each service is delivering at scale across not just tens, but hundreds, or thousands of services, spread across many groups. Without the OpenAPI contract API governance is non-existent, and at best extremely cumbersome. The OpenAPI contract is not just top down governance telling what they should be doing, it is the bottom up contract for service owners / stewards who are delivering the quality services on the ground inform governance, and leading efforts across many teams.

I can’t articulate the importance of the OpenAPI contract to each microservice, as well as the overall organizational and project microservice strategy. I know that many folks will dismiss the role that OpenAPI plays, but look at the list of members who govern the specification. Consider that Amazon, Google, and Azure ALL have baked OpenAPI into their microservice delivery services and tooling. OpenAPI isn’t a WSDL. An OpenAPI contract is how you will articulate what your microservice will do from inception to deprecation. Make it a priority, don’t treat it as just an output from your legacy way of producing code. Roll up your sleeves, and spend time editing it by hand, and loading it into 3rd party services to see the contract for your microservice in different ways, through different lenses. Eventually you will begin to see it is much more than just another JSON artifact laying around in your repository.


Imagine A Mock Only API Development Reality

Imagine if we didn’t actually write code when we developed our APIs initially. What if the bar for putting an API into production was so high, that many APIs never actually made it to that level. I spend a lot of time mocking and prototyping APIs, but I don’t have a strict maturity model determining when I actually bring an API to life using code. I’d love it if I never actually write code for many of my API prototypes and just mocked them, and iterated upon the mocks instead of thinking they needed actually have a database backend, and code or gateway front-end.

What if I only executed these stops along the API lifecycle, instead of actually writing any code:

  • Define - Define what I want my API to do, and what types of resources will be involved.
  • Design - Actually design my API and underlying schema using an OpenAPI definition.
  • Mock - Publish a mock representation of my API, complete with virtualized data behind responses.
  • Document - Make sure and publish documentation to a simple developer portal.
  • Plan - A overview of what it will cost, and the model for managing consumption.
  • Testing - Set up tests to ensure each API is up and doing what it was intended to do.
  • Communicate - Actually write stories, and communicate around what an API does.
  • Support - Establish and maintain feedback loops around the API to listen to consumers.
  • Road Map - Gather feedback into a road map, and plan next version, complete with a change log.

In this life cycle I would never actually write any code. I would never actually setup a database backend. Everything would be mocked and virtualized, but seem as real as I possibly can. Even maybe providing different types of virtualized datasets and responses, so people can experience different types of responses or scenarios. I’d work hard to make sure each API was as complete, and function like a real APIs, as I possibly could.

Now imagine within a company, if I had to do all of this, and before I would ever be allowed to begin developing my API for real, I had to sell my API. Show its value, and get some consumers to buy into my overall API design, and even develop some mock integrations showing the potential. Before my API would be considered for inclusion in the production road map, I’d have to prove that my approach was mature enough in each of the areas outlined above:

  • Define - Robust OpenAPI and JSON schema present.
  • Design - A complete design meeting all governance guidelines.
  • Document - Complete, up to date API document present.
  • Plan - A clear definition of costs, and how consumption will be measured.
  • Testing - A full suite of tests available with 100% coverage.
  • Communicate - Demonstrated ability to communicate around its operations.
  • Support - Demonstrated ability to support the APIs operation.
  • Road Map - Clearly articulated road map, and change log for the API.

As an API developer, once I’ve build the case for my microservice, presented it, and demonstrated its value–then I would be approved to actually begin developing it. Maybe in some cases I would be required to iterate upon it for another version, and engage with a larger audience before actually given the license and budget to do it for real. Going well beyond just an API design first reality, and enforcing a maturity model around how we define and communicate around our services, and raising the bar for what gets let into production.

Many of the companies, organizations, institutions, and government agencies I’m working with are having a hard time adopting a define and design first strategy, let alone a reality such as this. Imagine how different our production worlds would look if you had all this experience before you were ever given the license to write code. Think about how much worthless code gets written, when we should just be iterating on our ideas. How much of our budget gets wasted on writing code that should never have existed to begin with. It’s a nice thought. However, it is just a fantasy.


Serverless, Like Microservices Is About Understanding Our Dependencies And Constraints

I am not buying into all the hype around the recent serverless evolution in compute. However, like most trends I study, I am seeing some interesting aspects of how people are doing it, and some positive outcomes for teams who are doing it sensibly. I am not going all in on serverless when it comes to deploying or integrating with APIs, but I am using it for some projects, when it makes sense. I find AWS Lambda to be a great way to get in between the AWS API Gateway and backend resources, in order to conduct a little transformation. Keeping serverless as just one of many tools in my API deployment and integration toolbox, yet never going overboard with any single solution.

I put serverless into the same section of my toolbox as I do microservices. Serverless is all about decoupling how I deploy my APIs, helping me keep things doing small, meaningful things behind each of my API paths. Serverless forces me to think through how I am decoupling my backend, and pushes me to identify dependencies, acknowledge constraints, and get more creative in how I write code in this environment. To do one thing, and do it well, I need an intimate understanding of where my data and other backend resources are, and how I can distill a unit of compute down to the smallest unit as I possibly can. Identifying, reducing, and being honest about what my dependencies are is essential to serverless working or not working for me.

The challenge I’m having with serverless at the moment is that it can be easy to ignore many of the benefits I get from dependencies, while just assuming all dependencies are bad. Specifically around the frameworks I deploy as part of my API solutions. When coupled with AWS API Gateway this isn’t much of a problem, but all by itself, AWS Lambda doesn’t have all fo the other HTTP, transformation, header, request, and response goodness I enjoy from the API frameworks I’ve historically depend on. Showing me that understanding our dependencies is important, both good and bad. Not all dependencies are bad, as long as I am aware of them, have a knowledge of what they bring to the table, and it is a stable, healthy, and regularly evaluated relationships. This is a challenge I also come across regularly in microservices as well as serverless journeys, that we just throw things out for the sake of size, without thinking deeply about why we are doing something, and whether or not it is worth it.

I can see plenty of scenarios where I will be thinking that a serverless approach is good, but after careful evaluation of dependencies and constraints, I abandon this path as an option. I’m guessing we’ll see a goldilocks style of serverless emerge that helps us find this sweet spot. Similar to what we are finding with the microservices evolution, in that micro isn’t always about small–it is often just about finding the sweet spot between big and small, hot and cold, dependent and independent. Being sensible about how we use the tools in our API toolbox, and realizing most of these approaches are more about the journey, than they are about the actual tool.


Code Generation Of OpenAPI (fka Swagger) Still The Prevailing Approach

Over 50% of the projects I consult on still generate OpenAPI (fka Swagger) from code, rather then the other way around. When I first begin working with any API development group as an advisor, strategist, or governance architect I always ask, “are you using OpenAPI?” Luckily the answer is almost always yes. The challenge is that most of the time they don’t understand the full scope of how to use OpenAPI, and are still opting for the more costly approach–writing code, then generating OpenAPI from annotations. It has been over five years since Jakub Nesetril(@jakubnesetril) of Apiary first decoupled this way of doing API design first, but clearly we still have a significant amount of work when it comes to API definition and design literacy amongst development groups.

When you study where API services and tooling are headed it is clear that API deployment, and the actual writing of code is getting pushed further down in the life cycle. Services like Stoplight.io, and Postman are focusing on enabling a design, mock, document, test, and iterate approach, with API definitions (OpenAPI, Postman, etc) at the core. The actual deployment of API, either using open source frameworks, API gateways, or other method, is coming into the picture more downstream. Progressive API teams are hammering out exactly the API they need without ever writing any code, making sure the API design is dialed in before the more expensive, and often permanent code gets written and sent to production.

You will see me hammering on this line of API design first messaging on API Evangelist over the next year. Many developers still see OpenAPI (fka Swagger) about generating API documentation, not as the central contract that is used across every stop along the API lifecycle. Most do not understand that you can mock instead of deploying, and even provide mock data, errors, and other scenarios, allowing you to prototype applications on top of API designs. It will take a lot of education, and awareness building to get API developers up to speed that this is all possible, and begin the long process of changing behavior on the ground. Teams just are used to this way of thinking, but once they understand what is possible, they’ll realize what they have been missing.

I need to come up with some good analogies for generating API definitions from code. It really is an inefficient, and a very costly way to get the job done. Another problem is that this approach tends to be programming language focused, which always leaves its mark on the API design. I’m going to be working with both Stoplight.io and Postman to help amplify this aspect of delivering APIs, and how their services and tooling helps streamline how we develop our APIs. I’m going to be working with banks, insurance, health care, and other companies to improve how they deliver APIs, shifting things towards a design-first way of doing business. You’ll hear the continued drumbeat around all of this on API Evangelist in coming months, as I try to get the attention of folks down in the trenches, and slowly shift the behavior towards a better way of getting things done.


Importing OpenAPI Definition To Create An API With AWS API Gateway

I’ve been learning more about AWS API Gateway, and wanted to share some of what I’m learning with my readers. The AWS API Gateway is a robust way to deploy and manage an API on the AWS platform. The concept of an API gateway has been around for years, but the AWS approach reflects the commoditization of API deployment and management, making it a worthwhile cloud API service to understand in more depth. With the acquisition or all the original API management providers in recent years, as well as Apigee’s IPO, API management is now a default element of major cloud providers. Since AWS is the leading cloud provider, AWS API Gateway will play a significant role into the deployment and management of a growing number of APIs we see.

Using AWS API Gateway you can deploy a new API, or you can use it to manage an existing API–demonstrating the power of a gateway. What really makes AWS API Gateway reflect where things are going in the space, is the ability to import and define your API using OpenAPI. When you first begin with the new API wizard, you can upload or copy / paste your OpenAPI, defining the surface area of the API, no matter how you are wiring up the backend. OpenAPI is primarily associated with publishing API documentation because of the success of Swagger UI, and secondarily associated with generating SDKs and code samples. However, increasingly the OpenAPI specification is also being used to deploy and define aspects of API management, which is in alignment with the AWS API Gateway approach.

I have server side code that will take an OpenAPI and generate the server side code needed to work with the database, and handle requests and responses using the Slim API framework. I’ll keep doing this for many of my APIs, but for some of them I’m going to be adopting an AWS API Gateway approach to help standardize API deployment and management across the APIs I deliver. I have multiple clients right now who I am deploying, and helping them manage their API operations using AWS, so adopting AWS API Gateway makes sense. One client is already operating using AWS which dictated that I keep things on AWS, but the other client is brand new, looking for a host, which also makes AWS a good option for setting up, and then passing over control of their account and operations to their on-site manager.

Both of my current projects are using OpenAPI as the central definition for all stops along the API lifecycle. One API was new, and the other is about proxying an existing API. Importing the central OpenAPI definition for each project to AWS API Gateway worked well to get both projects going. Next, I am exploring the staging features of AWS gateway, and the ability to overwrite or merge the next iteration of the OpenAPI with an existing API, allowing me to evolve each API forward in coming months, as the central definition gets added to. Historically, I haven’t been a fan of API gateways, but I have to admit that the AWS API Gateway is somewhat changing my tune. Keep the service doing one thing, and doing it well, with OpenAPI as the definition, really fits with my definition of where API management is headed as the API sector matures.


Obfuscating The Evolving Code Behind My API

I’m dialing in a set of machine learning APIs that I use to obfuscate and distort the images I use across my storytelling. The code is getting a little more hardened, but there is still so much work ahead when it comes to making sure it does exactly what I needed it to do, with only dials and controls I need–nothing more. I’m the only consumer of my APIs, which I use them daily, with updates made to the code along the way, evolving the request and response structures until they meet my needs. Eventually the APIs will be done (enough), and I’ll stop messing with them, but that will take a couple months more of pushing forward the code.

While the code for these APIs are far from finished, I find the API helps obfuscate and diffuse the unfinished nature of things. The API maintains a single set of paths, and I might still evolve the number of parameters it accepts, and the fields it outputs, the overall will keep a pretty tight surface area despite the perpetually unfinished backend. I like this aspect of operating APIs, and how they can be used as a facade, allowing you to maintain one narrative on the front-end, even with another occurring behind behind the scenes. I feel like API facades really fit with my style of coding. I’m not really an A grade programmer, more a B- level one, but I know how to get things to work–if I have a polished facade, things look good.

Honestly, I’m pretty embarrassed by my wrappers for TensorFlow. I’m still figuring everything out, and finding new ways of manipulating and applying Tensor Flow models, so my code is rapidly changing, maturing, and not always complete. When it comes to the API interface I am focused on not introducing breaking changes, and maintaining a coherent request and response structure for the image manipulation APIs. Even though the code will at some point be stabilized, with updates becoming less frequent, the code will probably not see the light of day on Github, but the API might eventually be shared beyond just me, providing a simple, usable interface someone else can use.

I wouldn’t develop APIs like this outside a controlled group of consumers, but I find being both API provider and consumer pushes me to walk a line of unfinished and evolving backend, with a stable, simple API, without any major breaking changes. The fewer fixes I have to do to my API clients, the more successful I’ve been pushing forward the backend, while keeping the API in forward motion without the friction. I guess I just feel like the API makes for a nice wrapper for code, acting as shock absorbers for ups and downs of evolving code, and getting it to do what you need. Providing a nice facade that obfuscates the evolving code behind my API.


Everything Is Headless In An Api World

404: Not Found


Provide An Open Source Threat Information Database And API Then Sell Premium Data Subscriptions

I was doing some API security research and stumbled across vFeed, a “Correlated Vulnerability and Threat Intelligence Database Wrapper”, providing a JSON API of vulnerabilities from the vFeed database. The approach is a Python API, and not a web API, but I think provides an interesting blueprint for open source APIs. What I found interesting (somewhat) from the vFeed approach was the fact they provide an open source API, and database, but if you want a production version of the database with all the threat intelligence you have to pay for it.

I would say their technical and business approach needs a significant amount of work, but I think there is a workable version of it in there. First, I would create a Python, PHP, Node.js, Java, Go, Ruby version of the API, making sure it is a web API. Next, remove the production restriction on the database, allowing anyone to deploy a working edition, just minus all the threat data. There is a lot of value in there being an open source set of threat intelligence sharing databases and API. Then after that, get smarter about having a variety different free and paid data subscriptions, not just a single database–leverage the API presence.

You could also get smarter about how the database and API enables companies to share their threat data, plugging it into a larger network, making some of it free, and some of it paid–with revenue share all around. There should be a suite of open source threat information sharing databases and APIs, and a federated network of API implementations. Complete with a wealth of open data for folks to tap into and learn from, but also with some revenue generating opportunities throughout the long tail, helping companies fund aspects of their API security operations. Budget shortfalls are a big contributor to security incidents, and some revenue generating activity would be positive.

So, not a perfect model, but enough food for thought to warrant a half-assed blog post like this. Smells like an opportunity for someone out there. Threat information sharing is just one dimension of my API security research where I’m looking to evolve the narrative around how APIs can contribute to security in general. However, there is also an opportunity for enabling the sharing of API related security information, using APIs. Maybe also generating of revenue along the way, helping feed the development of tooling like this, maybe funding individual implementations and threat information nodes, or possibly even fund more storytelling around the concept of API security as well. ;-)


Looking At The 37 Apache Data Projects

I’m spending time investing in my data, as well as my database API research. I’ll have guides, with accompanying stories coming out over the next couple weeks, but I want to take a moment to publish some of the raw research that I think paints an interesting picture about where things are headed.

When studying what is going on with data and APIs you can’t do any search without stumbling across an Apache project doing something or other with data. I found 37 separate projects at Apache that were data related, and wanted to publish as a single list I could learn from.

  • Airvata** - Apache Airavata is a micro-service architecture based software framework for executing and managing computational jobs and workflows on distributed computing resources including local clusters, supercomputers, national grids, academic and commercial clouds. Airavata is dominantly used to build Web-based science gateways and assist to compose, manage, execute, and monitor large scale applications (wrapped as Web services) and workflows composed of these services.
  • Ambari - Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
  • Apex - Apache Apex is a unified platform for big data stream and batch processing. Use cases include ingestion, ETL, real-time analytics, alerts and real-time actions. Apex is a Hadoop-native YARN implementation and uses HDFS by default. It simplifies development and productization of Hadoop applications by reducing time to market. Key features include Enterprise Grade Operability with Fault Tolerance, State Management, Event Processing Guarantees, No Data Loss, In-memory Performance & Scalability and Native Window Support.
  • Avro - Apache Avro is a data serialization system.
  • Beam - Apache Beam is a unified programming model for both batch and streaming data processing, enabling efficient execution across diverse distributed execution engines and providing extensibility points for connecting to different technologies and user communities.
  • Bigtop - Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux.
  • BookKeeper - BookKeeper is a reliable replicated log service. It can be used to turn any standalone service into a highly available replicated service. BookKeeper is highly available (no single point of failure), and scales horizontally as more storage nodes are added.
  • Calcite - Calcite is a framework for writing data management systems. It converts queries, represented in relational algebra, into an efficient executable form using pluggable query transformation rules. There is an optional SQL parser and JDBC driver. Calcite does not store data or have a preferred execution engine. Data formats, execution algorithms, planning rules, operator types, metadata, and cost model are added at runtime as plugins.
  • CouchDB - Apache CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. Apache CouchDB works well with modern web and mobile apps. You can even serve web apps directly out of Apache CouchDB. And you can distribute your data, or your apps, efficiently using Apache CouchDB’s incremental replication. Apache CouchDB supports master-master setups with automatic conflict detection.
  • Crunch - The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.
  • DataFu - Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful user-defined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevalence of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 50-95% reductions in computational resources.
  • Drill - Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel.
  • Edgent - Apache Edgent is a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the continuous streams of data coming from equipment, vehicles, systems, appliances, devices and sensors of all kinds (for example, Raspberry Pis or smart phones). Working in conjunction with centralized analytic systems, Apache Edgent provides efficient and timely analytics across the whole IoT ecosystem: from the center to the edge.
  • Falcon - Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters.
  • Flink - Flink is an open source system for expressive, declarative, fast, and efficient data analysis. It combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution, and query optimization capabilities found in parallel databases.
  • Flume - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store
  • Giraph - Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
  • Hama - The Apache Hama is an efficient and scalable general-purpose BSP computing engine which can be used to speed up a large variety of compute-intensive analytics applications.
  • Helix - Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.
  • Ignite - Apache Ignite In-Memory Data Fabric is designed to deliver uncompromised performance for a wide set of in-memory computing use cases from high performance computing, to the industry most advanced data grid, in-memory SQL, in-memory file system, streaming, and more.
  • Kafka - A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
  • Knox - The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. In this capacity, the Knox Gateway is able to provide valuable functionality to aid in the control, integration, monitoring and automation of critical administrative and analytical needs of the enterprise.
  • Lens - Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.
  • MetaModel - With MetaModel you get a uniform connector and query API to many very different datastore types, including: Relational (JDBC) databases, CSV files, Excel spreadsheets, XML files, JSON files, Fixed width files, MongoDB, Apache CouchDB, Apache HBase, Apache Cassandra, ElasticSearch, OpenOffice.org databases, Salesforce.com, SugarCRM and even collections of plain old Java objects (POJOs). MetaModel isn’t a data mapping framework. Instead we emphasize abstraction of metadata and ability to add data sources at runtime, making MetaModel great for generic data processing applications, less so for applications modeled around a particular domain.
  • Oozie - Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
  • ORC - ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query.
  • Parquet - Apache Parquet is a general-purpose columnar storage format, built for Hadoop, usable with any choice of data processing framework, data model, or programming language.
  • Phoenix - Apache Phoenix enables OLTP and operational analytics for Apache Hadoop by providing a relational database layer leveraging Apache HBase as its backing store. It includes integration with Apache Spark, Pig, Flume, Map Reduce, and other products in the Hadoop ecosystem. It is accessed as a JDBC driver and enables querying, updating, and managing HBase tables through standard SQL.
  • REEF - Apache REEF (Retainable Evaluator Execution Framework) is a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource reuse for data caching, and state management abstractions that greatly ease the development of elastic data processing workflows on cloud platforms that support a Resource Manager service.
  • Samza - Apache Samza provides a system for processing stream data from publish-subscribe systems such as Apache Kafka. The developer writes a stream processing task, and executes it as a Samza job. Samza then routes messages between stream processing tasks and the publish-subscribe systems that the messages are addressed to.
  • Spark - Apache Spark is a fast and general engine for large-scale data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics.
  • Sqoop - Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
  • Storm - Apache Storm is a distributed real-time computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing real-time computation.
  • Tajo - The main goal of Apache Tajo project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data sets. Basically, Tajo provides SQL standard as a query language. Tajo is designed for both interactive and batch queries on data sets stored on HDFS and other data sources. Without hurting query response times, Tajo provides fault-tolerance and dynamic load balancing which are necessary for long-running queries. Tajo employs a cost-based and progressive query optimization techniques for optimizing running queries in order to avoid the worst query plans.
  • Tez - Apache Tez is an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a reusable set of data-processing primitives which can be used by other projects.
  • VXQuery - Apache VXQuery will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines.
  • Zeppelin - Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects.

There is a serious amount of overlap between these projects. Not all of these projects have web APIs, while some of them are all about delivering a gateway or aggregate API across projects. There is a lot to process here, but I think listing them out provides an easier way to understand the big data explosion of projects over at Apache.

It is tough to understand what each of these do without actually playing with them, but that is something I just don’t have the time to do, so next up I’ll be doing independent searches for these project names, and finding stories from across the space regarding what folks are doing with these data solutions. That should give me enough to go on when putting them into specific buckets, and finding their place in my data, and database API research.


Database To Database Then API, Instead Of Directly To API

I am working with a team to expose a database as an API. With projects like this there can be a lot of anxiety in exposing a database directly as an API. Security is the first one, but in my experience, most of the time security is just cover for anxiety about a messy backend. The group I’m working with has been managing the same database for over a decade, adding on clients, and making the magic happen via a whole bunch of databases and table kung fu. Keeping this monster up and running has been priority number one, and evolving, decentralizing, or decoupling has never quite been a priority.

The database team has learned the hard way, and they have the resources to keep things up and running, but never seem to have them when it comes to refactoring it and thinking differently, let alone tackling the delivery of a web API on top of things. There will need to be a significant amount of education and training around REST, and doing APIs properly before we can move forward, something there really isn’t a lot of time or interest in doing. To help bridge the gap I am suggesting that we do an entirely new API, with it’s own database, and we focus on database to database communication, since that is what the team knows. We can launch an Amazon RDS instance, with an EC2 instance running the API, and the database team can work directly with RDS (MySQL) which they are already familiar with.

We can have a dedicated API team handle the new API and database, and the existing team can handle the syncing from database to database. This also keeps the messy, aggregate, overworked database out of reach of the new API. We get an API. The database team anxiety levels are lowered. It balances things out a little. Sure there will still be some work between databases, but the API can be a fresh start, and it won’t be burdened by the legacy. The database to database connection can carry this load. Maybe once this pilot is done, the database team will feel a little better about doing APIs, and be a little more involved with the next one.

I am going to pitch this approach in coming weeks. I’m not sure if it will be well received, but I’m hoping it will help bridge the new to the old a little bit. I know the database team likes to keep things centralized, which is one reason they have this legacy beast, so there might be some more selling to occur on that front. Doing APIs isn’t always about the technical. It is often about the politics of how things get done on the ground. Many organizations have messy databases, which they worry will make them look bad when any of it is exposed as an API. I get it, we are all self-conscious about the way our backends look. However, sometimes we still need to find ways to move things forward, and find compromise. I hope this database to database, then to API does the trick.


Just Waiting The GraphQL Assault Out

I was reading a story on GraphQL this weekend which I won’t be linking to or citing because that is what they want, and they do not deserve the attention, that was just (yet) another hating on REST post. As I’ve mentioned before, the GraphQL’s primary strength seems to be they have endless waves of bros who love to write blog posts hating on REST, and web APIs. This particular post shows it’s absurdity by stating that HTTP is just a bad idea, wait…uh what? Yeah, you know that thing we use for the entire web, apparently it’s just not a good idea when it comes to exchanging data. Ok, buddy.

When it comes to GraphQL, I’m still watching, learning, and will continue evaluating it as a tool in my API toolbox, but when it comes to the argument of GraphQL vs. Web APIs I will just be waiting out the current assault as I did with all the other haters. The link data haters ran out of steam. The hypermedia haters ran out of steam. The GraphQL haters will also run out steam. All of these technologies are viable tools in our API toolbox, but NONE of them are THE solution. These assaults on “what came before” is just a very tired tactic in the toolbox of startups–you hire young men, give them some cash (which doesn’t last for long), get them all wound up, and let them loose talking trash on the space, selling your warez.

GraphQL has many uses. It is not a replacement for web APIs. It is just one tool in our toolbox. If you are following the advice of any of these web API haters you will wake up in a couple of years with a significant amount of technical debt, and probably also be very busy chasing the next wave of technology be pushed by vendors. My advice is that all API providers learn about the web, gain several years of experience developing web APIs, learn about linked data, hypermedia, GraphQL, and even gRPC if you have some high performance, high volume needs. Don’t spend much time listening to the haters, as they really don’t deserve your attention. Eventually they will go away, find another job, and technological kool-aid to drink.

In my opinion, there is (almost) always a grain of usefulness with each wave of technology that comes along. The trick is cutting through the bullshit, tuning out the haters, and understanding what is real and what is not real when it comes to the vendor noise. You should not be adopting every trend that comes along, but you should be tuning into the conversation and learning. After you do this long enough you will begin to see the patterns and tricks used by folks trying to push their warez. Hating on whatever came before is just one of these tricks. This is why startups hire young, energetic, an usually male voices to lead this charge, as they have no sense of history, and truly believe what they are pushing. Your job as a technologist is to develop the experience necessary to know what is real, and what is not, and keep a cool head as the volume gets turned up on each technological assault.


API Foraging And Wildcraft

I was in Colorado this last week at a CA internal gathering listening to my friend Erik Wilde talking about APIs. One concept he touched on was what he called API gardening, where different types of API providers approached the planting, cultivating, and maintenance of their gardens in a variety of different ways. I really like this concept, and will be working my way through the slide deck from his talk, and see what else I can learn from his work.

As he was talking I was thinking about a project I had just published to Github, which leverages the Google Sheets API, and the Github API to publish data as YAML, so I could publish as a simple set of static APIs. I’d consider this approach to be more about foraging and wildcrafting, then it is about tending to my API garden. My API garden just happens spread across a variety of services, often free and public space, in many different regions. I will pay for a service when it is needed, but I’d rather tend smaller, wilder, APIs, that grow in existing spaces–in the cracks.

I use free services, and build upon the APIs for the services I am already using. I publish calendars to Google Calendar, then pull data and public APIs using the Google APIs. I use the Flickr and Instagram APIs for storing, publishing, sharing, and integrating with my applications using their APIs. I pull data from the Twitter API, Reddit API, and store in Google Sheets. All of these calendar, image, messaging, and link data will ultimately get published to Github as YAML, which then is shared as XML, JSON, Atom, CSV via static APIs. I am always foraging for data using public services, then planting the seeds, and wildcrafting other APIs–in hopes something will eventually grow.

Right now Github is my primary jam, but since I’m just publishing static JSON, XML, YAML, and other media types, it can be hosted anywhere. Since it’s Github it can be integrated anywhere, directly via the static API paths I generate, or by actually cloning the underlying Github repository. I have a Jekyll instance that I can run on AWS EC2 instance or Docker container, and will be experimenting more with Dropbox, Google Drive, and other free, or low cost storage solutions that have a public component. I’m looking to have a whole toolbox of deployment options when it comes to publishing my data and content APIs across the landscape. Most of these are low use, long tail usage data and content I’m using across my storytelling, so I’m not worried about things being heavily manicured, just providing fruit in a sustained way for as low of a price as I possibly can.


API Deployment Comes In Many Shapes And Sizes

Deploying an API is an interesting concept that I’ve noticed folks struggle with a little bit when I bring it up. My research into API deployment was born back in 2011 and 2012 when my readers would ask me which API management provider would help them deploy an API. How you actually deploy an API varies pretty widely from company to company. Some rely on gateways to deploy and API from an existing backend system. Some hand-craft their own API using open source API frameworks like Tyk and deploy alongside your existing web real estate. Others rely on software as a services solutions like Restlet and Dreamfactory to connect to a data or content source and deploy an API in the clouds.

Many folks I talk with simply see this as developing their APIs. I prefer to break out development into define, design, deploy, and then ultimately manage. In my experience, a properly defined and designed API can be deployed into a variety of environments. The resulting OpenAPI or other definition can be used to generate server side code necessary to deploy an API, or maybe used in a gateway solution like AWS API Gateway. For me, API deployment isn’t just about the deployment of code behind a single API, it includes the question about where we are deploying the code, acknowledging that there are many places available when it comes to deploying our API resources.

API deployment can be permanent, ephemeral, or maybe just a sandbox. API deployment can be in a Docker container, which by default deploys APIs for controlling the underlying compute for the API you deploying. Most importantly, API deployment should be a planned, well-honed event. APIs should be able to be redeployed, load-balanced, and taken down without friction. It can be easy to find one way of deploying APIs, maybe using similar practices surrounding web deployments, or dependent on one gateway or cloud service. Ideally, API deployment comes in many shapes and sizes, and is something that can occur anywhere in the clouds, on-premise, or on-device. Pushing our boundaries when it comes to which platforms we use, which programming languages we speak, and what API deployment means.


Open Sourcing Your API Like VersionEye

I’m always on the hunt for healthy patterns that I would like to see API providers, and API service providers consider when crafting their own strategies. It’s what I do as the API Evangelist. Find common patterns. Understand the good ones, and the bad ones. Tell stories about both, helping folks understand the possibilities, and what they should be thinking about as they plan their operations.

One very useful API that notifies you about security vulnerabilities, license violations and out-dated dependencies in your Git repositories, has a nice approach to delivering their API, as well as the other components of their stack. You can either use VersionEye in the cloud, or you can deploy on-premise:

VersionEye also has their entire stack available as Docker images, ready for deployment anywhere you need them. I wanted have a single post that I can reference when talking about possible open source, on-premise, continuous integration approaches to delivering API solutions, that actually have a sensible business model. VersionEye spans the areas that I think API providers should consider investing in, delivering SaaS or on-premise, while also delivering open source solutions, and generating sensible amounts of revenue.

Many APIs I come across do not have an open source version of their API. They may have open source SDKs, and other tooling on Github, but rarely does an API provider offer up an open source copy of their API, as well as Docker images. VersionEye’s approach to operating in the cloud, and on-premise, while leveraging open source and APIs, as well as dovetailing with existing continuous integration flows is worth bookmarking. I am feeling like this is the future of API deployment and consumption, but don’t get nervous, there is still plenty of money to be be made via the cloud services.


Open Sourcing Your API Like VersionEye

I’m always on the hunt for healthy patterns that I would like to see API providers, and API service providers consider when crafting their own strategies. It’s what I do as the API Evangelist. Find common patterns. Understand the good ones, and the bad ones. Tell stories about both, helping folks understand the possibilities, and what they should be thinking about as they plan their operations.

One very useful API that notifies you about security vulnerabilities, license violations and out-dated dependencies in your Git repositories, has a nice approach to delivering their API, as well as the other components of their stack. You can either use VersionEye in the cloud, or you can deploy on-premise:

VersionEye also has their entire stack available as Docker images, ready for deployment anywhere you need them. I wanted have a single post that I can reference when talking about possible open source, on-premise, continous integration approaches to delivering API solutions, that actually have a sensible business model. VersionEye spans the areas that I think API providers should consider investing in, delivering SaaS or on-premise, while also delivering open source solutions, and generating sensible amounts of revenue.

Many APIs I come across do not have an open source version of their API. They may have open source SDKs, and other tooling on Github, but rarely does an API provider offer up an open source copy of their API, as well as Docker images. VersionEye’s approach to operating in the cloud, and on-premise, while leveraging open source and APIs, as well as dovetailing with existing continous integration flows is worth bookmarking. I am feeling like this is the future of API deployment and consumption, but don’t get nervous, there is still plenty of money to be be made via the cloud services.


Professional API Deployment Templates

I wrote about the GSA API prototype the other day. It is an API prototype developed by the GSA, providing an API that is designed in alignment with GSA API design guidelines, complete with an API portal for delivering documentation, and other essential resources any API deployment will need. The GSA provides us with an important approach to delivering open source blueprints that other federal agencies can fork, reverse engineer and deploy as their own custom API implementation.

We need more of this in the private sector. We need a whole buffet of APIs that do a variety of different things, in every language, and platform or stack that we can imagine. Need a contact directory API, or maybe a document storage API, URL shortener API–here is a forkable, downloadable, open source solution you can put to work immediately. We need the WordPress for APIs. Not the CMS interface WordPress is known for, just a simple API that is open source, and can be easily deployed by anyone in any common hosting, and serverless environments. Making the deployment of common API patterns a no-brainer, and something anyone can do, anywhere in the cloud, on-premise, or on-device.

Even though these little bundles of API deployment joy would be open source, there would be a significant amount of opportunity routing folks to other add-on, and premium services on top of any open source API deployment, as well as providing cloud deployment opportunities for developers–similar to the separation between WordPress.com and WordPress.org. I could see a variety of service providers emerge that could cater to the API deployment needs of various industries. Some could focus on more data specific solutions, while others could focus on content, or even more algorithmic, machine learning-focused API deployment solutions.

If linked data, hypermedia, gRPC, and GraphQL practitioners want to see more adoption in their chosen protocol, they should be publishing, evolving, and maintaining robust, reverse engineer-able, forkable, open source examples of the APIs they think companies, organizations, institutions, and government agencies should be deploying. People emulate what they know, and see out there. From what I can tell, many API developers are pretty lazy, and aren’t always looking to craft the perfect API, but if they had a working example, tailored for exactly their use case (or close enough), I bet they would put it to work. I track on a lot of open source API frameworks, I am going to see if I can find more examples like the GSA prototype out there, where providers aren’t just delivering an API, they are delivering a complete package including API design, deployment, management, and a functional portal to act as the front door for API operations.

Delivering professional API deployment templates could prove to be a pretty interesting way to develop new customers who would also be looking for other API services that target other stops along the API lifecycle like API monitoring, testing, and continuous integration and deployment. I’ll keep my eye out for open source API deployment templates that fit the professional definition I’m talking about. I will also give a little nudge to some API service providers, consulting groups, and agencies I know who are targeting the API space, and see if I can’t stimulate the creation of some professional API deployment templates for some interesting use cases.


Containerized Microservices Monitoring Driving API Infrastructure Visualizations

While I track on what is going on with visualizations generated from data, I haven’t seen much when it comes to API driven visualizations, or specifically visualization about API infrastructure, that is new and interesting. This week I came across an interesting example in a post from Netsil about mapping microservices so that you can monitor them. They are a pretty basic visualization of each database, API, and DNS element for your stack, but it does provide solid example of visualizing not just the deployment of database and API resources, but also DNS, and other protocols in your stack.

Netsil microservices visualization is focused on monitoring, but I can see this type of visualization also being applied to design, deployment, management, logging, testing, and any other stop along the API lifecycle. I can see API lifecycle visualization tooling like this becoming more common place, and play more of a role in making API infrastructure more observable. Visualizations are an important of the storytelling around API operations that moves things from just IT and dev team monitoring, making it more observable by all stakeholders.

I’m glad to see service providers moving the needle with helping visualize API infrastructure. I’d like to see more embeddable solutions deployed to Github emerge as part of API life cycle monitoring. I’d like to see what full life cycle solutions are possible when it comes to my partners like deployment visualizations from Tyk and Dreamfactory APIs, and management visualizations with 3Scale APIs, and monitoring and testing visualizations using Runscope. I’ll play around with pulling data from these provides, and publishing to Github as YAML, which I can then easily make available as JSON or CSV for use in some basic visualizations.

If you think about it, thee really should be a wealth of open source dashboard visualizations that could be embedded on any public or private Github repository, for every API service provider out there. API providers should be able to easily map out their API infrastructure, using any of the API service providers they are using already using to operate their APIs. Think of some of the embeddable API status pages we see out there already, and what Netsil is offering for mapping out infrastructure, but something for ever stop along the API life cycle, helping deliver visualizations of API infrastructure no matter which stop you find yourself at.


Internet Connectivity As A Poster Child For How Markets Work Things Out

I have a number of friends who worship markets, and love to tell me that we should be allowing them to just work things out. They truly believe in the magical powers of markets, that they are great equalizers, and work out all the worlds problems each day. ALL the folks who tell me this are dudes, with 90% being white dudes. From their privileged vantage point, markets are what brings balance and truth to everything–may the best man win. Survival of the fittest. May the best product win, and all that that delusion.

From my vantage point markets work things out for business leaders. Markets do not work things out for people. Markets don’t care about people with disabilities. Markets don’t see education and healthcare any differently than it sees financial products and commodities–it just works to find the most profit it possibly can. Markets work so diligent and blindly towards this goal, it will even do this to its own detriment, while believers think this is just how things should be–the markets decided.

I see Internet connectivity as a great example of markets working things out. We’ve seen consolidation of network connections into the hands of a few cable and telco giants. These market forces are looking to work things out and squeeze every bit of profit out of it’s networks that it can, completely ignoring the opportunities that are available when the networks operate at scale, and freely operate to protect everyone’s benefits. Instead of paying attention to the bigger picture, these Internet gatekeepers are all about squeezing every nickel they can for every bit of bandwidth that is currently being transmitted over the network.

The markets that are working the Internet out do not care if the bits on the network are from a school, a hospital, or you playing an online game and watching videos–it just wants to meter and throttle them. It may care just enough to understand where it can possible charge more because it is a matter of life or death, or it is your child’s education, so you are willing to pay more, but as far as actually equipping our world with quality Internet–it could care less. Cable providers and telco operators are in the profit making business, using the network that drives the Internet, even at the cost of the future–this is how short sighted markets are.

AT&T, Verizon, and Comcast do not care about the United States remaining competitive in a global environment. They care about profits. AT&T, Verizon, and Comcast do not care about folks in rural areas possessing quality broadband to remain competitive with metropolitan areas. They care about profits. In these games, markets may work things out between big companies, deciding who wins and loses, but markets do not work things out for people who live in rural areas, or depend on Internet for education and healthcare. Markets do not work things out for people, they work things out for businesses, and the handful of people who operate these businesses.

So, when you tell me that I should trust that markets will work things out, you are showing me that you do not care about people. Except for those handful of business owners who are hoping you will some day be in the club with. Markets rarely ever work things out for average people, let alone people of color, with disabilities, and beyond. When you tell me about the magic of markets, you are demonstrating to me that you don’t see these layers of society. Which demonstrates your privilege, your lack of empathy for the humans around you, while also demonstrating how truly sad your life must be, because it is lacking in meaningful interactions with a diverse slice of the life we are living on this amazing planet.


Challenges When Aggregating Data Published Across Many Years

My partner in crime is working on a large data aggregation project regarding ed-tech funding. She is publishing data to Google Sheets, and I’m helping her develop Jekyll templates she can fork and expand using Github when it comes to publishing and telling stories around this data across her network of sites. Like API Evangelist, Hack Education runs as a network of Github repositories, with a common template across them–we call the overlap between API Evangelist, Contrafabulists.

One of the smaller projects she is working on as part of her ed-tech funding research involves pulling the grants made by the Gates Foundation since the 1990s. Similar to my story a couple weeks ago about my friend David Kernohan, where he was wanting to pull data from multiple sources, and aggregate into a single, workable project. Audrey is looking to pull data from a single source, but because the data spans almost 20 years–it ends up being a lot like aggregating data from across multiple sources.

A couple of the challenges she is facing trying to gather the data, and aggregate as a common dataset are:

  • PDF - The enemy of any open data advocate is the PDF, and a portion of her research data data is only available in PDF format which translates into a good deal of manual work.
  • Search - Other portions of the data is available via the web, but obfuscated behind search forms requiring many different searches to occur, with paginated results to navigate.
  • Scraping - The lack of APIs, CSV, XML, and other machine readable results raises the bar when it comes to aggregating and normalizing data across many years, making scraping a consideration, but because of PDFs, and obfuscated HTML pages behind a search, even scraping will have a significant costs.
  • Format - Even once you’ve aggregated data from across the many sources, there is a challenge with it being in different formats. Some years are broken down by topic, while others are geographically based. All of this requires a significant amount of overhead to normalize and bring into focus.
  • Manual - Ultimately Audrey has a lot of work ahead of her, manually pulling PDFs and performing searches, then copying and pasting data locally. Then she’ll have to roll up her sleeves to normalize all the data she has aggregated into a single, coherent vision of where the foundation has put its money.

Data research takes time, and is tedious, mind numbing work. I encounter many projects like hers where I have to make a decision between scraping or manually aggregating and normalizing data–each project will have it’s own pros and cons. I wish I could help, but it sounds like it will end up being a significant amount of manual labor to establish a coherent set of data in Google Sheets. Once, she is done though, she has all the tools in place to publish as YAML to Github, and get to work telling stories around the data across her work using Jekyll and Liquid. I’m also helping her make sure she has a JSON representation of each of her data projects, allowing others to build on top of her hard work.

I wish all companies, organizations, institutions, and agencies would think about how they publish their data publicly. It’s easy to think that data stewards will have ill intentions when it comes to publishing data in a variety of formats like they do, but more likely it is just a change of stewardship when it comes to managing and publishing the data. Different folks will have different visions of what sharing data on the web needs to look like, and have different tools available to them, and without a clear strategy you’ll end up with a mosaic of published data over the years. Which is why I’m telling her story. I am hoping to possibly influence one or two data stewards, or would-be data stewards when it comes to the importance of pausing for a moment and thinking through your strategy for standardizing how you store and publish your data online.


Each Airpair Datastore Comes With Complete API and Developer Portal

I see a lot of tools come across my desk each week, and I have to be honest I don’t alway fully get what they are and what they do. There are many reasons why I overlook interesting applications, but the most common reason is because I’m too busy and do not have the time to fully play with a solution. One application I’ve been keeping an eye on as part of my work is Airtable, which I have to be honest, I didn’t get what they were doing, or really I just didn’t notice because I was too busy.

Airtable is part spreadsheet, part database, that operates as a simple, easy to use web application, which with a push of a button, you can publish an API from. You don’t just get an API by default with each Airtable, you get a pretty robust developer portal for your API complete with good looking API documentation. Allowing you to go from an Airtable (spreadsheet / database) to API and documentation–no coding necessary. Trust me. Try it out, anyone can create an Airtable and publish an API that any developer can visit and quickly understand what is going on.

As a developer, API deployment still feels like it can be a lot of work. Then, once I take off my programmers hat, and put on my business user hat, I see that there are some very easy to use solutions like Airtable available to me. Knowing how to code is almost slowing me down when it comes API deployment. Sure, the APIs that Airtable publishes aren’t the perfectly designed, artisanally crafted API I make with my bare hands, but they work just as well as mine. Most importantly, they get business done. No coding necessary. Something that anyone can do without the burden of programming.

Airtable provides me another solution that I can recommend that my readers and clients should consider using when managing their data, which will also allow them to easily deploy an API for developers to build applications against.I also notice that Airtable has a whole API integration part of their platform, which allows you to integrate your Airtables into other APIs–something I will have to write about separately in a future post. I just wanted to make sure and take the time to properly add Airtable to my research, and write a story about them so that they are in my brain, available for recall when people are asking me for easy to use solutions that will help them deploy an API.


When You Publish A Google Sheet To The Web It Also Becomes An API

When you take any Google Sheet and choose to publish it to the web, you immediately get an API. Well, you get the HTML representation of the spreadsheet (shared with the web), and if you know the right way to ask, you also can get the JSON representation of the spreadsheet–which gives you an interface you can program against in any application.

Articles I curate, the companies, institutions, organizations, government agencies, and everything else I track on lives in Google Sheets that are published to the web in this way. When you are viewing any Google Sheet in your browser you are viewing it using a URL like:

https://docs.google.com/spreadsheets/d/[sheet_id]/edit

Of course, [sheet_id] is replaced with the actual id for your sheet, but the URL demonstrates what you will see. Once you publish your Google sheet to the web you are given a slight variation on that url:

https://docs.google.com/spreadsheets/d/[sheet_id]/pubhtml

This is the URL you will share with the public, allowing them to view the data you have in your spreadsheet in their browsers. In order to get at a JSON representation of the data you just need to learn the right way to craft the URL using the same sheet id:

https://spreadsheets.google.com/feeds/list/[sheet_id]/default/public/values?alt=json

Ok, one thing I have to come clean on is that the JSON available for each Google sheet is not the most intuitive JSON you will come across, but once you learn what is going on you can easily consume the data within a spreadsheet using any programming languages. Personally, I use a JavaScript library called tabletop.js that quickly helps you make sense of a spreadsheet and get to work using the data in any (JavaScript) application.

The fastest, lowest cost way to deploy an API is to put some data in a Google Sheet, and hit publish to the web. Ok, its not a full blown API, it’s just JSON available at a public URL, but it does provide an interface you can program against when developing an application. I take all the data I have in spreadsheets and publish to Github as YAML, and then make static APIs available using that YAML in XML, CSV, JSON, Atom, or any other format that I need. Taking the load of Google, creating a cached version at any point in time that runs on Github, in a versioned repository that anyone can fork, or integrate into any workflow.


Being First With Any Technology Trend Is Hard

I first wrote about Iron.io back in 2012. The are an API-first company, and they were the first serverless platform. I’ve known the team since they first reached out back in 2011, and I consider them one of my poster children for why there is more to all of this than just the technology. Iron.io gets the technology side of API deployment, and they saw the need for enabling developers to go serverless, running small scalable scripts in the cloud, and offloading the backend worries to someone who knows what they are doing.

Iron.io is what I’d consider to be a pretty balanced startup, slowly growing, and taking sensible amounts of funding they needed to grow their business. The primary area I would say that Iron.io has fallen short is when it comes to storytelling about what they are up to, and generally playing the role of a shiny startup everyone should pay attention to. They are great storytellers, but unfortunately the frequency and amplification of their stories has fallen short, allowing other strong players to fill the void–opening the door for Amazon to take the lion share of the conversation when it comes to serverless. Demonstrating that you can rock the technology side of things, but if you don’t also rock the storytelling and more theatrical side of things, there is a good chance you can come in second.

Storytelling is key to all of this. I always love the folks who push back on me saying that nobody cares about these stories, the markets only care about successful strong companies–when it reality, IT IS ALL ABOUT STORYTELLING! Amazon’s platform machine is good at storytelling. Not just their serverless group, but the entire platform. They blog, tweet, publish press releases, whisper in reporter ears, buy entire newspapers, publish science fiction patents, conduct road shows, and flagship conferences. Each AWS platform team can tap into this, participate, and benefit from the momentum, helping them dominate the conversation around their particular technical niche.

Being first with any technology trend will always be hard, but it will be even harder if you do not consistently tell stories about what you are doing, and what those who are using your platform are doing with it. Iron.io has been rocking it for five years now, and are continuing to define what serverless is all about, they just need to turn up the volume a little bit, and keep doing what they are doing. I’ll own a portion of this story, as I probably didn’t do my share to tell more stories about what they are up to, which would have helped amplify their work over the years–something I’m working to correct with a little storytelling here on API Evangelist.


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.