rss-bridge 2026-02-27T09:19:00+00:00

Presentation: Platforms for Secure API Connectivity With Architecture as Code

Jim Gough discusses the transition from accidental architect to API program leader, explaining how to manage the complexity of secure API connectivity. He shares the Common Architecture Language Model (CALM), a framework designed to bridge the developer-security gap. By leveraging architecture patterns, engineering leaders can move from six-month review cycles to two-hour automated deployments. By Jim Gough

---

InfoQ Homepage

Presentations

Platforms for Secure API Connectivity With Architecture as Code

Architecture & Design

Platforms for Secure API Connectivity With Architecture as Code

- Reading list

49:10

Summary

Bio

James Gough is a Distinguished Engineer and API Platform Lead Architect at Morgan Stanley, where he works on API strategy, security, and developer experience. A Java Champion, author, and conference speaker, Jim has contributed to the Java Community Process, co-authored Mastering API Architecture and Optimizing Cloud Native Java (O’Reilly), and leads open-source initiatives like FINOS.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Jim Gough: Let's go into chatting about what we're here to do, which is talk about APIs. Specifically, I want to walk through some experiences that I've had working at Morgan Stanley while thinking about things like evolving API connectivity over time to thinking about secure design approaches from the outset and how we work with developers on that. Looking at what I call architecture patterns, and this is a new idea, so I'll walk you through that as well. We're going to do some live demos. Then I'll talk about a potential workflow before we wrap up.

Who Am I?

I'm Jim. I work at Morgan Stanley. I'm a Java champion, so I've done lots of stuff with open source. If you've seen JSR 310, the Date, Time API in Java was heavily involved in that. I've written a couple books. I'm also the developer and architect of Morgan Stanley's API program. Why do I say developer and architect? It's deliberately ambiguous. The reason for this is because when I started, I was the only developer of Morgan Stanley's API program, and I've become a bit of an accidental architect. As complexity has increased, as we have offered more services across the organization, I've become more responsible for more things, and that's what I want to talk about now.

The Complexity of API Connectivity

The complexity of API connectivity. I think building APIs is so simple. Caveat, it's not. Actually, working with tools with no security, you've got a consumer and an API service, you can pretty much get that up and running on your laptop in two or three minutes with some modern frameworks. Then, authentication and authorization comes in. You need a way to model this. If your users are outside an organization, or they need to cross trust boundaries, then it becomes a really big deal, and you have to start thinking about how you're going to make that consistent, regardless of what the API services look like in the background.

The first thing you're probably going to do is say, API management, there's a tool that does that for us. This introduces some complexity. As a user, if you're an operator, you have to go in, potentially set up some policies. You've got to be very careful what policies you do set up there. API management can do lots of things, which is great, until you then maybe want to move it somewhere else, and then it's maybe not so great. This is pretty much the beginning of any journey. You can be profitable by just getting this model correct. It's quite simple.

Typically, as you want to scale and grow, you're going to add in some element of automated deployments. Previously going into that user interface and configuring things that seemed easy at the time now means that you have to learn a whole series of APIs, APIs that will probably not be available to you in your dev environment. You might use stuff like WireMock, to mock out these dependencies, that makes things a little bit more complex. We have to manage the deployment of the API service in the background, and typically you need to make sure these things are coordinated and released at the same time to have zero downtime and also zero impact in terms of the specification.

The other thing that will probably happen along the way is that typically you don't want your API management doing things like service discovery and combination of services. You're probably going to end up deploying some element of gateway, whether it's security-related or possibly just even microservices-based architectures to give you routing to different endpoints downstream. We're going up on that complexity scale quite quickly. There's then these other things that get in the way, non-functional requirements. These are super important. Thinking about things like security from the outset. I'm going to talk a little bit more about security as we go through the presentation. What's your performance like?

Often, in terms of performance, we think about what is acceptable, think about service-level agreements. What I found with working with this is usually my users internally are misaligned with what performance characteristics of REST APIs will look like. Even just being clear in terms of communicating that is important. We also need to look at things like the availability. If we were to lose given infrastructure, can we still operate in a way that still serves our business? Observability is absolutely critical. Actually, I was looking at some observability traces to try figure out what was going wrong with something in production. It was very quick to narrow down. Then you want to have things like audit logs. Audit logs to make sure that you're compliant with various things, that you can see what has happened in your system.

General compliance, that depends on where you operate and your operating environment as to what you'd want to capture here. I'll give some hints around that as well. The cost is important. It's very easy to spin things up and maybe forget that they're running and then suddenly you've got a massive bill at the end of the year. Ultimately, you want to be able to grow and maintain this as your journey progresses.

These typically non-functional requirements actually become critical operating requirements. For me and in the work that I do, if I don't have all of these things, I can't actually run a successful API program because it's going to be constantly doing maintenance hygiene, other things that are required to then get to the point of what looks like good. My complexity goes up again. There's a lot more things for us to think about. The question I would ask is this idea of we'll just left shift it. The thing that happens is there is a gap between developers and infrastructure. If you shift too far left and say, developers, here are these things that you can do to roll out your API, and we can have things that are abstracted for you that make things easy.

Then they go, "I don't really know what I'm doing, or this is too much YAML, too much infrastructure as code", you've shifted too far left without support. As a developer, I quite like to get all the shiny tools, but then sometimes operating them is a bit more complex than it could be. Then we also move on to this term, undifferentiated heavy lifting. If you spend a lot of your time working with infrastructure and plumbing, it distracts from what you're trying to do, which is, in our cases, delivering products or impacting the user in a positive way. This developer and infrastructure gap is something else that we need to navigate as we go through building our systems.

The other one that's really quite prevalent is the developer security gap. This, to me, is where I think there is a knowledge and language gap often between security professionals and developers. Security concepts are often unfamiliar. When we do a workshop or a demo that's off the internet, we're probably not going to be thinking about the cross-site scripting attack, which may hit us later down the line. There is an expectation on developers to build secure systems. Stefania did a talk that was really good. One of the things she was saying was, we don't know how to use the tools, but we also don't have enough training, but there is training. There was a mismatch in terms of what people actually get trained on. Ultimately, this ends up in a situation where security becomes a gate. It's not a guide. If you just bring security in right at the end, just before your release, their job in that situation is to say, no, which is also not the words that you want to hear when you're trying to release the latest and greatest product.

Secure Design Approaches

Things are really getting quite complicated at this point. I've worked at Morgan Stanley just coming up to eight years. We have a functional requirement that's come in where all of the APIs that we run today, we run billions of requests through our platform. We need to change the architecture. What that means is that when we change the architecture, there's going to be reviews and processes that are going to come around that we have to be compliant with. It's part of changing architecture in a regulated industry. What I started to do is think about, how can I get secure design approaches into the way in which we think about modeling systems?

The idea from this perspective is to hopefully try and make things a little bit easier for the developers that use my platform. It's to try and answer some of those security questions up front. I've got an example here that I'm going to be using in my demo. Some bits that we're going to be working through in this session is C4. I'm a really big fan of Simon Brown and C4 architecture. You can see here that we've got a conference website. It's loading. It's going into a load balancer. There's an attendees service in the background, that's got all your details in.

Then you've got the attendees store in the background. Somebody's decided to come along and create an advertising service. That's really cool. It's going to look at things in the attendees service and make decisions about what it's going to send you after the conference. Maybe try get you interested in one of the other conferences that's coming up. I needed a way — and I'm talking a bit in abstract terms now because we weren't building this, this is just my fictitious system — to think about discussing this with security professionals. What I decided to have a go at, and I must stress this was have a go at, was start to do a user tool called threat modeling.

If you're involved in the security space, you will have come across this, but I want to explain it in a way how I think about it. I think as developers, and possibly even as architects, we think about the happy path, the things that are going to work and how we expect people to be using our systems. What we're doing here is we are accessing the attendee details for the purpose of advertising, and we're going to filter based on whether the user has accepted terms and conditions. All sounds good so far. That's not how an attacker thinks. Attackers think quite differently, and actually they're going to be thinking about, how can I get hold of the personally identifiable information that is in your system, and how much money could I make if this was leaked somewhere else? You probably think like this in your own day-to-day life as just what is called operational security.

An example would be, I quite typically walk with my wallet in my back pocket, but if I'm walking through Covent Garden, I'll often shift it into my left pocket or into my backpack because it's less likely to get lifted from there. I will warn you that working on these things does make you a little bit paranoid, so just be a bit careful. That's a good thing. That's the type of challenge that we want to have.

There is a model called STRIDE. I think STRIDE is really good at helping you to categorize and ask questions about your architecture. It's an acronym, so spoofing is the first one. I'm going to attempt to emulate the advertising service, and if I could do that, I can access data and abuse trust. What's the possible mitigation there? I could look to use mTLS, look to do service identity validation maybe using SPIFFE and cert pinning, so there are things there I could do to mitigate. If I didn't have something mitigated, that would be something called a residual risk, and in that residual risk, what I would need to do is apply that into a different model to talk about the likeliness of that happening. This is really interesting, this came up at the security table where we were talking about you will have some things, but how do you categorize these in terms of, are they going to be really bad? Tampering is another one. Imagine if I could go in and manipulate the request and the response.

Perhaps I could serve up one of my own adverts, like if I could get in there and tamper with the responses. That might mean that I could redirect you. You think you're going to QCon, San Francisco, but really, we're just contributing to Sam Newman's PayPal account or something similar. Repudiation, so that did happen, how can we prove that it happened? Keeping our logs secure and being able to say, could somebody go in and erase this information, is really quite critical, because without that, you really don't know what's happening in your system. Some of the worst attacks that have happened that are documented in our wider community is where people have been able to sit undetected for long periods of time figuring out exactly what they want to do. People couldn't see that in the logs because they were being erased over time. This also typically can happen with what's called an insider threat, where it's somebody inside your organization that maybe has a little bit more trust than they should have.

Information disclosure is the thing that we're going to play around with. It's the idea of exfiltrating data, possibly engaging in social engineering. You use TLS and strong encryption. Encrypt data at rest as well, if that's necessary. In our architecture, you might have noticed the attendees service was internal and the outbound conference website was coming in from the left-hand side. If we can overload the attendees service, we can maybe bring down somebody's website, so that denial of service comes in as a problem maybe from different angles. Often, we talk about accidental denial of service or lots of requests, or even just somebody hitting us from the front.

Actually, if you can hit something internally and make it fail closed, you can disrupt an entire organizational website. The final one is elevation of privilege. Are there roles or configuration that give me access to other systems? Can I go in and maybe have a look at things? This is the one example. We're going to just pick on these couple of things. Lateral movement to other services and the use of something called micro-segmentation. If folks are interested, I'm big into the micro-segmentation, zero-trust approach to architectures at the moment. It's changing the way we think about design. Things are really complicated. How can we approach this? There are ways in which we can make this a little bit easier and a bit more digestible.

This is the first opportunity for something to go wrong. Let's have a look at an insecure demo. As a developer, I've read all of the guidance. I've looked at the documentation. I'm going to try and implement this in a way that works on Kubernetes. I started something up before we went through the talk. I'm going to have a look at what's deployed on my Kubernetes cluster inside the conference namespace. I've got an attendees service up and running. We've got an attendees store up and running in the background as well. Those have just restarted just a couple minutes ago. We've also got the load balancer of attendees service. You can see that I've got it routing to localhost, that's so we can test the website on the other side.

Let's have a little look at what this looks like. My UI skills are terrible. I haven't built a conference website, so we'll just use the Swagger editor. Let's see where my skills lie. I'm going to go in and we're going to go and try out the GET request. We'll do the GET request. We get an empty array back from that response. What we're going to do is we're just going to use this POST request, try it out, and we're going to put in jim, g. There's not very good validation on this. That's another security problem, I think. We'll just skip over that. I've got now this POST request that succeeded. I've got a 201 back in the Swagger editor here, and I could do a GET request, and ultimately this appears here. Everything is looking good from a functional perspective. I've got my application deployed. All is really cool.

The next thing I'm going to do is actually something that is basically just running a container. It's like an elevation of privilege. I managed to run a container inside that Kubernetes cluster. What I'm going to try and do is I'm going to try and curl the attendees-service like this. It's secure. I can't reach the attendees service. All is good, or is it? If I was to then go instead of using the attendees service, which actually I've seen this assumption made. It's actually not visible because this is in the default namespace. I could go and grab hold of this IP address directly, and then all of a sudden, what looked like a fairly secure system, I've got this lateral movement effect.

As a developer, I think you could be forgiven, especially with the complexity of things like Kubernetes, for not quite realizing that that was what was going to happen behind the scenes when you built this application that looks secure. You've tested it on the happy path, but then somebody's come along, tried to do something a bit different with it, and they're exposing a vulnerability here. There are probably other vulnerabilities. It's very broken. I think it's very broken. It's easy for me to create something custom in this situation, but at the same time, I've not created it in the right way. As a developer, it's been really easy for me to do the wrong thing. What I want to try and do is make it really easy for developers to do the right thing. There's lots of different repeated infrastructure steps that I've been going through trying to get this up and running. What we're going to do is we're going to quit the demo there. We're going to go back to the slides and start talking about a potential framework for resolving these issues.

Architecture Patterns in Action

When I was told a couple years ago now actually that we were going to have to change the way that we structured our architecture internally, my big concern was having to go through 80, 90, 100 security reviews, explain to all the teams that work with me what that meant, on their API services, and how we were going to meet the business deadline of going live in production. Quite scary. Probably for my boss, what was quite scary was I went and said, "I'm trying to solve this problem. I don't really see something exactly that solves this problem, but I see this open-source project that looks cool. Can I spend 70% of my time working on this open-source project to try and solve this problem? Obviously, there's no risk in that".

The surprising answer was, "How are you going to fit the other 70% of your job in?" It's all about bringing something that will work, at least from my perspective, but also that's not something that's absolutely critical and core to the way that we operate. This is where I got involved with a couple folks across different financial institutions in the FINOS organization, the Financial Open Source Organization, to start working on something called CALM. CALM is the Common Architecture Language Model. I'm going to use this as an example, but I'm also going to talk about the different approaches that we've taken within this project so you get an idea about what it's for and why we've done different things. It has a core model. That core model is in JSON Schema.

If you're familiar with OpenAPI, familiar with most tools actually these days, they use a form of JSON Schema. The JSON Schema folks have been absolutely brilliant in their conversations, their observations of what we're doing with CALM, and indeed just helping us out when we've got stuck with a few things. The other thing is I want this to be relatively easy to use, so I want it to have some element of a command line interface or CLI. Then I wanted patterns, where patterns are going to be what represent something that I can show to my developers that they can work with that's got a clear line of approach in terms of what they're doing.

Daniel said that it was really important because everything is boxes and arrows. That's good, because these are all how we represent boxes and arrows in these next few slides. We call them nodes. It was one of those things where naming is the hardest thing. I wasn't involved in that conversation because I think that could have got quite philosophical. Actually, if we were to look at our C4 diagram, the conference website, load balancer, attendees service, and the store are all going to be represented by nodes in my CALM architecture model. The Kubernetes cluster is also a node, and that's probably a special case. We'll talk a little bit more about that. The base schema is actually quite flexible. We had a great talk from John about trying to standardize real details within a specification in a messaging system that makes total sense. When we've been approaching CALM, what we've tried to do is make points of extension where companies, open-source contributors, can add their complexity within their domain.

An example would be, and this is the minimum that you need to have, you need a unique ID. Unique IDs are used later on. I'll talk about where they get involved. What type is the node, what is the name, and then what's the description? Just capturing some very basic information about the box. Ultimately, the same kind of information you'd have in your C4 diagram at the top level, and not very much more. Relationships are all about the arrows and the dotted box around the components. You can see here that I've got a unique ID again, that's consistent on the stuff we do.

The website load balancer connection that connects from the conference website to the load balancer. What happens here is, and the reason why node naming is so important, is because we've got some tooling that I'll show you in the demo that goes and looks at the different nodes that you've named in your relationships and makes sure that they exist. There's an element of validation that happens not just in terms of your JSON schema compliance, but also in terms of, is this a valid semantic model? We actually use Spectral for those people that have come across Spectral in OpenAPI for analyzing JSON documents. The other type is the contains in relationship. In here, we can see that that node, the Kubernetes cluster, it contains the load balancer, the application, and the Postgres database. There's this contains within, and that's also then modeled through this setup of linking back to the unique ID.

Nodes and arrows, they're great, but they don't really give me the detail that I want to know about how something interacts with a given system. Actually, there could be different options that you have. For example, you could have your attendees service, and we have the model of it being connected to from a website, but also from the advertising gRPC service. The gRPC interface, and this is a hypothetical, like one of the things we're trying to do at the moment is figure out what would go on these interfaces to add to that detail and clarity that you were after. In this case, in the gRPC interface, you've got a service name. You've got an optional reference to your proto definition. If people were pulling documentation later, they could see what types they could use there. Then you've got the optional methods as well.

On the other hand, the path interface would look like something, what's the path you're connecting to? Maybe optionally it has an OpenAPI specification. The stuff that we're working on here is really quite brand new. The things I'm thinking about is, how does overlays fit into this? How do other things that are coming out in the OpenAPI community join in with what we're doing? Because, again, we're not trying to reinvent the wheel here, we're trying to create essentially some glue that sits between these concepts that we have in industry. There's a couple slides where I've seen buy versus build. My usual answer to that is it's probably a bit of both. You do need something to glue and fit those things together, which is one place where CALM could come in. That's the general flow of things. The interfaces can be selected when you create your architecture. I'll talk about some of the ways in which you can do that.

Then there's also this element of controls. Controls is the big thing for me because this is where we start to model some of those non-functional requirements. We do this in a way where we are providing support for users and for people in industry to build out the specific domains. I'm not an expert on observability. I do know a bit about performance, but not in all situations. The one I'm going to focus in on is security. The reason is, is because what we look at here is, security is going to have some core schemas.

An example would be, let's say, any requirement I specify, I want to build that off the back of a NIST document. I can say all the controls I'm going to create are going to have a document number, for example, NIST SP-800-207. The title, and then maybe capturing the status as well. Those things are all set up in the way that we are now going to create any control that sits off the back of security. You can imagine, depending on your domain, you might have something like, if it's performance, what are you capturing? Maybe you've got a core schema for measuring latency and throughput, so you can then add that in to build up the way in which you craft these controls and domains.

The only thing that we require on the core schema is this idea of a control requirement. The control requirement is basically stating, tell us what your non-functional requirement should be doing and why? Then the configuration represents any options in the requirement. For example, you might say, I'm going to do micro-segmentation, but you can specify whether you allow things to ingress, and do you allow things to egress. The optionality then becomes targeted to your specific architecture that comes from the back of the pattern. This is where we're going to use our example in the walkthrough, where we'll have a micro-segmentation requirement, and then we'll have a configuration around how that works as well. That's cool within how this models. Once we lock everything down, we will also have to open a few things up as well. There will be a couple of other things too.

Before I get into the demo, I do want to state, where do I think CALM fits? I still think C4 is really important. C4 to me is the beginning of my journey. If you think about the diagram I showed earlier, it's how I can explain architecture and talk through, in this situation, this is what this does. It captures that more communication element of architecture for me. I think that container and component diagrams, possibly in the context diagram, if you're speaking to people non-familiar with your use case, are really key for communication. I think what CALM does, and architecture as code, is it really puts an option between the level 3 and 4 diagram, which is the diagram that's got your components in and the diagram that's got your actual code and UML in. That's really quite key, because what usually sits within there is how do you deploy this using infrastructure. You'll see where this fits in for infrastructure-based deployments. That said, it's not a replacement for infrastructure as code. We're not trying to build an infrastructure as code solution here, but you will see how it can integrate with that in the demo.

On its own, CALM doesn't provide a complete developer experience. When you build it within your workflows, as you start to think where things sit, then it gives you an option. Having that option where you can communicate what your infrastructure and configuration should look like to a developer in a context that they understand is quite a powerful tool to have, in my opinion. It really shines in regulated industries or where you've got a high reuse organization. If you've got a platform team, or you work in a platform team, this is something that I would potentially consider. What I'm trying to do is ultimately embrace the complexity where I work. I could say it's really difficult to work in a complex environment with regulation, it really is. By trying to solve those problems, really look to elevate the developer experience. Try and solve them once effectively and make it easier for folks to adopt the system.

I want to provide really clear paved paths. What that means for us is that getting to production, going through various different reviews, can take between three and six months. We measured the time. We looked at this within our organization when we first kicked things off. We have this really ambitious goal that we wanted to get this down to two hours from six months. By providing some of the stuff that we've been doing here, we've been able to nearly achieve that. There's a few things this iterative improvement would provide. There are still things that we need to do to close that gap. We also want to have fast feedback loops. I don't want people necessarily sitting around waiting for approvals. There's nothing worse than waiting on a human to hear whether you can do something or not. That's quite demotivating as a developer. Get something up and running with security built in, and ultimately having self-service deployments.

Stuff that you've got now means that you can go out and release stuff into production in a secure, maybe guardrails is a good thing to say, but it's not going to stop you in terms of the business value that you add. There are going to be situations where you're going to need to upgrade. You're going to need to upgrade your API, maybe your service. You're going to need to upgrade some of the components that's used for routing. That's another thing that we want to provide here, this ability to do that in an easy manner. Then we want to have integrated approval processes.

For example, most secure SDLC environments have different gates that you need to walk through. If you can automate that into a common language that's based on industry standard, you're in a good place. This is something that we haven't quite worked out yet, but it is something that's on the roadmap for this year, is how we do drift detection. Some of the information we have in the architecture and the templates that we deploy, how do we know whether that's still what exists in production? How do we know that something that maybe somebody inadvertently has changed hasn't gone and then severely knocked out what we're doing?

Then, finally, and I think this is probably the most important thing, is you want to have clear documentation. I've worked on a lot of internal tools and in a lot of organizations where the documentation is really not very good. It's really hard to do anything. I'm sure other people can relate to that. When we started out the program here, one of the things that we really wanted to do was have documentation with a really clear set of documents in terms of categorization. We used something called the Diátaxis framework. We have things like concept guides, reference guides, how-to guides. It's very much driven on, what are you trying to do as a user, as a developer?

Demo

Let's actually go into the demo. The reason I had to show you all that is just to give you a bit more context so you understand what the different files are as we do a step through. I'll explain those as we go. We're also going to drop the advertising service from this part of the demo just to make things a little bit easier, mostly for me. Let's go and have a look. In the background, I was restarting into a new secure container so I can show you how things are working. Here you'll see that we've got the CALM CLI. The CALM CLI has got a couple commands, and I'm not going to list them all out because we're going to actually use them and use them in order as a developer would when they come and look at a given pattern. Let's look at the pattern file. Just to clarify, the responsibility of the pattern file in our organization belongs to me. I'm the architect and owner of the API program.

At the same time, I also don't own everything. Ownership is a really awkward word to use there because actually what I want to do is bring my security experts in, bring in people who are going to be responsible for different pieces and help them really understand and collectively build what this pattern file would look like. Here's an example of the conference-secure-signup.pattern.json. It's got our first node in, which has got the unique ID, attendees, and it's got Attendees Service as the name. What I want you to just think about, I'm going to show you this with the Attendees Service. I've deliberately put this here. This could be any application that runs inside Docker that follows the interface requirements, and you could deploy that in a secure manner using this pattern. Just bear that in mind as we go through. There's quite a lot of flexibility and also the ability to lock things in depending on how you craft your pattern.

Inside here, we've got our interface list. On line 101, you've got the container-image-interface. The container-image-interface is where I'm actually saying I've got an attendees-image and that's its name. What I'm not actually specifying in the pattern is, what is the image name? The great thing about this means that when I create the architecture, it's going to leave that as an exercise for the user. They're going to have to go in and fill in their image name to make this work. If we scroll down a little bit further, we've also got the Kubernetes cluster. I'll just try to position this at the top. 192 is where I've now got my controls and requirements. What I'm saying within my controls and requirements is that I've got two different things here. I've got the requirement URL and the config URL. The requirement, remember, being this cluster does micro-segmentation and this is how its schema specifies it.

Then the config, which is, and this is how it should be configured. Just to point out, there's the relationship between load balancer and attendees. That's the same thing we saw in the slides. The difference here is that there's also a permitted-connection HTTP config. That's because if I didn't have that, nothing would work because everything would be default-deny setup. This adds in then that particular behavior. Let's just have a look at the control requirements. This is the micro-segmentation one. The two things to note is that it's got a control ID, a name, and a description.

Then it's got the two things that I care about. Now just imagine, if you could think about a requirement or non-functional you might have in your own organization, think about the type of things that you could specify in there that you want to capture in your architecture. You could really go to town on this. Be aware that your developer will then have to fill them in. You could really go to town on this. Maybe actually what you do is provide some pre-canned configurations out of the box, which is, I think, how we're going to do this. I've got my ingress is true because it's coming from a load balancer. It shouldn't go anywhere else. Permit egress is set to false. I've then also got my protocol, and it will be HTTP. This is, again, saying what the reason is for connecting and allowing that connection between the two things. That's the basic configuration there.

The next command I can run is calm-generate. The pattern's all me. That's what I've built. I've then gone to my users, here's the pattern. I'll talk about how we give patterns to users. I've got, output and architecture. What calm-generate does, it takes a pattern as an input and it creates an architecture that your users can then work with.

One feature that we've just added in the last couple of weeks is this idea of pattern options. By creating our first patterns, what we realized was if you want to change one thing, you have to create an entirely new pattern. That's not a very good experience. We've introduced pattern options so you can make those selections. When you do that, there's an interactive command line interface which asks you what you're trying to do. What you'll be able to see here then is that I've got my attendees-image and these square bracket placeholders are asking for the image name. As a developer you would go through and fill out some of these things. You can also set things up as constants. To help my developers out, the URL and the config is already specified for them. I'm not going to get them to go and do that because I'm feeling kind.

Once you've got hold of your architecture, you don't want it going down some pipeline and then breaking two hours, three hours later. Being able to do an upfront validation is the next important step. If I run can't validate on this, it's going to tell me that there are no errors, which I should hope so because I've not really edited anything. That would be demo going wrong. It does have warnings. What this is telling me is that you've not filled in these placeholders, those placeholders that were available. That would be something I would need to do. Now, very much Blue Peter, here's one I made earlier because my typing will get worse. I'm using an image which is the mastering API repository and the attendees Quarkus example to start up this architecture. That now fits in the architecture file so it becomes a complete example. That's all great. You've got an architecture. I'll show you what you can do in terms of visualizing that. What I really want to do is create a working example from this. This is where template bundles come in.

In the background, we added some new features to CALM, and this is how new some of this stuff is, to create bundles that are basically based on handlebars, but you'll see that they also have this idea of an infrastructure transformer. You can generate most things. If you want to do something custom or bespoke, you can add it into that file and then you can pretty much generate anything that you would like. What that ends up looking like is you can see in this application deployment example. I've got the application image and that's what's going to be templatized and placed in here. Let's run that. What that's done now is taken my architecture and populated the template bundle with all of the information needed to deploy into Kubernetes. There are actually two things that have happened. The first is it's created a cluster setup script. Because it's on my laptop, it's a minikube example and it's got a Calico default-deny on the global namespace. That basically locks everything down on the cluster. Then what I've got is all of the deployments.

If you can think about one as being, although it's a minikube example, there's no reason that couldn't generate Terraform, or some other infrastructure as code language. I've then got my application deployment and that's now populated with the configuration of how this application should work with image pool policy, if not present, to hopefully mean that it works without the internet. Then, finally, I've got my permissive rules to allow things to connect together.

This is cached. This is what was loading in the background earlier. What I'm doing is now using kubectl apply, I'm using kustomize to basically put in place all of the different things that we need. What we're going to do is I'm first of all just going to reset the tunnel to point at my new secure application. We're going to go back to here, to our Swagger UI from earlier. I'm just going to do a GET on attendees actually, just to show that that does respond with an empty array. Not very exciting. Now what I could do if I go back to my walkthrough, and this time I'm spinning something up deliberately in the conference namespace, I can try do a curl on http://attendees-service/attendees. Nothing happens. That is a good thing, just to clarify. What this is doing is it's being blocked at the network level.

The network policy rules are stopping anything from connecting to the attendees-service that doesn't have a permissive rule. This starts to look like a micro-segmented cluster. Now as a developer, I've just entered my image name and pressed enter a few times. That's literally what I've been doing up here. That's a good thing. It means that we get to the point where we've got something that's secure and it's really easy for a developer. We've done that bit.

I wanted to show you another tool that we've been working on, because actually, I think after working with JSON Schema for a year, everything just looks fine. People don't like looking at JSON Schema. I don't really know why. You can see in here we're starting to create this idea of a hub that stores architectures and patterns. You can see the JSON Schema here if we click on this. What that would show is a visualization of the architecture laid out using boxes and arrows actually present. What we're building in the background is this idea that actually all of this is an API behind the scenes. We've got an ADR resource that's presenting ADR, so we're going to be able to capture ADRs in the system. We've got architectures. We've got flows as well. Flows is not something I'm talking about in this presentation, but it's all about capturing business flows for your system. Then I've got patterns and a core schema resource as well.

The last thing I want to show is the Docify, because if one documentation thing doesn't work you can always present the other one. This is something that generates a website as part of the architecture that's present. You can see here that what I do have now is my layout, my C4 diagram of what's been deployed and how that's working. I can scroll a little bit further down and I've got actually the controls that are specified as well. That's pretty cool because now if I was documenting this for regulatory requirements, I could say, this is why I've done this. You could go and have a look on the details here. There's actually more information about the given cluster.

A CALM Workflow

What would a CALM workflow look like? The C4 design and review would be done by architects, developers, security professionals to really set out, this is our vision and this is how we want things to work. The pattern would then capture all of the things that need to be documented. How things connect, what kind of controls they have in place, what non-functional requirements they have, the creation of any template bundles as well if you want to do automatic deployments. You don't have to do all these things. You can actually just draw boxes and arrows with CALM. For me, using patterns is its superpower. You can then publish to Calm Hub and you can generate architectures off the back of Calm Hub.

Then you can validate and store the architectures back. Your Calm Hub becomes the store of all your assets and becomes an integral part of your workflow. Then out of the other side you can generate infrastructure as code, you can template into configuration, or you can Docify into documentation. This starts to look like something that I can now say, this is how I want to deploy my applications, I want security reviews around this thing automatically, and then my developers don't have to wait six months. A digestible version of this is what I'm doing internally right now for my new API deployments. Finding lots of bugs and fixing them in the open-source project, and everything goes around.

Summary

In summary, I think in reality we all work in complex operating environments. Some have more rules than others, but the complexity is there, the connectivity complexity is there. Developer productivity is really important. I want to be able to get quick time to market. I want to help my users as best I can. Right now, what I see is there's a sea of tools out there, some vendor, some not vendor. What I want to be able to do is say, ok, that's fine, but I want to coexist with these tools and actually get the most out of them for my internal use case. I think that's reasonable. The way that we've gone about this then is by building an industry standard model. We've got more things that we need to do here. We've got controls and flows that then model more detail. Again, we're not pinning detail and you've got to do this. We've then got the CLI and the Calm Hub for more interactivity tools. Then, finally, we've got some documentation.

The one thing I said about great documentation, we need to fix some of this documentation. That bit's still coming, so bear with us on that one. That's really what I've been working on. My journey has gone from super easy as a developer, really complex as an accidental architect, writing about some of those stuff with Daniel, and then now going around and looking to solve problems that then scale out from just what we are doing as a single organization.

Questions and Answers

Daniel Bryant: How easy is it going to be to retrofit some of this stuff on existing systems? Because all the banks I work with definitely got a lot of mainframes, a lot of older systems. Is this a Greenfield thing only?

Jim Gough: At its core, you can take a legacy system, and just say the legacy system is a node. You can put that in your architecture with new things. What you can do with that then is you can add in your controls that are on your legacy system and just model it in that way. We also have this concept within CALM which is called detailed architecture. What you can do is you can say, I've got this node. I'm not defining it yet, but it will have a detailed architecture over here when it's available. You can grow with what you've already got, basically. That's been a big analysis that we've been doing. There are some contributors who are looking at using AI to analyze what people already have and then import that into the CALM model. There's some interesting things going on around legacy applications and how they would integrate with more detailed patterns and architectures.

Participant 1: I'm wondering whether flows are going to be able to take advantage of Arazzo since you're trying to lean on JSON Schema and OpenAPI in general as much as possible.

Jim Gough: I think when we started working with OpenAPI, we definitely wanted it to be a standard that just reflects what the API specification is of the service. I think as the specification has matured, things like Arazzo and overlays come in. A lot of what we are doing is pulling the OpenAPI spec into a system to then deploy it, whereas I think we need to think a little bit more about how the glue between architecture as code and some of these new standards will evolve. To answer your question, it's something I've been thinking about. It's not something we have a practical solution to just yet, but definitely watch the space if you're interested. That's what we'll be looking at next.

Participant 1: That'll be interesting when that comes around. I'm also wondering about the possibility of pushing some of this stuff out into OpenAPI, as a vendor extension initially, so it can be utilized in that context so that these things can be expressed in your OpenAPI where it can then be used by all the ecosystem of tooling around that.

Jim Gough: At the moment, we've not thought about that. Within the tooling that I own at Morgan Stanley, we do have a link that goes to our architecture and patterns, but it is done in a slightly different way off the back of our deployment tool. I think that's quite a good suggestion. Again, we're definitely not trying to go over to OpenAPI and go, we want to shoehorn you into what you do and then create things around you. It's more of a, if we have a link one way, you should probably look at doing links the other way.

See more presentations with transcripts

Recorded at:

Feb 27, 2026

Jim Gough

This content is in the InfoQ topic

Related Topics:

Architecture & Design

API

QCon London 2025

Transcripts

Platforms

QCon Software Development Conference

Security

Architecture

InfoQ

Related Editorial

Popular across InfoQ

Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17%

Google Brings its Developer Documentation into the Age of AI Agents

Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices

Vercel Releases React Best Practices Skill with 40+ Performance Rules for AI Agents

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Software Evolution with Microservices and LLMs: A Conversation with Chris Richardson

---

[Original source](https://www.infoq.com/presentations/secure-connectivity-api/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global)