Teams, Teamwork and Generative AI as a Team Member





Subscribe on:






Transcript

Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture podcast. Today I’m sitting down with Jeremiah Stone. Jeremiah is the CTO of SnapLogic and comes to us today from California. Jeremiah, welcome. Thanks for taking the time to talk to us.

Jeremiah Stone: Thank you, Shane. I’m delighted to be here.

Shane Hastie: Thank you. My typical starting question with a guest is who’s Jeremiah?

Introductions [00:29]

Jeremiah Stone: Well, I’m a father, I’m a technologist. But more than anything, Shane, I think the essence of who I am is somebody who loves getting as close as possible to really devilishly hard problems with other like-minded people and solving them. So I’m a problem solver, I think, would be the bottom line of who I am and what I really enjoy doing.

Shane Hastie: So tell us a little bit about your background and what got you to the role of CTO at SnapLogic.

Jeremiah Stone: Sure. Well, I started off my career as an undergraduate in mathematics and computer science, and I was fortunate enough at that time to gain a research assistant role at the National Center for Atmospheric Research in Boulder, Colorado. And that’s where I really fell in love with the notion that you could actually improve the world, help it run better using technology. I had the good fortune to work on a project that was helping to detect a clear air turbulence for airports. And so it was a very tangible, “Wow, if I build this thing, it could save people’s lives.” Very heady stuff as a student, and I was hooked. And so over the course of a career, which also included time at SAP building enterprise applications at General Electric, building and delivering industrial applications. I found my way back into that world of predictive applications about a decade ago, and I’ve been working in the AI and ML enabled application world ever since.

And during that time, all of the big gnarly problems I was dealing with were connecting valuable bits together into larger systems. And I was very frustrated about the fact that you could really build a great individual component of a system, but to get it all to work together in an elegant ongoing way turned out to be like half or more of the work. And that drove me nuts. And so I went and found a company that was using AI and machine learning to solve this problem. And that’s how I ended up at SnapLogic is because they are crazy like I am and believe that integration can be learned, and that’s what we’re trying to do.

Shane Hastie: One of the things that you mentioned in the chat we were having earlier was focusing on ways to work together to build great products. So what are some of the key things when it comes to the people aspects, the ways that we can and do work together?

The team is the primary unit of value generation in product development [02:34]

Jeremiah Stone: I think in my experience, having been in many different environments as an individual contributor, as a member of a team, leading teams, is that the team is the right unit of operations, first of all. So the very first thing is to set your mindset that as we think about how to build, manage, monitor, maintain anything, that often the first most important thing that I need to deal with in any environment I’m in is to help the groups that I work with to stop thinking in terms of individual disciplines and start thinking in terms of a team that has different skills and capabilities. I’d say if there’s one great learning in my time building a well-gelled team that has the right different professional capabilities is the single hardest thing to do, and yet the single most valuable thing you can do in order to be successful building technology.

And these days, I think that really for software development, at any rate, consists of a blend of professions, really professions that in and of themselves are difficult to master. And that means creating that team is very difficult as well because you have to find individuals that can, for example, do quality engineering, software development, user experience, documentation, product management and so on. Those are the core disciplines, and then you start to be able to have a team that can actually dig into three disjoint problems. Is it technically feasible? Is it viable from a business point of view, and is it desirable from a human point of view? And the center of that Venn diagram is where greatness happens, but that starts with a team. So I’d say my greatest learning is to understand that you have to first build the team before you can build the thing.

Shane Hastie: How do you move people from being a group of individuals to being a team?

Techniques to assist team formation [04:17]

Jeremiah Stone: Man, if I could answer that succinctly, I would be a very, very wise man indeed. I can share some of the learnings I have had in time, and I’m reminded of the book Peopleware by Tom DeMarco and Timothy Lister, I believe, where they basically say, “I can’t tell you how to build a great team, but I can tell you the things that will prevent you from building a great team.” And I believe they titled that chapter Team Aside. And I think my experience supports DeMarco and Lister’s thesis there in that the formation of the great team is only something the team can do itself. But I think there are things that we can do as leaders and as teammates that create the possibility for the creation of a great team. And the things that come to my mind, he had, I think, a list of 10 or 15 things.

There’s only a few that I’ve seen happen over and over and over again. I think first and foremost, creating an environment with shared risk and shared success. Very often when we organize to build technology, we organize humans within their disciplines and then we create a team out of them, which means that if you are in engineering and you report up to the head of engineering and you’re in product management, you report the head of product, your design into design, you can actually create three individual definitions of success. And if we have three individual definitions of success, I don’t see that team forming. And so that’s often a thing that I’ve had to work at over and over again is saying when it comes to a performance review, when it comes to how we determine whether we’ve failed or succeeded, let’s define something at the team level and then the team fails or succeeds together.

I think shared success, shared risk. I also think the way we define delivery turns out to really matter. We’re very fond of setting arbitrary deadlines based on insufficient knowledge. And when you launch any kind of an effort or project, by the nature of what we do, we don’t know what we had to do to build the thing until we’ve actually built it; therefore, we have no way of estimating the possibility of building it. And we have all of these interesting techniques like story points and other approaches to doing that, but fundamentally it’s unknowable. And so I think having a team that is able to set a goal rather than a delivery deadline and then give updates on the progress and challenges against that goal in a timely manner to people who can help them, I’d say those are the two things. If I were to boil down Lister’s entire list, and those two things for me are my most important learnings, shared success and risk, and a goal orientation rather than a deadline or estimated delivery orientation, a goal target as well as the ability to surface blocks or progress against that.

Shane Hastie: There’s a lot going on right now in the realm of generative AI. It’s very disruptive, I know, and this is one of the reasons we’re talking is because you are actually actively involved in looking at the impact of this and how developers in particular are using these tools. But stepping away a little bit from that, at a conceptual level, where are we going as an industry with these generative AI tools?

AI is decreasing the level of deep technical knowledge required to create code and making it more accessible to people who have deep domain knowledge in other areas [07:18]

Jeremiah Stone: I see right now and we don’t know, but based on what we do know, I think there are two immediate pretty clear directions that we’re heading. I think both of them are consistent with what we’ve been doing as an industry for decades anyway, and that is making the ability to create information or technical products more accessible to individuals who don’t have the same level of advanced training. And I think we’ve done this with integrated development environments. We’ve done this with low-code, no-code tooling. I have a daughter in high school and she’s using a visual programming model that was actually developed here at University of California, Berkeley. And many of us with kids have scratch programs we’ve seen the kids develop. So I think part of where this is taking us is decreasing the level of deep technical knowledge required to create code and making it more accessible to people who have deep domain knowledge in other things.

So maybe you have deep domain knowledge in how to run a transportation network, how to run a bus network, and you have an idea for some technology or software that could help you do that. Well, you’re unlikely to have a computer science degree and be able to develop it. But now increasingly we are creating the ability for people with less technical aptitude to be able to actually build things. And whether that is what we’ve done with cloud and making the startup cost much lower because we can abstract the infrastructure and use a software interface to create that, or with the programming experience, we’re making it more available. An example for that would be the whole text to SQL world where we want to give people who have great questions for their data, structured relational data, the ability to simply ask … We’ve said this for decades, ask questions of the data.

AI is not significantly increasing developer productivity because writing code is such a small part of the development process [09:04]

Well, now we’re getting to a point where you can actually ask questions of the data, which is kind of interesting. So I think that’s one, increasing accessibility. That seems pretty clear. I feel confident about that. The other one that’s very, very, very hyped right now is increasing productivity of engineers and productivity of engineers being measured, I guess, in if we use DORA metrics, how often people are pushing to production, how often are there change failures and rollbacks and that sort of thing. I’m less confident about the ability for generative AI to increase developer productivity predominantly because writing code is only a very small subset of what developers actually do. And so I think that we can probably see a bit of a levelling of the quality of code produced that will happen in the industry. So we’re basically helping to bring people up who may not have had the good fortune to work under the tutelage of more senior engineers and have learned tips and tricks that is very oral, in our industry is a lot of oral knowledge that’s passed down from senior engineers to more junior engineers.

I think that might be helpful, but I still think that what we’re not addressing is that kind of fundamental analysis and design component. So if I were to give you the very succinct answer, where are we headed as an industry, I think one, we’re making the ability to create software more accessible to people who have lower levels of technical skill. And two, I think we are further increasing the, let’s say, first delivery quality of code that’s produced by professional software developers. And both of those are good things. What I do not and vehemently disagree with is I don’t think this is putting developers or coders out of a job anytime soon, and I don’t think that we’re going to experience hyper increases in individual developer productivity across industry.

Shane Hastie: That’s interesting. It feels to me that there’s almost a contradiction between the two things you were talking about there though. On the one hand, we’ve got people who are not technology experts producing technology products more easily, on the other we’ve got the “quality” of the code that the technologists are producing getting better. Are we saying that these non-technology experts will produce something that won’t be of high quality? Where does it hand off between the two, I suppose?

The distinction between skilled amateur with AI and professional developers [11:28]

Jeremiah Stone: I think I look at it a little bit differently. So let’s draw a metaphor to power tools. So I, as a very mediocre DIY weekend carpenter around the house, I can use a power drill to drill a hole and bang together a table, et cetera, and that table can hold up food and it can support things. And that’s great because if we didn’t have power tools, then I’d have to use a hand driven approach to doing that, that I couldn’t even build the table at all. So I can build something functional and it’s capable. On the other hand, I am not a furniture maker and I cannot make a beautiful piece of furniture that requires significantly higher skill. But if we look at it from a functional perspective, could I do something, and I have done, I’ve built furniture and things that we use in the … backyard furniture, it’s just fine. Does it hold me up? Am I worried about it as a safety hazard? Does it service the function? Yes, it does.

Is it something that you would look at as art that an actual master furniture maker would make? No, you would not. And so I think that’s the distinction is we’re giving accessibility to create functional software systems to a much larger number of people. The ability to create highly scalable fault-tolerant capable systems that are going to run at a global scale and handle the problems you have to earn your way into through success, I think that’s a much harder proposition and that requires significantly more machinery, and there’s more to that. But in much the same way that, let’s take the data science domain, the emergent of Python Notebooks, Jupyter Notebooks has given people who couldn’t use VI or Emacs, or whatever you needed to build the multiple files to run data science, a much more accessible way, and they’re able to bring their mathematics knowledge to do something capable. That thing that they do is often an analytic output that’s highly functional.

It would not be confused with a model that could run for billions of requests per minute, et cetera, on a globally distributed network. So I think that’s what I need to do. I think these are kind of two different axes almost in terms of reliability, scalability, functionality, and accessibility to build something functional for a given population that may need it.

Shane Hastie: Thank you. A very clear metaphor there. So digging into the research, you mentioned that you are actually doing active research. You want to tell us a little bit about what you’re doing there?

Jeremiah’s research into large language models for code generation [13:47]

Jeremiah Stone: Sure. So I’m in the process actually of finishing a master’s degree in information and data science. And my specific research area has been exactly this domain of large language models for code generation or program synthesis. So two different subdomains there. The first place that I started out was what is known as a sequence to sequence problem for text to SQL. So given a question and a database, to what degree can you generate a SQL statement that will not only execute but actually return the same results as if a human expert programmer would do? That was the first. My current active research project is looking at coding copilots. So, familiar with GitHub Copilot, there’s a few other ones out on the market that are there to assist software developers in various tasks from code completion to code generation to documentation, analysis and refactoring, or in context help documentation, et cetera. Those are common skills that are being applied now.

And my specific domain of research is about how to make such programming assistance or models intelligent or higher performing on private code bases, which is nearly all enterprise code bases were not available for any of these models for training, and therefore the way these models work is that they are statistical text generators that will generate text that is most similar to the data that they were trained on. Well, if none of the private training sets were available, they will not perform well in a given context if you’re trying to create software. So that’s the research I’m working on right now is saying what is the minimal amount of work, what are the best techniques to adapt these models in a private local way to a private code base?

Shane Hastie: What’s coming up? What are you learning? Let’s start with the text to SQL. When will I be able to just, “Tell me how many clients are buying our products,” and magically get an answer.

Jeremiah Stone: That domain is really doing amazingly well. So as of right now, there’s a common benchmark, if your listeners want to look into this more, called the Spider benchmark that is run by Yale University. And this is an open benchmark where there are submissions of different approaches to that. And I believe the current state of state-of-the-art on a generalized model produces sort of in the low 80% accuracy, maybe 84%, maybe 87%, given a question and the schema of the database, it’s able to return a perfectly accurate SQL statement on its first try. So that’s remarkable, I think, if you’re in the mid 80% in a generalized model, that’s actually quite well. And I think what we’re seeing right now is a very rapid adoption of text to SQL in database tools and we can expect that very quickly that we will have high performing capable interfaces to whether it’s Redshift or BigQuery or Microsoft SQL, Synapse, et cetera, I think that’s the first place you’ll see that emerge, that your ad hoc query builder will now be something that you can type in your question to.

And then we’re already starting to see this in analytical tools, or let’s say a Tableau, Power BI looker type thing, and also in bespoke applications where now not only the question is free text, but also the answer can be free text as well, because these models are bidirectional, so you could take that numeric answer and give back a textual response. So I think that’s probably the place that most of us will experience the really capable large language models in the work environment for development and development oriented teams.

Shane Hastie: And in the private code base?

Jeremiah Stone: Much earlier days in the private code base world. Now there are tools like the GitHub Copilot that are available for these purposes. Those tools perform well from a syntax perspective, so it can generate good Java, it can generate good Python, it can do good documentation. What those tools today are incapable of doing is picking up the correct libraries that you might be using using or coding conventions. A lot of the times, the way we define success within an engineering team is that we agree on conventions of white space, line length, commenting conventions, these sorts of things so that we can have transferable work between individuals in the team. That’s an area that maybe for Python we can get sort of the PEP 8 style standards or something like that, but not a team specific coding standard. And that’s where much more active research is in play now to take and adapt these models to specific code bases. And in fact, the performance of these models on generating code is still an incredibly active domain of research.

Most recently, the most exciting recent development has been the release of Code Llama, Code Llama Python from Meta, very interesting models there. But again, these are general purpose tools that are not adapted to individual code bases. And so I think that does limit, to a certain extent, their utility for engineering teams to be using them.

Shane Hastie: So where’s it going?

The emergence of local models trained on private code-bases [18:50]

Jeremiah Stone: I really anticipate, at least based on our research, that increasingly what I’m seeing, particularly from a business private enterprise software companies, let’s take that population or enterprises themselves that are developing software, my intuition is that we are going to see the emergence of local models that are tuned on to private code bases, and that that will be probably supported by integrated development environments in the near term, whether that’s JetBrains or Visual Studio, those types of environments, or as independent plugins that you can then use to host these. And I think that we’re headed in that direction specifically because this is, again, back to our earlier point, I think extrapolating from things we already do and care about is a good idea because they have momentum and what do we care about? We care about creation of good work product. We care about quality and assessment of that quality. We care about continuity and transferability of work between team members and those are all good work practices within a team and a support with large language models for those specific coding activities supports things we already care about.

And so I think I’m pretty confident that we’ll be heading in that direction, but what we will not be able to overcome, and I experience this ourselves or in my company, we’re basically AI maximalist, we are all in and we believe this, we took a very long time to evaluate and approve the GitHub Copilot in our environment, and that’s because we really care about our intellectual property and our ability to keep that safe, secure, and represent our employees and shareholders. And I think that is a warring force against using these tools in teams. And if there were a way to overcome the privacy concerns about sending my code to a large language model and just basically hosting that to the world, then I don’t really feel like doing that. On the other hand, if it were a model that was fully contained within my environment and I wasn’t teaching Amazon, Google or Microsoft anything about my code, then that might be okay. And that’s where I think we’re headed is we’re headed to a place where this will be one of the first sort of not only self-hosted, but self-tuned models that are used in engineering practices for, really, as we talked about earlier, helping to improve the quality and consistency of code bases.

Shane Hastie: So as an engineer, I’m not at risk of being replaced by the machine, I’m going to be able to leverage the machine in my context.

Engineers working with locally tuned AI tools rather than being replaced by them [21:19]

Jeremiah Stone: I think that’s right. Andrew McAfee and Erik Brynjolfsson had a book a while ago, or at least it was an ebook, I think that said, it’s not the race against the machine, it’s the race with the machine. And it’s the teams that figure out how to team with the machine that will have a high degree of success. And I think we’re dealing with something that is a Jevons paradox. There’s a lot of fear that these AI models will take away programmers jobs, but the flip side of this is what is enough software? Is there any team that has too much capacity and they have an empty backlog? No, every team I’ve ever worked with has this infinite backlog of work and people that don’t even ask them for help anymore because they have no capacity.

So the more readily available this expertise is, the more demand there will be for it. And so I really don’t subscribe to the pessimistic dire view that this is going to put programmers out of the work. Quite the contrary, I think this is only going to create an even more insatiable demand for programmers in the workplace.

Shane Hastie: Topic that I’m personally very interested in, and I’m throwing this one at you without any notice, is the ethics of computing. You’ve mentioned that you’ve worked in high risk environments, you’ve worked in life and safety critical environments. How do we keep things ethical?

Jeremiah Stone: With regard to artificial intelligence or in what context?

Shane Hastie: I’m going to say broader context, what’s happening in our industry from an ethics perspective?

Ethical challenges in the technology industry [22:46]

Jeremiah Stone:  I hear a great deal about this topic as well, and there are so many dimensions that the fundamental definition of what do ethics mean in the context of respecting the rights of an individual as well as counterbalancing the needs of society, and basically the short form definition of doing the right thing when nobody’s looking. That’s a good definition of what it means to be ethical. I think there’s so many different directions to think about this in our industry and I think broadly speaking, let’s just take it one step at a time. I think one area where we’ve had challenging and questionable behavior is utilizing wage arbitrage to take advantage of technologists in lower wage working environments. That definitely seems to be changing as the living standards and incomes of those individuals increase seeing more and more fully autonomous teams in so-called offshore locations.

And I think that’s a broadly good trend, that there still is certainly some, I think, unsavoury behaviour where offshore the product and fire the team in the high cost location or make the team the lower cost location work unreasonable hours or put them under unreasonable pressure. I think that still happens, but at least from what I can see, that’s starting to tail off simply because businesses tend to care about cost, and as cost equalizes, then we start to see the ethical behaviors improve. So I think that’s on a good trend.

On the other hand, I am very concerned about the rise of the surveillance society and state. And I think whether that is talking about somebody tracking your browsing habits for whatever reason or your speech and your behaviors, I think that we are in a very murky time for that type of ethical behavior currently.

And I’m quite concerned about that more broadly, whether we’re talking about nation states or individuals. And we’ve certainly seen some very interesting behaviors, whether it’s Twitter being purchased and turned into X by a self-avowed free speech maximalist or other behaviors in terms of closing down the ability to communicate and speak freely. So I think there’s definitely concerns in terms of individual freedoms and surveillance that I don’t see a clear point of view on today. And I think we should be concerned, we should be active individually as leaders to push in the right direction and to demand good behavior. So I think we can go on for many hours on this topic, Shane, but those are the two, are we treating people decently that are doing the work? And by decently does that mean giving autonomy, giving support, giving the capability to contribute in a meaningful way?

I think that’s broadly improving. And even now we see countries that were previously not locations for offshoring that are coming online, whether it’s Thailand or the Philippines or other places, and they seem to be experiencing very rapid wage growth and leveling off. And I don’t think the AI and ML progress will have a massive indicator, and I hope I’m not proven wrong. I think that’s a good direction. On the other hand, I think the continued low cost of storage of information and the continued censorizing of both the built and digital world and the purposes that people have put that information to should concern us all and we should be active and thoughtful about how we as individuals, corporations, and the governments that should serve us are dealing with that.

Shane Hastie: Jeremiah, thank you. Very interesting wide-ranging conversation, and I appreciate your willingness to go where it goes. If people want to continue the conversation, where do they find you?

Jeremiah Stone: Thank you. I have enjoyed the conversation and yes, it was definitely wide-ranging but enjoyable for my part as well. I can be found on Twitter @jeremiahstone on X, guess we’re meant to call it now, or can hit me up on LinkedIn, and more than happy to connect and collaborate. Those are two easy ways to find me … I’m active on. And so I’m grateful for the opportunity to come on today and enjoy the conversation you continue to drive with the pod. Thank you.

Mentioned

Spider benchmark that is run by Yale University

Jeremiah’s research into Large Language Models Trained on Code

Jeremiah on Twitter/X and LinkedIn

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.



Source link

Yorum yapın