Haphazard Thoughts on Automation from a So-Called Academic

D-O-Fs galore
Find the shortest path you can
A slide will do fine

I've now spent about as many years out of academia as I did in it, and I can't help but reflect that the promises of the robotic manipulation community a half-decade (if not longer) ago haven't advanced all that much, at least in terms of deliverables that have impact beyond the next round of paper submissions and grant proposals. Since Covid shook up the world, the news has been churning out story after story of the accelerating need for robotics and automation without really ever getting into just how pervasive automation already is, and how difficult it is to expand on our existing capabilities. I acknowledge that advances in data manipulation and management have grown by leaps and bounds, despite the (imo) over-hyping of robotic process automation's library of glorified hotkeys. However, I believe that industry is still fairly limited in terms of how machines can physically interact with the world, especially in scenarios and tasks that are frequently changing. In my biased view, when the hardware that provides the coupling between the metaphysical and the physical hardly changes, a rising tide of thought complexity doesn't exactly raise all boats.

I will admit that to push my narrative, I'm conveniently ignoring the autonomous driving space, mostly because I don't think I can fully appreciate the technical challenges and solutions in the past decade, but also: if we take a look at autonomous vehicles' smaller sidewalk brethren, most solutions still rely on remote monitoring and human overrides ([1], [2],[3]), suggesting to me that even with well-structured environments and standards, autonomy outside of controlled facilities is still a ways away. That said, even in significantly more structured scenarios, closed off from the unpredictable nuances of the outside world, I'm surprised at how limited robotics applications can still be.

Hard Things that are Easy for Automation

I often say (with a sad and bemused resignation) that most robots in the physical world are just a collection of motors, and most of us robotics mechanical engineers are just trying to figure out how to package motors. Turns out, we're pretty good at controlling the motion of rotary actuators with a high degree of fidelity and precision. Apparently, machines are excellent at timing and repeatability. In isolation, that makes most robotic systems capable of poses and configurations (for the most part) significantly more repeatable than anything a human could do. We only have to look as far as the CNC machines forming the backbone of manufacturing for a demonstration of the tolerances and motion resolution we can achieve. As we further develop machines to interact with the physical world and move away from the virtual (sorry, metaverse), one could ask: what are robots but just motion systems and gantries?

Broadly speaking, I believe robotic systems are currently best at applications requiring precise timing and positioning. Trigger a bunch of relays with a very particular timing? No problem. Move a linear stage to within 0.050mm of a prescribed point? Easy for even a low-end 3d printer that missed its funding goal on Kickstarter. When we look at applications where machines are excelling, we see structure and consistency. We can rely on an automated system to reliably complete the same motion a thousand times (and more). Not only that, we can scale that sort of performance to systems both much larger and smaller than a typical person. This level of performance isn't even necessarily reserved for physically-connected closed systems. Warehouse logistics setups like Kiva Systems (Amazon Robotics), Six River Systems, Locus Robotics, and many more have freed motion systems from networks of conveyors and tracks. Aethon and Savioke have deployed similar setups in environments alongside people, and more recently, Dusty Robotics has turned that simple two-wheel mobility system into a source of reference in dirty construction environments.

Arguably, precision in automated systems may not even be a necessary advantage when evaluating their worth. A non-optimal action repeated multiple times may exceed the benefits of a single, most-appropriate action. The original Roombas (and their cheap brethren today) did not have mapping capability, relying instead of pseudo-random reactive/Braitenberg behaviors to get the job done. Did that result in wasteful, redundant motions that seemed silly at times? Sure, but it did a good enough job, that sometimes good enough is all you need. Dumb brute force is still forceful.

Easy Things that are Hard for Automation

One thing that's probably aggravating to any roboticists/technologists reading up until this point is that I've been conflating automation and robotics as if they're the same. I'll probably double-down through the conclusion of this set of ramblings and play jump rope with the line between the two, but I would like to acknowledge that the complexity of a system's output alone doesn't make for a robot. I don't think even the most whimsical of Rube Goldberg machines would qualify as a robot. If it pleases the interwebs, I'd argue that as we move from simple machines to the robots as we know them in sci-fi, we're expanding the range of inputs and outputs, as well as the number of ways we map between those inputs and outputs. I'm personally a bit amused that 'robotic' in the vernacular implies something that's repetitive and monotonous (dumb, even), whereas in an academic context, we're expecting something that reacts to input in some non-trivial and transformative way.

Spatial repeatability in robotic motion systems are great for a closed and connected system, but real world systems are fluid and constantly changing. Even when they're well-constrained, the definition of the system in question can change as the robotic element engages and disengages with various other elements in the system. In manipulation research, readers should be used to reading about 'hand-arm' vs 'object-hand-arm' systems, and while the latter may be well defined and highly repeatable in isolation, the transition between the former to the latter remains messy, and I would argue that there is very little expectation of reliability in that process in any application to this day. Even in implementations with fixed geometries/locations known a priori, it's not atypical to see robotic systems leverage prehensile motions against hardstops to minimize the positional error state of the 'object' part of the 'object-hand-arm' system. Other strategies include using a gravity-assisted drop-off station to further align the part or detecting the grasped part location after a pick attempt and adjusting the drop-off pose accordingly to compensate for the pick error. When the task requirements deviate from a particular nominal motion, the reliable, rigid, repetitive robot now becomes a liability. Robots can sometimes more repeatably reproduce errors than solutions.

The term 'unstructured' gets thrown around a lot in academic papers to suggest that we're making headway in adapting robotic systems to operate more effectively in a more adaptive way, conducive to handling a larger range of scenarios, whether through innovative hardware design or more intelligent/complex control schemes. My entire thesis centers on the notion of 'mechanical intelligence' and designing mechanisms with parameters conducive to a varied set of tasks as opposed to a particular one. Truth be told, it's hard to ever say anything is truly fully 'unstructured.' Academia thrives on finding that sweet spot where tractability, novelty, and simplified dimensionality meet, but a proclamation of prowess in accomplishing the 'unstructured' conveniently leaves out a more thorough discussion of the tradeoff between depth and breadth that typically was made.

Robots and automation always work better with a consistent ground truth and reference, ideally one that is as readily available as possible. Autonomous cars can assume GPS coverage, standardized street markers, and perhaps most importantly: a level, typically asphalt ground. Warehouse logistics robots rely on a grid of visual markers on the ground, and sometimes, the ceiling. Even markerless navigation solutions would need to presume a set of reliable landmarks (which arguably are still just different sorts of fiducial markers). How do you map an area that from the robots perspective is frequently changing?

Perhaps traditional machining concepts illustrate this problem best: what tolerances can you hit with the best, most high-end machines when you can't guarantee your datum? Given a near-net part geometry (say, via some sort of additive manufacturing solution, if that clarifies the example) without reliable reference planes, how should be the default cutting paths be adjusted to produce the target part? It's interesting to me that despite the added time and material waste, it's far simpler to machine the desired shape out of a larger, regular billet block than to try and account for a more irregular stock. The best reference is perhaps one the system creates itself, if you have the opportunity to arbitrarily construct and dictate your own closed system, though that's not to say we aren't all actively trying otherwise.

In a way, I suppose you could say it's interesting that robotics can struggle so hard with evaluating system error and correcting for it, given that's the premise for the most basic of control loops. However, it would appear to me that as system scope expands and becomes more complex, the technological requirements to achieve a reliable feedback loops become more difficult as well. Perhaps it's a problem waiting for a more novel strategy, though I'm more inclined to think that we're more lacking in hardware, and I'd further argue that in the meantime, our best bet is to look for ways of redefining and rescoping the problem through the addition of self-imposed structure.

Augmentation vs Replacement

Automation also gets talked up a lot as the inevitable path to human workforce replacement, but there's perhaps overzealous focus on mimicking/copying human capabilities. That's a rather narrow and also unrealistic (or at least incomplete) viewpoint. Trains, planes, and automobiles were never seen as 1-to-1 replacements for human workers, so why should robots be? Referring again to some of the more popular semi-supervised autonomous delivery models, robotics can enable a single operator to handle several concurrent deliveries remotely, and that can be quite a productivity gain even if the human oversight can never be 100% removed. On the flip side, I could argue that the primary failing of chatbots was that it couldn't replace enough of the functionality of a human assistant, or there wasn't a compatible flow between the autonomous component and the human agent.

One question I've had however: is there any difference between augmentation and partially automating a portion of the target task? Is this just a matter of semantics? In the case of RPAs, truly mind-numbing and repetitive tasks can be reduced to a negligible number of key strokes, and in the robotic delivery application, moving between safe waypoints no longer require direct joysticking or human review. When a task can't be 100% automated, we run the risk of creating new work in transitioning between the human and autonomous workflows. If those workflows can be completely decoupled, they would be considered different tasks, not part of one and the same. Perhaps augmentation is just the scenario where that extra friction and workload is worth the production gains.

Is Your Uptime Up This Week?

Prior to my most current project, I did not have a proper appreciation for keeping even the most basic of systems running unaided and unmonitored. What do you mean this thing needs to keep running even after the weekly investor demo is over? Downtime means wasted potential and delays on expected ROI. Downtime means there were edge cases that caused unforeseen failures. Downtime means that perhaps not enough of the application scenarios were automated.

As with the viability of automation, I believe uptime and structure are positively correlated. Each new source of variability, no matter how small, represents an increased chance of failure until that particular variation is tested and validated. In my mind, the robotic system can seem more autonomous or 'capable' when the direct empirical validation of a particular scenario is sufficient for a significant range of 'similar' scenarios. I think our goal as roboticists is to figure out the tech necessary to extend that range as much as possible. I know that my current goal as an automation engineer is to thoroughly test as many scenarios as possible and make sure our system doesn't get too adventurous.

Failure is Just a Matter of Time

With great uptime comes greater maintenance and upkeep requirements. Nothing lasts forever, and system performance may degrade far sooner than when evidence of impending failure becomes apparent in the critical subcomponent. The difference between today's and yesterday's device may be miniscule, but it's not zero, and even with direct oversight/access, it's not always obvious when say a bearing may begin to seize, or an elastic element becomes plastically deformed, or increased stiction in some transmission changes the expected force profile at output.

I've actually struggle a great deal tracking performance losses over time for the system as a whole. The core robotics arm is typically rock solid, the end effectors remain responsive, the sensors trigger as they should, and the software continues to do exactly what it was written to do, but once we include frequent interfacing with external environments and objects, I lose confidence in the system's capability, and I'm still crossing my fingers when I turn the damn thing on. Most discouragingly, the feedback I need to debug the system errors tend to be relatively infrequent, discrete, and agonizingly boolean (success or fail). My problem's further complicated when I can't guarantee the consistency of the inputs or the reporting of the outputs. It's been critical for me to have an extensive library of unit tests and maintenance procedures, not just for system evaluation prior to deployment, but also to attempt preventative care. The problem is similar to rehab in professional sports: the joints and motion may seem fine in individual workouts and on machines, but when it comes to game-time, it's always hard to say whether you can reach peak performance.

One-Trick Ponies Can Have a Lot of Downtime

Despite recent companies like Rapid Robotics, Rios Intelligent Machines, and your more traditional integrators making it easier than ever to set up a robotic workcell for your task, it still typically winds up being run like a particular tool for a singular task. There's not much variability in the motion, and certainly not in the type of work. Once the system handles all the remaining throughput, it's generally unsuitable for any other task without effectively being torn down and rebuilt. I guess there's a question of whether it's truly downtime if the machine accomplishes all that it was designed to do, but I think it's fair to say that it's not being fully utilized. Arguably, considering the 6-dof industrial arm as a multipurpose, flexible workhorse may only be true up until integration is complete. Is a spork really more functional than a fork if it's only ever used as a fork?

That'll Cost You at Least an Arm, but It Doesn't Have To

Every video or sizzle reel of automation in factories focus on the rows of high-dof robotic arms (usually in nice Fanuc-yellow, Kuka-orange, or UR-grey). That always seemed a bit off to me, as I'd argue that up until the past decade or so, the arm is performing the same motion profile, and often a fairly rudimentary one. Given the compromised workspace resolution, payload capacity, and stiffness (thanks, cantilevers), why insist on high-dof serial-chain manipulators when a simpler gantry could be more beneficial? It feels like a scenario where the user elected for a swiss army knife to cut down a field of weeds.

Could the reason be that hardware is still just hard, and a robotic arm provides not only some amount of future-proofing for when the task constraints change, but also a standardized motion platform with a higher degree of reliability than most custom gantries? System deployment complexity potentially shifts from a more tedious hardware redesign to updating a cartesian trajectory. As modular, customizable gantry subsystems become more accessible via efforts from companies like Vention, I wonder if we may see less, not more, robotic workcells. A quick comparison of the humble coffee vending machines of the last century and the 'intelligent', more advanced kiosks suggests to me that maybe we've gone a bit backwards in some cases.

Adaptability/Robotics is When Setup Time/Cost = 0

All of this rambling leads me to wonder: is there any functional difference between a general purpose robot with a multitude of capabilities and a readily available set of single-purpose mechanisms purpose-built for each of those capabilities? If not, then how do we consider mechanisms that are increasingly easier and faster to retrofit? Ease of adaptability for a given system has mostly been explored through software, leaving morphological changes for the initial setup phase as opposed to part of the task execution. There's a good number of justifications for that strategy, but imagine other methods of hardware combinatorics as a means of extending capability:

Can elements from separate tasks be leveraged together for a third task when not ordinarily in use?
Can multiple motion systems interact to produce motions or force profiles neither could on its own?
Can more complex affordances and assistive features be automatically loaded/assembled instead of being simply swapped in/out (as with toolchangers)?
Is there a reconfiguration procedure fast and easy enough (whether via automation or manual intervention) that would be compelling enough to include regularly in an automation system?
Can redundant degrees of freedom, and consequently, different body/system poses for a target tool pose give more functionality than an exactly constrained/actuated with extra actuators explicitly for tools?

I'm certainly biased by my preference for hardware, but it feels like the typical robotics engineer is looking at the 6-dof arm as some sort of magic glue to tie different tooling together. It's a nice multitool that could be either excessive and limiting, depending on the application. I think there's quite a bit of opportunity in improving ease of hardware reconfiguration that can bypass some of the more extreme frustrations of automation today. Why force a square peg, as squishy as it may be, into a round hole when you have time and opportunity to whittle the peg to the shape you want? You can be as good as making the tool as you can wielding it.