1 Introduction

Machine learning (ML) is increasingly used in safety-critical applications, thereby creating an acute need for techniques to gain higher assurance in ML-based systems (Russell et al. 2015; Seshia et al. 2016; Amodei et al. 2016). ML has proved particularly effective at the difficult perceptual tasks (e.g., vision) arising in cyber-physical systems like autonomous vehicles which operate in heterogeneous, complex physical environments. Thus, there is a pressing need to tackle several important problems in the design of such ML-based cyber-physical systems, including:

  • training the system to be robust, correctly responding to events that happen only rarely;

  • testing the system under a variety of conditions, especially unusual ones, and

  • debugging the system to understand the root cause of a failure and eliminate it.

The traditional ML approach to these problems is to gather more data from the environment, retraining the system until its performance is adequate. The major difficulty here is that collecting real-world data can be slow and expensive, since it must be preprocessed and correctly labeled before use. Furthermore, it may be difficult or impossible to collect data for corner cases that are rare and even dangerous but nonetheless necessary to train and test against: for example, a car accident. As a result, recent work has investigated training and testing systems with synthetically generated data, which can be produced in bulk with correct labels and giving the designer full control over the distribution of the data (Jaderberg et al. 2014; Gupta et al. 2016; Tobin et al. 2017; Johnson-Roberson et al. 2017).

A challenge to the use of synthetic data is that it can be highly non-trivial to generate meaningful data, since this usually requires modeling complex environments (Seshia et al. 2016). Suppose we wanted to train a neural network on images of cars on a road. If we simply sampled uniformly at random from all possible configurations of, say, 12 cars, we would get data that was at best unrealistic, with cars facing sideways or backward, and at worst physically impossible, with cars intersecting each other. Instead, we want scenes like those in Fig. 1, where the cars are laid out in a consistent and realistic way. Furthermore, we may want scenes that are not only realistic but represent particular scenarios of interest for training or testing, e.g., parked cars, cars passing across the field of view, or bumper-to-bumper traffic as in Fig. 1. In general, we need a way to guide data generation toward scenarios that make sense for our application.

Fig. 1
figure 1

Three scenes generated from a single \(\sim 20\)-line Scenic program representing bumper-to-bumper traffic

We argue that probabilistic programming languages (PPLs) (Gordon et al. 2014) provide a natural solution to this problem. Using a PPL, the designer of a system can construct distributions representing different input regimes of interest, and sample from these distributions to obtain concrete inputs for training and testing. More generally, the designer can model the system’s environment, with the program becoming a specification of the distribution of environments under which the system is expected to operate correctly with high probability. Such environment models are essential for any formal analysis: in particular, composing the system with the model, we obtain a closed program about which we could potentially prove properties to establish the correctness of the system.

Fig. 2
figure 2

Spectrum of scenarios, from general to specific

In this paper, we focus on designing and analyzing ML-based cyber-physical systems. We refer to the environment of such a system at any point in time as a scene, a configuration of objects in space (including dynamic agents, such as vehicles) along with their features. We develop a domain-specific scenario description language, Scenic, to specify such environments. Scenic is a probabilistic programming language, and a Scenic scenario defines a distribution over both scenes and the behaviors of the dynamic agents in them over time. As we will see, the syntax of the language is designed to simplify the task of writing complex scenarios, and to enable the use of specialized sampling techniques. In particular, Scenic allows the user to both construct objects in a straightforward imperative style and impose hard and soft constraints declaratively. It also provides readable, concise syntax for spatial and temporal relationships: constructs for common geometric relationships that would otherwise require complex non-linear expressions and constraints, as well as temporal constructs such as parallel and sequential composition and interrupts for building complex dynamic behaviors in a modular way. In addition, Scenic provides a notion of classes allowing properties of objects to be given default values depending on other properties: for example, we can define a so that by default it faces in the direction of the road at its position. More broadly, Scenic uses a novel approach to object construction which factors the process into syntactically-independent specifiers which can be combined in arbitrary ways, mirroring the flexibility of natural language. Finally, Scenic provides constructs to generalize simple scenarios by adding noise or by composing multiple scenarios together.

The variety of constructs in Scenic makes it possible to model scenarios anywhere on a spectrum from concrete scenes (i.e. individual test cases) to extremely broad classes of abstract scenarios (see Fig. 2). A scenario can be reached by moving along the spectrum from either end: the top-down approach is to progressively constrain a very general scenario, while the bottom-up approach is to generalize from a concrete example (such as a known failure case), for example by adding random noise. Probably most usefully, one can write a scenario in the middle which is far more general than simply adding noise to a single scene but has much more structure than a completely random scene: for example, the traffic scenario depicted in Fig. 1. We will illustrate all three ways of developing a scenario, which as we will see are useful for different training, testing, and debugging tasks.

Generating concrete scenarios from a Scenic program requires sampling from the probability distribution it implicitly defines. This task is closely related to the inference problem for imperative PPLs with observations (Gordon et al. 2014). While Scenic could be implemented as a library on top of such a language, we found that clarity and concision could be significantly improved with new syntax (specifiers and interrupts in particular) difficult to implement as a library. Furthermore, while Scenic could be translated into existing PPLs, using a new language allows us to impose restrictions that enable domain-specific sampling techniques which are not applicable to general-purpose PPLs. In particular, we develop algorithms which take advantage of the particular structure of distributions arising from Scenic programs to dramatically prune the sample space. We refer to the random generation of concrete scenarios as scenario improvisation, as it is inspired by and closely related to a class of problems known as control improvisation (Fremont et al. 2015; Fremont 2019).

We also integrate Scenic as the environment modeling language for VerifAI, a tool for the formal design and analysis of AI-based systems (Dreossi et al. 2019). VerifAI allows writing system-level specifications in Metric Temporal Logic (Koymans 1990) or as objective functions, and performing falsification, running simulations and monitoring for violations of the specifications. VerifAI provides several search techniques, including active samplers that use feedback from earlier simulations to try to drive the system towards violations. To support these active samplers, each sampled concrete scenario and the corresponding performance of the system with respect to its given specifications are logged in a table. This data can be analyzed (by clustering, principal component analysis, etc.) to determine promising parts of the environment space; an active sampler can intelligently select an unexplored concrete scenario that is likely to induce a violation of a specification. We make these techniques available from Scenic using syntax to define external parameters which are sampled by VerifAI or another external tool. Such parameters need not have a fixed distribution of values: for instance, we can define a prior distribution, but then use cross-entropy optimization (Rubinstein and Kroese 2004) to drive the distribution towards one that is concentrated on values that tend to lead to system failures (Fremont et al. 2020).

We demonstrate the utility of Scenic in training, testing, and debugging ML-based cyber-physical systems, both at the ML component level and at the full system level. Our first case study is on SqueezeDet (Wu et al. 2017), a convolutional neural network for object detection in autonomous cars. For this task, it has been shown (Johnson-Roberson et al. 2017) that good performance on real images can be achieved with networks trained purely on synthetic images from the video game Grand Theft Auto V [GTAV (Rockstar Games 2015)]. We implemented a sampler for Scenic scenarios, using it to generate scenes which were rendered into images by GTAV. Our experiments demonstrate using Scenic to:

  • evaluate the accuracy of the ML model under particular conditions, e.g. in good or bad weather,

  • improve performance in corner cases by emphasizing them during training: we use Scenic to both identify a deficiency in a state-of-the-art car detection data set (Johnson-Roberson et al. 2017) and generate a new training set of equal size but yielding significantly better performance, and

  • debug a known failure case by generalizing it in many directions, exploring sensitivity to different features and developing a more general scenario for retraining: we use Scenic to find an image the network misclassifies, discover the root cause, and fix the bug, in the process improving the network’s performance on its original test set (again, without increasing training set size).

These experiments show that Scenic can be a very useful tool for understanding and improving ML-based perception systems.

Fig. 3
figure 3

Various domains where we have applied Scenic: reinforcement learning agents for soccer (Azad et al. 2021), ML-based aircraft navigation (Fremont et al. 2020), and autonomous vehicle testing in the real world (Fremont et al. 2020b)

While this case study is performed in the domain of visual perception for autonomous driving, and uses one particular simulator (GTAV), we stress that Scenic is not specific to either. Several other applications where we have successfully used Scenic are shown in Fig. 3; see the cited papers for details. In this paper, we include two additional examples: in Sec. 3 we illustrate a different domain, namely robotic motion planning [using the Webots simulator (Michel 2004)], and in Sect. 7.2.2 we use Scenic and VerifAI to falsify an autonomous agent in the CARLA driving simulator (Dosovitskiy et al. 2017). The latter experiment demonstrates Scenic’s usefulness applied not only to ML-based perception components in isolation but to entire closed-loop cyber-physical systems. In fact, since the conference version of this paper we have successfully applied Scenic in two industrial case studies on large ML-based systems (Fremont et al. 2020, 2020b): an aircraft navigation system from Boeing [tested in the X-Plane flight simulator (Laminar Research 2019)] and the Apollo autonomous driving platform (Baidu 2020) [tested in the LGSVL driving simulator (Rong et al. 2020) and on an actual test track]. Generally, Scenic can produce data of any desired type (e.g. RGB images, LIDAR point clouds, or trajectories from dynamical simulations) by interfacing it to an appropriate simulator. This requires only two steps: (1) writing a small Scenic library defining the types of objects supported by the simulator, as well as the geometry of the workspace; (2) writing an interface layer converting the configurations output by Scenic into the simulator’s input format (and, for dynamic scenarios, transferring simulator state back into Scenic). While the current version of Scenic is primarily concerned with geometry, leaving the details of rendering up to the simulator, the language allows putting distributions on any parameters the simulator exposes: for example, in GTAV the meshes of the various car models are fixed but we can control their overall color. We have also used Scenic to specify distributions over parameters on system dynamics, such as mass.

In summary, the main contributions of this work are:

  • Scenic, a domain-specific probabilistic programming language for describing scenarios: distributions over spatio-temporal configurations of physical objects and agents;

  • a methodology for using PPLs to design and analyze cyber-physical systems, especially those based on ML;

  • domain-specific algorithms for sampling from the distribution defined by a Scenic program;

  • a case study using Scenic to analyze and improve the accuracy of a practical deep neural network used for perception in an autonomous driving context beyond what is achieved by state-of-the-art synthetic data generation methods.

The paper is structured as follows: we begin with an overview of our approach in Sect. 2. Section 3 gives examples highlighting the major features of Scenic for specifying spatial relationships and motivating various choices in its design. We continue in Sect. 4 with a discussion of Scenic’s more advanced features for temporal modeling and scenario composition. In Sect. 5 we describe the syntax of the Scenic language in detail, and in Sect. 6 we discuss its formal semantics and our sampling algorithms. Section 7 describes the setup and results of our car detection case study and other experiments. Finally, we discuss related work in Sect. 8 before concluding in Sect. 9 with a summary and directions for future work.

An early version of this paper appeared as Fremont et al. (2018), extended and published as Fremont et al. (2019). This paper further extends Fremont et al. (2019) by generalizing Scenic to dynamic scenarios (including new spatiotemporal pruning techniques), adding constructs for composing scenarios, and integrating Scenic within the broader VerifAI toolkit. For the Appendices and our implementation code, see Fremont et al. (2020a).

2 Using PPLs to design and analyze ML-based cyber-physical systems

We propose a methodology for training, testing, and debugging ML-based cyber-physical systems using probabilistic programming languages. The core idea is to use PPLs to formalize general operation scenarios, then sample from these distributions to generate concrete environment configurations. Putting these configurations into a simulator, we obtain images or other sensor data which can be used to test and train the system. The general procedure is outlined in Fig. 4. For a demonstration of this paradigm on an industrial system, proceeding from falsification through failure analysis, retraining, and validation, see Fremont et al. (2020). Note that the training/testing datasets need not be purely synthetic: we can generate data to supplement existing real-world data (possibly mitigating a deficiency in the latter, while avoiding overfitting). Furthermore, even for models trained purely on real data, synthetic data can still be useful for testing and debugging, as we will see below. Now we discuss the three design problems from the Introduction in more detail.

Fig. 4
figure 4

Tool flow using Scenic to train, test, and debug a cyber-physical system

Testing and falsification. The most straightforward problem is that of assessing system performance under different conditions. We can simply write scenarios capturing each condition, generate a test set from each one, and evaluate the performance of the system on these. Note that conditions which occur rarely in the real world present no additional problems: as long as the PPL we use can encode the condition, we can generate as many instances as desired. If we do not have particular conditions in mind, we can write a very general scenario describing the expected operation regime of the system [e.g., the “Operational Design Domain” (ODD) of an autonomous vehicle (Thorn et al. 2018)] and perform falsification, looking for violations of the system’s specification. We can perform such analyses at the level of individual components or of the system as a whole: in Sect. 7.2.1 we test a car-detecting neural network’s sensitivity to weather, while in Sect. 7.2.2, we use the VerifAI toolkit (Dreossi et al. 2019) to falsify a closed-loop AV system, modeling a traffic scenario in Scenic and specifying a safety specification for the AV in temporal logic.

Training on rare events. Extending the previous application, we can use this procedure to help ensure the system performs adequately even in unusual circumstances or particularly difficult cases. Writing a scenario capturing these rare events, we can generate instances of them to augment or replace part of the original training set. Emphasizing these instances in the training set can improve the system’s performance in the hard case without impacting performance in the typical case. In Sect. 7.3 we will demonstrate this for car detection, where a hard case is when one car partially overlaps another in the image. We wrote a Scenic program to generate a set of these overlapping images. Training the car-detection network on a state-of-the-art synthetic dataset obtained by randomly driving around inside the simulated world of GTAV and capturing images periodically (Johnson-Roberson et al. 2017), we find its performance is significantly worse on the overlapping images. However, if we keep the training set size fixed but increase the proportion of overlapping images, performance on such images dramatically improves without harming performance on the original generic dataset.

Debugging failures. Finally, we can use the same procedure to help understand and fix bugs in the system. If we find an environment configuration where the system fails, we can write a scenario reproducing that particular configuration. Having the configuration encoded as a program then makes it possible to explore the neighborhood around it in a variety of different directions, leaving some aspects of the scene fixed while varying others. This can give insight into which features of the scene are relevant to the failure, and eventually identify the root cause. The root cause can then itself be encoded into a scenario which generalizes the original failure, allowing retraining without overfitting to the particular counterexample. We will demonstrate this approach in Sect. 7.4, starting from a single misclassification, identifying a general deficiency in the training set, replacing part of the training data to fix the gap, and ultimately achieving higher performance on the original test set.

For all of these applications we need a PPL which can encode a wide range of general and specific environment scenarios. In the next section, we describe the design of a language suited to this purpose.

3 The basic Scenic language

We use Scenic scenarios from our autonomous car case study to motivate and illustrate the main features of the language, focusing on features that make Scenic particularly well-suited for the domain of specifying scenarios for cyber-physical systems. We begin by describing how Scenic can define spatial relationships between objects to model scenarios like “a badly-parked car”; in Sect. 4, we will cover Scenic’s more advanced constructs for temporal dynamics and scenario composition.

Classes, Objects, Geometry, and Distributions. To start, suppose we want scenes of one car viewed from another on the road. We can simply write:

figure b

First, we import Scenic’s world model for the GTAV simulator: a Scenic library containing everything specific to our case study, including the class and information about the locations of roads (from now on we suppress this line). Only general geometric concepts are built into Scenic.

The second line creates a and assigns it to the special variable specifying the ego object which is the reference point for the scenario. In particular, rendered images from the scenario are from the perspective of the ego object (it is a syntax error to leave undefined). Finally, the third line creates an additional . Note that we have not specified the position or any other properties of the two cars: this means they are inherited from the default values defined in the class . Object-orientation is valuable in Scenic since it provides a natural organizational principle for scenarios involving different types of physical objects. It also improves compositionality, since we can define a generic model in a library like the GTAV world model and use it in different scenarios. Our definition of begins as follows (slightly simplified):

figure k

Here, and are properties of a object. These properties may have distributions and constraints, both of which model realistic initial state of the object. is a region (one of Scenic’s primitive types) defined in the GTAV world model to specify which points in the workspace are on a road. Similarly, is a vector field specifying the prevailing traffic direction at such points. The operator \(F \texttt { at } X\) simply gets the direction of the field F at point X, so the default value for a car’s is the road direction at its . The default , in turn, is a (we will explain this syntax shortly), which means a uniformly random point on the road.

The ability to make random choices like this is a key aspect of Scenic. Scenic’s probabilistic nature allows it to model real-world stochasticity, for example encoding a distribution for the distance between two cars learned from data. This in turn is essential for our application of PPLs to training perception systems: using randomness, a PPL can generate training data matching the distribution the system will be used under. Scenic provides several basic distributions (and allows more to be defined). For example, we can write

figure u

to create a car that is 20–40 m ahead of the camera. The notation (\(\textit{X}\), \(\textit{Y}\)) creates a uniform distribution over the given continuous range, and (\(\textit{X}\),\(\textit{Y}\))creates a pair, interpreted here as a vector given by its xy coordinates.

Local Coordinate systems. Using as above overrides the default position of the , leaving the default orientation (along the road) unchanged. Suppose for greater realism we don’t want to require the car to be exactly aligned with the road, but to be within say \(5^\circ \). We could try:

figure y

where overrides the default heading of the , but this is not quite what we want, since it sets the orientation of the in global coordinates (i.e. within \(5^\circ \) of North). Instead we can use Scenic’s general operator \(\textit{X}\) \(\textit{Y}\), which can interpret vectors and headings in a variety of local coordinate systems:

figure ad

If we want the heading to be relative to the ego car’s orientation, we simply write .

Notice that since is a vector field, it defines a coordinate system at each point, and an expression like does not define a unique heading. The example above works because Scenic knows that depends on a reference position, and automatically uses the of the being defined. This is a feature of Scenic’s system of specifiers, which we explain next.

Readable, Flexible Specifiers. The syntax \(\textit{X}\) and \(\textit{Y}\) for specifying positions and orientations may seem unusual compared to typical constructors in object-oriented languages. There are two reasons why Scenic uses this kind of syntax: first, readability. The second is more subtle and based on the fact that in natural language there are many ways to specify positions and other properties, some of which interact with each other. Consider the following ways one might describe the location of an object:

  1. 1.

    “is at position X” (absolute position);

  2. 2.

    “is just left of position X” (position based on orientation);

  3. 3.

    “is 3 m left of the taxi” (a local coordinate system);

  4. 4.

    “is one lane left of the taxi” (another local coordinate system);

  5. 5.

    “appears to be 10 m behind the taxi” (relative to the line of sight);

  6. 6.

    “is 10 m along the road from the taxi” (following a curved vector field).

These are all fundamentally different from each other: e.g., (3) and (4) differ if the taxi is not parallel to the lane.

Furthermore, these specifications combine other properties of the object in different ways: to place the object “just left of” a position, we must first know the object’s ; whereas if we wanted to face the object “towards” a location, we must instead know its . There can be chains of such dependencies: “the car is 0.5 m left of the curb” means that the right edge of the car is 0.5 m away from the curb, not the car’s , which is its center. So the car’s depends on its , which in turn depends on its . In a typical object-oriented language, this might be handled by computing values for and other properties and passing them to a constructor:

figure at

Notice how must be used twice, because determines both the model of the car and (indirectly) its position. This is inelegant and breaks encapsulation because the default model distribution must be used outside of the constructor. The latter problem could be fixed by having a specialized constructor, i.e.,

figure ax

but these would proliferate since we would need to handle all possible combinations of ways to specify different properties (e.g. do we want to require a specific model? Are we overriding the width provided by the model for this specific car?). Instead of having a multitude of such monolithic constructors, Scenic factors the definition of objects into potentially-interacting but syntactically-independent parts:

figure ay

Here \(\textit{X}\) \(\textit{D}\) and \(\textit{M}\) are specifiers, which are unordered and together specify the properties of the car. Scenic works out the dependencies between properties ( is provided by , which depends on , whose default value depends on ) and evaluates them in the correct order. To use the default model distribution we would simply omit ; keeping it affects the appropriately without having to specify more than once.

Specifying Multiple Properties Together. Recall that we defined the default for a to be a : this is an example of another specifier, \(\textit{region}\), which specifies to be a uniformly random point in the given region. This specifier illustrates another feature of Scenic, namely that specifiers can specify multiple properties simultaneously. Consider the following scenario, which creates a parked car given a region defined in the GTAV world model:

figure bp

The function \(\textit{region}\) returns the part of the region that is visible from the ego object. The specifier will then set to be a uniformly random visible point on the curb. We create as an , which is a built-in class that defines a local coordinate system by having both a and a . The \(\textit{region}\) specifier can also specify if the region has a preferred orientation (a vector field) associated with it: in our example, is oriented by . So is, in fact, a uniformly random visible point on the curb, oriented along the road. That orientation then causes the car to be placed 0.25 m left of in ’s local coordinate system, i.e. away from the curb, as desired.

In fact, Scenic makes it easy to elaborate the scenario without needing to alter the code above. Most simply, we could specify a particular model or non-default distribution over models by just adding \(\textit{M}\) to the definition of the . More interestingly, we could produce a scenario for badly-parked cars by adding two lines:

figure cg

This will yield cars parked 10\(^\circ \)–20\(^\circ \) off from the direction of the curb, as seen in Fig. 5. This illustrates how specifiers greatly enhance Scenic’s flexibility and modularity.

Fig. 5
figure 5

A scene of a badly-parked car

Declarative Specifications of Hard and Soft Constraints. Notice that in the scenarios above we never explicitly ensured that the two cars will not intersect each other. Despite this, Scenic will never generate such scenes. This is because Scenic enforces several default requirements: all objects must be contained in the workspace, must not intersect each other, and must be visible from the ego object.Footnote 1Scenic also allows the user to define custom requirements checking arbitrary conditions built from various geometric predicates. For example, the following scenario produces a car headed roughly towards us, while still facing the nominal road direction:

figure ch

Here we have used the \(\textit{X}\) \(\textit{Y}\) predicate, which in this case is checking that the ego car is inside the \(30^\circ \) view cone of the second car. If we only need this constraint to hold part of the time, we can use a soft requirement specifying the minimum probability with which it must hold:

figure cj

Hard requirements, called “observations” in other PPLs (see, e.g., Gordon et al. (2014)), are very convenient in our setting because they make it easy to restrict attention to particular cases of interest. They also improve encapsulation, since we can restrict an existing scenario without altering it (we can simply import it in a new Scenic program that includes additional statements). Finally, soft requirements are useful in ensuring adequate representation of a particular condition when generating a training set: for example, we could require that at least 90% of the images have a car driving on the right side of the road.

Mutations. Scenic provides a simple mutation system that improves compositionality by providing a mechanism to add variety to a scenario without changing its code. This is useful, for example, if we have a scenario encoding a single concrete scene obtained from real-world data and want to quickly generate variations. For instance:

figure cl

This will add Gaussian noise to the and of , while still enforcing all built-in and custom requirements. The standard deviation of the noise can be scaled by writing, for example, (which adds twice as much noise), and we will see later that it can be controlled separately for and .

Multiple Domains and Simulators. We conclude this section by illustrating a second application domain, namely testing motion planning algorithms, and also Scenic’s ability to work with different simulators. A robot like a Mars rover able to climb over rocks can have very complex dynamics, with the feasibility of a motion plan depending on exact details of the robot’s hardware and the geometry of the terrain. We can use Scenic to write a scenario generating challenging cases for a planner to solve. Figure 6 shows a scene, visualized using an interface we wrote between Scenic and the Webots robotics simulator (Michel 2004), with a bottleneck between the robot and its goal that forces the planner to consider climbing over a rock.

Fig. 6
figure 6

Webots scene of a Mars rover in a debris field with a bottleneck

Even within a single application domain, such as autonomous driving, Scenic enables writing cross-platform scenarios that will work without change in multiple simulators. This is made possible by what we call abstract application domains: Scenic world models which define object classes and other world information like our GTAV world model, but which are abstract, simulator-agnostic protocols that can be implemented by models for particular simulators. For example, Scenic includes an abstract domain for autonomous driving, scenic.domains.driving, which loads road networks from standard formats, providing a uniform API for referring to lanes, maneuvers, and other aspects of road geometry. The driving domain also provides generic and classes, complete with implementations of common dynamic behaviors (covered in the next section) like lane following. These make it straightforward to implement complex driving scenarios, which are then guaranteed to work in any simulator supporting the driving domain. Figure 7 illustrates this, showing the exact same Scenic code being used to generate scenarios in both the CARLA (Dosovitskiy et al. 2017) and LGSVL (Rong et al. 2020) simulators.

Fig. 7
figure 7

Scenes sampled from the same Scenic program in CARLA and LGSVL

All of the examples we have seen above illustrate the versatility of Scenic in modeling a wide range of interesting scenarios. Complete Scenic code for the bumper-to-bumper scenario of Fig. 1, the Mars rover scenario of Fig. 6, as well as other scenarios used as examples in this section or in our experiments, along with images of generated scenes, can be found in the Appendix (Fremont et al. 2020a).

4 Dynamic and compositional scenarios

In Sect. 3 we saw the basic constructs Scenic provides for defining objects and their spatial relationships. These constructs suffice for expressing static scenarios like “a badly-parked car”, where Scenic need only define a configuration of objects at one point in time. However, for dynamic scenarios like “a badly-parked car, which pulls into the road as you approach”, we need ways to express temporal properties of objects. In this section, we outline Scenic’s support for dynamic scenarios, as well as for composing multiple scenarios together to produce more complex ones.

4.1 Dynamic scenarios

Agents, Actions, and Behaviors. We call Scenic objects which take actions over time dynamic agents, or simply agents. We can still use all of the syntax described above to define the initial positions, orientations, etc. of such objects. In addition, we specify their dynamic behavior using a built-in property called . Using a behavior defined in Scenic’s driving library, we can write for example:

figure cv

A behavior defines a sequence of actions for the agent to take, which need not be fixed but can be probabilistic and depend on the state of the agent or other objects. In Scenic, an action is an instantaneous operation executed by an agent, like setting the steering angle of a car or turning on its headlights. Most actions are specific to particular application domains, and so different sets of actions are provided by different simulator interfaces. For example, the Scenic driving domain defines a for cars.

To define a behavior, we write a function which runs over the course of the scenario, periodically issuing actions. Scenic uses a discrete notion of time, so at each time step the function specifies zero or more actions for the agent to take. For example, here is a very simplified version of the above:

figure cy

We intend this behavior to run for the entire scenario, so we use an infinite loop. In each step of the loop, we compute appropriate throttle and steering controls, then use the statement to take the corresponding actions. When that statement is executed, Scenic pauses the behavior until the next time step of the simulation, whereupon the function resumes and the loop repeats.

Execution of Behaviors. When there are multiple agents, their behaviors run in parallel, as seen in Fig. 8; each time step, Scenic sends their selected actions to the simulator to be executed and runs the simulation for one step. It then reads back the state of the simulation, updating the , , etc. of each object.

Fig. 8
figure 8

Diagram showing interaction between Scenic and a simulator during the execution of a dynamic scenario

As behaviors run dynamically during simulations, they can access the current state of the world to decide what actions to take. Consider the following behavior:

figure dc

Here, we repeatedly query the distance from the agent running the behavior () to the ego car; as long as it is above a threshold, we use the statement to take no action. Once the threshold is met, we start driving by using the statement to invoke the we saw above.

Behavior Arguments and Random Parameters. The example above also shows how behaviors may take arguments, like any Scenic function. Here, has default value 15 but can be customized, so we could write for example:

figure di

Both and will use the behavior, but independent copies of it with thresholds of 15 and 20 respectively.

Unlike ordinary Scenic code, control flow constructs such as and are allowed to depend on random variables inside a behavior. Any distributions defined inside a behavior are sampled at simulation time, not during scene sampling. Consider the following behavior:

figure do

Here, the value of is sampled only once, at the beginning of the scenario when the behavior starts running. The value , on the other hand, is sampled every time control reaches line 5, so that every time step when the car is braking we use a slightly different braking strength.

Interrupts. It is frequently useful to take an existing behavior and add a complication to it; for example, suppose we want a car that follows a lane, stopping whenever it encounters an obstacle. Scenic provides a concept of interrupts which allows us to reuse the basic without having to modify it.

figure ds

This try-interrupt statement has the following semantics: at first, the code block after the (the body) is executed. At the start of every time step during its execution, the condition from each clause is checked; if any are true, execution of the body is suspended and we instead begin to execute the corresponding interrupt handler. In the example above, there is only one interrupt, which fires when we come within 5 meters of any object. When that happens, is paused and we instead apply full braking for one time step. In the next step, we will resume wherever it left off, unless we are still within 5 meters of an object, in which case the interrupt will fire again.

Successive clauses take precedence over those which precede them, and such higher-priority interrupts can fire even during the execution of an earlier interrupt handler. This makes it easy to model a hierarchy of behaviors with different priorities; for example, we could implement a car which drives along a lane, passing slow cars and avoiding collisions, along the following lines:

figure dy

Here, the car begins by lane following, switching to passing if there is a car or other obstacle too close ahead. During either of those two sub-behaviors, if the time to collision gets too low, we switch to collision avoidance. Once the behavior completes, we will resume whichever behavior was interrupted earlier. If we were executing , it will run to completion (possibly being interrupted again) before we finally resume .

When resuming the interrupted code after an interrupt completes is undesired, using the statement exits the entire try-interrupt statement. For example, to run a behavior until a condition is met without resuming it later, we can write:

figure ed

This is a common enough use case of interrupts that Scenic provides a shorthand notation:

figure ee

Finally, note that when try-interrupt statements are nested, interrupts of the outer statement take precedence. This makes it easy to build up complex behaviors in a modular way. For example, the behavior we wrote above is relatively complicated, using interrupts to switch between several different sub-behaviors. We would like to be able to put it in a library and reuse it in many different scenarios without modification. Interrupts make this straightforward; for example, if for a particular scenario we want a car that drives normally but suddenly brakes for 5 seconds when it reaches a certain area, we can write:

figure eg

With this behavior, operates as it did before, interrupts firing as appropriate to switch between lane following, passing, and collision avoidance. But during any of these sub-behaviors, if the car enters the it will immediately brake for 5 seconds, then pick up where it left off. This example also shows how behaviors can use local variables to maintain state, enabling the encoding of behaviors which make decisions based on actions taken in the past.

Requirements and Monitors. Just as you can declare spatial constraints on scenes using the statement, you can also impose constraints on dynamic scenarios. For example, if we don’t want to generate any simulations where and are simultaneously visible from the ego car, we could write:

figure em

The statement enforces that the given condition must hold at every time step of the scenario; if it is ever violated during a simulation, we reject that simulation and sample a new one. Similarly, we can require that a condition hold at some time during the scenario using the statement:

figure ep

To enforce more complex temporal properties, you can define a monitor. Like behaviors, monitors are functions which run in parallel with the scenario and can inspect world state. Here is a monitor for the property “ and must both enter the intersection before ”:

figure et

We use the variables and to remember whether we have seen and respectively enter the intersection. The loop will iterate as long as at least one of the cars has not yet entered the intersection, so if enters before either or , the requirement on line 4 will fail and we will reject the simulation. Note the necessity of the statement on line 9: if we omitted it, the loop could run forever without any time actually passing in the simulation.

Preconditions and Invariants. Even general behaviors designed to be used in multiple scenarios may not operate correctly from all possible starting states: for example, assumes that the agent is actually in a lane rather than, say, on a sidewalk. To model such assumptions, Scenic provides a notion of guards for behaviors. Most simply, we can specify one or more preconditions:

figure fd

Here, the precondition requires that whenever the behavior is executed by an agent, the agent must not already be in the destination lane but should be on the same road. We can add any number of such preconditions; like ordinary requirements, violating any precondition causes the simulation to be rejected.

Since behaviors can be interrupted, it is possible for a behavior to resume execution in a state it doesn’t expect: imagine a car which is lane following, but then swerves onto the shoulder to avoid an accident; naïvely resuming lane following, we find we are no longer in a lane. To catch such situations, Scenic allows us to define invariants which are checked at every time step during the execution of a behavior, not just when it begins running. These are written similarly to preconditions:

figure ff

While by default guard violations cause the simulation to be rejected, in some cases it may be possible to recover by taking additional actions. To enable this kind of design, Scenic signals guard violations by raising a exception which can be caught like any other exception; the simulation is only rejected if the exception propagates out to the top level. So to model the lane-following-with-collision-avoidance behavior suggested above, we could write code like this:

figure fh

When any object comes within 5 meters, we suspend lane following and switch to collision avoidance. When the latter completes, will be resumed; if its invariant fails because we are no longer on the road, we catch the resulting exception and run a behavior to restore the invariant. The whole statement then completes, so the outermost loop iterates and we begin lane following once again.

Terminating the Scenario. By default, scenarios run forever, unless a time limit is specified when running the Scenic tool. However, scenarios can also define termination criteria using the statement; for example, we could decide to end a scenario as soon as the ego car travels at least a certain distance:

figure fn

Additionally, the statement can be used inside behaviors and monitors: if it is ever executed, the scenario ends. For example, we can use a monitor to terminate the scenario once the ego spends 30 time steps in an intersection:

figure fp

4.2 Compositional scenarios

Scenic provides facilities for defining multiple scenarios in a single program and composing them in various ways. This enables writing a library of scenarios which can be repeatedly used as building blocks to construct more complex scenarios.

Modular Scenarios. To define a named, reusable scenario, optionally with tunable parameters, Scenic provides the statement. For example, here is a scenario which creates a parked car on the shoulder of the ’s current lane (assuming there is one), using some APIs from the driving library:

figure fs

The block contains Scenic code which executes when the scenario is instantiated, and which can define classes, create objects, declare requirements, etc. as in any of the example scenarios we saw above. Additionally, we can define preconditions and invariants, which operate in the same way as for dynamic behaviors. Having now defined the scenario, we can use it in a more complex scenario, potentially multiple times:

figure fv

Here our scenario itself only creates the ego car; then its block orchestrates how to run other modular scenarios. In this case, we invoke two copies of the scenario in parallel, specifying in one case that the gap between the parked car and the curb should be 0.5 m instead of the default 0.25. So the scenario will involve three cars in total, and as usual Scenic will automatically ensure that they are all on the road and do not intersect.

Parallel and Sequential Composition. The scenario above is an example of parallel composition, where we use the statement to run two scenarios at the same time. We can also use sequential composition, where one scenario begins after another ends. This is done the same way as in behaviors: in fact, the block of a scenario is executed in the same way as a monitor, and allows all the same control-flow constructs. For example, we could write a block as follows:

figure gc

Here, a new parked car is created every 30 s,Footnote 2 with the distance to the curb alternating between 0.25 and 0.5 m. Note that without the qualifier, we would never get past line 2, since the scenario does not define any termination conditions using (or ) and so runs forever by default. If instead we want to create a new car only when the has passed the current one, we can use a - statement:

figure gl

Note how we can refer to the variable created in the scenario as a property of the scenario. Combined with the ability to pass objects as parameters of scenarios, this is convenient for reusing objects across scenarios.

Interrupts, Overriding, and Initial Scenarios. The - statement used in behaviors can also be used in blocks to switch between scenarios. For example, suppose we already have a scenario where the is following a , and want to elaborate it by adding a parked car which suddenly pulls in front of the lead car. We could write a block as follows:

figure gu

If the scenario is defined to end shortly after the parked car finishes entering the lane, the interrupt handler will complete and Scenic will resume executing on line 3 (unless the is still within 10 m of the lead car).

Suppose that we want the lead car to behave differently while the parked car scenario is running; for example, perhaps the behavior for the lead car defined in does not handle a parked car suddenly pulling in. To enable changing the or other properties of an object in a sub-scenario, Scenic provides the statement, which we can use as follows:

figure hb

Here we override the property of for the duration of the scenario, reverting it back to its original value (and thereby continuing to execute the old behavior) when the scenario terminates. The \(\textit{object}\) \(\textit{specifier},\ {\dots }\) statement has the same syntax as an object definition, and can specify any properties of the object except for dynamic properties like or which can only be indirectly controlled by taking actions.

In order to allow writing scenarios which can both stand on their own and be invoked during another scenario, Scenic provides a special conditional statement testing whether we are inside the initial scenario, i.e., the very first scenario to run.

figure hh

Random Selection of Scenarios. For very general scenarios, like “driving through a city, encountering typical human traffic”, we may want a variety of different events and interactions to be possible. We saw above how we can write behaviors for individual agents which choose randomly between possible actions; Scenic allows us to do the same with entire scenarios. Most simply, since scenarios are first-class objects, we can write functions which operate on them, perhaps choosing a scenario from a list of options based on some complex criterion:

figure hi

However, some scenarios may only make sense in certain contexts; for example, a red light runner scenario can take place only at an intersection. To facilitate modeling such situations, Scenic provides variants of the statement which randomly choose scenarios to run amongst only those whose preconditions are satisfied:

figure hk

Here, line 1 checks the preconditions of the three given scenarios, then executes one (and only one) of the enabled scenarios. If for example the current road has no shoulder, then will be disabled and we will have a 50/50 chance of executing either or (assuming their preconditions are satisfied). If none of the three scenarios are enabled, Scenic will reject the simulation. Line 2 is a shuffled variant, where all three scenarios will be executed, but in random order.Footnote 3

5 Syntax of Scenic

Scenic is an object-oriented PPL, with programs consisting of sequences of statements built with standard imperative constructs including conditionals, loops, functions, and methods (which we do not describe further, focusing on the new elements). Compared to other imperative PPLs, the major restriction of Scenic, made in order to allow more efficient sampling, is that conditional branching may not depend on random variables (except in behaviors). The novel syntax, outlined above, is largely devoted to expressing spatiotemporal relationships in a concise and flexible manner. Figure 9 gives a formal grammar for Scenic, which we now describe in detail.

5.1 Data types

Scenic provides several primitive data types:

  1. Booleans

    expressing truth values.

  2. Scalars

    as floating-point numbers, which can be sampled from various distributions (see Table 1).

  3. Vectors

    representing positions and offsets in space, constructed from coordinates in meters with the syntax .Footnote 4

  4. Headings

    representing orientations in space. Conveniently, in 2D these are a single angle (in radians, anticlockwise from North). By convention the heading of a local coordinate system is the heading of its y-axis, so, for example, means 2 meters left and 3 ahead.

  5. Vector Fields

    associating an orientation to each point in space. For example, the shortest paths to a destination or (in our case study) the nominal traffic direction.

  6. Regions

    representing sets of points in space. These can have an associated vector field giving points in the region preferred orientations (e.g. the surface of an object could have normal vectors, so that objects placed randomly on the surface face outward by default).

Fig. 9
figure 9

Simplified Scenic grammar. \(\textit{Point}\) and \(\textit{OrientedPoint}\) are instances of the corresponding classes. See Table 5 for statements, Fig. 11 for operators, Table 1 for \(\textit{baseDist}\), and Tables 3 and 4 for \(\textit{posSpec}\) and \(\textit{headSpec}\)

Table 1 Built-in distributions

In addition, Scenic provides objects, organized into single-inheritance classes specifying a set of properties their instances must have, together with corresponding default values (see Fig. 9). Default value expressions are evaluated each time an object is created. Thus if we write when defining a class then each instance will have a drawn independently from . Default values may use the special syntax \(\textit{property}\) to refer to one of the other properties of the object, which is then a dependency of this default value. In our case study, for example, the and of a are by default derived from its .

Physical objects in a scene are instances of , which is the default superclass when none is specified. descends from the two other built-in classes: its superclass is , which in turn subclasses . These represent locations in space, with and without an orientation respectively, and so provide the fundamental properties and . extends them by defining a bounding box with the properties and , as well as temporal information like and . Table 2 lists the properties of these classes and their default values.

Table 2 Properties of the built-in classes Point, OrientedPoint, and Object

To allow cleaner notation, and are automatically interpreted as vectors or headings in contexts expecting these (as shown in Fig. 9). For example, we can write and instead of and . Ambiguous cases, e.g. , are illegal (caught by a simple type system); the more verbose syntax must be used instead.

5.2 Expressions

Scenic’s expressions are mostly straightforward, largely consisting of the arithmetic, boolean, and geometric operators shown in Fig. 11. The meanings of these operators are largely clear from their syntax, so we defer complete definitions of their semantics to the Appendix (Fremont et al. 2020a). Figure 10 illustrates several of the geometric operators (as well as some specifiers, which we will discuss in the next section). Various points to note:

  • \(\textit{X}\) \(\textit{Y}\) uses a simple model where a can see a certain distance, and an restricts this to the sector along its with a certain angle (see Table 2). An is visible iff its bounding box is.

  • \(\textit{X}\) \(\textit{Y}\) interprets \(\textit{X}\) as an offset in a local coordinate system defined by \(\textit{Y}\). Thus \(\textit{Y}\) yields 3 m West of \(\textit{Y}\) if \(\textit{Y}\) is a vector, and 3 m left of \(\textit{Y}\) if \(\textit{Y}\) is an . If defining a heading inside a specifier, either \(\textit{X}\) or \(\textit{Y}\) can be a vector field, interpreted as a heading by evaluating it at the of the object being specified. So we can write for example .

  • \(\textit{region}\) yields the part of the region visible from the , so we can write for example . The form \(\textit{region}\) \(\textit{X}\) uses \(\textit{X}\) instead of .

  • \(\textit{Object}\), \(\textit{Object}\), etc. yield the corresponding points on the bounding box of the object, oriented along the object’s .

Fig. 10
figure 10

Various Scenic operators and specifiers applied to the object and an P. Instances of are shown as bold arrows

Fig. 11
figure 11

Operators by result type

Two types of Scenic expressions are more complex: distributions and object definitions. As in a typical imperative probabilistic programming language, a distribution evaluates to a sample from the distribution. Thus the program

figure jm

does not make y uniform over the unit box, but rather over its diagonal. For convenience in sampling multiple times from a primitive distribution, Scenic provides a (D) function returning an independentFootnote 5 sample from D, one of the distributions in Table 1. Scenic also allows defining custom distributions beyond those in the Table.

The second type of complex Scenic expressions are object definitions. These are the only expressions with a side effect, namely creating an object in the generated scene. More interestingly, properties of objects are specified using the system of specifiers discussed above, which we now detail.

5.3 Specifiers

As shown in the grammar in Fig. 9, an object is created by writing the class name followed by a (possibly empty) comma-separated list of specifiers. The specifiers are combined, possibly adding default specifiers from the class definition, to form a complete specification of all properties of the object. Arbitrary properties (including user-defined properties with no meaning in Scenic) can be specified with the generic specifier \(\textit{property}\) \(\textit{value}\), while Scenic provides many more specifiers for the built-in properties and , shown in Tables 3 and 4 respectively.

In general, a specifier is a function taking in values for zero or more properties, its dependencies, and returning values for one or more other properties, some of which can be specified optionally, meaning that other specifiers will override them. For example, on \(\textit{region}\) specifies and optionally specifies if the given region has a preferred orientation. If is such a region, as in our case study, then will create an object at a position uniformly random in and with the preferred orientation there. But since is only specified optionally, we can override it by writing .

Table 3 Specifiers for position
Table 4 Specifiers for heading

Specifiers are combined to determine the properties of an object by evaluating them in an order ensuring that their dependencies are always already assigned. If there is no such order or a single property is specified twice, the scenario is ill-formed. The procedure by which the order is found, taking into account properties that are optionally specified and default values, will be described in the next section.

As the semantics of the specifiers in Tables 3 and 4 are largely evident from their syntax, we defer exact definitions to the Appendix (Fremont et al. 2020a). We briefly discuss some of the more complex specifiers, referring to the examples in Fig. 10:

  • \(\textit{vector}\) means the object is placed with the midpoint of its front edge at the given vector, and similarly for \(\textit{vector}\).

  • \(\textit{A}\) \(\textit{O}\) \(\textit{B}\) means the position obtained by treating \(\textit{O}\) as an offset in the local coordinate system at \(\textit{A}\) oriented along the line of sight from \(\textit{B}\). In this and other specifiers, if the \(\textit{B}\) is omitted, the ego object is used by default. So for example means 3 m directly behind the taxi as viewed by the camera (see Fig. 10 for another example).

  • The heading optionally specified by \(\textit{OrientedPoint}\), etc. is that of the (thus in Fig. 10, yields an facing the same way as ). Similarly, the optionally specified by the \(\textit{vectorField}\) specifier is that of the vector field at the specified .

  • \(\textit{H}\) means the object has heading \(\textit{H}\) with respect to the line of sight from . For example, would orient the object so that the camera views its left side head-on.

5.4 Statements

Finally, we discuss Scenic’s statements, listed in Table 5. Class and object definitions have been discussed above, and variable assignment behaves in the standard way.

Table 5 Statements (excluding if, while, def, import, etc. from Python)

Selecting a world model. The \(\textit{name}\) statement specifies that the Scenic program is written for the given Scenic world model. It is equivalent to the statement \(\textit{name}\) (as in Python), importing everything from the given Scenic module, but can be overridden from the command-line when running the Scenic tool. This enables writing cross-platform scenarios using abstract domains like scenic.domains.driving, then executing them in particular simulators by overriding the model with a more specific module (e.g. scenic.simulators.carla.model).

Global parameters. The statement \(\textit{name} \texttt { = } \textit{value},\ {\dots }\) assigns values to global parameters of the scenario. These have no semantics in Scenic but provide a general-purpose way to encode arbitrary global information. For example, in our case study we used parameters time and weather to put distributions on the time of day and the weather conditions during the scene.

Behaviors and monitors. The statement (see Fig. 9) defines a dynamic behavior. A behavior definition has the same structure as a function definition, with two differences: (1) it may begin with any number of \(\textit{boolean}\) and \(\textit{boolean}\) lines defining preconditions and invariants; (2) it may use the statements in the second section of Table 5, which are not allowed in ordinary functions. The statement has the same structure as a statement but defines a monitor.

Modular scenarios. The statement (see Fig. 9) defines a modular scenario which can be invoked from another scenario. Scenario definitions begin like behavior definitions, with a name, parameters, preconditions, and invariants. However, the body of a scenario consists of two parts, either of which can be omitted: a block and a block. The block contains code that runs once when the scenario begins to execute, and is a list of statements like a top-level Scenic program.Footnote 6 The block orchestrates the execution of sub-scenarios during a dynamic scenario, and may use and any of the other statements allowed inside behaviors (except , which only makes sense for an individual agent).

Requirements. The \(\textit{boolean}\) statement requires that the given condition hold in all generated scenes (equivalently to observe statements in other probabilistic programming languages; see e.g. Milch et al. (2004); Claret et al. (2013)). The variant [p] \(\textit{boolean}\) adds a soft requirement that need only hold with some probability p (which must be a constant). We will discuss the semantics of these in the next section. The and variants define requirements that must hold in every and some time step of dynamic simulations respectively.

Mutation. The \(\textit{instance},\ {\dots }\) \(\textit{number}\) statement adds Gaussian noise with the given standard deviation (default 1) to the and properties of the listed objects (or every , if no list is given). For example, would add twice as much noise as . The noise can be controlled separately for and , as we discuss in the next section.

Termination conditions. The \(\textit{boolean}\) statement defines a condition which is monitored as in , but which when true causes the scenario to end. The statement can be called inside a behavior, monitor, or block to end the scenario immediately.

Actions. The \(\textit{action},\ {\dots }\) statement can be used inside behaviors to select one or more actionsFootnote 7 for the agent to take in the current time step. The statement means no actions are taken in this time step (which makes sense inside monitors and blocks). When either of these statements is executed, the behavior is suspended until one time step has elapsed; then its invariants are checked (raising an exception if any are violated) and it is resumed.

Invoking other behaviors and scenarios. The \(\textit{name},\ {\dots }\) statement has the same structure as the statement, but invokes one or more behaviors (if in a behavior) or scenarios (if in a block). It does not return until the sub-behavior/sub-scenario terminates, so multiple time steps may pass (unlike ). Early termination can be enabled by adding a \(\textit{scalar}\) clause, which enforces a maximum time limit, or an \(\textit{boolean}\) clause, which adds an arbitrary termination criterion. When the statement returns, the invariants of the calling behavior/scenario are checked as above.

Interrupts. The statement (see Fig. 9) consists of a block and one or more \(\textit{boolean}\): and \(\textit{exception}\): blocks, each containing arbitrary lists of statements. As described in Sect. 4.1, when a statement executes, the conditions for each block are checked at each time step. While none of them are true, the block executes. When an interrupt condition becomes true, the body of the corresponding block is executed (with lower blocks preempting those above), suspending any behaviors/scenarios that were executing in the block until the interrupt handler completes (at which point the invariants of the suspended behavior/scenario are checked as usual). Any exceptions raised in the block or any interrupt handler can be caught by blocks as in the Python statement. Additionally, any block may execute the statement to immediately terminate the entire statement.

Overrides. The \(\textit{name}\) \(\textit{specifier},\ {\dots }\) statement may be used inside a scenario definition to override properties of an object during a dynamic scenario. It has the same structure as an object definition, with and the name of the object replacing the class, so for example given an object we could write to set the property of to 3. Dynamic properties read back from the simulator at every time step, like , cannot be overridden since they are controlled using actions and not direct assignments. Properties overridden by a scenario revert to their original values when the scenario terminates. When the property is overridden, the original behavior is suspended, then resumed at the end of the scenario.

6 Semantics and concrete scenario generation

6.1 Semantics of Scenic

The output of a Scenic program has two parts: first, a scene consisting of an assignment to all the properties of each defined in the scenario, plus any global parameters defined with . For dynamic scenarios, this scene forms the initial state of the scenario, which then changes after each time step according to the actions taken by the agents. Since actions and their effects are domain-specific (consider for example the different physics involved for aerial, ground, and underwater vehicles), dynamic Scenic scenarios do not directly define trajectories for objects. Instead, the second part of the output of a Scenic program is a policy, a function mapping the history of past scenes to the choice of actions for the agents in the current time step.Footnote 8 This pair of a scene and a policy is what we mean formally by the concrete scenario generated by a Scenic program.

Since Scenic is a probabilistic programming language, the semantics of a program is actually a distribution over possible outputs, here concrete scenarios. As for other imperative PPLs (with declarative constraints), the semantics can be defined operationally as a typical interpreter for an imperative language but with two differences to handle random sampling and constraints. First, the interpreter makes random choices when evaluating distributions (Saheb-Djahromi 1978). For example, the Scenic statement updates the state of the interpreter by assigning a value to drawn from the uniform distribution on the interval (0, 1). In this way every possible run of the interpreter has a probability associated with it. Second, every run where a statement (the equivalent of an “observation” in other PPLs) is violated gets discarded, and the run probabilities appropriately normalized (see, e.g., Gordon et al. (2014)). For example, adding the statement above would yield a uniform distribution for over the interval (0.5, 1).

In order to support efficient sampling, the Scenic tool does not directly implement an interpreter along the lines above; instead, it compiles a Scenic program into an intermediate representation, an expression forest, which preserves the structure of the distributions defined in the program. The expression forest is a directed acyclic graph where each vertex is a random-valued expression occurring in the program, and edges indicate dependencies between expressions.Footnote 9 For example, in the program , the forest would have a root node for the position property of the car; that node would have a single child representing , which in turn would have children representing x and y. The Scenic sampler works by traversing the expression forest in topological order from leaves to roots, sampling a value for each node after values for all of its dependencies have already been determined. This yields the same distribution that would be obtained by simply “running” the Scenic code as usual in imperative PPLs; by rejecting any samples which violate statements, we obtain a scene distribution conditioned on the requirements being satisfied, as desired. However, the structural information in the expression forest allows us to improve on this simplistic rejection sampling approach by performing transformations on the forest that reduce the probability of rejection while leaving the conditioned distribution the same. These transformations take advantage of the domain-specific syntax of Scenic, using pattern matching to identify subtrees representing certain geometric relationships in the forest and replace them with “pruned” versions that exclude parts of the parameter space which would be guaranteed to violate built-in or user-defined requirements. We describe several such pruning techniques in Sect. 6.2. For clarity, since these techniques do not change the semantics of the program, in the rest of this section and in the Appendix we describe the semantics of Scenic constructs in terms of a simple imperative interpreter.

Scenic uses the standard semantics for assignments, arithmetic, loops, functions, and so forth. Below, we define the semantics of the main constructs unique to Scenic. See the Appendix (Fremont et al. 2020a) for a more formal treatment.

Soft requirements. The statement is interpreted as with probability and as a no-op otherwise: that is, it is interpreted as a hard requirement that is only checked with probability . This ensures that the condition will hold with probability at least in the induced distribution of the Scenic program, as desired.

Specifiers and object definitions. As we saw above, each specifier defines a function mapping values for its dependencies to values for the properties it specifies. When an object of class C is constructed using a set of specifiers S, the object is defined as follows (see the Appendix (Fremont et al. 2020a) for details):

  1. 1.

    If a property is specified (non-optionally) by multiple specifiers in S, an ambiguity error is raised.

  2. 2.

    The set of properties P for the new object is found by combining the properties specified by all specifiers in S with the properties inherited from the class C.

  3. 3.

    Default value specifiers from C are added to S as needed so that each property in P is paired with a unique specifier in S specifying it, with precedence order: non-optional specifier, optional specifier, then default value.

  4. 4.

    The dependency graph of the specifiers S is constructed. If it is cyclic, an error is raised.

  5. 5.

    The graph is topologically sorted and the specifiers are evaluated in this order to determine the values of all properties P of the new object.

Mutation. The statement sets the special property to (the form sets it to 1). At the end of evaluation of the Scenic program, but before requirements are checked, Gaussian noise is added to the and properties of objects with nonzero . The standard deviation of the noise is the value of the and property respectively (see Table 2), multiplied by .

Dynamic constructs. As suggested in Sect. 5.4, behaviors and monitors are coroutines: they usually execute like ordinary functions, but are suspended when they an action (or ) until one time step has passed. Scenarios behave similarly: in their blocks, using causes them to wait for one step, and any sub-scenarios they invoke using run recursively; scenarios without blocks do nothing in a time step other than check whether any of their conditions have been met or their conditions violated.

The output of the policy of a dynamic Scenic program is defined according to the following procedure:

  1. 1.

    Run the blocks of all currently-running scenarios for one time step. If any conditions fail, discard the simulation. If instead the top-level scenario finishes its block (if any), one of its conditions is true, or it executes , set a flag to remember this (we use a flag rather than terminating immediately since we need to ensure that all requirements are satisfied before terminating).

  2. 2.

    Check all conditions of currently-running scenarios; if any fail, discard the simulation.

  3. 3.

    Run all monitors of currently-running scenarios for one time step. As above, discard the simulation if any conditions fail, and set the terminate flag if the statement is executed.

  4. 4.

    If the flag is set, check that all conditions were satisfied at some time step: if so, terminate the simulation; otherwise, discard it.

  5. 5.

    Run all the behaviors of dynamic agents for one time step, gathering their actions and discarding the simulation or setting the terminate flag as in (3).

  6. 6.

    Repeat (4) to check the terminate flag.

  7. 7.

    Return the choice of actions selected by the dynamic agents.

The problem of sampling scenes from the distribution defined by a Scenic program is essentially a special case of the sampling problem for imperative PPLs with observations (since soft requirements can also be encoded as observations). While we could apply general techniques for such problems,Footnote 10 the domain-specific design of Scenic enables specialized sampling methods, which we discuss below. We also note that the scenario generation problem is closely related to control improvisation, an abstract framework capturing various problems requiring synthesis under hard, soft, and randomness constraints (Fremont et al. 2015; Fremont 2019). Scenario improvisation from a Scenic program can be viewed as an extension with a more detailed randomness constraint given by the imperative part of the program.

6.2 Domain-specific sampling techniques

The geometric nature of the constraints in Scenic programs, together with Scenic’s lack of conditional control flow outside behaviors, enable domain-specific sampling techniques inspired by robotic path planning methods. Specifically, we can use ideas for constructing configuration spaces to prune parts of the sample space where the objects being positioned do not fit into the workspace. Furthermore, by combining spatial and temporal constraints, we can prune some initial scenes by proving that they force a requirement to be violated at some future point during a dynamic scenario. We describe several pruning techniques below, deferring formal statements of the algorithms to the Appendix (Fremont et al. 2020a).

Pruning based on containment. The simplest technique applies to any object X whose position is uniform in a region R and which must be contained in a region C (e.g. the road in our case study). If \(\textit{minRadius}\) is a lower bound on the distance from the center of X to its bounding box, then we can restrict R to \(R \cap \textit{erode}(C, \textit{minRadius)}\). This is sound, since if X is centered anywhere not in the restriction, then some point of its bounding box must lie outside of C.

Pruning based on orientation. The next technique applies to scenarios placing constraints on the relative heading and the maximum distance M between objects X and Y, which are oriented with respect to a vector field that is constant within polygonal regions (such as our roads). For each polygon P, we find all polygons \(Q_i\) satisfying the relative heading constraints with respect to P (up to a perturbation if X and Y need not be exactly aligned to the field), and restrict P to \(P \cap \textit{dilate}(\cup Q_i, M)\). This is also sound: suppose X can be positioned at x in polygon P. Then Y must lie at some y in a polygon Q satisfying the constraints, and since the distance from x to y is at most M, we have \(x \in \textit{dilate}(Q, M)\).

Pruning based on size. In the setting above of objects X and Y aligned to a polygonal vector field (with maximum distance M), we can also prune the space using a lower bound on the width of the configuration. For example, in our bumper-to-bumper scenario we can infer such a bound from the specifiers in the program. We first find all polygons that are not wide enough to fit the configuration according to the bound: call these “narrow”. Then we restrict each narrow polygon P to \(P \cap \textit{dilate}(\cup Q_i, M)\) where \(Q_i\) runs over all polygons except P. To see that this is sound, suppose object X can lie at x in polygon P. If P is not narrow, we do not restrict it; otherwise, object Y must lie at y in some other polygon Q. Since the distance from x to y is at most M, as above we have \(x \in \textit{dilate}(Q, M)\).

Pruning based on reachability. Finally, we can prune initial positions for objects which make it impossible to reach a goal location within the duration of the scenario; for example, a car which travels down a road and then runs a red light must start sufficiently close to an intersection. Suppose an object is required to enter a region R within T time (either by an explicit statement or a precondition of a behavior or scenario guaranteed to eventually execute) and we have an upper bound S on the object’s speed. Then we can prune away all initial positions of the object which do not lie within a distance \(D = S T\) of R, i.e., we can restrict its initial positions to \(\textit{dilate}(R, D)\). If the object is also required to stay within some containing region C (e.g., a road) for the entire duration of the scenario, we can compute a tighter value of D by considering only paths that lie within C.

After pruning the space as described above, our implementation uses rejection sampling, generating scenes from the imperative part of the scenario until all requirements are satisfied. While this samples from exactly the desired distribution, it has the drawback that a huge number of samples may be required to yield a single valid scene (in the worst case, when the requirements have probability zero of being satisfied, the algorithm will not even terminate). However, we found in our experiments that all reasonable scenarios we tried required at most several hundred iterations of rejection sampling, yielding a sample within a few seconds. Furthermore, the pruning methods above could reduce the number of samples needed by a factor of 3 or more (see the Appendix (Fremont et al. 2020a) for details of our experiments). In future work it would be interesting to see whether Markov chain Monte Carlo methods previously used for probabilistic programming (see, e.g., Milch et al. (2004); Nori et al. (2014); Wood et al. (2014)) could be made effective in the case of Scenic.

7 Experiments

We demonstrate the three applications of Scenic discussed in Sect. 2: testing a system under particular conditions, either a perception component in isolation Sect. 7.2.1 or a dynamic closed-loop system Sect. 7.2.2, training a system to improve accuracy in hard cases Sect. 7.3, and debugging failures Sect. 7.4. We begin by describing the general experimental setup.

7.1 Experimental setup

For our main case study, we generated scenes in the virtual world of the video game Grand Theft Auto V (GTAV) (Rockstar Games 2015). We wrote a Scenic world model defining s representing the roads and curbs in (part of) this world, as well as a type of object providing two additional propertiesFootnote 11: model, representing the type of car, with a uniform distribution over 13 diverse models provided by GTAV, and color, representing the car color, with a default distribution based on real-world car color statistics (DuPont 2012). In addition, we implemented two global scene parameters: time, representing the time of day, and weather, representing the weather as one of 14 discrete types supported by GTAV (e.g. “clear” or “snow”).

GTAV is closed-source and does not expose any kind of scene description language. Therefore, to import scenes generated by Scenic into GTAV, we wrote a plugin based on DeepGTAV.Footnote 12 The plugin calls internal functions of GTAV to create cars with the desired positions, colors, etc., as well as to set the camera position, time of day, and weather.

Our experiments used SqueezeDet (Wu et al. 2017), a convolutional neural network real-time object detector for autonomous driving.Footnote 13 We used a batch size of 20 and trained all models for 10,000 iterations unless otherwise noted. Images captured from GTAV with resolution \(1920 \times 1200\) were resized to \(1248 \times 384\), the resolution used by SqueezeDet and the standard KITTI benchmark (Geiger et al. 2012). All models were trained and evaluated on NVIDIA TITAN XP GPUs.

We used standard metrics precision and recall to measure the accuracy of detection on a particular image set. The accuracy is computed based on how well the network predicts the correct bounding box, score, and category of objects in the image set. Details are in the Appendix (Fremont et al. 2020a), but in brief, precision is defined as \(tp / (tp + fp)\) and recall as \(tp / (tp + fn)\), where true positives tp is the number of correct detections, false positives fp is the number of predicted boxes that do not match any ground truth box, and false negatives fn is the number of ground truth boxes that are not detected.

7.2 Testing and falsification

We begin with the most straightforward application of Scenic, namely generating specialized data to test a system under particular conditions. We demonstrate both using a static scenario to test a perception component, and using a dynamic scenario to falsify a closed-loop system.

7.2.1 Testing a perception module

When testing a model, one may be interested in a particular operation regime. For instance, an autonomous car manufacturer may be more interested in certain road conditions (e.g. desert vs. forest roads) depending on where its cars will be mainly used. Scenic provides a systematic way to describe scenarios of interest and construct corresponding test sets.

To demonstrate this, we first wrote very general scenarios describing static scenes of 1–4 cars (not counting the camera), specifying only that the cars face within \(10^\circ \) of the road direction: all other features had their default distributions, e.g. the cars were positioned uniformly at random over the road and the time of day was uniform over an entire 24 h period. We generated 1000 images from each scenario, yielding a training set \(X_\mathrm {generic}\) of 4000 images, and used these to train a model \(M_\mathrm {generic}\) as described in Sect. 7.1. We also generated an additional 50 images from each scenario to obtain a generic test set \(T_\mathrm {generic}\) of 200 images. For all of our scenarios (including in our other experiments), sampling a single scene and rendering an image from it took at most several seconds.

Next, we specialized the general scenarios in opposite directions: scenarios for good/bad road conditions fixing the time to noon/midnight and the weather to sunny/rainy respectively, generating specialized test sets \(T_\mathrm {good}\) and \(T_\mathrm {bad}\).

Evaluating \(M_\mathrm {generic}\) on \(T_\mathrm {generic}\), \(T_\mathrm {good}\), and \(T_\mathrm {bad}\), we obtained precisions of 83.1, 85.7, and 72.8%, respectively, and recalls of 92.6, 94.3, and 92.8%. This shows that, as might be expected, the model performs better on bright days than on rainy nights. This suggests there might not be enough examples of rainy nights in the training set, and indeed under our default weather distribution rain is less likely than shine. This illustrates how specialized test sets can highlight the weaknesses and strengths of a particular model. In Sect. 7.3, we go one step further and use Scenic to redesign the training set and improve model performance.

7.2.2 Falsifying a dynamic closed-loop system

Next, we demonstrate how we can use a dynamic Scenic scenario to test a closed-loop system, using VerifAI’s falsification facilities to monitor and analyze counterexamples to a system-level specification. We tested an autonomous agentFootnote 14 in the CARLA (Dosovitskiy et al. 2017) driving simulator, for which we wrote a similar Scenic world model as we did for GTA V. This agent consists of a planner and controller (but no perception components) which implement basic driving behaviors including abiding by traffic lights, lane following, and collision avoidance.

We wrote a Scenic program describing a scenario where the ego vehicle (i.e. the autonomous agent) is performing a right turn at an intersection, yielding to the crossing traffic. As the ego approaches the intersection, the traffic light turns green, but a crossing car runs the red light. The ego vehicle has to decide either to yield or make a right turn. The crossing car executes a reactive behavior where it slows down to maintain a minimum distance with any car in front.

We allowed three environment parameters to vary in this scenario (code for which can be found in the Appendix (Fremont et al. 2020a)) :

  • The traffic light’s transition from red to green is triggered when the distance between the ego and the crossing car reaches a threshold, which was uniformly random between 10–30 m.

  • The crossing car’s speed was uniformly random between 5–12 m/s.

  • The scenario takes place at a random 4-way intersection in the CARLA map. To demonstrate how Scenic programs can be written in a generic, map-agnostic style, we used the same Scenic code on two different CARLA maps (Town05 and Town03).

We formulated a safety specification for the autonomous agent in Metric Temporal Logic, stating that the distance between the agent and the crossing car must be greater than 5 meters at all times. Giving this specification and the Scenic program to VerifAI, we generated 2,000 scenarios for each map. VerifAI monitored each simulation and computed the robustness value \(\rho \) of the MTL specification, which measures how strongly the specification was satisfied (Koymans 1990) (negative values meaning it was violated).

Our results are shown in Fig. 12. On the left, we plot \(\rho \) as a function of the traffic light trigger threshold and the speed of the crossing car. Each dot represents one simulation, with redder colors indicating smaller \(\rho \), i.e., being closer to violating the safety specification. We found a significant number of violations, approximately 21% and 17% of tests on Town05 and Town03 respectively. From the plots we observe broadly similar behavior across the two maps, with the distance when the traffic light switch occurs being the dominant factor controlling failures of the autonomous agent (most failures occurring for values of 15–25 m).

Fig. 12
figure 12

Falsification results in CARLA. Top: Town05; bottom: Town03

On the right side of Fig. 12, we plot the average value of \(\rho \) at each intersection, with color again indicating the average value of \(\rho \) and the size of each dot being proportional to its variance. We can see that some intersections are much easier or harder for the autonomous agent to handle. Investigating some of the most extreme intersections, we observed that those with 4-lane legs and a turning radius of about 6.5 m caused the agent to fail most frequently. Re-testing the agent at such intersections, we found that this geometry often created a situation where the agent and the crossing car were merging into the same lane simultaneously, instead of one car completing its maneuver before the other.

These results show how we can use Scenic to find scenarios where a closed-loop system violates its specification. In Sect. 7.4, we will further show how Scenic can help us diagnose the root causes of failures and eliminate them through retraining.

7.3 Training on rare events

In the synthetic data setting, we are limited not by data availability but by the cost of training. The natural question is then how to generate a synthetic data set that as effective as possible given a fixed size. In this section we show that over-representing a type of input that may occur rarely but is difficult for the model can improve performance on the hard case without compromising performance in the typical case. Scenic makes this possible by allowing the user to write a scenario capturing the hard case specifically.

For our car detection task, an obvious hard case is when one car substantially occludes another. We wrote a simple scenario, shown in Fig. 13, which generates such scenes by placing one car behind the other as viewed from the camera, offset left or right so that it is at least partially visible; Fig. 14 shows some of the resulting images. Generating images from this scenario we obtained a training set \(X_\mathrm {overlap}\) of 250 images and a test set \(T_\mathrm {overlap}\) of 200 images.

Fig. 13
figure 13

A scenario where one car partially occludes another. The property roadDeviation is defined in to mean its heading relative to the roadDirection

Fig. 14
figure 14

Two scenes generated from the partial-occlusion scenario

For a baseline training set we used the “Driving in the Matrix” synthetic data set (Johnson-Roberson et al. 2017), which has been shown to yield good car detection performance even on real-world images.Footnote 15 Like our images, the “Matrix” images were rendered in GTAV; however, rather than using a PPL to guide generation, they were produced by allowing the game’s AI to drive around randomly while periodically taking screenshots. We randomly selected 5000 of these images to form a training set \(X_\mathrm {matrix}\), and 200 for a test set \(T_\mathrm {matrix}\). We trained SqueezeDet for 5,000 iterations on \(X_\mathrm {matrix}\), evaluating it on \(T_\mathrm {matrix}\) and \(T_\mathrm {overlap}\). To reduce the effect of jitter during training we used a standard technique (Arlot and Celisse 2010), saving the last 10 models in steps of 10 iterations and picking the one achieving the best total precision and recall. This yielded the results in the first row of Table 6. Although \(X_\mathrm {matrix}\) contains many images of overlapping cars, the precision on \(T_\mathrm {overlap}\) is significantly lower than for \(T_\mathrm {matrix}\), indicating that the network is predicting lower-quality bounding boxes for such cars.Footnote 16

Table 6 Performance of models trained on 5000 images from \(X_\mathrm {matrix}\) or a mixture with \(X_\mathrm {overlap}\), averaged over 8 training runs with random selections of images from \(X_\mathrm {matrix}\)

Next we attempted to improve the effectiveness of the training set by mixing in the difficult images produced with Scenic. Specifically, we replaced a random 5% of \(X_\mathrm {matrix}\) (250 images) with images from \(X_\mathrm {overlap}\), keeping the overall training set size constant. We then retrained the network on the new training set and evaluated it as above. To reduce the dependence on which images were replaced, we averaged over 8 training runs with different random selections of the 250 images to replace. The results are shown in the second row of Table 6. Even altering only 5% of the training set, performance on \(T_\mathrm {overlap}\) significantly improves. Critically, the improvement on \(T_\mathrm {overlap}\) is not paid for by a corresponding decrease on \(T_\mathrm {matrix}\): performance on the original data set remains the same. Thus, by allowing us to specify and generate instances of a difficult case, Scenic enables the generation of more effective training sets than can be obtained through simpler approaches not based on PPLs.

7.4 Debugging failures

In our final experiment, we show how Scenic can be used to generalize a single input on which a model fails, exploring its neighborhood in a variety of different directions and giving insight into which features of the scene are responsible for the failure. The original failure can then be generalized to a broader scenario describing a class of inputs on which the model misbehaves, which can in turn be used for retraining. We selected one scene from our first experiment, shown in Fig. 15, consisting of a single car viewed from behind at a slight angle, which \(M_\mathrm {generic}\) wrongly classified as three cars (thus having 33.3% precision and 100% recall). We wrote several scenarios which left most of the features of the scene fixed but allowed others to vary. Specifically, scenario (1) varied the model and color of the car, (2) left the position and orientation of the car relative to the camera fixed but varied the absolute position, effectively changing the background of the scene, and (3) used the mutation feature of Scenic to add a small amount of noise to the car’s position, heading, and color. For each scenario we generated 150 images and evaluated \(M_\mathrm {generic}\) on them. As seen in Table 7, changing the model and color improved performance the most, suggesting they were most relevant to the misclassification, while local position and orientation were less important and global position (i.e. the background) was least important.

Fig. 15
figure 15

The misclassified image, with the predicted bounding boxes

Table 7 Performance of \(M_\mathrm {generic}\) on different scenarios representing variations of the image in Fig. 15

To investigate these possibilities further, we wrote a second round of variant scenarios, also shown in Table 7. The results confirmed the importance of model and color [compare (2)–(7)], as well as angle [compare (5)–(6)], but also suggested that being close to the camera could be the relevant aspect of the car’s local position. We confirmed this with a final round of scenarios [compare (5) and (8)], which also showed that the effect of car model is small among scenes where the car is close to the camera [compare (4) and (9)].

Having established that car model, closeness to the camera, and view angle all contribute to poor performance of the network, we wrote broader scenarios capturing these features. To avoid overfitting, and since our experiments indicated car model was not very relevant when the car is close to the camera, we decided not to fix the car model. Instead, we specialized the generic one-car scenario from our first experiment to produce only cars close to the camera. We also created a second scenario specializing this further by requiring that the car be viewed at a shallow angle.

Finally, we used these scenarios to retrain \(M_\mathrm {generic}\), hoping to improve performance on its original test set \(T_\mathrm {generic}\) (to better distinguish small differences in performance, we increased the test set size to 400 images). To keep the size of the training set fixed as in the previous experiment, we replaced 400 one-car images in \(X_\mathrm {generic}\) (10% of the whole training set) with images generated from our scenarios. As a baseline, we used images produced with classical image augmentation techniques implemented in imgaug (Jung 2018). Specifically, we modified the original misclassified image by randomly cropping 10–20% on each side, flipping horizontally with probability 50%, and applying Gaussian blur with \(\sigma \in [0.0, 3.0]\).

The results of retraining \(M_\mathrm {generic}\) on the resulting data sets are shown in Table 8. Interestingly, using classical augmentation actually decreased performance, presumably due to overfitting to relatively slight variants of a single image. On the other hand, replacing part of the data set with specialized images of cars close to the camera significantly reduced the number of false positives like the original misclassification (while the improvement for the “shallow angle” scenario was less, perhaps due to overfitting to the restricted angle range). This demonstrates how Scenic can be used to improve performance by generalizing individual failures into scenarios that capture the essence of the problem but are broad enough to prevent overfitting during retraining.

Table 8 Performance of \(M_\mathrm {generic}\) after retraining, replacing 10% of \(X_\mathrm {generic}\) with different data

8 Related work

Synthetic data generation. There has been a large amount of work on generating synthetic data for specific applications, including text recognition (Jaderberg et al. 2014), text localization (Gupta et al. 2016), robotic object grasping (Tobin et al. 2017), and autonomous driving (Johnson-Roberson et al. 2017; Filipowicz et al. 2017). Closely related is work on domain adaptation, which attempts to correct differences between synthetic and real-world input distributions. Domain adaptation has enabled synthetic data to successfully train models for several other applications including 3D object detection (Liebelt et al. 2010; Stark et al. 2010), pedestrian detection (Vazquez et al. 2014), and semantic image segmentation (Ros et al. 2016). Such work provides important context for our paper, showing that models trained exclusively on synthetic data (possibly domain-adapted) can achieve acceptable performance on real-world data. The major difference in our work is that we provide, through Scenic, language-based systematic data generation for any cyber-physical system.

A closely-related area is that of Generative Adversarial Networks (GANs) (Goodfellow et al. 2014a), a particular kind of neural network able to generate realistic synthetic data, which has been used to augment training sets (Liang et al. 2017; Marchesi 2017). The difference with Scenic is that GANs require an initial training set/pretrained model and do not easily incorporate declarative constraints, while Scenic produces synthetic data in an explainable, programmatic fashion requiring only a simulator. At present, achieving precise control over the contents of images generated by GANs is challenging. However, in future it would be interesting to explore using GANs in combination with Scenic, either to improve the realism of the generated data (as in domain adaptation), or more interestingly, using Scenic to generate some of the latent variables of the GAN, thereby providing some level of controllability.

Robustness checking and adversarial ML. Adversarial machine learning (Szegedy et al. 2014) is a field which focuses on the analysis of ML algorithms against adversarial attacks and the design of models robust to such attacks. Some of these methods generate misclassified examples by looking at the model gradient and by finding minimal input perturbations that lead to a misclassification (Szegedy et al. 2013; Goodfellow et al. 2014b; Moosavi-Dezfooli et al. 2016; Nguyen et al. 2015). Other techniques assume the model to be gray/black-box and focus on input modifications or high-level properties of the model (Pei et al. 2017; Dreossi et al. 2017, 2018). Based on these analyses, some works have explored the idea of using adversarial examples (i.e. misclassified examples) to retrain and improve ML models (e.g., Xu et al. 2016; Wong et al. 2016; Goodfellow et al. 2014b; Dreossi et al. 2018). Our work on Scenic is complementary to most prior work on adversarial ML, which usually considers attacks consisting of small pixel-level perturbations to the input images. By contrast, Scenic is part of a line of work on semantic adversarial ML (Dreossi et al. 2018), enabling search through a space of meaningful, high-level features rather than individual pixel values.

Model-based test generation. Techniques using a model to guide test generation have long existed Broy et al. 2005. A popular approach is to provide example tests, as in mutational fuzz testing (Sutton et al. 2007) and example-based scene synthesis (Fisher et al. 2012). While these methods are easy to use, they do not provide fine-grained control over the generated data. Another approach is to give rules or a grammar specifying how the data can be generated, as in generative fuzz testing (Sutton et al. 2007), procedural generation from shape grammars (Müller et al. 2006), and grammar-based scene synthesis (Jiang et al. 2018). While grammars allow much greater control, they do not easily allow enforcing global properties. This is also true when writing a program in a domain-specific language with nondeterminism (Elmas et al. 2013). Conversely, constraints as in constrained-random verification (Naveh et al. 2006) allow global properties but can be difficult to write. Scenic improves on these methods by simultaneously providing fine-grained control, enforcement of global properties, specification of probability distributions, and simple imperative syntax.

Probabilistic programming languages. The semantics (and to some extent, the syntax) of Scenic are similar to that of other probabilistic programming languages such as Prob (Gordon et al. 2014), Church (Goodman et al. 2008), and BLOG (Milch et al. 2004). In probabilistic programming the focus is usually on inference rather than generation (the main application in our case), and in particular to our knowledge probabilistic programming languages have not previously been used for test generation. However, the most popular inference techniques are based on sampling and so could be directly applied to generate scenes from Scenic programs, as we discussed in Sect. 6.

Several probabilistic programming languages have been used to define generative models of objects and scenes: both general-purpose languages such as WebPPL (Goodman and Stuhlmüller 2014) (see, e.g., Ritchie (2016)) and languages specifically motivated by such applications, namely Quicksand (Ritchie 2014) and Picture (Kulkarni et al. 2015). The latter are in some sense the most closely-related to Scenic, although neither provides specialized syntax or semantics for dealing with geometry or dynamic behaviors (Picture also was used only for inverse rendering, not data generation). The main advantage of Scenic over these languages is that its domain-specific design permits concise representation of complex scenarios and enables specialized sampling techniques.

Scenario description languages for autonomous driving. Recently, formal dynamic scenario description languages have been proposed for the domain of autonomous driving. The Paracosm language (Majumdar et al. 2019) is used to model dynamic scenarios with a reactive and synchronous model of computation. However, it is not a PPL, so it lacks probability distributions and declarative constraints; it also does not provide constructs like Scenic’s interrupts which allow easy customization of generic behavior models. The Measurable Scenario Description Language (M-SDL) (Foretellix 2020) does provide declarative constraints, as well as compositional features similar to those we introduced in this paper. However, compared to both of these languages (which were introduced after the first version of this paper), Scenic has several distinguishing features: (1) it provides a much higher-level, declarative way of specifying geometric constraints; (2) it is fundamentally a probabilistic programming language (as opposed to M-SDL where distributions are optional), and (3) it is not specific to the autonomous driving domain (as demonstrated in Fremont et al. (2019, 2020)).

9 Conclusion

In this paper, we introduced Scenic, a probabilistic programming language for specifying distributions over configurations of physical objects and the behaviors of dynamic agents. We showed how Scenic can be used to generate synthetic data sets useful for a variety of tasks in the design of robust ML-based cyber-physical systems. Specifically, we used Scenic to generate specialized test sets and falsify a system, improve the robustness of a system by emphasizing difficult cases in its training set, and generalize from individual failure cases to broader scenarios suitable for retraining. In particular, by training on hard cases generated by Scenic, we were able to boost the performance of a car detector neural network (given a fixed training set size) significantly beyond what could be achieved by prior synthetic data generation methods (Johnson-Roberson et al. 2017) not based on PPLs.

In future work we plan to conduct experiments applying Scenic to a variety of additional domains, applications, and simulators. As we mentioned in the Introduction, we have already successfully applied Scenic to aircraft (Fremont et al. 2020), and we are currently investigating applications in further domains including underwater vehicles and indoor robots. We also plan to extend the Scenic language itself in several directions, including allowing user-defined specifiers and describing 3D scenes. Finally, we are exploring ways to combine Scenic with automated analyses: in particular, reducing the human burden of writing Scenic programs through algorithms for synthesizing or adapting such programs (e.g. Kim et al. (2020)), and improving the efficiency of falsification by performing white-box analyses of the system.