Extensive literature has been published about well-placement optimization; authors have proposed many different approaches. Real field experience, however, has shown that these techniques are rarely used. In practice, expert engineers usually identify the main factors affecting a future well’s production. They consider these factors in obtaining the best well configurations. These are then evaluated with a full-field reservoir-simulation model. Uncertainty is handled in the process typically in a second phase to test the robustness of the proposed development plan by running a few alternative models corresponding to some low and high cases and, more rarely, on a full Monte Carlo set of models generated by the full prior-model uncertainty.

The reasons the different optimization methods proposed in the literature are not used in operations can vary: difficulty in handling all the operational constraints during optimization, a prohibitive number of simulation runs, relatively poor performance of the optimization methods when considering many complex production and injection wells, and the complexity of the method and its configuration. The approach proposed here is not a proper well-placement optimization method; it is more a set of data-analytics tools aimed at helping reservoir engineers to explore the extremely large domain of possible solutions and, it is hoped, to find better solutions. This paper’s objective is to provide a methodology and some guidelines that can be adapted to each specific field case. The idea is to follow the typical work flow used by reservoir engineers while adding data-analytics techniques to improve the decision-making process.

This work presents a methodology to address the problem of field development planning mainly for greenfields, although a similar work flow can be applied to mature fields. The first step in field development planning is to perform a study to incorporate geological, seismic, and fluid data into a reservoir-simulation model.

A schematic idea of the ideal field development also is presumed to be available. More precisely, the engineer will decide whether to use horizontal or vertical wells and the total number of wells and define an area of investigation for each well and its operational constraints. The area of investigation could be the entire field eventually, but, particularly in offshore fields, a restricted zone can be determined on the basis of platform locations. Once the investigation area is determined for a given well, the second step is to generate possible trajectories.

Well-Trajectories Generation. The search algorithm is based on a systematic brute-force approach. All locations to be tested need to be predefined. In practical terms, this is accomplished by providing rules for a search pattern, and the algorithm will place completions inside the reservoir model on the basis of these rules. The search pattern itself is based on a circular concept with the wellhead at the center. The rules are set while keeping in mind a tradeoff between computational performance, resolution of investigation, desired well design, and drilling constraints. One must define wellhead location, inner and outer investigation radii for start and end of the completion, azimuths of investigation directions, range of dips, maximum completion length, and a target-zone definition. The target zone is defined as a reservoir property cell by cell and indicates in which cells a completion can and cannot be placed. Some examples of possible well schemes for horizontal wells are shown in Fig. 1.

Well-Features Extraction and Computation. The next step concerns the definition of the features used to characterize each well trajectory. Two set of features are distinguished: features computed along the cells crossed by a well trajectory and features computed on connected bodies crossed by trajectories. A connected body in a reservoir is defined as a set of cells connected to each other by at least one cell with a transmissibility higher than a given threshold.

The connected-body notion is central to this approach because it provides information about drainage areas for each well without having to evaluate it with a simulation; however, the calibration of the transmissibility threshold used to compute connected bodies should be conducted according to the drained areas observed by use of a full-field simulation.

The along-well features are computed by summing or averaging the cell properties crossed by the well, whereas, for connected-bodies features, the properties of all connected bodies crossed by the well are summed or averaged. The corresponding features for each well trajectory are then computed for each different model realization. Note that, for very large models with millions of cells, this could be time-consuming; therefore, a selection of a reduced number of geological realizations may be necessary at this stage. For each combination of trajectories, the average of each feature for all the realizations is then computed. At this stage, the decision can be made whether to apply a cutoff on the values of some feature average. For example, the decision can be made not to have wells too close to each other or not to have wells too close to the aquifer.

Clustering of Well Configurations. Once the threshold has been applied, the next step is to select configurations to be evaluated by the reservoir simulator. These configurations should be as different as possible. To accomplish this, a clustering algorithm is applied on the features matrix composed of the features values for each well and configuration. The K-medoid algorithm (a variation of the well-known K-means algorithm) is used for selecting N configurations that are the most distant in the features space and that should allow the machine-learning algorithm to learn the most different well configurations and, it is hoped, be more predictive. The selected configurations then are simulated for all the chosen realizations.

Machine-Learning-Model Construction. Now, all of the ingredients are available to start training a machine-learning model to evaluate new well configurations. The inputs for the model are the different features computed for each well, and the output is the reservoir-simulation output that typically will be cumulative oil production, recovery factor, or the net present value at a certain future time. Note, however, that, if a goal is to predict cumulative production for the field, the features evaluated for each well will have to be used. Usually, between five and 10 features will be considered for each well; therefore, if the field has more than five wells, the number of features can become quite high. Moreover, the cumulative function will be complex and difficult to learn. In order to simplify the learning task, each cumulative production for each well is going to be learned independently. In this way, the problem is simplified and more information is provided to the system because it is informed about the production of each separate well and not only the total field production.

Different machine-learning algorithms can then be tested to find the best model (this study tested neural networks, random forests, and gradient boosting). To qualify the accuracy of each method, the data is divided into a training set and a test set consisting of 80 and 20% of the total dataset, respectively. After the model has been validated, it can be used to evaluate millions of new well configurations. Depending on the number of well configurations, either they all can be evaluated or an optimization method can be used to find the best possible configurations. The machine-learning algorithm is only an approximation of the simulator result; therefore, it will be used to identify a certain number of best solutions (typically a dozen of them) that will then be evaluated with the reservoir simulator.

A novel data-analytics work flow is presented to assist reservoir engineers in finding optimal well locations for greenfields. The method is based on heuristic (expert-based) definitions of geological features that can be used to train machine-learning regression algorithms to predict the final cumulative production of new trajectories by training them with a reasonable number of reservoir simulations (a few thousands). Geological uncertainty provided by different possible geological realizations also is taken into account. The feasibility of this approach has been proved on a synthetic oil field with three horizontal wells to optimize. Application on a real field case is ongoing.