Understanding the WITSML format in the Volve Oilfield dataset

The drilling technical data of the Volve Oilfield Wells is stored in WITSML format and is available under the section Volve\WITSML Realtime drilling data\.

WITSML is the acronym for Well-site Information Transfer Standard Markup Language. It is a standard for transmitting well-site data from the rig to different stakeholders like operating companies, service companies, drilling contractors, application vendors and regulatory agencies.

A description of this section is also available in here. The dataset of the Volve Oilfield encompasses the following information based on version 1.0 of WITSML:

  • Borehole Assembly Run: The bhaRun object is used to capture information about one run of the drillstring into and out of the hole. The drillstring configuration is described in the tubular object. That is, one configuration may be used for many runs.
  • Well logs: The log object is used to capture the curves on a well log. Here, it represents GasTime, GlcSettingsLog, TripConn, GenTime, GwdTime, CementData, GenTime2, Hydraulics, Pits, TripTime logs.
  • Message: The message object is used to provide a central location for informative time stamped information about another well related object. These messages can represent alarms, warnings, events, etc.
  • Rig: The rig object is used to capture information about a drilling rig used to drill a wellbore.
  • Trajectory: The trajectory object is used to capture information about a directional survey in a wellbore. It contains many trajectory stations to capture the information about individual survey points.
  • Tubular: The tubular object is used to capture information about the configuration of a drill string.
  • Wellbore Geometry: The wbGeometry object is used to capture information about the configuration of the permanently installed components in a wellbore. It does not define the transient drilling strings or the hanging production components.

(Ref: WITSML version 1.4.1 schema)

Potential applications of this dataset

The WITSML database may be used to calculate the Well deviation, Geomechanical analysis, Drilling operation activity analysis, Prediction of drilling issues like stuck pipe/ kick and losses, Rig performance analytics, Benchmarking, Non-Productive time (NPT) analysis, Uncertain analysis, Geo-steering.

This articles proposes its readers to take up analysis demonstrating the above techniques.

Understanding the XML format

Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration. XML documents create a hierarchical structure looks like a tree.

The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 = Latin-1/West European character set).

The next line describes the root element of the document (like trajectory, log, wbGeometry, etc) and the next lines describe the child elements of the root.

Image: XML format of the WITSML datatype

Documentation on the WITSML format is available on Energistics Online (EO). A few important links to understand the format is:

Alfonso R Reyes in his blog provides a detailed lecture on the trajectory data file, he goes on to explain the hierarchical data structures

Access WITSML data

Current Literature

Aaron Olsen in his article WITSML import via Pandas and Python described the procedure to connect to the azure storage via python and provides the method to parse the log and trajectory files to a pandas data frame. He also provides a description of the log and trajectory files helping the reader understand the data.

Alejandro Primera in his video posted on the Orkahub Energy YouTube channel describes the procedure to read the trajectory file of the WITSML database by parsing it to a pandas data frame and also generating 3D Plotly plots.

Alfonso R Reyes in his blog Exploring drilling data from the Volve dataset with WITSML and R provides the procedure to parse the trajectory XML file using R and develops a couple of functions to analyze the data.

Jonny Corcutt in his post Accessing Equinor’s Volve EDM Data presents a procedure to access the EDM (Engineers Data Model) file in the Volve dataset.

Generic Code to access WITSML data

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

Here, we provide the procedure to parse a downloaded WITSML datafile. The procedure to access the Volve dataset file is available here. The WITSML datafiles are available under the folder, volve/WITSML Realtime drilling data/.

## Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

Trajectory file

## Opening the F-15 well Trajectory file
WITSML_file = r"C:\Users\Ankit Bansal\Desktop\Volve\Volve - Real Time drilling data 13.05.2018\sitecom14.statoil.no\Norway-Statoil-NO 15_$47$_9-F-15\1\trajectory\1.xml"

# Reading the WITSML file
with open(WITSML_file) as f:
data = f.read()
## Parse the WITSML file using the Beautiful library
data_xml = BeautifulSoup(data, 'xml')
# Print the tags in the file
temp = set([str(tag.name) for tag in data_xml.find_all()])
print ("\n".join(temp))

Result:
tvd
azi
gravAccelCorUsed
magXAxialCorUsed
magTotalFieldReference
name
trajectorys

columns = ['azi', 'md', 'tvd', 'incl', 'dispNs', 'dispEw']
df = pd.DataFrame()
for col in columns:
    df[col] = [float(x.text) for x in data_xml.find_all(col)]
print(df)

Result

azi md tvd incl dispNs dispEw

0 0.000000 0.000000 0.000000 0.000000 -3.170000 3.530000
1 0.000000 145.899994 145.899994 0.000000 -3.170000 3.530000
2 4.427726 153.399994 153.399985 0.002618 -3.172757 3.520578
3 2.548530 234.199997 234.199627 0.004887 -3.366177 3.529412
4 2.377313 274.600006 274.597786 0.013614 -3.646544 3.774881
.. … … … … … …
75 4.469265 3118.280029 2938.419040 0.382925 -699.230295 -217.451880
76 4.471534 3158.540039 2975.709107 0.390081 -702.866857 -232.185777
77 4.470487 3199.020020 3013.291014 0.371232 -706.461696 -246.789120
78 4.460363 3211.780029 3025.193506 0.366170 -707.585751 -251.248326
79 4.460363 3232.000000 3044.073003 0.366170 -709.391062 -258.259222

## Plot the trajectory
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

# define the axis parameters
ax.plot(df['dispNs'], df['dispEw'], df['tvd']*-1, '-r', linewidth = 2)

# format the plot
ax.set_xlabel('dispNs', size=20, labelpad=30)
ax.set_ylabel('dispEw', size=20, labelpad=30)
ax.set_zlabel('tvd', size=20, labelpad=30)
ax.tick_params(labelsize=15)

#set plot aspect ratio
ax.get_proj = lambda: np.dot(Axes3D.get_proj(ax), np.diag([0.5, 0.5,1.5, 1]))

fig.show()
Well trajectory of NO 15/ 9-F-15

Log file

## Opening the F-15 well log file
WITSML_file = r"C:\Users\Ankit Bansal\Desktop\Volve\Volve - Real Time drilling data 13.05.2018\sitecom14.statoil.no\Norway-Statoil-NO 15_$47$_9-F-15\2\log\1\3\3\00001.xml"

# Reading the WITSML file
with open(WITSML_file) as f:
    data = f.read()

## Parse the WITSML file using the Beautiful library
soup = BeautifulSoup(data, 'xml')
set([str(tag.name) for tag in soup.find_all()])
# Print the tags in the file
temp = set([str(tag.name) for tag in soup.find_all()])
print (soup.find_all('mnemonicList')[0].text)

Result Depth,LAGMWT,Time,EditFlag,TORQUE,MOTOR_RPM,STRATESUM,MWOUT,LAGMWDIFF,MWIN, BIT_RPM,DXC,MUDRETDEPTH,PUMP,LAGMTEMP,RigActivityCode,MRIN,FLOWOUT,ROP_AVG,LAGMRES, TOTGAS,MROUT,MTIN,LAGMRDIFF,FLOWIN,WOB,ONBOTTOM_TIME,ECDBIT,MTOUT,BIT_DIST,SURF_RPM,LAGMTDIFF

## Process the log files
# Get name of logs in the file
log_names = soup.find_all('mnemonicList')

# The units in the file
unit_names = soup.find_all('unitList')

# Define that the header is the 'mnemonic - unit' this simiplifies the pandas dataframe format
header = [i + ' - ' + j for i, j in zip(log_names[0].string.split(","), unit_names[0].string.split(","))] 

# define out pandas dataframe - the columns are the header - a concatenation of the mnemonic and the unit, the data is parsed by looping over every 
# list found under the data tag.
data = soup.find_all('data')
df = pd.DataFrame(columns=header, 
                  data=[row.string.split(',') for row in data])

# replace blank values with nan
df = df.replace('', np.NaN)

Result
[‘Depth – m’, ‘LAGMWT – kg/m3’, ‘Time – unitless’, ‘EditFlag – unitless’, ‘TORQUE – N.m’, ‘MOTOR_RPM – c/s’, ‘STRATESUM – Hz’, ‘MWOUT – kg/m3’, ‘LAGMWDIFF – kg/m3’, ‘MWIN – kg/m3’, ‘BIT_RPM – c/s’, ‘DXC – unitless’, ‘MUDRETDEPTH – m’, ‘PUMP – Pa’, ‘LAGMTEMP – K’, ‘RigActivityCode – unitless’, ‘MRIN – ohm.m’, ‘FLOWOUT – m3/s’, ‘ROP_AVG – m/s’, ‘LAGMRES – ohm.m’, ‘TOTGAS – Euc’, ‘MROUT – ohm.m’, ‘MTIN – K’, ‘LAGMRDIFF – ohm.m’, ‘FLOWIN – m3/s’, ‘WOB – N’, ‘ONBOTTOM_TIME – s’, ‘ECDBIT – kg/m3’, ‘MTOUT – K’, ‘BIT_DIST – m’, ‘SURF_RPM – c/s’, ‘LAGMTDIFF – K’]

## Plot the logs
fig, (ax1, ax2) = plt.subplots(1,2,figsize=(5,12))
ax1.set_aspect('auto')
ax1.plot(pd.to_numeric(df['ROP_AVG - m/s']), pd.to_numeric(df['Depth - m']), '-g', linewidth = 2)
ax1.set_xlabel('ROP (m/s)', size=20)
ax1.set_ylabel('Depth (m)', size=20)
ax1.tick_params(labelsize=15, rotation = 90)


ax2.set_aspect('auto')
ax2.plot(pd.to_numeric(df['WOB - N']), pd.to_numeric(df['Depth - m']), '-b', linewidth = 2)
ax2.set_xlabel('WOB - N', size=20)
ax2.set_ylabel('Depth (m)', size=20)
ax2.set_xlim(-10000, 100000)
ax2.tick_params(labelsize=15, rotation = 90)
fig.tight_layout()
fig.show()

Subscribe for Updates and Conversations

2 thoughts on “Understanding the WITSML format in the Volve Oilfield dataset”

  1. Hi Ankit, do you have any document or reference that describes what each mnemonic means, such as LAGMWDIFF, MWOUT, STRATESUM, etc. ?

Leave a Reply

Your email address will not be published. Required fields are marked *