### CDT – The Climate Data Toolbox for MATLAB

Today’s post is about CDT, the Climate Data Toolbox for MATLAB. The development of this new toolbox has been speerheaded by Chad A. Greene, an active long-time contributor to the MATLAB community and scientist at the Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA. CDT has been released and an accompanying paper has just been published (Greene et al., 2019).

CDT is special in many respects. First of all, it contains a versatile set of MATLAB functions for analyzing and displaying climate data, be it gridded, spatiotemporal or time series data. The functions are computationally efficient and quite easy to use. Second, CDT comes with many tutorials that describe not only how to use CDT functions, but also detail mathematical backgrounds, and offer guidance on how to interpret the results in the context of Earth science processes.

As geomorphologist, I am often working with climate data. In fact, my diploma thesis was on dust mobilization in the Sahara and its large-scale meteorological controls (Schwanghart and Schütt, 2008). In retrospective, CDT would have helped me a lot in learning the computational tools required for the analysis. Too bad, CDT was not available at this time.

Give CDT a try. Chad Greene’s website has the documentation online. If you install CDT (download from github or the MATLAB file exchange), this documentation will be available in the MATLAB help browser, too.

**References**

Greene, C.A., Thirumalai, K., Kearney, K.A., Delgado, J.M., Schwanghart, W., Wolfenbarger, N.S., Thyng, K.M., Gwyther, D.E., Gardner, A.S., Blankenship, D.D., 2019. The Climate Data Toolbox for MATLAB. *Geochemistry, Geophysics, Geosystems*, accepted [DOI: 10.1029/2019GC008392]

Schwanghart, W., Schütt, B. (2008): Meteorological causes of Harmattan dust in West Africa. *Geomorphology*, 95, 3-4, 412-428. [DOI: 10.1016/j.geomorph.2007.07.002]

### TopoToolbox dependencies

TopoToolbox is a MATLAB software. MATLAB consists of the core product MATLAB and can be extended by numerous toolboxes that have specific purposes, and that cost extra. TopoToolbox also comes as a toolbox, and is free. When typing ver in the command line, TopoToolbox should be listed together with the other available toolboxes.

TopoToolbox relies mainly on core MATLAB functions, but unfortunately it also requires a few toolboxes. Many universities and academic institutions may have licences for these toolboxes, but if you are working for a private company or some governmental administration, access to these toolboxes may be limited, often due to monetary reasons. Accordingly, usability of TopoToolbox may be limited for many.

Dirk and I are well aware of this problem. Thus, we have designed TopoToolbox in a way that keeps toolbox dependencies at a minimum. However, this is not completely feasible. This is because, first, we only have limited time available for our TopoToolbox work. Writing some functions of other Toolboxes ourself would certainly be possible, but it is likely that they would not run with the same efficiency and speed. It would also mean reinventing the wheel. Second, we could make use of external and free third-party toolboxes and libraries that are available on the MATLAB file exchange and github. However, that would also mean maintaining these dependencies which can be difficult if there are major changes. Of course, MATLAB toolboxes are also changed frequently, but these changes are usually well documented in the release notes. Thus, to secure a positive user-experience we largely refrain from using third-party toolboxes.

What are toolbox dependencies of TopoToolbox?

DEM analysis heavily relies on image processing algorithms. In fact, gridded DEMs are just georeferenced gray-scale images. Thus, TopoToolbox requires the Image Processing (IP) Toolbox which provides access to numerous filtering and morphological algorithms (e.g. image erosion and dilation, gray-weighted distance transforms). If you don’t have the IP toolbox, you will quickly encounter errors that state that a particular function is missing. If you don’t have the IP Toolbox but you want to still use TopoToolbox, I’d recommend using dipimage, an excellent image processing toolbox. But this would mean making numerous changes to the code. I won’t say it is impossible, but it’ll be quite some work.

Then, you should have the Mapping Toolbox. If you don’t have it, reading and writing geocoded data (geotiffs, shapefiles) is a bit tedious. Moreover, some of the functions (e.g. STREAMobj2latlon) would not work because they rely on the Mapping Toolbox functions that transform coordinates from projected to geographic systems and vice versa. TopoToolbox works well without georeferencing, but having the Mapping Toolbox is much more convenient.

Some new functions (crs, smooth (only if output is positive only), quantcarve) need the Optimization Toolbox because they rely on large-scale sparse linear and quadratic programming algorithms. If you don’t have the Optimization Toolbox, you won’t be able to run these functions. There are some third-party optimization products available, but I haven’t looked at alternatives so far. If you know good alternative toolboxes that are free and implement large-scale sparse optimization techniques, please let me know. mnoptim uses the bayesopt function that come with the Statistics and Machine Learning Toolbox.

Finally, some of the functions are coded so that they make use of the Parallel Computing Toolbox. However, they run without this toolbox, too.

All in all, to run TopoToolbox you should definitely have the Image Processing Toolbox and the Mapping Toolbox. Some functions require other toolboxes, but perhaps you don’t really need these functions. Hope I didn’t forget any dependencies. If I did, please let me know.

### Meet us at the EGU 2019

Dirk and I will attend the EGU this year. And as luck would have it, our presentations on point processes on river networks (PICO4.7 | EGU2019-18489) and on identification and ordering of divides (PICO4.6 | EGU2019-14979) are in the same PICO session (**Tue, 09 Apr, 10:55–10:59, PICO spot 4**). Interactive presentation time is afterwards from **11:15-12:30**.

If you want to chat and discuss with us, please come to our interactive presentations. We will highlight some of the upcoming additions to TopoToolbox, and we are curious what you think of them.

### iceTEA – Tools for Exposure Ages from ice margins

*My guest blogger today is Richard Selwyn Jones from Durham University. He developed iceTEA, a set of MATLAB tools to calculate exposure ages. He kindly accepted my invitation to write here about his software. Thanks!
*

iceTEA (Tools for Exposure Ages) is a new toolbox for plotting and analysing cosmogenic-nuclide surface-exposure data. The tools were originally intended for data that are used to reconstruct past glacier and ice sheet geometries, but they can potentially be used for many other geochronology and geomorphology applications of cosmogenic nuclides.

The iceTEA tools are available via an online interface (http://ice-tea.org) and as MATLAB code with a ready-to-use front-end script for each tool (https://github.com/iceTEA-code). While the online version performs all primary analysis and plotting functionality, the code provides the user with greater flexibility to apply the tools for specific needs and also includes some additional options.

There are eight tools, which provide the following functionality: 1) calculate exposure ages from ^{10}Be and ^{26}Al data, 2) plot exposure ages as kernel density estimates and as a horizontal or vertical transect, 3) identify and remove outliers within a dataset, 4) plot nuclide concentrations on a two-isotope diagram and as a function of depth, 5) correct exposure ages for cover of the rock surface, 6) correct ages for changes in elevation through time, and estimate 7) average and 8) continuous rates of change (e.g. ice margin retreat or thinning).

Here is an example of how you can use the code to correct data for long-term uplift:

data_name = 'input_data.xlsx'; % File name used for sample data scaling_model = 'LSD'; % 'DE','DU','LI','ST','LM','LSD','LSDn' % Set elevation correction correction_type = 'rate'; % Select 'model' or 'rate' GIA_model = []; % If 'model', set GIA model to use - 'I5G' or 'I6G' elev_rate = 2; % If 'rate', set rate of elevation change (m/ka) sample_data = get_data(data_name); % Load sample data % Calculate Elevation-corrected Exposure Ages if strcmp(correction_type,'model') elev_input = GIA_model; elseif strcmp(correction_type,'rate') elev_input = elev_rate; end corrected = elev_correct(sample_data,scaling_model,correction_type,elev_input);

The corrected exposure ages can then be plotted as kernel density estimates:

% Plot settings feature = 1; % Data from single feature? yes = 1, no = 0 save_plot = 0; % Save plot? 1 to save as .png and .eps, otherwise 0 mask = []; % Select samples to plot (default is all) time_lim = []; % Optionally set x-axis limits (in ka) weighted = []; % Optionally select weighted (1) or unweighted (0) mean and standard deviation (default is weighted) % Plot figure plot_kernel(corrected.plot.corr,feature,save_plot,mask,time_lim,weighted);

The purpose of these tools is to allow users to explore the spatial and temporal patterns in their data in a consistent and inter-comparable way, and also to initiate discussion of further improvements in the application and analysis of surface-exposure data. We welcome suggestions for additional plotting or analysis tools.

**Reference**

Jones, R.S., Small, D., Cahill, N., Bentley, M.J. and Whitehouse, P.L., 2019. iceTEA: Tools for plotting and analysing cosmogenic-nuclide surface-exposure data from former ice margins. *Quaternary Geochronology*, 51, 72-86. [DOI: 10.1016/j.quageo.2019.01.001]

### Which tolerance should I choose when using knickpointfinder?

Kai Deng from the GFZ Potsdam commented on my previous post on knickpointfinder and asked the question about the parameter ‘tol’:

[…] one question on the parameter “tol”. You said that “the value of tol should reflect uncertainties that are inherent in longitudinal river profile data”. However, I’m still confused about how to translate this sentence into code, or in another word, how can I validate my choice of tol by some criteria based on codes?

Now that’s an excellent question. The ‘tol’-parameter determines how many knickpoints the finder will detect and we might not want to miss a knickpoint. At the same time, we know that river profile data often has some erroneous values and we wouldn’t want to pick knickpoints that are merely artefacts of the data. So, what value of ‘tol’ should we choose?

I wrote that “the value of tol should reflect uncertainties that are inherent in longitudinal river profile data”. More specifically, I would argue that one should choose tolerance values that are higher than the maximum expected error between the measured and the true river profile. With lower tolerance values we would likely increase the false positive rate, i.e., the probability that we choose a knickpoint that is due to an artefact.

Now, the problem that we immediately face is that we don’t know this error. Of course, there are quality measures for most common DEM products but usually these do not apply to values measured along valley bottoms, at least if you take common global DEM products such as SRTM, ASTER GDEM, etc (Schwanghart and Scherler, 2017). Usually, there are also no repeated measurements. Hence, we cannot estimate the error as we would in the case of measurements in the lab.

But there is another approach: What we know for sure is that river profiles should monotonously decrease downstream. At least, this is true for the water surface, yet not necessarily bed elevations (e.g. due to the presence of riffle-pool sequences). The elevation of a pixel on the stream network should thus be either lower or equal to the elevation of its upstream neighbor. If it is not, than the value is probably wrong (for whatever reason). I say ‘probably’, because the upstream pixel might have an erroneously low value, too.

The following code demonstrates how we could determine these errors: Through downstream minima imposition (imposemin) and upstream maxima imposition (cummaxupstream). The shaded areas in the thus generated plot (here I show a zoomed version) show the differences between these values.

DEM = GRIDobj('srtm_bigtujunga30m_utm11.tif'); FD = FLOWobj(DEM); S = STREAMobj(FD,'minarea',1000); S = klargestconncomps(S); zmax = cummaxupstream(S,DEM); zmin = imposemin(S,getnal(S,DEM)); plotdzshaded(S,[zmin zmax]) hold on plotdz(S,DEM,'color','k')

The maximum difference between the elevations is the maximum relative error that we estimate from the profile.

max(zmax-zmin) ans = 43

Now we might want to take this elevation difference as tolerance value for the knickpointfinder. However, this value might be very high if our profile is characterized by a few positive or negative outliers. This would result in only few if any knickpoints. Thus, we could come up with a more robust estimate of the error which is based on quantiles. Robust in this sense means: robust to outliers. And the proposed technique is **constrained regularized smoothing (CRS)** (Schwanghart and Scherler, 2017).

What is constrained regularized smoothing? It is an approach to hydrologically correct DEMs based on quantile regression techniques. While minima or maxima imposition present two extremes of hydrological conditioning – carving and filling, respectively – CRS (and quantile carving, which is similar but doesn’t smooth) is a hybrid technique and returns river longitudinal profiles that run along the tau’s quantile of the data. Why would this be preferred? Simply because we would not want that the shape of the resulting profile is predominantly determined by the errors with the largest magnitude.

Here is how to apply quantile carving using the quantiles 0.1 and 0.9.

z90 = crs(S,DEM,'K',1,'tau',0.9); z10 = crs(S,DEM,'K',1,'tau',0.1); max([z90-z10]) ans = 41.7629

Of course, the maximum difference is lower than the previous value obtained by minima and maxima imposition. But not very much. Let’s see how well the algorithm performs if we add some additional errors.

IX = randlocs(S,10); IX = find(ismember(S.IXgrid,IX)); z = getnal(S,DEM); z(IX) = z(IX) + sign(randn(numel(IX),1))*100; plotdz(S,z)

Let’s compute the difference between carving and filling.

zmin = imposemin(S,z); zmax = cummaxupstream(S,z); max(zmax-zmin) ans = 110

And whatabout our uncertainty estimate based on the CRS algorithm?

z90 = crs(S,z,'K',1,'tau',0.9); z10 = crs(S,z,'K',1,'tau',0.1); d = max(z90-z10) d = 41.7629

The CRS result is not affected at all by few outliers. That means, it enables us to derive a rather robust estimate about the uncertainty of the data.

Moreover, we can use the CRS-corrected profile as input to the knickpointfinder, because we have filtered the spikes from the data. As elevations are often overestimated, you might use directly the node-attribute list z10.

I have shown that you can use an uncertainty estimate derived from your river profile as direct input to knickpointfinder. There are, however, a few things to note:

- Don’t carve or fill your data before using CRS. Otherwise, your uncertainty estimate is flawed.
- The uncertainty estimate may not reliably detect the true uncertainty in your data. Alsways cross-check whether the uncertainty estimate is either too low or too high.
- I haven’t talked about the smoothness parameter ‘K’ which also controls the uncertainty estimates. Larger values of K will likely increase the uncertainty estimate.
- CRS can take a while for very large river networks. Consider estimating the tolerance value from a subset of the data.
- You can also use a node-attribute list of tolerances in knickpointfinder and thus account for spatially varying tolerances. In this case, however, add some offset because the CRS might indicate areas with zero uncertainty and thus overestimate the true precision of the data.

## References

Schwanghart, W., Scherler, D., 2017. Bumps in river profiles: uncertainty assessment and smoothing using quantile regression techniques. *Earth Surface Dynamics*, 5, 821-839. [DOI: 10.5194/esurf-5-821-2017]

### Working with knickpointfinder

I have learnt from numerous comments here on this blog, that the knickpointfinder function seems to be quite popular. However, working with the output of the function often entails some difficulties. So here is a short post about knickpointfinder.

Let’s load and prepare some data first:

DEM = GRIDobj('srtm_bigtujunga30m_utm11.tif'); FD = FLOWobj(DEM); S = STREAMobj(FD,'minarea',1000); S = klargestconncomps(S);

In these first four lines of code, I loaded the DEM of the Big Tujunga catchment, derived flow directions (stored in FD, an instance of FLOWobj), and then extracted the stream network S based on a minimum supporting drainage area of 1000 pixels which is about 0.9 sqkm. In the last line, I clipped the stream network to the largest connected stream network, that is in this case the largest drainage basin.

Now we are ready to run knickpointfinder:

[zs,kp] = knickpointfinder(S,DEM,'tol',30,'split',false);

The function takes the stream network S, the digital elevation model DEM and then a few parameter name-value pairs. knickpointfinder reconstructs to the actual longitudinal profile by iteratively removing a curvature constraint at those locations that have a maximum vertical offset to the actual profile. After few iterations, the vertical offset decreases and ‘tol’ sets the minimum vertical offset where the algorithm should stop. The value of tol should reflect uncertainties that are inherent in longitudinal river profile data (Schwanghart and Scherler, 2017).

‘split’ enables running the function in parallel. The function splits the stream network into connected components (see the function STREAMobj2cell) and sends each to individual workers. Parallel execution, however, is only faster if there are several drainage basins. Since we have only one in this example, I set ‘split’ to false. There are other options to control verbose and visual output and you can explore these options by typing

help knickpointfinder

Now, knickpointfinder returned two output variables: zs and kp. zs is a node-attribute list that contains the reconstructed river profile. You can plot it using plotdz.

plotdz(S,DEM) hold on plotdz(S,zs) hold off

kp is a structure array that holds all kind of information.

kp kp = struct with fields: n: 26 x: [26×1 double] y: [26×1 double] distance: [26×1 double] z: [26×1 double] IXgrid: [26×1 double] order: [26×1 double] dz: [26×1 double] nal: [6985×1 logical]

Admittedly, this is a bit messy and there will be some changes in the future. But this will still have to wait a bit. Now what do the structure fields contain? kp.n is 26. Hence, the algorithm has identified 26 knickpoints. kp.x and kp.y contain the coordinates and kp.z the elevations of the knickpoints. The distance from the outlet is stored in kp.distance. kp.IXgrid contains the linear indices into the DEM. TopoToolbox works a lot with linear indices and thus this output is very crucial. kp.order contains the order with which the knickpoints where identified. However, several knickpoints may have the same order and thus this information is not very meaningful. kp.dz is the vertical offset between the modelled profile at the time when the knickpoint was identified. Finally, kp.nal is a logical vector and a node-attribute list. It allows indexing into other node attribute lists.

Now, here are a few things you might want to do with the structure array kp.

## 1. Plot a map

imageschs(DEM) hold on plot(S,'k'); plot(kp.x,kp.y,'ko','MarkerFaceColor','w') hold off

## 2. Plot a longitudinal river profile

plotdz(S,DEM); hold on plot(kp.distance,kp.z,'ko','MarkerFaceColor','w') hold off

## 3. Plot a longitudinal river profile in chi space

A = flowacc(FD); c = chitransform(S,A,'mn',0.45); plotdz(S,DEM,'distance',c); hold on [~,locb] = ismember(kp.IXgrid,S.IXgrid); ckp = c(locb); plot(ckp,kp.z,'ko','MarkerFaceColor','w') hold off

xlabel('\chi [m]')

## 4. Export as shapefile

MS = STREAMobj2mapstruct(S); shapewrite(MS,'streamnet'); MKP = struct('X',num2cell(kp.x),'Y',num2cell(kp.y),'Geometry','Point'); shapewrite(MKP,'knickpoints');

Ok, that’s a basic work flow that you may apply easily to your own DEM. Give it a try.

## References

Schwanghart, W., Scherler, D., 2017. Bumps in river profiles: uncertainty assessment and smoothing using quantile regression techniques. *Earth Surface Dynamics*, 5, 821-839. [DOI: 10.5194/esurf-5-821-2017]

### Fill missing values

DEMs often have missing values. Reasons include lack of coherence in radar imagery or lack of backscattered signals in LiDAR data. Sometimes, it is just a pixel that is missing, but frequently there are larger contiguous regions with no data. These missing values can be annoying because they inhibit flow routing.

An approach to get rid of these pixels is to fill them with elevation data obtained by another DEM. For example, missing data in some of the first versions of the SRTM were often filled with 1 km resolution GTOPO3 data. Alternatively, one can use techniques that estimate pixel values from the surrounding pixels that have data. Those of you, who have worked with ArcGIS might know the function nibble, which uses nearest neighbor interpolation to estimate missing values.

TopoToolbox also has a function to fill regions with missing values. The function name is inpaintnans because inpainting is a common technique in image processing and restoration. The function takes either another DEM to fill missing values, or uses different techniques to estimate from the surrounding pixels. The option ‘nearest’ is the one that is equal to the tool nibble in ArcGIS.

However, while nearest neighbor interpolation is suited for nominal or ordinal scaled data, it might produce unwanted steps in continuous data such as elevation. Thus, the recommended (and default) approach in TopoToolbox is laplacian interpolation. This type of interpolation is explained nicely by Steve Eddins here and implemented in the function regionfill that is used by the TopoToolbox function inpaintnans. In short, this approach interpolates inward by solving a Dirichlet boundary value problem and thus derives a smooth surface.

The Taiwan dataset that is among the new example files on the DEM repository has some missing values. Here is how to apply inpainting:

DEM = readexample('taiwan');

axislim = [0.1970 0.2384 2.5183 2.5951]*10^6;

subplot(2,2,1);

imageschs(DEM)

hold on

plot([axislim(1) axislim(2) axislim(2) axislim(1) axislim(1)],...

[axislim(3) axislim(3) axislim(4) axislim(4) axislim(3)],...

'r','LineWidth',2)

subplot(2,2,2);

imageschs(DEM,[],'caxis',[0100])

axis(axislim);

subplot(2,2,3);

DEMin = inpaintnans(DEM);

imageschs(DEMin);

hold on

plot([axislim(1) axislim(2) axislim(2) axislim(1) axislim(1)],...

[axislim(3) axislim(3) axislim(4) axislim(4) axislim(3)],...

'r','LineWidth',2)

subplot(2,2,4);

imageschs(DEMin,[],'caxis',[0 100]);

axis(axislim);

It seems that some regions were not inpainted. However, these regions have a connection to the sea. To be filled, regions with no data must be entirely surrounded by pixels with data.

Inpainting can deal with missing values in DEMs. However, there are also other applications. In our recent paper by Amelie Stolle we use laplacian interpolation to reconstruct a model of the undissicted surface of Pokhara Formation in Nepal which allows us to estimate incision depths and rates.

#### Reference

Stolle, A., Schwanghart, W., Andermann, C., Bernhardt, A., Fort, M., Jansen, J.D., Wittmann, H., Merchel, S., Rugel, G., Adhikari, B.R., Korup, O., 2019. Protracted river response to medieval earthquakes. *Earth Surface Processes and Landforms*, 44, 331-341. [DOI: 10.1002/esp.4517]