Visualizing Raw Data
Key Investigators
Josh Siegle (Allen Institute), Jeremy Magland (Flatiron Institute)
Project Description
A crucial (but often ignored) step in the spike sorting process is visual inspection of the raw data before and after pre-processing steps have been performed. This can be done by looking at the raw traces for individual channels, but a qualitative assessment of overall data quality is often easier to grasp by looking at the channels x samples
matrix displayed as an image. Using standard Python plotting libraries does not allow efficient interactions with large datasets, which is one reason why this step may be skipped.
We will attempt to overcome this limitation by using deck.gl, a visualization framework for big datasets that works well in the browser and in Python environments (via the pydeck bindings). Many of its functions assume you’re looking at geospatial data, but for the most part they are general enough to be adapted for ephys. Other frameworks may work as well, but deck.gl seems the best for getting up and running quickly.
Objectives
- Create an interactive visualization of high-channel-count raw + preprocessed data that can run in a browser or a Jupyter notebook
- Overlay sorting results on top of the original data
- Integrate with existing tools, such as SpikeInterface and sortingview
Approach and Plan
- Talk with other hackathon attendees to determine whether there are any unforeseen limitations of deck.gl that make it unsuitable for this application.
- Select a visualization framework (deck.gl or an alternative)
- Create an interactive visualization of raw data. This will likely require an intermediate step in which the data is saved as an image pyramid or other multi-scale format.
- Create an interactive visualization of spike sorting results, with spikes distributed in space and time.
- Combine the outputs of 3 + 4 into one visualization.
- Add raw data plotting functions to SpikeInterface and (time permitting) sortingview.
Progress
Pre-hackathon
-
The
figurl-tiled-image
repository - creates a tiled image from anumpy
array and makes it accessible through the browser (using kachery-cloud and figurl). -
Proof-of-concept example of Neuropixels data displayed as a deck.gl
TileLayer
via figurl.
During the hackathon
-
Example of Neuropixels 1.0 data displayed after three different pre-processing steps (filtering, phase shifting, and referencing).
-
Code and example of Neuropixels 2.0 data displayed after three different pre-processing steps (centering, filtering, and referencing).
The last example demonstrates how these plotting methods can be used in practice: given a set of SpikeInterface
recording extractors, it’s now possible to generate a link to a scrollable, zoomable visualization of the raw data.
Next steps
-
The code we have written could be more tightly integrated into
SpikeInterface
by creating an alternative backend for theplot_timeseries
function. Currently, this method displays the raw traces usingmatplotlib
, but there could also be an option to export the traces as a tiled image onfigurl
. -
We have not yet tried to overlay the sorting results on top of the raw data. This is a more involved project, but one that will be useful for comparing the output of different sorters.
Materials
Example data for this project. Includes raw traces for a Neuropixels 1.0 and 2.0 recording, plus Kilosort 2 sorting results.
Background and References
deck.gl TileLayer
example - useful for displaying raw data
pydeck ScatterplotLayer
example - useful for displaying sorting results