DataLad
Distributed data versioning and management software.
DataLad is a command line tool for data management and sharing. DataLad can download existing DataLad-prepared datasets and can assist in sharing your own data. DataLad can track changes to data and supports data versioning.
Development status
DataLad is production software and is actively maintained.
Innovation
By applying source code best practices to data, DataLad has been able to build on existing tools to rapidly build a usable system for data management, versioning, and sharing.
Citation information
Halchenko, Y., Meyer, K., Poldrack, B., Solanky, D., Wagner, A., Gors, J., MacFarlane, D., Pustina, D., Sochat, V., Ghosh, S., Mönch, C., Markiewicz, C., Waite, L., Shlyakhter, I., de la Vega, A., Hayashi, S., Häusler, C., Poline, J.-B., Kadelka, T., … Hanke, M. (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262. https://doi.org/10.21105/joss.03262
Requisite knowledge to use
- Command line familiarity
- Git familiarity is helpful but not mandatory
- git-annex familiarity is helpful but not mandatory
Requisite technical requirements
- One of the following systems, and proficiency in its installer
- Debian (install with apt)
- macOS (install with conda or Homebrew)
Links
- Home page: https://datalad.org
- Tutorial: https://handbook.datalad.org/
- Installation: https://handbook.datalad.org/en/latest/intro/installation.html
- Full documentation: http://docs.datalad.org/en/stable/
- How to get help: https://github.com/datalad/datalad/issues
- Testimonials: https://github.com/datalad/datalad/wiki/Testimonials
Representative publications
Wagner, A. S., Waite, L. K., Wierzba, M., Hoffstaedter, F., Waite, A. Q., Poldrack, B., Eickhoff, S. B., & Hanke, M. (2022). FAIRly big: A framework for computationally reproducible processing of large-scale data. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01163-2
Zhao, C., Jarecka, D., Covitz, S., Chen, Y., Eickhoff, S. B., Fair, D. A., Franco, A. R., Halchenko, Y. O., Hendrickson, T. J., Hoffstaedter, F., Houghton, A., Kiar, G., Macdonald, A., Mehta, K., Milham, M. P., Salo, T., Hanke, M., Ghosh, S. S., Cieslak, M., & Satterthwaite, T. D. (2024). A reproducible and generalizable software workflow for analysis of large-scale neuroimaging data collections using BIDS Apps. Imaging Neuroscience, 2, 1–19. https://doi.org/10.1162/imag_a_00074
Halchenko, Y. O., Goncalves, M., Ghosh, S., Velasco, P., Visconti di Oleggio Castello, M., Salo, T., Wodder, J. T., Hanke, M., Sadil, P., Gorgolewski, K. J., Ioanas, H.-I., Rorden, C., Hendrickson, T. J., Dayan, M., Houlihan, S. D., Kent, J., Strauss, T., Lee, J., To, I., … Kennedy, D. N. (2024). HeuDiConv — flexible DICOM conversion into structured directory layouts. Journal of Open Source Software, 9(99), 5839. https://doi.org/10.21105/joss.05839