DataLad

Distributed data versioning and management software.

DataLad is a command line tool for data management and sharing. DataLad can download existing DataLad-prepared datasets and can assist in sharing your own data. DataLad can track changes to data and supports data versioning.

Development status

DataLad is production software and is actively maintained.

Innovation

By applying source code best practices to data, DataLad has been able to build on existing tools to rapidly build a usable system for data management, versioning, and sharing.

Citation information

Halchenko, Y., Meyer, K., Poldrack, B., Solanky, D., Wagner, A., Gors, J., MacFarlane, D., Pustina, D., Sochat, V., Ghosh, S., Mönch, C., Markiewicz, C., Waite, L., Shlyakhter, I., de la Vega, A., Hayashi, S., Häusler, C., Poline, J.-B., Kadelka, T., … Hanke, M. (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262. https://doi.org/10.21105/joss.03262

RRID:SCR_003931

Requisite knowledge to use

  • Command line familiarity
  • Git familiarity is helpful but not mandatory
  • git-annex familiarity is helpful but not mandatory

Requisite technical requirements

  • One of the following systems, and proficiency in its installer
    • Debian (install with apt)
    • macOS (install with conda or Homebrew)

Links

Representative publications

Wagner, A. S., Waite, L. K., Wierzba, M., Hoffstaedter, F., Waite, A. Q., Poldrack, B., Eickhoff, S. B., & Hanke, M. (2022). FAIRly big: A framework for computationally reproducible processing of large-scale data. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01163-2

Zhao, C., Jarecka, D., Covitz, S., Chen, Y., Eickhoff, S. B., Fair, D. A., Franco, A. R., Halchenko, Y. O., Hendrickson, T. J., Hoffstaedter, F., Houghton, A., Kiar, G., Macdonald, A., Mehta, K., Milham, M. P., Salo, T., Hanke, M., Ghosh, S. S., Cieslak, M., & Satterthwaite, T. D. (2024). A reproducible and generalizable software workflow for analysis of large-scale neuroimaging data collections using BIDS Apps. Imaging Neuroscience, 2, 1–19. https://doi.org/10.1162/imag_a_00074

Halchenko, Y. O., Goncalves, M., Ghosh, S., Velasco, P., Visconti di Oleggio Castello, M., Salo, T., Wodder, J. T., Hanke, M., Sadil, P., Gorgolewski, K. J., Ioanas, H.-I., Rorden, C., Hendrickson, T. J., Dayan, M., Houlihan, S. D., Kent, J., Strauss, T., Lee, J., To, I., … Kennedy, D. N. (2024). HeuDiConv — flexible DICOM conversion into structured directory layouts. Journal of Open Source Software, 9(99), 5839. https://doi.org/10.21105/joss.05839