Metadata-Version: 2.4
Name: aext-project-filebrowser-server
Version: 4.26.1
Summary: Anaconda Project FileBrowser server
Author-email: "Anaconda, Inc" <anaconda@anaconda.com>
License-File: LICENSE
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Requires-Dist: aext-shared<5,>=4
Requires-Dist: anaconda-auth>=0.7.1
Requires-Dist: jupyterlab-server<3,>=2.19.0
Requires-Dist: jupyterlab<5,>=4.0.9
Requires-Dist: sqlalchemy>=2.0.29
Requires-Dist: tornado
Requires-Dist: watchdog<5,>=4.0.1
Provides-Extra: dev
Requires-Dist: pytest-env==1.1; extra == 'dev'
Requires-Dist: pytest-tornasync==0.6.0.post2; extra == 'dev'
Requires-Dist: pytest<=8.3.4; extra == 'dev'
Description-Content-Type: text/markdown

# aext_project_filebrowser_server

---

**Table of Contents**

- [Installation](#installation)

## Installation

```console
pip install aext_project_filebrowser_server
```

## Unitests
To run all backend unitests just type in your terminal:

```
cd backend_lib/project_filebrowser
pytest -s --cov-report=term-missing:skip-covered "--cov=project_filebrowser"
```

Case you want to run a specific unitest file:

```
pytest -s --cov-report=term-missing:skip-covered "--cov=project_filebrowser" tests/FILENAME.py
```

or even a single unitest in a file:

```
pytest -s --cov-report=term-missing:skip-covered "--cov=project_filebrowser" tests/FILENAME.py -k TEST_NAME
```

## Migrations

This project relies on a SQLite database, and therefore it is necessary to manage database migrations.
Alembic was introduced to help with that and this topic will talk about it.

### Alembic config files

Due to the fact that the development environment does not mimic perfectly the configuration of our testing and production environments it was necessary to create 3 alembic configuration files. The main reason is the fact that the python path changes accordingly with the environment. Below there is a brief explanation about the 3 files:

- alembic.dev.ini: is used when running JLab locally for development purpose. It is necessary to set an env var named LOCAL
- alembic.prod.ini: used in production and contains the python path for PythonAnywhere servers
- alembic.dev.migrations.ini: used to handle migrations locally for development purpose (create, modify, upgrade, downgrade...)

### Creating migrations

Make sure to be in the root of the repository and run the following command

```bash
alembic -c  backend_lib/project_filebrowser/aext_project_filebrowser_server/alembic.dev.migrations.ini revision --autogenerate -m  "<name_your_migration>"
```

Always check the generated content and adjust accordingly to your needs.

### Running migrations

To run all migrations up to the head one use the following command

```bash
alembic -c  backend_lib/project_filebrowser/aext_project_filebrowser_server/alembic.dev.migrations.ini upgrade head
```

# Design decisions

Most of the big engineering decisions were made after working on specific **SPIKES** and there is a centralized document where all the SPIKES are compiled. The document can be found [here](https://docs.google.com/document/d/1UwUaSuUYKeSetRoRl829dXdAEpiKK23kLeq6WR0rQ3w/edit?usp=sharing).

## Database

This extension has a stateful characteristic and therefore saves data and states locally (in users). The chosen database was SQLite, for various reasons but mainly due to the support to ACID operations and also be a consolidated software. The decision was made after a SPIKE work ([TBP-1180](https://anaconda.atlassian.net/browse/TBP-1180)), check it out in case you want to know more about it.

In production the database is stored in users local storage, more specifically at `/var/www/filebrowser`. The `/var/www` directory is a commonly used directory for storing application data by our extensions, such as _panels_.

[SPIKE Link](https://docs.google.com/document/d/1UwUaSuUYKeSetRoRl829dXdAEpiKK23kLeq6WR0rQ3w/edit#heading=h.sh5w653byf6o)

## Websockets

Due to the nature of this project, which keeps synchronization between different servers, the team opted for adding support to websockets. It is important to highlight that websockets will be applied only in scenarios that make sense, therefore HTTP requests will still be present.
Past projects made clear that polling mechanisms can be very costly, not performant and also does not contribute positively to the general UX. Websockets would allow us not only to have a more performant application but also offer to users.

[[SPIKE Link](https://docs.google.com/document/d/1UwUaSuUYKeSetRoRl829dXdAEpiKK23kLeq6WR0rQ3w/edit#heading=h.kuqidosmn2k4)]

## Schemas

An issue encountered while developing the plugin is the two-way transformation of the content that we receive from the cloud to the local database. Given that there are differences between both of these segments, we require an efficient way to switch between both. In order to address this, the project uses 2 distinct project schemas: `CloudProject`, which is essentially a wrapper for the Project API endpoint responses, and `LocalProject`, which is suited to deal with local db operations.

Initially, we used only one `Project` schema to deal with both segments. This required defining the schema too broadly, making debugging more difficult. An example of this hardship was having to include `title`, `created_at`, `updated_at`, `metadata` and `description` fields in `Project`, when those fields don't have a column in the respective db table. Another source of frequent errors was when dealing with the `ProjectOwner` schema, because the `owner` column in the database was expected to be a string (`owner_id`), but often the entire `ProjectOwner` would be passed to it, updating the db with wrong information.

## Diagrams

### General architecture

![](./diagrams/filebrowser.png)

### Websockets pull operation example

![](./diagrams/pull-projects.png)

### General Dataflow

There are many different data flows within the application, therefore the idea of the diagram below is a representation of a general dataflow outlining the components that are part of most of the features.
![](./diagrams/data-flow-fileproject.png)
