# **Mobroute:** Development Guide
This development guide provides various pointers on local development. If
something from this guide is missing feel free to add it. If you have
a question regarding something on this guide, please open a ticket on
the ticket tracker or send a note to the mailing list.

## Diagrams
For baseline understanding of the overall system architecture the diagrams
at the [diagrams doc page](doc_diagrams.md) can be very helpful.

## Internals: Architecture

**The Mobroute project is composed of 3 separate codebases / smaller projects:**

- [Mobsql](http://git.sr.ht/~mil/mobsql): is a GTFS-to-SQL ETL tool which
  pulls GTFS sources from the MobilityDB into a SQLite database. It exposes
  both a CLI & a Go library (consumed by Mobroute).
- [Mobroute](http://git.sr.ht/~mil/mobroute): is the core routing tool. This
  project handles interfacing Mobsql via its public Go library and then once the
  DB is loaded, opens the DB and performs routing calculations (currently via
  the Connection Scan Algorithm). It exposes both a CLI & a Go library (
  consumed by Transito).
- [Transito](http://git.sr.ht/~mil/transito): is a simple Android & Linux GUI 
  app to use Mobroute. It allows querying from/to via Nominatim and
  performing routing calculations on-the-go via Mobroute's underlying
  routing mechanism (note this implicitly integrates Mobsql as well).

## Internals: What goes into a Routing Calculation? (e.g. full data pipeline)
Routing dataflow follows a roughly 7-step process. The overall dataflow
upon submitting a routing request roughly looks like this:

One time (via `RTDatabase` via `loadmdbgtfs`/`loadcustomgtfs` and then `compute` op):

- (1) **Dataload: Mobility Database CSV Fetch**
    - A HTTP request is made to fetch the Mobility Database CSV
    - This CSV contains thousands of potential GTFS sources to use
    - The CSV is imported into Mobsql's SQLite database
    - (*Handled by Mobsql*)
- (2) **Dataload: Upstream (Agency) GTFS Fetch & Load**
    - Each feed ID in the routing request is downloaded from its source
      agency in raw ZIP form.
    - Each GTFS ZIP archive is imported into Mobsql's SQLite database, with
      each table mirroring GTFS spec (but having a 'feed_id' column
      added referring to GTFS feed ID from the mobility database. Adding this
      extra column allows 'multisource' storage of GTFS data & routing).
    - (*Handled by Mobsql*)
- (3) **Dataload: GTFS Computed Tables Calculations**
    - The routing algorithm essentially takes the input of several arrays
      as params which directly correlate with particular SQL-select statements
      pulling data from the SQLite database housing the GTFS data.
    - These routing logic *has the ability to operate directly on the GTFS data*
      (e.g. with no-preprocessing) selecting directly from the [core data
      extraction logic views](https://git.sr.ht/~mil/mobroute/tree/master/item/db/dbschemaextra/schemaextra.go), 
      however that would be quite inefficient.
    - There are massive indexing and precomputation gains, and as such
      several views used by the routing API are stored in 'computed tables'
      which are essentially just a materialized table indexed by MDBID
    - In this step, the views are translated to tables as such; views
      named `_vfoo` are materialized into tables like `_ctfoo`; note
      `_ct` is short for "computed table".
    - (*Handled by Mobsql; computed tabled spec'd as ExtraSchema by Mobroute*)

Routing Library call (via `RTRoute`):

- (1) **Routing Library Function: Prep for Algorithm - Memload SQL selects**
    - This step handles SQL selecting & loading to memory the data needed
      to run the core routing algorithm.
    - "Loader" functions (see `db/load*`) translate SQL selects into the 
      arrays of structs for each datatype (Connections, Transfers, etc.)
    - At the completion of this step all data needed is in memory
- (2) **Routing Library Function: CSA Algorithm Execution**
    - The [CSA algorithm](https://i11www.iti.kit.edu/extra/publications/dpsw-isftr-13.pdf) is
      run, passing the memload'd arrays from (4) as params to the main CSA
      function entrypoint as args.
    - The result / return value is an array of 'connections' which represents
      the most efficient route as limited down from the input array of
      connections. Each connection is essentially the 'quickest' way to
      reach each destination stop given the criteria input.
- (3) **Routing Library Function: Decoration / Memload SQL Connections Verbose**
    - The array of connections from (5) is not a user-friendly format and lacks
      details such as stop names / latitude / longitude and similar.
    - Thus we go back to the DB as the raw GTFS data has all this information
      so we just pull the same connections (by a UID) with verbose 
      information metadata.
- (4) **Routing Library Function: Formatting**
    - The result of (6) is formatted into different structures by pure functions
      depending on user input.
    - The main format is 'legs' which is just steps for accomplishing a route
      like walk here, take a trip here at x time etc. (in much more verbose
      and dataprocessable format ofcourse).
    - There is also a 'mapurl' formatter which translates the route into
      GeoJSON and uses a Map URL rendering service which speaks GeoJSON.

## Internals: Algorithm

The core of the routing system is based on the Connection Scan Algorithm
methodology. See the following papers for more details:

- [2013 Original CSA Paper](https://i11www.iti.kit.edu/extra/publications/dpsw-isftr-13.pdf)
- [2017 Followup CSA Paper](https://arxiv.org/pdf/1703.05997.pdf)


## Internals: Glossary

These abbreviations are used in the sourcecode, note explanation below:

| Term | Abbreviation | Explanation |
| ---- | ------------ | ------- |
| **StopUID** | Stop Unique ID | GTFS archives internally use 'stop id' to cross-reference stops. Since multiple GTFS archives can be handled with a single routing, schedule, etc. query in Mobroute; we dynamically create stop UIDs. UIDs in current implementation are always composed as {FEEDID}_{STOPID} |
| **DRUTCTime**: | Date-relative UTC Time | Refers to UTC time (in seconds) relative to the input date for a query. This is used primarily in the connections loader to abstract away timezones.  |
| **Feed ID** | | Refers to Mobroute's concept of a single GTFS archive correlating to a ID number. For Mobility DB sources this is always a positive number mapping to the MDB catalog origin feed ID; for custom feed IDs this is a negative user provided number. |
| **MDBID**: | Mobility Database ID | Source ID from pulled from the [Mobility Database Catalog](https://github.com/MobilityData/mobility-database-catalogs). |


## Testing: Using the route_tester.sh Script to Test GTFS Feeds
Mobroute works with potentially any GTFS source as specified in the
Mobility Database. While certain sources are tested by CI or known good,
it may be helpful (either because a source is untested, or you're too
lazy to specify routing request parameters) to determine if a source
"is good and works for routing". The `route_tester.sh` script serves
this functionality as essentially acting as a smoke testing script to
allow users to run a 'random' routing request with lax parameters to
determine if a particular MDBID can route properly.  To use this script,
after the mobroute binary is built (assuming you've section to build
the mobroute binary and then run: `./scripts/route_tester.sh MDBID`

For example:
`./scripts/route_tester.sh 1898`

## Testing: Running Unit & Integration Tests

Run (unit) tests:
```sh
./build.sh test
```

Run GTFS-based (integration) tests:
```sh
./build.sh testzipgtfs
```

Run unit & integration tests both:
```sh
./build.sh testall
```

Run individual unit tests packages:
```sh
./build.sh test ./dbquery_test
```

## Testing: Debugging Tips

Various debugging tips below:

- **Debugging via SQLite DB**:
    - Run `sqlite3 ~/.cache/mobroute/sqlite.db`
    - One example, check that the calendar for today actually produces dates: `
        - `select * from _vcaltoservice where service_date = 20231220 and source = 1898`
- **Clear Cache**:
    - If working with multiple sources and you paused the load process, it might
      not be a bad idea to clear the cache wholesale and retry again.
    - Run: `rm -rf ~/.cache/mobroute`


## Testing: Profiling

- Set env var MOBROUTE_PPROF to the file to write a pprof profile.
- Set env var MOBROUTE_CFG to set global JSON runtime config  (e.g. can
  alter mobroute MDB & mobsql params)

## Regenerate CLI Documentation `doc_cli.md` page
The CLI documentation page simply list each subcommand for the `mobroute`
binary and its usage. This is equivalent for each subcommand to running
the help text. As such, a generator script creates this page. Regenerate
the `doc_cli.md` page as follows:

```sh
./scripts/generate_cliguide.sh > doc/doc_cli.md
```

## Regenerate the master `mobroute_lib.go` file
For end-users we provide the single package `git.sr.ht/~mil/mobroute` from
which all public functionality is exposed. Types and functions in this
package are just aliased from constituent subpackages in `api/`. Doing
this aliasing allows from a development standpoint subpackages to be in
completely distinct and isolated namespaces ensuring modularity.

Rather then manually aliasing all public functionality from each
subpackage, we use a generator script to create the master library
file. Regenerate the master `mobroute_lib.go` file as follows:

```sh
./scripts/generate_mobroutelibgo.sh > mobroute_lib.go
```

