2 Comments
User's avatar
Rishwanth T's avatar

A api call to a Vision Language Model would do the task.

Am I missing something ?

Expand full comment
Tarik Kranda's avatar

Thanks to DoorDash Team for sharing their experience with the community and thank you guys for bringing this to us with your expansions. I wondered about two subjects after reading.

1- The decrease in the ratio of human supervision once the pipeline is in production? Business outcome of developing this pipeline?

2- I wondered if this two models would be connected tandem to mitigate their (i.e. Model2 -> Model1) rather than working in parallel to each other, would this improve the overall number of transcription validated in guardrail model?

Thank you!

Expand full comment