The flow is as follows
1) Real time data is ingested into system via stream ingestion tools like Flume , Kafta or Samza. The choice of which tool to be used among these dependent on factors like do we need event guaranteed delivery , are we bothered about sequence of events. etc. This is topic for another posr
2) Apply the processing required on the incoming data via Spark streaming api and make it ready to be consumable by third party apps
3) Using the REST gateway job server of Spark , third party apps trigger further spark jobs to process the data and return results back to the applications.