Skip to main content
Log in

Why is Information Retrieval a Scientific Discipline?

  • Published:
Foundations of Science Aims and scope Submit manuscript

Abstract

It is relatively easy to state that information retrieval (IR) is a scientific discipline but it is rather difficult to understand why it is science because what is science is still under debate in the philosophy of science. To be able to convince others that IR is science, our ability to explain why is crucial. To explain why IR is a scientific discipline, we use a theory and a model of scientific study, which were proposed recently. The explanation involves mapping the knowledge structure of IR to that of well-known scientific disciplines like physics. In addition, the explanation involves identifying the common aim, principles and assumptions in IR and in well-known scientific disciplines like physics, so that they constrain the scientific investigation in IR in a similar way as in physics. Therefore, there are strong similarities in terms of the knowledge structure and the constraints of the scientific investigations between IR and scientific disciplines like physics. Based on such similarities, IR is considered a scientific discipline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Al-Maskari, A., Sanderson, M., & Clough, P. (2008). Relevance judgments between TREC and non-TREC assessors. In Proceedings of the 31st ACM SIGIR conference (pp. 683–684).

  • Azzopardi, L., & Roelleke, T. (2007). Explicitly considering relevance within the language modeling framework. In Proceedings of the 1st international conference on theory of information retrieval (pp. 125–134).

  • Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454.

    Article  Google Scholar 

  • Basat, R. B., Tennenholtz, M., & Kurland, O. (2015). The probability ranking principle is not optimal in adversarial retrieval settings. In Proceedings of ICTIR’15 (pp. 51–60).

  • Cartwright, N. (1995). False idealization: A philosophical threat to the scientific method. Philosophical Studies, 77(2–3), 339–352.

    Article  Google Scholar 

  • Cerf, V. G. (2012). Where is the science in computer science? Communications of the ACM, 55(10), 5.

    Article  Google Scholar 

  • Chalmers, A. F. (2013). What is this thing called science?. Maidenhead: Open University Press.

    Google Scholar 

  • Cleland, C. E. (2001). Historical science, experimental science and the scientific method. Geology, 29(11), 987–990.

    Article  Google Scholar 

  • Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), 100–111.

    Article  Google Scholar 

  • Costa, A., & Roda, F. (2011). Recommender systems by means of information retrieval. In Proceedings of WIMS’11, Article no. 57.

  • Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Upper Saddle River, NJ: Pearson Addison-Wesley.

    Google Scholar 

  • Damessie, T. T., Nghiem, T. P., Scholer, F., & Culpeper, J. S. (2017). Gauging the quality of relevance assessments using inter-rater agreement. In Proceedings of the 40th ACM SIGIR conference (pp. 1089–1092).

  • Dang, E. K. F., Wu, H. C., Luk, R. W. P., & Wong, K. F. (2009). Building a framework for the probability ranking principle by a family of expected weighted rank. ACM Transactions on Information Systems, 27, 4.

    Article  Google Scholar 

  • Denning, P. J. (2005). Is computer science science? Communications of the ACM, 48(4), 27–31.

    Article  Google Scholar 

  • Denning, P. J. (2007). Computing is a natural science. Communications of the ACM, 50(7), 13–18.

    Article  Google Scholar 

  • Denning, P. J. (2013). The science in computer science. Communications of the ACM, 56(5), 35–38.

    Article  Google Scholar 

  • Feyeraband, P. (2011). The tyranny of science. London: Polity Press.

    Google Scholar 

  • Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval, 11(3), 251–265.

    Article  Google Scholar 

  • Fuhr, N. (2012). Salton award lecture information retrieval as an engineering science. ACM SIGIR Forum, 46(2), 19.

    Article  Google Scholar 

  • Fuhr, N. (2017). Some common mistakes in IR evaluation, and how they can be avoided. ACM SIGIR Forum, 51(3), 32–41.

    Article  Google Scholar 

  • Gonzalo, G. (2010). Is computer science truly scientific? Communications of the ACM, 53(7), 37–39.

    Article  Google Scholar 

  • Greiff, W. R. (1998). A theory of term weighting based on exploratory data analysis. In Proceedings of the 21st ACM SIGIR conference (pp. 11–19).

  • Huston, S., & Croft, W. B. (2014). A comparison of retrieval models using term dependencies. In Proceedings of the 23rd ACM CIKM conference (pp. 111–120).

  • Indri. (2013). INDRI: Language modeling meets inference networks. The Lemur Project. Retrieved June 27, 2020 from http://lemurproject.org/indri/.

  • Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information System, 20(4), 422–446.

    Article  Google Scholar 

  • Kosso, P. (2007). Scientific understanding. Foundations of Science, 12(2), 119–130.

    Article  Google Scholar 

  • Lafferty, J., & Zhai, C. X. (2001). Probabilistic relevance models based on document and query generation. In B. Croft & J. Lafferty (Eds.), Language modeling for information retrieval (pp. 1–10). Dordrecht: Springer.

    Google Scholar 

  • Lavrenko, V. (2009). A Generative Theory of Relevance. Berlin: Springer.

    Google Scholar 

  • Lin, J. (2018). The neural hype and comparison against weak baselines. ACM SIGIR Forum, 52(2), 40–51.

    Article  Google Scholar 

  • Luk, R. W. P. (2008). On event space and rank equivalence between probabilistic retrieval models. Information Retrieval, 11, 539–561.

    Article  Google Scholar 

  • Luk, R. W. P. (2010). Understanding scientific study via process modeling. Foundations of Science, 15(1), 49–78.

    Article  Google Scholar 

  • Luk, R. W. P. (2017). A theory of scientific study. Foundations of Science, 22(1), 11–38.

    Article  Google Scholar 

  • Luk, R. W. P. (2018). To explain or to predict: Which one is mandatory? Foundations of Science, 23(2), 411–414.

    Article  Google Scholar 

  • Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3), 216–244.

    Article  Google Scholar 

  • Paik, J. H. (2013). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th ACM SIGIR conference (pp. 343–352).

  • Popper, K. (1959). The logic of scientific discovery. London: Hutchinson.

    Google Scholar 

  • Rapaport, W. J. (2019). Philosophy of computer science. Retrieved March 25, 2019 from http://cse.buffalo.edu/~rapaport/Papers/phics.pdf.

  • Raza, K. (2014). Is the discipline “computer science” a “natural science”? Retrieved June 27, 2020 from https://www.researchgate.net/post/Is_the_discipline_Computer_Science_a_Natural_Science2.

  • Reiss, J., & Sprenger, J. (2017). Scientific objectivity. In E. N. Zalta (Eds.), The Stanford encyclopedia of philosophy (Winter 2017 Edition). Retrieved June 27, 2020 from https://plato.stanford.edu/archives/win2017/entries/scientific-objectivity.

  • Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33, 294–304.

    Article  Google Scholar 

  • Robertson, S. E. (2006). On GMAP: And other transformations. In Proceedings of the 15th ACM CIKM conference (pp. 78–83).

  • Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the Association for Information Science and Technology, 26(6), 321–343.

    Google Scholar 

  • Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR conference (pp. 21–29).

  • Sordoni, A., Nie, J.-Y., & Bengio, Y. (2013). Modeling term dependencies with quantum language models for IR. In Proceedings of the 36th ACM SIGIR conference (pp. 653–662).

  • Spärck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21.

    Article  Google Scholar 

  • Terrier. (2019). Terrier v5.1. University of Glasgow. Retrieved July 3, 2019 from http://terrier.org.

  • Van Fraassen, B. (1980). The scientific image. Oxford: Clarendon Press.

    Book  Google Scholar 

  • Van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.

    Google Scholar 

  • Van Rijsbergen, C. J. K. (2006). Quantum haystacks. In Proceedings of the 29th ACM SIGIR conference (pp. 1–2).

  • Wong, K. F., Song, D., Bruza, P., & Chen, C.-H. (2001). Application of aboutness to functional benchmarking in information retrieval. ACM Transactions on Information Systems, 19(4), 337370.

    Article  Google Scholar 

  • Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting TF-IDF weights as making relevance decisions. ACM Transactions on Information Systems, 26, 3.

    Article  Google Scholar 

  • Yang, P, & Feng, H. (2016). A reproducibility study of information retrieval models. In Proceedings of ICTIR’16 (pp. 77–86).

  • Zamani, H., Croft, W. B., & Culpepper, J. S. (2018). Neural query performance prediction using weak supervision from multiple signals. In Proceedings of the 41st ACM SIGIR conference (pp. 105–114).

  • Zhai, C. X. (2011). Axiomatic analysis and optimization of information retrieval models. In Proceedings of ICTIR 2011 conference (p. 1).

  • Zhai, C. X., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2), 179–214.

    Article  Google Scholar 

  • Zobel, J. (2017). What we talk about when we talk about information retrieval. ACM SIGIR Forum, 51(3), 18–26.

    Article  Google Scholar 

  • Zuccon, G., Azzopardi, L. A., & van Rijsbergen, C. J. K. (2009). The quantum probability ranking principle for information retrieval. In Proceedings of the ICTIR ‘09 (pp. 232–240).

  • Zuo, J., Wang, M., Wan, J., Wu, G., & Wu, S. (2012). Modified information retrieval model based on Markov network. In Proceedings of international conference on network computing and information security (pp. 307–314).

Download references

Acknowledgements

I thank Dr. Edward Dang for running the random search model. I also thank the anonymous reviewers for their constructive, insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert W. P. Luk.

Ethics declarations

Conflict of interest

The corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luk, R.W.P. Why is Information Retrieval a Scientific Discipline?. Found Sci 27, 427–453 (2022). https://doi.org/10.1007/s10699-020-09685-x

Download citation

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s10699-020-09685-x

Keywords