What we read in 2024: The year’s highest-selling books

Normal text sizeLarger text sizeVery large text sizeAustralians were hungry for more in 2024 from RecipeTin Eats founder Nagi Maehashi who topped the country’s bestseller list for the second year as her two cookbooks ruled the charts.Readers devoured Maehashi’s second cookbook, RecipeTin Eats: Tonight, which sold nearly 300,000 copies in less than three months. It was the highest-selling title of the year, according to figures from Nielsen BookScan.Nagi Maehashi confirmed she has the recipe for success with her second book Tonight.Credit: James BrickwoodThe Sydneysider’s first release, RecipeTin Eats: Dinner, followed with 177,000 sales, after topping the list in 2023. Dinner is now the biggest-selling cookbook in Australia since records began, ahead of Jamie Oliver’s 2012 Jamie’s 15-Minute Meals and Kim McCosker and Rachael Bermingham’s 2010 4 Ingredients. Maehashi, who left a high-flying career in corporate finance to launch the blog RecipeTin Eats in 2014, has since sold more than 895,000 books in Australia generating $25.5 million.Pan MacMillan Australia’s publishing director Ingrid Ohlsson said Maehashi (who is a Good Food columnist) was a once-in-a-generation phenomenon.“There is nothing flash-in-the-pan about Nagi’s success. She is tirelessly and forensically focused on meeting everyday people where they are on their cooking journeys. Her mission is to ‘get us’, not ‘impress us’. This is why we can’t get enough,” Ohlsson said.Liane Moriarty’s 10th novel Here One Moment was the top fiction title of 2024, and third overall after Maehashi’s cookbooks, selling 158,000 copies, all in the final quarter of the year. The only other Australian novel to crack the top 20 was Trent Dalton’s third novel, 2023’s Lola in the Mirror, with 80,000 copies sold in 2024. Other strong local fiction performers included Dervla McTiernan’s crime novel What Happened to Nina?, Sarah A. Parker’s romantasy When the Moon Hatched and Tim Winton’s dystopian novel Juice. It Ends With Us was boosted by the tie-in to its controversial film adaptation.Credit: APAdvertisementBookTok continued to shape fiction sales, with older releases and genre novels dominating the top 10. Colleen Hoover’s 2016 novel It Ends With Us ranked as the fourth-highest seller, boosted by the release of a new edition tied to the controversial film adaptation starring Blake Lively. Desire still burns bright for romantasy with Sarah J. Maas’ 2015 A Court of Thorns and Roses and Rebecca Yarros’ 2023 Iron Flame the next highest-selling novels, followed by Freida McFadden’s 2022 thriller The Housemaid. One of the year’s most anticipated literary fiction releases, Sally Rooney’s fourth novel Intermezzo, ranked 19th with 73,000 copies sold, matching the sales of her third novel, Beautiful World, Where Are You, released in 2021.The top fiction title of the year. John Farnham’s hardback The Voice Inside (written with Poppy Stockell) defied sluggish non-fiction sales and cost of living challenges to take ninth spot with 93,000 copies sold after the autobiography performed strongly in the lead-up to Christmas.While genre fiction and cookbooks saw growth, overall sales in the Australian book market dropped in 2024. The volume of titles sold dropped by 1.3 per cent from 69.8 million to 68.9 million, representing a decline in value of about $40 million to $1.29 billion. Sales have dropped the last two years yet are still tracking above 2021 and 2020, and pre-pandemic 2019 levels. The data captures printed books only.Australian Publishers Association chief executive Patrizia Di Biase-Dyson highlighted significant growth in the e-book and audiobook markets, noting that more than 150 million library loans were recorded in the latest national borrowing data.“The average sales price for a book is down, with books providing great value for consumers. Despite big increases in supply chain costs for publishers and retailers, those costs are not being passed on to consumers,” Dyson said.Biase-Dyson said it was promising to see Australian authors hold their ground against international titles, claiming the top three spots on the year’s bestseller list.“Despite being in a global English-language market, the bestselling three books are from Australian authors, with local publishing teams. This is a real success for the Australian publishing industry, and for readers, who want to see local stories and content on their shelves,” Biase-Dyson said.While Maehashi is not expected to serve up thirds in 2025, there will be significant international releases from Emily Henry, Rebecca Yarros, Taylor Jenkins Reid, and Suzanne Collins, as well as notable local releases from Geraldine Brooks, Hannah Kent, and Michael Robotham.What was your favourite book of 2024? Tell us in the comments below, and discover the books we’re excited to read this year here. Most Viewed in Culture

Accelerated computing is driving AI innovation in the business world

UNLOCKING THE POTENTIAL OF AI WITH NVIDIA

NVIDIA offers an integrated stack of hardware and software solutions designed to help businesses across industries solve complex challenges. The following case studies illustrate the transformative impact of these tools.

1. NVIDIA GPUs: The cornerstone of NVIDIA’s accelerated computing platform.Real-world use: FPT Smart Cloud, a member of FPT Corporation, an NVIDIA cloud partner and AI provider in Vietnam, leveraged NVIDIA H100 Tensor Core GPUs and NVIDIA HGX H200 in their AI Factories to develop multi-lingual AI agents for tasks such as customer service and employee training. Long Chau, a major pharmaceutical chain in Vietnam, used the employee AI agent and saw a 55 per cent improvement in pharmacist knowledge quality while cutting training resources by 30 per cent.

2. NVIDIA AI: A suite of AI services including pre-trained models, training scripts, software development kits and frameworks to speed up AI workflows, improve accuracy, efficiency and performance while reducing cost.Real-world use: American Express leveraged fraud detection algorithms to monitor all customer transactions globally in real time, detecting fraud in just milliseconds. Using a combination of advanced algorithms — one of which tapped into the NVIDIA AI platform — American Express enhanced model accuracy, advancing the company’s ability to better fight fraud and improving security for cardholders.

3. NVIDIA Metropolis: An AI-powered video analytics platform for smart city applications. It processes video and sensor data in real time to improve safety, efficiency and decision-making in areas like traffic management, public safety and building automation.Real-world use: Vietnam Posts and Telecommunications Group (VNPT) used NVIDIA Metropolis to build a system for monitoring and analysing traffic patterns in Tan An City, Long An Province. This initiative, combined with VNPT’s AI models, achieved an 80 per cent reduction in traffic violations within two months of implementation. Over a 10-month period, it detected 2,400 traffic violations and supported local authorities in tracing and investigating security offenders, reinforcing public safety.

Three grants give boost to COCC initiative to encourage student diversity in science programs

BEND, Ore. (KTVZ) — An initiative to encourage student diversity in science programs at Central Oregon Community College and to promote broader representation in scientific fields in general has gained new momentum, thanks to grant awards from the Bloomfield Family Foundation, the Randall Charitable Trust and the Central Oregon Health Council.

Support comes from a $10,000 grant from the Bloomfield Family Foundation, the second such award from the Portland-based organization, along with a $10,000 grant from the Randall Charitable Trust and $20,000 in aid from the Central Oregon Health Council.

A portion of the funding is being allocated for an outreach program that COCC has developed with a number of Central Oregon rural middle schools, in partnership with the Central Oregon STEM Hub, to bring underrepresented students to the college three times a year for hands-on science learning sessions.

Another portion of the funding will support a one-week summer “bridge” program for incoming college freshmen, to build confidence and connection around science-based programs.

Small stipends will be allocated to faculty who participate in the bridge program or middle school outreach activities, and also to Teresa DeShow, coordinator of the program.

“We’re trying to get students to see, early on, that they can be a scientist,” said DeShow, an assistant professor of biology at COCC. “We looked at the available research to determine best practices for supporting underrepresented students in the sciences, then developed a program tailored for Central Oregon based off of what we found.” 

COCC’s long-running summer high school symposiums, designed for district Latinx, Native American and Black students, which all incorporate academic samplers, have helped serve as a model.

According to a 2021 report from the Pew Research Center, there are significant gaps in workforce diversity when it comes to the STEM fields of science, technology, engineering and math. For instance, the study found that Hispanic workers make up 17% of total employment across all occupations, but just 8% of all STEM workers.

For more information, contact Zak Boone, COCC’s vice president of college advancement and executive director of the COCC Foundation, at [email protected] or 541-383-7212.

Intellectuals, scientists urged to contribute to national development

Party General Secretary Tô Lâm shakes hand with former President Trương Tấn Sang at a meeting with former Party and State leaders, distinguished senior officials on Thursday in HCM City. — VNA/VNS Photo Thống NhấtHÀ NỘI — Party General Secretary Tô Lâm urged intellectuals and scientists to strive to fulfil their responsibilities and missions during the new era, contributing alongside the Party, people and armed forces to achieve the country’s strategic goals.Speaking on Thursday morning in HCM City, during a meeting with former Party and State leaders and distinguished senior officials, as well as scientists and intellectuals, he urged them to have high aspirations.The meeting was attended by former Politburo members and former Presidents Nguyễn Minh Triết and Trương Tấn Sang, former Politburo member and former Prime Minister Nguyễn Tấn Dũng, former Politburo member and former Permanent Secretary of the Party Central Committee Lê Hồng Anh, as well as other former leaders of the Party and State.Lâm also told the intellectual community to educate, nurture and support the younger generations to advance, becoming a powerful driving force for building and protecting the nation, while contributing to shaping the future of humanity and global civilisation.He highlighted the importance of building a network connecting domestic and international experts, scientists and Vietnamese intellectuals abroad.”The Party, State and people have great faith and expectations in intellectuals and scientists – the vanguard who play a pivotal role in driving innovation and breakthroughs, aiming for the rapid and sustainable development of our country in the new era,” he said.He also added that despite many internal and external challenges, the Party, people and armed forces had made significant efforts and achieved important accomplishments.In 2024, the country successfully met its economic and social development goals and made critical strides in Party-building and the development of the political system.Notably, the contributions of former leaders, intellectuals, scientists and artists had been immense, he said.Their intellectual efforts, dedication and valuable experience had played a crucial role in policy formulation and the introduction of breakthrough solutions across all sectors.Creating a space for cultural developmentParty General Secretary Tô Lâm (eighth from left) and delegates at a meeting with former Party and State leaders, distinguished senior officials on Thursday in HCM City.— VNA/VNS Photo Thống NhấtHe emphasised that since the establishment of the Party and the Government, the Party and State had always paid particular attention to the artistic community.Numerous resolutions, mechanisms and policies had been enacted to create favourable conditions and a conducive space for the development of arts and culture.In response, the artistic community had grown increasingly mature, making significant contributions to the revolutionary cause of the Party.Revolutionary artists had become the cultural vanguard of the Party, a key force in shaping the new cultural landscape, promoting the development of a socialist-oriented cultural industry, nurturing the spiritual life of the people and contributing to preserving the rich cultural heritage of the nation.“Their work has helped to elevate the national identity, glorify the country’s beauty and contribute to global civilisation,” he added.Therefore he said, in the future, artists were told to continue to contribute by creating works of significant artistic and intellectual value, celebrating the ideals of truth, goodness and beauty, serving the people and supporting the revolutionary cause of the Party and the nation.He revealed that the Party would soon issue a resolution on the development of culture and arts and would study the formulation of a National Strategy for building a cultural era.The National Assembly, Government and relevant agencies would collaborate to address legal, policy, financial and investment barriers, creating the resources and space for artists to freely create and produce while combating distorted, regressive and anti-cultural ideologies.Also at the meeting, Lâm said the goals of building a strong, prosperous, democratic, equitable and civilised Việt Nam were not only the aspirations of individuals but also the collective mission of the entire nation.Then he expressed his hope for continued support and valuable contributions from former leaders, intellectuals, scientists and artists – individuals who had always been a source of inspiration, motivation and spiritual strength on the nation’s development journey. — VNS

Family releases name of deceased night cleaner found at Syracuse Academy of Science

SYRACUSE, N.Y. — The deceased found at the Syracuse Academy of Science Charter School has been identified as Brian Deforge by his sister, Kathryn Duvergel.Deforge’s body was found inside the cafeteria of the Academy of Science on the morning of Jan. 7 by school staff. He worked as a night cleaner at the school. A vigil was held for Deforge at the Syracuse Academy of Science at 1001 Park Ave. on Thursday. Around 30 people showed up to remember him. Those who came were family, friends and staff. They brought candles, flowers, Deforge’s picture and stuffed animals.Crews from the Syracuse Fire Department responded to the report of his body being found on Jan. 7 and their carbon monoxide detectors began going off inside the cafeteria. No official cause of death has been provided by the Syracuse Police Department as of Wednesday afternoon. The police are investigating his death and the CO leak found at the school.

18 data science tools to consider using in 2025

The increasing volume and complexity of enterprise data as well as its central role in decision-making and strategic planning are driving organizations to invest in the people, processes and technologies they need to gain valuable business insights from their data assets. That includes a variety of tools commonly used in data science applications.

In the latest version of an annual survey that’s now conducted by the Data & AI Leadership Exchange, an education and advisory firm, investments in data and AI initiatives were cited as a top priority by 90.5% of chief data officers and other IT and business executives from 125 large organizations. Looking ahead, 98.4% expect spending increases on such initiatives this year, according to a report titled “2025 AI & Data Leadership Executive Benchmark Survey” that was published in December 2024.
The survey — done in partnership with DataIQ, which runs a community for data leaders — also found that 93.7% of the organizations got measurable business value from their data and AI investments in 2024. Nearly half — 46.4% — reported either significant value or a high degree of transformational value.
As data science teams build their portfolios of enabling technologies to help achieve those business goals, they can choose from a wide selection of tools and platforms. Here’s a rundown of 18 top data science tools, listed in alphabetical order, with details on their features and capabilities as well as some potential limitations. The list was compiled by Informa TechTarget editors based on research of available technologies plus market analysis from firms such as Forrester Research and Gartner.

This article is part of

1. Apache Spark
Apache Spark is an open source data processing and analytics engine that can handle large amounts of data — upward of several petabytes, according to proponents. Spark’s ability to rapidly process data has fueled significant growth in the use of the platform since it was created in 2009, making the Spark project one of the largest open source communities among big data technologies.
Due to its speed, Spark is well suited for continuous intelligence applications powered by near-real-time processing of streaming data. However, as a general-purpose distributed processing engine, Spark is equally suited for extract, transform and load uses as well as other SQL batch jobs. In fact, Spark initially was touted as a faster alternative to the MapReduce engine for batch processing in Hadoop clusters.
Spark is still often used with Hadoop but can also run standalone against other file systems and data stores. It features an extensive set of developer libraries and APIs, including a machine learning library and support for key programming languages, making it easier for data scientists to quickly put the platform to work.

2. D3
Another open source tool, D3 is a JavaScript library for creating custom data visualizations in a web browser. D3, which stands for Data-Driven Documents, uses web standards such as HTML, Scalable Vector Graphics and CSS instead of its own graphical vocabulary. The technology’s developers describe it as a dynamic and flexible tool that requires a minimum amount of effort to generate visual representations of data.
Also referred to as D3.js, the tool lets visualization designers bind data to documents via the Document Object Model and then use DOM manipulation methods to make data-driven transformations to the documents. First released in 2011, it can be used to design various types of data visualizations and supports features such as interaction, animation, annotation and quantitative analysis.
D3 includes more than 30 modules and 1,000 visualization methods, making it complicated to learn. In addition, many data scientists don’t have JavaScript skills. As a result, they might be more comfortable with a commercial visualization tool such as Tableau, leaving D3 to be used more by data visualization developers and specialists who are also members of data science teams.

[embedded content]

3. IBM SPSS
IBM SPSS is a family of software for managing and analyzing complex statistical data. It includes two primary products: SPSS Statistics, a statistical analysis, data visualization and reporting tool, and SPSS Modeler, a data science and predictive analytics platform with a drag-and-drop UI and machine learning capabilities.
SPSS Statistics covers every step of the analytics process, from planning to model deployment, and enables users to clarify relationships between variables, create clusters of data points, identify trends and make predictions, among other capabilities. It can access common structured data types and offers a combination of a menu-driven UI, its own command syntax, and the ability to integrate R and Python extensions. It also has features for automating procedures and import-export ties to SPSS Modeler.
Created by SPSS Inc. in 1968, initially with the name Statistical Package for the Social Sciences, the statistical analysis software was acquired by IBM in 2009, along with the predictive modeling platform, which SPSS had previously bought. While the product family is officially called IBM SPSS, the software is still usually known simply as SPSS.

4. Julia
Julia is an open source programming language used for numerical computing as well as machine learning and other kinds of data science applications. In a 2012 blog post announcing Julia, its four creators said they set out to design one language that addressed all their needs. A big goal was to avoid having to write programs in one language and convert them to another for execution.
To that end, Julia combines the convenience of a high-level dynamic language with performance that’s comparable to statically typed languages, such as C and Java. Users don’t have to define data types in programs, but an option allows them to do so. The use of a multiple dispatch approach at runtime also helps to boost execution speed.
Julia 1.0 became available in 2018, nine years after work began on the language; the latest version is 1.1.1. The documentation for Julia notes that because its compiler differs from the interpreters in data science languages like Python and R, new users “may find that Julia’s performance is unintuitive at first.” But, it claims, “once you understand how Julia works, it is easy to write code that is nearly as fast as C.”

[embedded content]

5. Jupyter Notebook/JupyterLab
An open source web application, Jupyter Notebook enables interactive collaboration among data scientists, data engineers, mathematicians, researchers and other users. It’s a computational notebook tool that can be used to create, edit and share code as well as explanatory text, images and other information. For example, Jupyter users can add software code, computations, comments, data visualizations and rich media representations of computation results to a single document, known as a notebook, which can then be shared with and revised by colleagues.
As a result, notebooks “can serve as a complete computational record” of interactive sessions among the members of data science teams, according to Jupyter Notebook’s documentation. The notebook documents are JSON files that have version control capabilities. In addition, a Notebook Viewer service lets users render notebooks as static webpages for viewing by users who don’t have Jupyter installed on their systems.
Jupyter Notebook’s roots are in the programming language Python. It was originally part of the open source IPython interactive toolkit project before being split off in 2014. The loose combination of Julia, Python and R gave Jupyter its name. Along with supporting those three languages, Jupyter has modular kernels for dozens of others. The open source project also includes JupyterLab, a newer web-based UI that’s more flexible and extensible than the original one.

6. Keras
Keras is a programming interface that enables data scientists to access and use machine learning platforms more easily. It’s an open source deep learning API and framework written in Python that runs on top of TensorFlow, PyTorch and JAX. Keras initially supported multiple back ends but was tied exclusively to TensorFlow starting with its 2.4.0 release in 2020. However, multiplatform support was restored in Keras 3.0, a full rewrite released in late 2023.
As a high-level API, Keras was designed to drive easy and fast experimentation that requires less coding than other deep learning options. The goal is to accelerate the implementation of machine learning models — in particular, deep learning neural networks — through a “quick and easy” development process, as the Keras documentation puts it. In addition, models can be run on any of the supported platforms without code changes.
The Keras framework includes a sequential interface for creating relatively simple linear stacks of layers with inputs and outputs as well as a functional API for building more complex graphs of layers or writing deep learning models from scratch.

7. Matlab
Developed and sold by software vendor MathWorks since 1984, Matlab is a high-level programming language and analytics environment for numerical computing, mathematical modeling and data visualization. It’s primarily used by conventional engineers and scientists to analyze data; design algorithms; and develop embedded systems for wireless communications, industrial control, signal processing and other applications. This is often in concert with a companion Simulink tool that offers model-based design and simulation capabilities.
While Matlab isn’t as widely used in data science applications as languages such as Python, R and Julia, it does support machine learning and deep learning, predictive modeling, big data analytics, computer vision, and other work done by data scientists. Data types and high-level functions built into the platform are designed to speed up exploratory data analysis and data preparation in analytics applications.
Considered relatively easy to learn and use, Matlab — short for “matrix laboratory” — includes prebuilt applications but also lets users build their own. It also has a library of add-on toolboxes with discipline-specific software and hundreds of built-in functions, including the ability to visualize data in 2D and 3D plots.

8. Matplotlib
Matplotlib is an open source Python plotting library that’s used to read, import and visualize data in analytics applications. Data scientists and other users can create static, animated and interactive data visualizations with Matplotlib, using it in Python scripts, the Python and IPython shells, Jupyter Notebook, JupyterLab, web application servers, and various GUI toolkits.
The library’s large code base can be challenging to master, but it’s organized in a hierarchical structure that’s designed to enable users to build visualizations mostly with high-level commands. The top component in the hierarchy is pyplot, a module that provides a “state-machine environment” and a set of simple plotting functions like those in Matlab.
First released in 2003, Matplotlib also includes an object-oriented interface that can be used together with pyplot or on its own. It supports low-level commands for more complex data plotting. The library is primarily focused on creating 2D visualizations but offers an add-on toolkit with 3D plotting features.

[embedded content]

9. NumPy
Short for Numerical Python, NumPy is an open source Python library that’s used widely in scientific computing, engineering, and data science and machine learning applications. The library consists of multidimensional array objects and routines for processing those arrays to enable various mathematical and logic functions. It also supports linear algebra, random number generation and other operations.
One of NumPy’s core components is the N-dimensional array (ndarray) which represents a collection of items that are the same type and size. An associated data-type object describes the format of the data elements in an array. The same data can be shared by multiple ndarrays, and data changes made in one can be viewed in another.
NumPy was created in 2006 by combining and modifying elements of two earlier libraries. The NumPy website touts it as “the universal standard for working with numerical data in Python.” It is generally considered one of the most useful libraries for Python because of its numerous built-in functions. It’s also known for its speed, partly resulting from the use of optimized C code at its core. In addition, various other Python libraries are built on top of NumPy.

[embedded content]

10. Pandas
Another popular open source Python library, pandas typically is used for data analysis and manipulation. Built on top of NumPy, it features two primary data structures: the Series one-dimensional array and the DataFrame, a two-dimensional structure for data manipulation with integrated indexing. Both can accept data from NumPy ndarrays and other inputs. A DataFrame can also incorporate multiple Series objects.
Created in 2008, pandas has built-in data visualization capabilities; exploratory data analysis functions; and support for file formats and languages that include CSV, SQL, HTML and JSON. Additionally, it provides features such as intelligent data alignment, integrated handling of missing data, flexible reshaping and pivoting of data sets, data aggregation and transformation, and the ability to quickly merge and join data sets, according to the pandas website.
The developers of pandas say their goal is to make it “the fundamental high-level building block for doing practical, real-world data analysis in Python.” Key code paths in pandas are written in C or the Cython superset of Python to optimize its performance. The library can be used with various kinds of analytical and statistical data, including tabular, time series and labeled matrix data sets.

[embedded content]

11. Python
Python is the most widely used programming language for data science and machine learning and one of the most popular languages overall. The Python open source project’s website describes it as “an interpreted, object-oriented, high-level programming language with dynamic semantics,” built-in data structures, and dynamic typing and binding capabilities. The site also touts Python’s simple syntax, saying it’s easy to learn and its emphasis on readability reduces the cost of program maintenance.
The multipurpose language can be used for a wide range of tasks, including data analysis, data visualization, AI, natural language processing and robotic process automation. Developers can create web, mobile and desktop applications in Python too. In addition to object-oriented programming, it supports procedural, functional and other types plus extensions written in C or C++.
Python is used not only by data scientists, programmers and network engineers but also by workers outside of computing disciplines, from accountants to mathematicians and scientists. They are often drawn to its user-friendly nature. Python 2.x and 3.x are both production-ready versions of the language, although support for the 2.x line ended in 2020.

[embedded content]

12. PyTorch
An open source framework used to build and train deep learning models based on neural networks, PyTorch is touted by its proponents for supporting fast and flexible experimentation as well as a seamless transition to production deployment. The Python library was designed to be easier to use than Torch, a precursor machine learning framework that’s based on the Lua programming language. PyTorch also provides more flexibility and speed than Torch, according to its creators.
First released publicly in 2017, PyTorch uses arraylike tensors to encode model inputs, outputs and parameters. Its tensors are similar to the multidimensional arrays supported by NumPy, but PyTorch adds built-in support for running models on GPUs. NumPy arrays can be converted into tensors for processing in PyTorch and vice versa.
The library includes various functions and techniques, including an automatic differentiation package named torch.autograd, a module for building neural networks, a TorchServe tool for deploying PyTorch models, and deployment support for iOS and Android devices. In addition to the primary Python API, PyTorch offers a C++ API that can be used as a separate front-end interface or to create extensions for Python applications.

13. R
The R programming language is an open source environment designed for statistical computing and graphics applications as well as data manipulation, analysis and visualization. Many data scientists, academic researchers and statisticians use R to retrieve, cleanse, analyze and present data, making it one of the most popular languages for data science and advanced analytics.
The open source project is supported by The R Foundation, and thousands of user-created packages with libraries of code that enhance R’s functionality are available. One example is ggplot2, a well-known package for creating graphics that’s part of a collection of R-based data science tools named tidyverse. In addition, multiple vendors offer integrated development environments and commercial code libraries for R.
R is an interpreted language like Python and has a reputation for being relatively intuitive. It was created in the 1990s as an alternative version of S, a statistical programming language that was developed in the 1970s. R’s name is both a play on S and a reference to the first letter of the names of its two creators.

14. SAS
SAS is an integrated software suite for statistical analysis, advanced analytics, BI and data management. Developed and sold by software vendor SAS Institute Inc., the platform helps users integrate, cleanse, prepare and manipulate data. They can then analyze it using different statistical and data science techniques. SAS can be used for various tasks from basic BI and data visualization to risk management, operational analytics, data mining, predictive analytics and machine learning.
The development of SAS started in 1966 at North Carolina State University. Use of the technology began to grow in the early 1970s, and SAS Institute was founded in 1976 as an independent company. The software was initially built for use by statisticians — SAS was short for Statistical Analysis System. But over time, it was expanded to include a broad set of functionality and became one of the most widely used analytics suites in both commercial enterprises and academia.
Development and marketing are now focused primarily on SAS Viya, a cloud-based version of the platform that was launched in 2016 and redesigned to be cloud-native in 2020.

15. Scikit-learn
Scikit-learn is an open source machine learning library for Python that’s built on the SciPy and NumPy scientific computing libraries as well as Matplotlib for plotting data. It supports both supervised and unsupervised machine learning and includes numerous algorithms and models, called estimators in scikit-learn parlance. Additionally, it provides functionality for model fitting, selection and evaluation and for data preprocessing and transformation.
Initially called scikits.learn, the library started as a Google Summer of Code project in 2007, and the first public release became available in 2010. The first part of its name is short for “SciPy toolkit” and is also used by other SciPy add-on packages. Scikit-learn primarily works on numeric data that’s stored in NumPy arrays or SciPy sparse matrices.
The library’s suite of tools also enables various other tasks, such as data set loading and the creation of workflow pipelines that combine data transformer objects and estimators. But scikit-learn has some limits due to design constraints. For example, it doesn’t support deep learning, reinforcement learning or GPUs. The library’s website also says its developers “only consider well-established algorithms for inclusion.”

[embedded content]

16. SciPy
SciPy is another open source Python library that supports scientific computing uses. Short for Scientific Python, it features a set of mathematical algorithms and high-level commands and classes for data manipulation and visualization. It includes more than a dozen subpackages that contain algorithms and utilities for functions such as data optimization, integration and interpolation as well as algebraic equations, differential equations, image processing and statistics.
The SciPy library is built on top of NumPy and can operate on NumPy arrays. But SciPy delivers additional array computing tools and provides specialized data structures, including sparse matrices and K-dimensional trees, to extend beyond NumPy’s capabilities.
SciPy predates NumPy; it was created in 2001 by combining different add-on modules built for the Numeric library that was one of NumPy’s predecessors. Like NumPy, SciPy uses compiled code to optimize performance. In its case, most of the performance-critical parts of the library are written in C, C++ or Fortran.

17. TensorFlow
TensorFlow is an open source machine learning platform developed by Google that’s particularly popular for implementing deep learning neural networks. The platform takes inputs in the form of tensors that are akin to NumPy multidimensional arrays and then uses a graph structure to flow the data through a list of computational operations specified by developers. It also offers an eager execution programming environment that runs operations individually without graphs, which provides more flexibility for research and debugging machine learning models.
Google made TensorFlow open source in 2015, and Release 1.0.0 became available in 2017. TensorFlow uses Python as its core programming language and incorporates Keras as a high-level API for building and training models. Alternatively, a TensorFlow.js library enables model development in JavaScript, and custom operations — ops, for short — can be built in C++.
The platform also includes a TensorFlow Extended, or TFX, module for end-to-end deployment of production machine learning pipelines. In addition, it supports LiteRT, a runtime tool for mobile and IoT devices that formerly was named TensorFlow Lite. TensorFlow models can be trained and run on CPUs, GPUs and Google’s special-purpose Tensor Processing Units.

[embedded content]

18. Weka
Weka is an open source workbench that provides a collection of machine learning algorithms for use in data mining tasks. Weka’s algorithms, called classifiers, can be applied directly to data sets without any programming via a GUI or a command-line interface that offers additional functionality. They can also be implemented through a Java API.
The workbench can be used for classification, clustering, regression, and association rule mining applications. It also includes a set of data preprocessing and visualization tools. In addition, Weka supports integration with R, Python, Spark and other libraries like scikit-learn. For deep learning uses, an add-on package combines it with the Eclipse Deeplearning4j library.
Weka is free software licensed under the GNU General Public License. It was developed at the University of Waikato in New Zealand starting in 1992. An initial version was rewritten in Java to create the current workbench, which was first released in 1999. Weka stands for the Waikato Environment for Knowledge Analysis. It is also the name of a flightless bird native to New Zealand that the technology’s developers say has “an inquisitive nature.”

[embedded content]

Data science and machine learning platforms
Commercially licensed platforms that provide integrated functionality for machine learning, AI and other data science applications are also available from numerous software vendors. The product offerings are diverse. They include machine learning operations hubs, automated machine learning platforms and full-function analytics suites, with some combining MLOps, AutoML and analytics capabilities. Many platforms incorporate some of the data science tools listed above.
Matlab and SAS can also be counted among the data science platforms. Other prominent platform options for data science teams include the following technologies:

Altair RapidMiner.
Alteryx AI Platform for Enterprise Analytics.
Amazon SageMaker.
Anaconda.
Azure Machine Learning.
BigML.
Databricks Data Intelligence Platform.
Dataiku.
DataRobot.
Domino.
Google Cloud Vertex AI.
H2O AI Cloud.
IBM Watson Studio.
Knime.
Qubole.
Saturn Cloud.

Some platforms are also available in free open source or community editions. Examples include Dataiku and H2O. Knime combines an open source analytics platform with a commercial Knime Hub software package that supports team-based collaboration and workflow automation, deployment and management.
Editor’s note: Informa TechTarget editors updated this article in January 2025 for timeliness and to add new information.
Mary K. Pratt is an award-winning freelance journalist with a focus on covering enterprise IT and cybersecurity management.