Indexed Arrays in Python: Series and DataFrame

Pandas provides Series (1D indexed arrays) and DataFrame (2D indexed arrays) for structured data (like tables with headers).

Installing Pandas

Install via pip:

code.python
python -m pip install pandas

Pandas Series

A Series is a 1D array with an index (labels for rows).

Creation:

code.python
>>> class="text-secondary">import pandas class="text-secondary">as pd
>>> ser = pd.Series([10, 20, 30, 40])  # Default index: 0–3
>>> ser = pd.Series({'a': 10, 'b': 20})  # Index class="text-secondary">from dict keys: 'a', 'b'
>>> ser = pd.Series(np.arange(10, 50, 10), index=['a', 'b', 'c', 'd'])  # Custom index

Description:

ser.shape: (4,) (length 4)

ser.index: Index(['a', 'b', 'c', 'd'])

ser.values: array([10, 20, 30, 40])

ser.head(2): First 2 elements

ser.tail(2): Last 2 elements

Indexing/Slicing:

By label: ser['b'] (20), ser[['a', 'd']] (Series with labels 'a' and 'd')

By position: ser.iloc[0] (10), ser.iloc[[0, 3]] (elements at positions 0 and 3)

Slicing: ser['a':'c'] (labels 'a' to 'c'), ser.iloc[1:] (positions 1 to end)

Boolean Indexing:

code.python
>>> ser[ser.values <= 20]  # Elements ≤ 20: a(10), b(20)
>>> ser[ser.index != 'a']  # Elements with index ≠ 'a': b(20), c(30), d(40)

Pandas DataFrame

A DataFrame is a 2D array with row and column indices (like a spreadsheet).

Creation:

code.python
>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data)  # Default indices: 02(rows), 02(columns)
>>> df = pd.DataFrame(data, index=['a', 'b', 'c';], columns=['A', 'B', 'C'])  # Custom indices
>>> data = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> df = pd.DataFrame(data)  # Dict: keys = columns, values = column data

Description:

df.shape: (3, 3) (3 rows, 3 columns)

df.index: Row indices

df.columns: Column indices

df.values: 2D array of data

df.info(): Summary (data types, non-null counts)

df.describe(): Descriptive stats (count, mean, std, min, quartiles, max)

Indexing/Slicing:

By column: df['A'] (Series of column 'A'), df[['A', 'C']] (DataFrame with columns 'A' and 'C')

By row: df.loc['a'] (Series of row 'a'), df.iloc[0] (row at position 0)

Slicing: df.loc['a':'c', 'A':'B'] (rows 'a'–'c', columns 'A'–'B')

Boolean Indexing:

code.python
>>> df[df['B';] >= 3]  # Rows where column 'B' ≥ 3
>>> df[(df['A'] >= 2) & (df['C'] == 9)]  # Rows where 'A' ≥ 2 and 'C' = 9
>>> df[df['B'].between(4, 9)]  # Rows where 'B' is between 4 and 9

Exchanging Data Between Excel and Pandas

Reading Excel Data: Use pd.read_excel(). Key parameters:

io: File path.

sheet_name: Sheet name/index (default: 0, first sheet).

header: Row number for column labels (default: 0, first row).

index_col: Column name/index for row labels.

usecols: Columns to import (e.g., usecols=['A', 'C']).

Example:

code.python
>>> df = pd.read_excel('D:\身份证号.xlsx', engine=&#x27;openpyxl')  # Read first sheet
>>> df = pd.read_excel('D:\身份证号.xlsx', sheet_name=[0, 1], index_col='工号', engine='openpyxl')  # Read multiple sheets

Writing Excel Data: Use df.to_excel(). Example:

code.python
>>> df0.to_excel('D:\\new_file.xlsx')  # Save DataFrame to Excel

Using xlwings with Pandas: Read/write multiple DataFrames to the same sheet:

code.python
>>> class="text-secondary">import xlwings class="text-secondary">as xw
>>> app = xw.App(visible=True, add_book=False)
>>> bk = app.books.open('D:\\身份证号.xlsx')
>>> sht1 = bk.sheets[0]
>>> sht2 = bk.sheets[1]
>>> sht3 = bk.sheets.add(after=bk.sheets.count, name='多DataFrame')
>>> df1 = sht1.range('A1:E6').options(pd.DataFrame).value  # Read sheet 1 class="text-secondary">as DataFrame
>>> df2 = sht2.range('A1:E6').options(pd.DataFrame).value  # Read sheet 2 class="text-secondary">as DataFrame
>>> sht3.range('A1').value = df1  # Write df1 to sheet 3
>>> sht3.range('A8').value = df2  # Write df2 below df1